From Checkbox to Textbox

Capturing Nuanced Public Opinion with Large Language Models

Laurence-Olivier
M. Foisy

Université Laval

Hubert Cadieux
Étienne
Proulx
Camille Pelletier
Yannick Dufresne

June 29, 2025

The Challenge of Open-Ended Data

It is possible to group stimuli in almost any conceivable manner and to classify and subclassify them indefinitely, it is strictly true that the number of attitudes which any given person possesses is almost infinite (Likert 1932)

  • Open-ended questions offer more nuance but are notoriously difficult to analyze at scale.
  • Open-ended questions are often left unanalyzed
  • Is there a trade-off between the qualitative richness of open-ended text and the quantitative scalability of closed-ended items?
  • Can we resolve this long-standing methodological tension?

Why Use Open-Ended Questions?

  • Reduce Bias: Avoids the cueing bias common in closed-ended formats. (Iyengar 1996)

  • Capture Nuance: Uncovers the full spectrum of opinion, including detailed, complex, and unexpected responses.

  • Detect Emerging Issues before they become salient in public discourse.

  • Create Durable Data: Raw text can be re-analyzed with new methods and theories, increasing the long-term value of the survey. (Roberts et al. 2014)

Using Large Language Models

The literature raises valid concerns about transparency, reproducibility, and accuracy (the “black box” problem). However,

  • LLMs present a potential solution, capable of understanding and categorizing text.
  • LLMs can replace expert human coders in many political science applications. (Benoit et al. 2025; Wu 2025; Mens and Gallego 2025)
  • Are LLMs replicable?
  • Are LLMs scalable?
  • Are LLMs reliable?
  • Are LLMs multilingual?

Questions

  1. Can LLMs accurately and efficiently process open-ended responses for quantitative analysis?

  2. Can LLM-cleaned open-ended questions measure the same latent constructs as traditional closed-ended questions?

  3. Can LLM analysis of open-ended responses reveal insights that closed-ended questions fundamentally cannot capture?

Survey Design

  • Design: Two treatment groups:
    • Group 1: Traditional closed-ended questions
    • Group 2: Identical but Open-ended
  • 20 Questions:
    • 7 Socioeconomic
    • 1 Vote Intention
    • 7 Environmental Support Attitudes
    • 5 Anti-Immigration Attitudes
    • +1 Open-ended question about survey appreciation
  • Characteristics:
    • Datagotchi respondents
    • n = 1,685 for the Open-ended group
    • n = 1,687 for the Closed-ended group

Awkward Open-Ended Questions

How much should the federal government spend on the environment?

How much should the federal government spend on immigrants and minorities?

How many immigrants should the country admit?

Two-Step LLM Prompting Process Shell Script Orchestrator Step 1: Create Custom Prompts Gemini 2.5 Pro 20 Prompts Step 2: Code Responses Gemini 2.0 Flash Lite 20 R Processes (1 per variable) 10 Parallel Requests per R process Modal Response (Consensus)

Prompt

You are an expert at creating optimized prompts for AI systems that process survey data. Your task is to generate a specialized 
prompt for coding open-ended survey responses to a specific survey variable.

## Variable Information:
- Variable name: {variable_name}
- Question text: {question_text}
- Variable type: {variable_type}
- Domain/topic: {variable_domain}
- Sample response values: {sample_values}

## Response Categories/Options:
{categories_info}

## Sample Open-ended Responses (if available):
{sample_responses}

## Language Information:
{language_info}
IMPORTANT: This is a bilingual survey (French and English). Responses may be in either language.

## Your Task:
Generate an optimal prompt consisting of TWO parts:

### PART 1: SYSTEM MESSAGE
Create a system message that:
- Defines the AI assistant's role and expertise for this specific variable type
- Explains the task clearly (mapping open responses to codes)
- Provides domain-specific guidance relevant to this variable's topic
- Emphasizes returning only the numeric code
- Includes any special considerations for this variable type
- CRITICAL: Explicitly mentions handling both French and English responses
- Provides key French translations for common responses (oui=yes, non=no, etc.)
- Warns against coding valid French responses as Don't know

### PART 2: USER TEMPLATE
Create a user message template that:
- Uses placeholder variables: {{variable_name}}, {{question_text}}, {{options_block}}, {{open_response}}
- Is formatted clearly for easy reading
- Includes appropriate context for this variable type
- Follows this general structure but adapt the labels/sections as needed:

Variable: {{variable_name}}
Question: {{question_text}}

[Appropriate section title for the options]:
{{options_block}}

Open-ended response:
"{{open_response}}"

## Requirements:
- Be specific to this variable's domain and characteristics
- Consider the types of responses likely for this question
- Optimize for accuracy in mapping responses to the correct codes
- Keep instructions clear and concise
- Ensure the prompt will work well for the sample responses shown
- MUST handle bilingual responses correctly (French and English)
- Include guidance on common French political terms if relevant (e.g., Libéral = Liberal Party)
- Provide clear French-English mappings for agreement/disagreement terms

## Output Format:
IMPORTANT: Return ONLY valid JSON in this exact format. Do not include any other text or explanation:

{{
  "system_message": "Your system message here...",
  "user_template": "Your user template here..."
}}

Generate the optimized prompt now:

Custom AI-Generated Prompt

Question: How much should the federal government spend on the environment?

### System Message

You are an expert AI assistant specializing in coding survey data for political science and public opinion research. Your task is to 
analyze an open-ended survey response and assign the single most appropriate numeric code based on the provided categories. 
The response is for a question about federal government spending on the environment.

## Task Instructions:

1.  **Analyze the Response**: Carefully read the open-ended response to understand the respondent's opinion on the level of 
environmental spending.

2.  **Identify Sentiment**: Determine if the respondent wants to spend MORE, LESS, or maintain the SAME level of spending.

3.  **Assign Code**: Select the single numeric code that best corresponds to this sentiment.

4.  **Output**: Return ONLY the single numeric code. Do not provide any explanation, text, or punctuation.

## Critical Considerations for this Variable:
-   **Bilingual Survey**: This is a bilingual survey. Responses may be in English or French (FR-CA). You MUST correctly interpret both 
languages. Do NOT code a valid French response as 'Don't know' (6).
-   **Focus on Quantity**: The core of the question is about the *amount* of spending. Look for words indicating an increase, decrease, 
or maintenance of the current level.
-   **Typo in Options**: Note that the option 'Spend about the same as snow' contains a typo and should be interpreted as 'Spend about 
the same as now'.

## French Language Guidance:
To ensure accuracy, use the following French-to-English translations for common terms related to this question:

**More / Increase:**
-   `Plus` = More
-   `Augmenter` / `Augmentation` = Increase
-   `Davantage` = More
-   `Beaucoup plus` = A lot more
-   `Plus qu'en ce moment` = More than right now

**Less / Decrease:**
-   `Moins` = Less
-   `Diminuer` / `Réduire` = Decrease / Reduce
-   `Beaucoup moins` = A lot less

**Same / Maintain:**
-   `Pareil` / `La même chose` = The same
-   `Comme maintenant` = Like now
-   `Le même montant` = The same amount
-   `Garder le même` = Keep the same

**Don't Know / Refusal:**
-   `Je ne sais pas` / `Sais pas` / `NSP` = I don't know
-   `Aucune idée` = No idea

If the response is ambiguous, expresses no opinion, or is a clear 'don't know' or refusal to answer (in either language), use code 6.

### User Template

Variable: {variable_name}
Question: {question_text}

Response Options & Codes:
{options_block}

Open-ended response:
"{open_response}"

### Response Options

1: Spend less
4: Spend about the same as snow
5: Spend more
6: Don't know/Prefer not to answer

LLM Characteristics

  • Models:

    • Custom AI prompts: Gemini 2.5 Pro
      • Price : $1.25 input, $10.00 output per 1M tokens
      • Rate limit: 150 RPM
    • Main analysis: Gemini 2.0 Flash Lite
      • Price : $0.075 input, $0.30 output per 1M tokens
      • Rate limit: 4000 RPM
  • Parameters: Default parameters (Salimian et al. 2025)

  • 223,639 Prompts total

  • Cost:

    • Around $0.023 per 1,000 responses
    • Total cost: Under 5 dollars
  • Time: About 55 minutes

  • Consensus: 92.7%

Appreciation

18% of respondents complained about the difficulty of answering numerical open-ended questions.

Discussion & Implications

  • Open-ended questions can be effectively cleaned and analyzed using LLMs.
  • The results show that LLM-cleaned open-ended questions can measure the same latent constructs as traditional closed-ended questions.
  • Struggles with nuance

Merci!

Bibliography

Benoit, Kenneth, Scott De Marchi, Conor Laver, Michael Laver, and Jinshuai Ma. 2025. “Replacing Experts with LLMs When Analyzing Political Texts.” In NASP International and Interdisciplinary Seminars. University of Milan.
Iyengar, Shanto. 1996. “Framing Responsibility for Political Issues.” Annals of the American Academy of Political and Social Science 546: 59–70. https://www.jstor.org/stable/1048170.
Likert, R. 1932. “A Technique for the Measurement of Attitudes.” Archives of Psychology 22 140: 55–55.
Mens, Gaël Le, and Aina Gallego. 2025. “Positioning Political Texts with Large Language Models by Asking and Averaging.” Political Analysis 33 (3): 274–82. https://doi.org/10.1017/pan.2024.29.
Roberts, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G. Rand. 2014. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58 (4): 1064–82. https://doi.org/10.1111/ajps.12103.
Salimian, Sina, Gias Uddin, Most Husne Jahan, and Shaina Raza. 2025. “Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs.” arXiv. https://doi.org/10.48550/arXiv.2502.07186.
Wu, Patrick Y. 2025. “Large Language Models Can Be a Viable Substitute for Expert Political Surveys When a Shock Disrupts Traditional Measurement Approaches.” arXiv. https://doi.org/10.48550/arXiv.2506.06540.