Capturing Nuanced Public Opinion with Large Language Models
June 29, 2025
It is possible to group stimuli in almost any conceivable manner and to classify and subclassify them indefinitely, it is strictly true that the number of attitudes which any given person possesses is almost infinite (Likert 1932)
Reduce Bias: Avoids the cueing bias common in closed-ended formats. (Iyengar 1996)
Capture Nuance: Uncovers the full spectrum of opinion, including detailed, complex, and unexpected responses.
Detect Emerging Issues before they become salient in public discourse.
Create Durable Data: Raw text can be re-analyzed with new methods and theories, increasing the long-term value of the survey. (Roberts et al. 2014)
The literature raises valid concerns about transparency, reproducibility, and accuracy (the “black box” problem). However,
Can LLMs accurately and efficiently process open-ended responses for quantitative analysis?
Can LLM-cleaned open-ended questions measure the same latent constructs as traditional closed-ended questions?
Can LLM analysis of open-ended responses reveal insights that closed-ended questions fundamentally cannot capture?
How much should the federal government spend on the environment?
How much should the federal government spend on immigrants and minorities?
How many immigrants should the country admit?
You are an expert at creating optimized prompts for AI systems that process survey data. Your task is to generate a specialized
prompt for coding open-ended survey responses to a specific survey variable.
## Variable Information:
- Variable name: {variable_name}
- Question text: {question_text}
- Variable type: {variable_type}
- Domain/topic: {variable_domain}
- Sample response values: {sample_values}
## Response Categories/Options:
{categories_info}
## Sample Open-ended Responses (if available):
{sample_responses}
## Language Information:
{language_info}
IMPORTANT: This is a bilingual survey (French and English). Responses may be in either language.
## Your Task:
Generate an optimal prompt consisting of TWO parts:
### PART 1: SYSTEM MESSAGE
Create a system message that:
- Defines the AI assistant's role and expertise for this specific variable type
- Explains the task clearly (mapping open responses to codes)
- Provides domain-specific guidance relevant to this variable's topic
- Emphasizes returning only the numeric code
- Includes any special considerations for this variable type
- CRITICAL: Explicitly mentions handling both French and English responses
- Provides key French translations for common responses (oui=yes, non=no, etc.)
- Warns against coding valid French responses as Don't know
### PART 2: USER TEMPLATE
Create a user message template that:
- Uses placeholder variables: {{variable_name}}, {{question_text}}, {{options_block}}, {{open_response}}
- Is formatted clearly for easy reading
- Includes appropriate context for this variable type
- Follows this general structure but adapt the labels/sections as needed:
Variable: {{variable_name}}
Question: {{question_text}}
[Appropriate section title for the options]:
{{options_block}}
Open-ended response:
"{{open_response}}"
## Requirements:
- Be specific to this variable's domain and characteristics
- Consider the types of responses likely for this question
- Optimize for accuracy in mapping responses to the correct codes
- Keep instructions clear and concise
- Ensure the prompt will work well for the sample responses shown
- MUST handle bilingual responses correctly (French and English)
- Include guidance on common French political terms if relevant (e.g., Libéral = Liberal Party)
- Provide clear French-English mappings for agreement/disagreement terms
## Output Format:
IMPORTANT: Return ONLY valid JSON in this exact format. Do not include any other text or explanation:
{{
"system_message": "Your system message here...",
"user_template": "Your user template here..."
}}
Generate the optimized prompt now:
Question: How much should the federal government spend on the environment?
### System Message
You are an expert AI assistant specializing in coding survey data for political science and public opinion research. Your task is to
analyze an open-ended survey response and assign the single most appropriate numeric code based on the provided categories.
The response is for a question about federal government spending on the environment.
## Task Instructions:
1. **Analyze the Response**: Carefully read the open-ended response to understand the respondent's opinion on the level of
environmental spending.
2. **Identify Sentiment**: Determine if the respondent wants to spend MORE, LESS, or maintain the SAME level of spending.
3. **Assign Code**: Select the single numeric code that best corresponds to this sentiment.
4. **Output**: Return ONLY the single numeric code. Do not provide any explanation, text, or punctuation.
## Critical Considerations for this Variable:
- **Bilingual Survey**: This is a bilingual survey. Responses may be in English or French (FR-CA). You MUST correctly interpret both
languages. Do NOT code a valid French response as 'Don't know' (6).
- **Focus on Quantity**: The core of the question is about the *amount* of spending. Look for words indicating an increase, decrease,
or maintenance of the current level.
- **Typo in Options**: Note that the option 'Spend about the same as snow' contains a typo and should be interpreted as 'Spend about
the same as now'.
## French Language Guidance:
To ensure accuracy, use the following French-to-English translations for common terms related to this question:
**More / Increase:**
- `Plus` = More
- `Augmenter` / `Augmentation` = Increase
- `Davantage` = More
- `Beaucoup plus` = A lot more
- `Plus qu'en ce moment` = More than right now
**Less / Decrease:**
- `Moins` = Less
- `Diminuer` / `Réduire` = Decrease / Reduce
- `Beaucoup moins` = A lot less
**Same / Maintain:**
- `Pareil` / `La même chose` = The same
- `Comme maintenant` = Like now
- `Le même montant` = The same amount
- `Garder le même` = Keep the same
**Don't Know / Refusal:**
- `Je ne sais pas` / `Sais pas` / `NSP` = I don't know
- `Aucune idée` = No idea
If the response is ambiguous, expresses no opinion, or is a clear 'don't know' or refusal to answer (in either language), use code 6.
### User Template
Variable: {variable_name}
Question: {question_text}
Response Options & Codes:
{options_block}
Open-ended response:
"{open_response}"
### Response Options
1: Spend less
4: Spend about the same as snow
5: Spend more
6: Don't know/Prefer not to answer
Models:
Parameters: Default parameters (Salimian et al. 2025)
223,639 Prompts total
Cost:
Time: About 55 minutes
Consensus: 92.7%
18% of respondents complained about the difficulty of answering numerical open-ended questions.