Classifying Open-Ended Survey Responses Directly in Claude Code

7 minute read

Published:

My open-source package cat-llm has always been a Python-first tool: you install it, import it, write a script, run it. That works fine when you already have a pipeline set up. It’s friction when you just want to quickly classify a CSV someone sent you and move on.

Claude Code — Anthropic’s terminal-based coding agent — supports project-local slash commands: markdown files in .claude/commands/ that inject a prompt and tool permissions when invoked. I added four of them to cat-llm. The result is that you can now classify survey data, extract categories, check your API keys, and run end-to-end tests without touching a Python file.

This post walks through the setup and shows what it looks like in practice using 40 rows of open-ended responses from the UCNets survey (variable a19i).


Installation

pip install cat-llm

That’s the only dependency. cat-llm pulls in pandas, tqdm, requests, openai, and anthropic automatically. For PDF classification support:

pip install cat-llm[pdf]

You’ll also need at least one provider API key. cat-llm reads from environment variables automatically:

export OPENAI_API_KEY="sk-..."       # OpenAI / xAI
export ANTHROPIC_API_KEY="sk-ant-..."  # Anthropic
export GOOGLE_API_KEY="AIza..."      # Google

Or drop them in a .env file at your project root — cat-llm will pick them up.


Adding the Claude Code Commands

Clone or navigate to your cat-llm working directory, then create the commands folder:

mkdir -p .claude/commands/catllm

Add four markdown files — classify.md, extract.md, providers.md, and test.md — each containing a prompt that describes what Claude should do when the command is invoked. The full command definitions are in the cat-llm repository under .claude/commands/catllm/.

Once the files exist, open (or reopen) a Claude Code session from the cat-llm project directory. Type /catllm: and tab-complete to see all four commands available.


Checking Your Providers

Before classifying anything, run:

/catllm:providers

Claude detects which API keys are present in your environment, masks the values, and lists suggested model names for each configured provider:

=== cat-llm Provider Status ===

[OK] OpenAI / xAI
     Key: OPENAI_API_KEY = sk-p...k3Rw
     Models: gpt-5, gpt-4o-mini, grok-3

[OK] Anthropic
     Key: ANTHROPIC_API_KEY = sk-a...9xQz
     Models: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001

[ ] Google (GOOGLE_API_KEY not set)
[ ] Mistral (MISTRAL_API_KEY not set)
[ ] HuggingFace (HUGGINGFACE_API_TOKEN not set)

[OK] Ollama (local)
     llama3.2:latest
     mistral:latest

Configured: 2 provider(s)

Classifying UCNets a19i

The UCNets survey includes an open-ended question (a19i) asking respondents about sources of social support. The raw responses are text — exactly what cat-llm is designed for.

I have 40 responses in a CSV. The first step is letting the data suggest its own categories rather than imposing a scheme from the outside. The /catllm:extract command runs cat-llm’s extract() function, which samples random chunks of the responses, asks the model to name the recurring themes in each chunk, and returns a ranked frequency list:

/catllm:extract ucnets_a19i.csv --col response --model gpt-5
=== Extracting categories from ucnets_a19i.csv ===
Column: response | Rows: 40 | Model: gpt-5

Discovered categories (ranked by frequency):
  1. Family and close relationships       (found in 18/20 chunks)
  2. Friends and peer support             (found in 15/20 chunks)
  3. Professional or institutional help   (found in 12/20 chunks)
  4. Online communities                   (found in 9/20 chunks)
  5. Religious or spiritual community     (found in 8/20 chunks)
  6. No support or social isolation       (found in 7/20 chunks)
  7. Coworkers or colleagues              (found in 5/20 chunks)

Suggested follow-up:
  /catllm:classify ucnets_a19i.csv \
    --col response \
    --categories "Family and close relationships,Friends and peer support,Professional or institutional help,Online communities,Religious or spiritual community,No support or social isolation,Coworkers or colleagues" \
    --output ucnets_a19i_classified.csv

The command even generates the follow-up classify call. Running it:

/catllm:classify ucnets_a19i.csv \
  --col response \
  --categories "Family and close relationships,Friends and peer support,Professional or institutional help,Online communities,Religious or spiritual community,No support or social isolation,Coworkers or colleagues" \
  --model gpt-5 \
  --output ucnets_a19i_classified.csv
=== Classifying ucnets_a19i.csv ===
Column: response | Rows: 40 | Model: gpt-5
Categories: 7

   response                                          family  friends  professional  online  religious  isolated  coworkers
0  My sister and I talk every day, she really...        1       0          0          0         0          0         0
1  Mostly just online — Reddit communities wh...        0       0          0          1         0          0         0
2  My church group has been incredible, they ...        0       1          0          0         1          0         0
3  Honestly nobody. I've felt very alone thr...         0       0          0          0         0          1         0
4  My therapist and my wife, in that order...          1       0          1          0         0          0         0
...

--- Category Distribution (40 rows) ---
Family and close relationships      29  (72.5%)
Friends and peer support            18  (45.0%)
Professional or institutional help  11  (27.5%)
Religious or spiritual community     9  (22.5%)
Online communities                   8  (20.0%)
No support or social isolation       6  (15.0%)
Coworkers or colleagues              4  (10.0%)

Saved to ucnets_a19i_classified.csv

The output is a binary matrix — one column per category, one row per response. A single response can belong to multiple categories simultaneously, which is the right design for this kind of data: someone who mentions both their sister and their therapist gets a 1 in both family and professional, not a forced choice between them.

The classified CSV is ready for any downstream analysis — R, Stata, Python, whatever the pipeline requires.


Running the Quick Test

There’s also a built-in smoke test command that runs on the package’s bundled example data:

/catllm:test
=== cat-llm Quick Test ===
File: examples/test_data/survey_responses.csv
Model: gpt-5
Categories: ['Positive', 'Negative', 'Neutral']

Loaded 20 rows. Columns: ['id', 'response']
Text column: response

--- Results ---
    response                                          Positive  Negative  Neutral
0   The program was incredibly helpful and I...           1        0        0
1   I didn't find the sessions useful at all...           0        1        0
2   It was okay, nothing special but not bad...           0        0        1
...

--- Category Distribution ---
Positive    11
Negative     5
Neutral      4

PASS: classify() completed successfully.

Useful for verifying a new API key or model name works before pointing it at real data.


What Else You Can Do

The commands are thin wrappers around cat-llm’s Python API, which has considerably more surface area:

Multi-model ensemble. Pass a list of models and cat-llm runs all of them in parallel, then reports a consensus column alongside each model’s individual output. Disagreements across models are flagged automatically.

import catllm as cat

result = cat.classify(
    input_data,
    categories,
    models=[
        ("gpt-5",                "openai",    openai_key,    {}),
        ("claude-sonnet-4-6",    "anthropic", anthropic_key, {}),
        ("gemini-2.0-flash",     "google",    google_key,    {}),
    ],
    consensus_threshold="unanimous",
)

Extended reasoning. Pass thinking_budget=4096 (Anthropic) or chain_of_thought=True for borderline classification tasks where you want the model to reason before committing to a label.

PDF classification. If input_data points at a directory of PDFs, cat-llm renders each page and classifies the content as images. Useful for document archives or grant applications.

Category discovery. cat.extract() is the same function the /catllm:extract command calls under the hood. You can run it directly, tune the iterations and divisions parameters, and use the raw frequency table to build a codebook before any classification happens.


The cat-llm repository is at github.com/chrissoria/cat-llm and the package is on PyPI as cat-llm (version 2.5.1 as of this writing). If you adapt the Claude Code commands for a different workflow or build something interesting with the package, reach out at chrissoria@berkeley.edu.