Getting Started with CatLLM for R

Introduction

CatLLM is an ecosystem of R packages that use large language models (LLMs) to categorize open-ended text — survey responses, social media posts, academic papers, policy documents, web content — at scale. It’s designed for researchers who want quantitative analysis of free-text data without manual coding or hiring research assistants.

CatLLM achieves 98% accuracy compared to human consensus on classification tasks using leading models such as GPT-5, Gemini 2.5 Pro, and Qwen 3. Validated against expert human coders across 21 LLMs and 4 surveys; see the SocArXiv preprint for methodology.

The R packages are thin reticulate wrappers around the underlying Python implementation. Every parameter, default, and behavior is identical to the Python version — only the calling syntax differs. For deep conceptual content, advanced configuration, or the full 50-parameter classify() reference, the Python cat-llm README is the canonical source.

Installation

Install the meta-package (brings in all 7 sub-packages) from R-universe:

install.packages(
  "cat.llm",
  repos = c("https://chrissoria.r-universe.dev",
            "https://cloud.r-project.org")
)

Or install a single domain package for a lighter footprint:

install.packages(
  c("cat.stack", "cat.survey"),
  repos = c("https://chrissoria.r-universe.dev",
            "https://cloud.r-project.org")
)

One-time setup: install the Python backend (requires Python 3.9+ on your system):

library(cat.llm)
install_cat_stack()

# With PDF processing support:
# install_cat_stack(pdf = TRUE)

Quick Start

CatLLM is designed for building datasets at scale, not one-off queries. While you can classify individual responses, its primary purpose is batch processing entire text columns, image collections, or PDF corpora into structured research datasets. All outputs are R data.frames ready for analysis or CSV export.

Option A — via the meta-package

library(cat.llm) attaches every domain package and exposes domain-suffixed aliases (classify_survey(), classify_political(), classify_social(), etc.):

library(cat.llm)

api_key <- Sys.getenv("OPENAI_API_KEY")

# Domain-neutral classification (from cat.stack)
results <- classify(
  input_data  = c("I love this product!", "Terrible experience.", "It was fine."),
  categories  = c("Positive", "Negative", "Neutral"),
  description = "Customer feedback sentiment",
  api_key     = api_key
)

# Survey classification — adds survey-tuned prompts
results <- classify_survey(
  input_data      = df$responses,
  categories      = c("Job change", "Family reasons", "Cost of living"),
  survey_question = "Why did you move to a new city?",
  api_key         = api_key
)

# Academic paper classification — fetches by journal
results <- classify_academic(
  input_data    = NULL,
  categories    = c("Empirical", "Theoretical", "Review"),
  journal_issn  = "0894-4393",
  paper_limit   = 50L,
  polite_email  = "you@university.edu",
  api_key       = api_key
)

# Social media classification
results <- classify_social(
  input_data = df$posts,
  categories = c("Misinformation", "Opinion", "News"),
  api_key    = api_key
)

# Political text classification (built-in registered sources)
results <- classify_political(
  source     = "city_san_diego",
  doc_type   = "ordinance",
  since      = "2025-01-01",
  n          = 50L,
  categories = c("Housing", "Public Safety", "Finance"),
  api_key    = api_key
)

# Cognitive assessment scoring (CERAD drawings)
scores <- cerad_drawn_score(
  shape       = "diamond",
  image_input = df$drawing_paths,
  api_key     = api_key
)

Option B — install only the domain you need

For a lighter dependency footprint, install only the package you actually use:

# install.packages("cat.survey", repos = ...)
library(cat.survey)

results <- cat.survey::classify(
  input_data      = df$responses,
  categories      = c("Job change", "Family reasons", "Cost of living"),
  survey_question = "Why did you move to a new city?",
  api_key         = Sys.getenv("OPENAI_API_KEY")
)

The two options produce identical results — classify_survey() from cat.llm is just a thin re-export of cat.survey::classify().

The Ecosystem

Package	Domain	Wraps
cat.stack	General-purpose classification base	`classify`, `extract`, `explore`, `summarize`
cat.survey	Open-ended survey responses	Adds `survey_question=` framing
cat.vader	Social media posts	Platform connectors (Threads, Reddit, Bluesky, etc.)
cat.ademic	Academic papers	OpenAlex-based journal/topic fetching, PDF support
cat.cog	Cognitive assessment scoring	`cerad_drawn_score()` for CERAD constructional praxis
cat.pol	Policy documents	17 registered sources (ordinances, federal laws, EOs, political speech)
cat.web	Web content	Automatic URL fetching, web-context prompt injection
cat.llm	Meta-package (installs all 7)	Re-exports + domain-suffixed aliases

Every domain package shares the same core API — classify(), extract(), explore(), summarize() (where applicable) — and depends on cat.stack, which holds the underlying classification engine.

Best Practices for Classification

These recommendations are based on empirical testing across 4 surveys, 4 models (7B to frontier-class), and 250-row subsamples compared against human-coded ground truth. They apply identically to R and Python.

What works

Detailed category descriptions — the single biggest lever for accuracy. Instead of short labels like "Job change", use verbose descriptions like "The person had a job or school or career change, including transferred and retired." Consistently improves accuracy by several percentage points across all models.

verbose_categories <- c(
  "Job/school: A change in employment, education, or career, including transfers and retirement.",
  "Family: Relationship changes, having children, supporting relatives, or relocating to be near family.",
  "Cost of living: Housing affordability, cost of goods, or general economic pressure.",
  "Other: The response does not fit any of the above categories."
)

results <- classify(
  input_data = df$responses,
  categories = verbose_categories,
  api_key    = Sys.getenv("OPENAI_API_KEY")
)

Include an “Other” category — a catch-all like "Other: The response does not fit any of the above categories." prevents the model from forcing ambiguous responses into ill-fitting categories. By default, R wrappers will prompt to add one if your category list lacks one (add_other = "prompt").

Few-shot examples (example1–example6) — providing 2-4 labeled examples can help, especially for weaker models. Effects are modest (+0–1 pp on average) and model-dependent.

results <- classify(
  input_data = df$responses,
  categories = verbose_categories,
  example1   = list(text = "Took a new job in Chicago", label = "Job/school"),
  example2   = list(text = "Wanted to be closer to grandkids", label = "Family"),
  api_key    = Sys.getenv("OPENAI_API_KEY")
)

Low temperature (creativity = 0) — for classification, deterministic output is preferable. Higher temperatures add noise without improving accuracy.

What doesn’t help (or hurts)

Chain of Thought (chain_of_thought = TRUE): no measurable improvement in our testing; slightly degraded performance for some models. Off by default.
Chain of Verification (chain_of_verification = TRUE): uses ~4x the API calls for self-verification. Consistently reduced accuracy by 1–2 pp by retracting correct classifications. Not recommended for classification.
Step-back prompting (step_back_prompt = TRUE): inconsistent — slight gains for weaker models (~+1.8 pp), slight losses for stronger ones (~−0.5 pp). Not recommended as a default.
Context prompting (context_prompt = TRUE): no consistent benefit observed.

Summary

The most effective approach is straightforward: write detailed category descriptions, include an “Other” category, use a capable model at low temperature. Advanced prompting adds complexity and cost without reliable gains for classification.

Configuration

Get an API key

Get an API key from your preferred provider:

Provider	Where
OpenAI	platform.openai.com
Anthropic	console.anthropic.com
Google	aistudio.google.com
HuggingFace	huggingface.co/settings/tokens
xAI	console.x.ai
Mistral	console.mistral.ai
Perplexity	perplexity.ai/settings/api

Most providers require adding a payment method. Store your key securely and never share it publicly.

Store your key in `.Renviron` (recommended)

Rather than pasting your key into scripts, store it in ~/.Renviron so it’s automatically available to every R session:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

Then restart R and use:

api_key <- Sys.getenv("OPENAI_API_KEY")

To find or edit your .Renviron:

usethis::edit_r_environ()   # opens it for editing; creates if missing

After saving, restart R for the changes to take effect.

Run entirely locally with no API key

For sensitive data, free use, or air-gapped environments, run against a local model via Ollama:

# In a separate terminal: install Ollama, then pull a model.
# Recommended (larger, more accurate, ~9 GB):
#   ollama pull qwen2.5:14b
# Smaller fallback if disk/RAM constrained (~4.7 GB):
#   ollama pull qwen2.5:7b

results <- classify(
  input_data    = df$responses,
  categories    = c("Positive", "Negative", "Neutral"),
  user_model    = "qwen2.5:14b",   # or "qwen2.5:7b" if you pulled the smaller one
  model_source  = "ollama"
)

⚠️ Disk-space heads-up: qwen2.5:14b is ~9 GB on disk and Ollama needs roughly that much free during the download. Check df -h / first — if under ~12 GB free, use qwen2.5:7b.

No API key needed; your data never leaves the machine.

Supported Models

Specify any of these via user_model = "...":

OpenAI: gpt-4o, gpt-4o-mini, gpt-4, gpt-5, …
Anthropic: claude-sonnet-4-20250514, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, …
Google: gemini-2.5-flash, gemini-2.5-pro, …
HuggingFace: Qwen/Qwen3-235B, meta-llama/Llama-4-Scout, deepseek-ai/DeepSeek-V3, and thousands of community models
xAI: grok-2, …
Mistral: mistral-large-latest, pixtral-large-latest, …
Perplexity: sonar-large, sonar-small, …
Ollama (local): qwen2.5:14b (recommended, ~9 GB), qwen2.5:7b (smaller fallback, ~4.7 GB), llama3.1:8b, … (set model_source = "ollama")

Fully tested: OpenAI, Anthropic, Perplexity, Google Gemini (free tier has 5 RPM limit), HuggingFace, xAI, Mistral.

For best results when starting out, OpenAI (gpt-4o-mini) or Anthropic (claude-3-5-haiku-20241022) are cheap, fast, and reliable.

Ensemble & multi-model classification

Run the same input through multiple models and combine results via majority voting. Often improves accuracy by reducing individual model biases.

results <- classify(
  input_data = df$responses,
  categories = verbose_categories,
  models = list(
    c("gpt-4o-mini",            "openai",    Sys.getenv("OPENAI_API_KEY")),
    c("claude-3-5-haiku-20241022", "anthropic", Sys.getenv("ANTHROPIC_API_KEY")),
    c("gemini-2.5-flash",       "google",    Sys.getenv("GOOGLE_API_KEY"))
  ),
  consensus_threshold = "unanimous"   # or 0.5 for majority, etc.
)

The output data.frame includes per-model predictions (e.g., category_1_gpt_4o_mini, category_1_claude) plus a consensus column.

API Reference (brief)

Every parameter from the Python classify(), extract(), explore(), and summarize() functions is exposed in R with identical semantics. The full per-parameter documentation lives in the in-R help system and on the R-universe per-package reference manuals.

Function	In-R help	Online
Domain-neutral `classify()`	`?cat.stack::classify`	cat.stack manual
Survey `classify()`	`?cat.survey::classify`	cat.survey manual
Academic `classify()`	`?cat.ademic::classify`	cat.ademic manual
Political `classify()`	`?cat.pol::classify`	cat.pol manual
Web `classify()`	`?cat.web::classify`	cat.web manual
Social `classify()`	`?cat.vader::classify`	cat.vader manual
CERAD scoring	`?cat.cog::cerad_drawn_score`	cat.cog manual
List registered policy sources	`?cat.pol::list_sources`	cat.pol manual

For full conceptual coverage of every parameter — batch mode, prompt tuning, embeddings, JSON formatting, advanced ensemble configurations — see the Python README API Reference. The R wrappers expose every Python kwarg.

R ↔︎ Python type translation

When adapting Python examples from the project README, the table below covers the syntax differences. All conversions are handled automatically by reticulate::r_to_py() inside the R wrappers — you write R, the wrapper passes Python.

Python	R
`["a", "b", "c"]`	`c("a", "b", "c")`
`{"key": "value"}`	`list(key = "value")`
`True` / `False` / `None`	`TRUE` / `FALSE` / `NULL`
`[(model, provider, key), (...)]` (ensemble)	`list(c(model, provider, key), c(...))`
`df['col']`	`df$col`
`import catllm`	`library(cat.llm)`
`catllm.classify_survey(...)`	`classify_survey(...)` (after `library(cat.llm)`)

Where to go from here

Full conceptual reference: the Python cat-llm README — covers every parameter, advanced configuration, prompt tuning, embeddings, etc. Since R is a thin reticulate layer, every Python concept applies directly.
Per-package R reference manuals: https://chrissoria.r-universe.dev — pick a package, then click the “Reference Manual” link for full @param docs.
End-to-end smoke test: see r-package/test-all-packages.R in the GitHub repo — a single R script that installs all 8 packages and runs a minimal classification per package.
Issues, questions, contributions: github.com/chrissoria/cat-llm/issues
Citation — if you use CatLLM in published research, please cite:

Soria, C. (2026). Scaling Open-Ended Survey Coding: An LLM Pipeline Where Definitions Do the Heavy Lifting. SocArXiv. https://osf.io/preprints/socarxiv/gjvcf_v1

and the software DOI:

Soria, C. (2026). CatLLM: A Reproducible Python Ecosystem for Generating, Assigning, and Scoring Open-Ended Text, Images, and Documents Across Research Domains (v3.0.0) [Software]. Zenodo. https://doi.org/10.5281/zenodo.19960067

Chris Soria

2026-05-18

Introduction

Installation

Quick Start

Option A — via the meta-package

Option B — install only the domain you need

The Ecosystem

Best Practices for Classification

What works

What doesn’t help (or hurts)

Summary

Configuration

Get an API key

Store your key in `.Renviron` (recommended)

Run entirely locally with no API key

Supported Models

Ensemble & multi-model classification

API Reference (brief)

R ↔︎ Python type translation

Where to go from here

Getting Started with CatLLM for R

Chris Soria

2026-05-18

Introduction

Installation

Quick Start

Option A — via the meta-package

Option B — install only the domain you need

The Ecosystem

Best Practices for Classification

What works

What doesn’t help (or hurts)

Summary

Configuration

Get an API key

Store your key in .Renviron (recommended)

Run entirely locally with no API key

Supported Models

Ensemble & multi-model classification

API Reference (brief)

R ↔︎ Python type translation

Where to go from here

Store your key in `.Renviron` (recommended)