CatLLM

Project site: catllm.com — the full project landing page with documentation, examples, and citation info.

Overview

CatLLM is an open-source Python and R ecosystem for systematic LLM-powered text classification. The cat-llm meta-package installs a family of domain-specific tools — survey responses, social media, academic papers, political text, web content, and more — all sharing the same classify() / extract() / summarize() API. Validated against expert human coders across multiple datasets.

Downloads per month Total Downloads JOSS DOI

The Ecosystem

Meta-package

cat-llm: The full ecosystem in one install (pip install cat-llm). Provides every domain package through a single import catllm namespace. [GitHub]

General-purpose base

cat-stack: The domain-agnostic classification engine underlying every other package. Use it directly for general text, or build your own domain wrapper on top of it. [GitHub]

Downloads per month Total Downloads

Domain packages

cat-survey: Classify open-ended survey responses at scale. Verbose category definitions and ensemble voting handle ambiguity. [GitHub]

Downloads per month Total Downloads

cat-pol: Classify political text — municipal ordinances, federal laws, executive orders, presidential speeches. Ships with 17 built-in datasets on HuggingFace, updated weekly. [GitHub]

Downloads per month Total Downloads

cat-vader: Classify and analyze social media posts. Connects to the Threads API to pull your full post history, classify posts into custom categories, and return an enriched dataset with engagement metrics. [Learn more] [GitHub]

Downloads per month Total Downloads

cat-ademic: Classify and summarize academic papers — abstracts, full texts, and research documents across disciplines. Built-in journal/field context. [GitHub]

Downloads per month Total Downloads

cat-cog: Cognitive assessment scoring, including CERAD Constructional Praxis test evaluation for dementia research. [GitHub]

Downloads per month Total Downloads

cat-web: Classify scraped web content — pages, articles, and HTML. Domain-tuned prompts for long-form online text. [GitHub]

Downloads per month Total Downloads

llm-web-research: A separate package in the CatLLM family for LLM-powered web research with a focus on accuracy over quantity. Uses a multi-step verification pipeline to catch ambiguous queries before returning potentially incorrect answers. [GitHub]

Downloads per month Total Downloads

Web Apps

Classify Survey Responses: A web-based tool for categorizing survey responses without writing code.

Citation

If you use CatLLM in your research, please cite:

Soria, C. (2026). Scaling Open-Ended Survey Coding: An LLM Pipeline Where Definitions Do the Heavy Lifting. Journal of Open Source Software. https://doi.org/10.21105/joss.09678