CatLLM

Project site: catllm.com — the full project landing page with documentation, examples, and citation info.

Overview

CatLLM is an open-source Python and R ecosystem for systematic LLM-powered text classification. The cat-llm meta-package installs a family of domain-specific tools — survey responses, social media, academic papers, political text, web content, and more — all sharing the same classify() / extract() / summarize() API. Validated against expert human coders across multiple datasets.

The Ecosystem

Meta-package

cat-llm: The full ecosystem in one install (pip install cat-llm). Provides every domain package through a single import catllm namespace. [GitHub]

General-purpose base

cat-stack: The domain-agnostic classification engine underlying every other package. Use it directly for general text, or build your own domain wrapper on top of it. [GitHub]

Domain packages

cat-survey: Classify open-ended survey responses at scale. Verbose category definitions and ensemble voting handle ambiguity. [GitHub]

cat-pol: Classify political text — municipal ordinances, federal laws, executive orders, presidential speeches. Ships with 17 built-in datasets on HuggingFace, updated weekly. [GitHub]

cat-vader: Classify and analyze social media posts. Connects to the Threads API to pull your full post history, classify posts into custom categories, and return an enriched dataset with engagement metrics. [Learn more] [GitHub]

cat-ademic: Classify and summarize academic papers — abstracts, full texts, and research documents across disciplines. Built-in journal/field context. [GitHub]

cat-cog: Cognitive assessment scoring, including CERAD Constructional Praxis test evaluation for dementia research. [GitHub]

cat-web: Classify scraped web content — pages, articles, and HTML. Domain-tuned prompts for long-form online text. [GitHub]

Apps

CatLLM Desktop (Mac): A self-contained Mac app — drag in a CSV, pick categories, get a coded dataset back. Same engine as the Python and R packages, no Python install required.

Classify Survey Responses: A web-based tool for categorizing survey responses without writing code.

Citation

If you use CatLLM in your research, please cite:

Soria, C. (2026). CatLLM: A Python package for Generating, Assigning, and Scoring Open-Ended Survey Data and Images. Journal of Open Source Software. https://doi.org/10.21105/joss.09678

Contact Information