CatLLM: A Python package for Generating, Assigning, and Scoring Open-Ended Survey Data and Images

Published in Journal of Open Source Software, 2026

Recommended citation: Soria C. CatLLM: A Python package for Generating, Assigning, and Scoring Open-Ended Survey Data and Images. Journal of Open Source Software. 2026. doi:10.21105/joss.09678 https://doi.org/10.21105/joss.09678

CatLLM is an open-source Python and R toolkit for systematic, reproducible LLM-powered text classification. The package implements a provider-agnostic pipeline supporting frontier and open-weight models, with defaults calibrated against the consensus of double-blind coding by sociologists and demographers across multiple survey datasets. This short software paper documents the design, scope, and reproducibility guarantees of the toolkit; the full empirical validation is reported in a companion preprint under review at the Journal of Computational Social Science.

Read the paper

Soria C. CatLLM: A Python package for Generating, Assigning, and Scoring Open-Ended Survey Data and Images. Journal of Open Source Software. 2026. doi:10.21105/joss.09678

Share on

Threads Twitter Facebook LinkedIn

Contact Information

Share on