Skip to contents

Political and policy document classification with LLMs. A domain wrapper around cat.stack that adds a registered-source fetcher (city ordinances, federal laws, executive orders, presidential speeches, social-media archives) and policy-document prompt framing.

cat.pol wraps the Python catpol package via reticulate.

Installation

# From R-universe (recommended once published):
install.packages("cat.pol", repos = "https://chrissoria.r-universe.dev")

# Or from a local clone:
devtools::install("path/to/cat.stack")
devtools::install("path/to/cat.pol")

# Install the Python backend
# pip install cat-pol

Quick Start

Classify ordinances from a built-in source

library(cat.pol)

results <- classify(
  source     = "city_san_diego",
  doc_type   = "ordinance",
  since      = "2024-01-01",
  n          = 50,
  categories = c("Housing", "Public Safety", "Finance",
                 "Infrastructure", "Health"),
  api_key    = Sys.getenv("OPENAI_API_KEY")
)

Discover categories from your own text

result <- extract(
  input_data        = df$bill_text,
  document_context  = "California state legislation",
  api_key           = Sys.getenv("OPENAI_API_KEY")
)
print(result$top_categories)

Summarize policy documents in plain English

results <- summarize(
  source = "federal_executive_orders",
  since  = "2025-01-01",
  format = "paragraph",
  tone   = "eli5",
  api_key = Sys.getenv("OPENAI_API_KEY")
)

List every available data source

list_sources()
#> [1] "city_san_diego"            "city_san_francisco"
#> [3] "federal_laws"              "federal_executive_orders"
#> [5] "social_trump_truth"        ...

Functions

Function Description
classify() Classify policy documents into categories
extract() Discover and extract categories from policy text
explore() Get raw category extractions for saturation analysis
summarize() Summarize policy documents (with tone parameter)
list_sources() List every registered political data source

License

MIT