Skip to contents

What cat.vader adds

cat.vader is a thin domain wrapper around cat.stack that adds social-media platform connectors plus prompt framing tuned for short, informal, online language. You can use it two ways:

  1. Pull posts directly from a platform (Threads, Reddit, Bluesky, Mastodon, YouTube, etc.) using the sm_source, sm_handle, sm_credentials arguments, then classify.
  2. Classify text you already have as a plain character vector — identical to cat.stack::classify() but with social-media-aware prompt context.

Everything else — supported models, output format, ensemble voting — is identical to cat.stack.

Install

install.packages(
  "cat.vader",
  repos = c("https://chrissoria.r-universe.dev",
            "https://cloud.r-project.org")
)
library(cat.vader)

Classify text you already have

The simplest path — pass a character vector of post text:

posts <- c(
  "Just had the best coffee ever! Highly recommend the new place downtown.",
  "Politicians are all the same. Nothing ever changes.",
  "Looking forward to the game tonight!",
  "This new policy is going to ruin small businesses.",
  "Anyone know a good vet in the area?"
)

results <- classify(
  input_data = posts,
  categories = c("Positive sentiment", "Negative sentiment",
                 "Question/request", "Other"),
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini"
)

Pull and classify in one call

If you want to analyse your own posting history or a public account, cat.vader can pull posts directly and feed them through the classifier:

# Authenticate once with your Threads API credentials, then:
results <- classify(
  sm_source      = "threads",
  sm_handle      = "your_username",
  sm_months      = 6L,                      # last 6 months
  sm_credentials = Sys.getenv("THREADS_TOKEN"),
  categories     = c("Personal", "Political", "Promotional",
                     "Question", "Other"),
  api_key        = Sys.getenv("OPENAI_API_KEY"),
  user_model     = "gpt-4o-mini"
)

The returned data.frame includes the original text and platform engagement metrics (likes, replies, reposts, etc.) so you can correlate content categories with reach.

Each platform has its own authentication setup; check the cat-vader Python docs for the current credential formats.

Discovering categories from a feed

Before classifying, see what themes are actually present:

cats <- extract(
  input_data     = df$posts,
  max_categories = 10L,
  api_key        = Sys.getenv("OPENAI_API_KEY"),
  user_model     = "gpt-4o-mini"
)
cats$top_categories

Then iterate — drop noise categories, merge similar ones, and re-run classify() with the cleaned scheme.

Tips for social-media data

  1. Short text is harder. Tweets/posts often lack context ("lol", "this"). Including the platform name and a brief description in description= helps the model interpret short posts.
  2. Emoji and slang. Frontier models handle these well; older or smaller models less so. If you’re seeing weird classifications on highly informal text, try gpt-4o or claude-3-5-sonnet over the mini/haiku tier.
  3. Engagement isn’t ground truth. A high-engagement post isn’t necessarily correctly classified — engagement reflects audience, not category. Validate on a hand-coded subsample.

Where to learn more