Skip to main content
Perplexity

GitHub Deploys Perplexity‑Powered Agentic Pipeline to Predict CFPB Enforcement Actions

Published by
SectorHQ Editorial
GitHub Deploys Perplexity‑Powered Agentic Pipeline to Predict CFPB Enforcement Actions

Photo by gaspar zaldo (unsplash.com/@gasparzaldo) on Unsplash

GitHub launched a Perplexity‑powered agentic pipeline that autonomously builds a Bayesian Optimization model to predict CFPB enforcement actions from public complaint data, achieving an F1 score of 0.725 at a $200/month cost, reports indicate.

Key Facts

  • Key company: Perplexity

GitHub’s new agentic pipeline leverages Perplexity’s LLM to orchestrate a full Bayesian Optimization (BO) workflow without any human‑engineered code. According to the public GitHub repository sign‑of‑fourier/cfpb‑complaint‑enforcement, the AI was instructed only to “predict which companies the CFPB will take enforcement action against” using the Consumer Financial Protection Bureau’s publicly available complaint dataset. The agent independently selected the BoTorch library, chose MixedSingleTaskGP as the surrogate model, and settled on LogExpectedImprovement as the acquisition function—components it had never been explicitly taught. The entire research design, spanning eight hyperparameters (lookback window, minimum complaints, class‑weight ratio, decision threshold, feature subset, model type, text‑feature inclusion, and control‑match ratio), was explored via BO rather than a manual grid search.

Performance metrics show the pipeline’s practical edge over naïve random search. Across 48 evaluation runs, the mean F1 score reached 0.725, compared with 0.389 for random search, while the best‑case configuration achieved a perfect 1.000 F1. The BO process converged after roughly 19 evaluations, and in 86 percent of the runs it outperformed random search. The optimizer discovered that a short look‑back window of about 156 days (≈5 months) maximized predictive power, whereas longer windows diluted the signal. Moreover, recent complaint velocity—not cumulative volume—proved to be the dominant predictor, and heavy class weighting (an 18.5× up‑weight of enforcement cases) was essential to counter the severe class imbalance (26 matched enforcement actions out of 213 total).

Feature engineering emerged as another decisive factor. The study reports that combining all feature groups—complaint volume, distributional patterns, response metrics, textual content, and geographic information—outperformed any subset. Model choice proved less critical; logistic regression, random forest, and gradient‑boosted trees all reached comparable performance when the surrounding pipeline configuration was optimal. This suggests that the BO search was effectively optimizing the research design rather than merely fine‑tuning model hyperparameters.

The pipeline’s cost structure underscores its accessibility. Per the repository, the only expense was a $200 monthly Perplexity Max subscription; no cloud compute, GPU clusters, or dedicated research staff were required. Live predictions generated on March 16, 2026 ranked Complaints LL Holdings LLC (9,537 complaints) at a risk score of 0.9999, followed closely by SchoolsFirst Federal Credit Union (72 complaints) with 0.9993, and State Employees Credit Union (288 complaints) at 0.9989. The authors note that the high score for SchoolsFirst, despite its modest complaint count, reflects the model’s reliance on distributional and response signatures rather than raw volume, reinforcing the claim that the predictor captures statistical patterns in public data rather than accusations of wrongdoing.

Caveats temper the optimism. The matched enforcement dataset is small—only 26 of 213 actions linked via strict name matching—so fuzzy matching could potentially triple the positive cases. The test set comprises just 16 samples, inflating the best‑case F1 of 1.0 and raising concerns about over‑fitting. Moreover, the train‑test split was random rather than chronological, precluding temporal validation. The authors propose a version 2 that would train on data from 2017‑2021 and test on 2022‑2024 to address this limitation. Nonetheless, the mean F1 of 0.725 across multiple BO runs demonstrates a robust baseline for an entirely autonomous AI‑driven research pipeline.

Sources

Primary source

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories