Anti-AI detection: how Chesky verifies case study authenticity
Our detection engine reaches 94% confidence on held-out test sets. Here's exactly how it works — and why that accuracy number matters more than you might think.
When a candidate submits a case study, you're making an assumption: that the work represents their thinking. That assumption is increasingly untenable. In our beta testing, we found that roughly 31% of case study submissions showed significant AI generation signals.
This isn't a moral judgment about AI use — it's a measurement problem. If you're trying to evaluate a candidate's reasoning and communication ability, and the work was generated by GPT-4, you're evaluating the model, not the candidate.
What we detect
Chesky's detection engine analyzes four signal categories: lexical patterns (vocabulary distribution, sentence structure variation, punctuation density), semantic patterns (topic coherence, reasoning depth, logical sequencing), stylistic consistency (comparison against the candidate's other written work where available), and behavioral signals (submission timing, revision patterns, time-to-complete vs. expected duration).
No single signal is conclusive. A candidate who writes in clean, well-structured prose isn't necessarily using AI — they may simply be a strong writer. Our engine looks for converging evidence across all four categories before producing a confidence score.
The 94% number in context
94% confidence means a 6% false positive rate on our test set. Before you use that to dismiss the tool, consider the alternative: without detection, your false positive rate for AI-generated work is close to 0% — you detect almost nothing. 94% accurate detection with a 6% false positive rate is orders of magnitude better than the status quo.
We surface the confidence score alongside the flag, never as a binary pass/fail. A 71% AI confidence score should prompt a follow-up conversation, not automatic rejection. A 96% score is a much stronger signal. The decision remains with your team — we provide the evidence.
Why this matters for culture
Companies building high-performance remote cultures are selecting for people who do original thinking under pressure — not people who are good at prompting AI tools. The case study is one of the few evaluation moments where you can directly observe a candidate's reasoning process. Protecting the integrity of that signal matters.