vibetell

Examples

LLM →

Random →

⚠ Contains a recognizable token ≥5 chars — credential is potentially a hybrid/passphrase. If it is, vibetell's analysis does not apply.

⚠ This password is shorter than 12 characters. vibetell requires at least 12 characters for reliable analysis.

Conventional Strength Metrics — what existing tools say

Entropy

Character-frequency information content

—

KeePass (Simplified)

Pool-size × length bit estimate

—

zxcvbn

Pattern-matching strength estimator

—

vibe·tell (beta)

◌

Enter a credential and press Analyze.
vibetell checks whether a credential's
pattern is consistent with autoregressive generation.

Signal Strength

SCT Rate

—

threshold < 0.024

E[SCT]

—

expected under randomness

Delta (Δ)

—

SCT − E[SCT]

Z-Score

—

std. devs from baseline

LLR Total

—

threshold > 0

LLR Digits

—

digit class component

LLR Letters

—

letter class component

LLR Symbols

—

symbol class component

Soft Indicators

—

info only, no verdict effect

Max Run

—

longest same-class streak

Classes

—

U · l · D · S present

Active Path

—

signals fired

Introduction

vibetell is an upcoming cli-tool designed to analyze credentials for indicators of LLM generation. It does not evaluate for cryptographic randomness; instead, it identifies signatures of LLM-generation that password strength tools often miss.

The blind spot

The gap of existing tools

Modern strength meters measure what characters appear, not how they are ordered, and LLM-generated credentials score highly on all of them. Autoregressive generation appears to leave a structural fingerprint that CSPRNG passwords rarely carry — consistent class-cycling patterns. vibetell's core metric, the Same-Class Transition rate (SCT), is built to measure exactly this. For the full methodology, see the paper.

G7$kL9#mQ2&xP4!wN8@v

Entropy

6.49 bits/char · 99% of theoretical max Strong ✓

KeePass (Simplified)

128.5 bits · pool 94 × length 20 Very strong ✓

zxcvbn

Score 4/4 · centuries to crack offline Very strong ✓

vibetell

SCT 0.000 · LLR +19.41 (class mix matches LLM profile) LLM_LIKELY ✗

Apparent entropy ~105 bits what strength meters see

→

Minus class structure ~92 bits assumption of rigid cycling

→

With known biases ~73 bits in reality, it's even less secure.

Signal distribution

How LLM and random passwords score

LLM-generated CSPRNG

← more LLM-like more random →

The two distributions barely overlap. LLMs cycle character types so rigidly that the vast majority of their passwords have zero same-class adjacent pairs, pulling the entire distribution to the left. Toggle to multi-layered to add other signals that catch LLM-generated credentials harder to spot with structure alone.

False positive rate

How often vibetell flags a genuinely random password

Fixed length · 1M passwords <0.001% Fewer than 1 in 100,000 genuine random passwords flagged as LLM_LIKELY.

Mixed lengths 12–128 · 1M passwords <0.001% Same result across a wide range of lengths.

When vibetell flags a password as LLM_LIKELY, almost no genuine random passwords are caught in that net — fewer than 1 in 100,000. This low false positive rate makes vibetell a practical component of a larger credential auditing pipeline.

Why should I care?

LLMs are embedded in the tools that write your code — Copilot, Claude Code, ChatGPT. If a coding agent generates credentials for a .env file or a service configuration on its own, the result looks strong by every conventional measure. A report by security lab Irregular ↗ (2025) estimated an LLM-generated password carries roughly 27 bits of realistic entropy despite appearing to have ~98 bits — a gap large enough to make brute-force feasible. An attacker who guesses a credential was LLM-generated can apply a mask brute-force attack and crack it in mere hours. vibetell is the first tool to detect whether a credential seems LLM-generated.

How is vibetell useful?

A new verification axis. Existing password strength tools measure one dimension: can this credential be guessed by a rule-based attack? None ask whether it has an anomalous structure. These are orthogonal questions. vibetell adds the missing axis to password strength assessment.
Credential auditing. Scan codebases and config files for credentials silently generated by AI agents — ones that pass conventional strength checks but are actually weak. This can be an integration to tools like trufflehog.
Verifying CSPRNG delegation. LLMs sometimes generate credentials directly instead of delegating to a secure generator as instructed. There is now a way to confirm that a key-generation task actually delegated to a secure generator.
Breach forensics. Knowing whether a leaked credential is structurally consistent with LLM generation can narrow down how it was created and what else the same system may have produced. Preliminary testing suggests model-specific quirks exist beyond the universal bias — particular character preferences and structural templates that differ by model family — an area of ongoing research.
Research and education. A live demonstration of a measurable, reproducible structural failure mode in autoregressive generation — and a concrete illustration of why high apparent entropy and actual strength are not the same thing.

Can vibetell detect passwords from any LLM?

Yes. The structural signal (SCT) is parameter-free — it measures deviation from a mathematical baseline, not a fitted profile. The vocabulary signal (LLR) was built from Claude and GPT output but fires on models outside that set. Our most likely hypothesis is that it's not capturing model-specific quirks — it's capturing what instruction-tuned autoregressive generation does to character preferences in general. In the same sense that no one would say zxcvbn is "fitted to specific humans" because it was built on human password data, vibetell isn't "fitted to specific LLMs".

Why does this work on a single credential?

vibetell isn't measuring randomness — it's detecting indicators of autoregressive generation. One sample is enough when you know what pattern to look for, the same way malware signatures or EXIF metadata work. Most LLM passwords carry a specific, measurable structural fingerprint that genuinely random passwords almost never produce by chance.

Does this only apply to passwords?

The tool uses "password" throughout, but the detection applies to any gibberish looking credential — API keys, secret tokens, .env values, signing keys, and so on. The structural bias is a property of how LLMs generate character sequences, not of how those sequences are used. If an LLM produced it, the fingerprint is there regardless of what it's called.

Why doesn't vibetell analyze passwords containing words or recognizable patterns?

vibetell is designed specifically for gibberish credentials — strings that look random to the eye. Passwords built from words, phrases, or word-plus-number combinations occupy a completely different structural space and require different detection methods. More importantly, they're already caught by existing tools: zxcvbn and similar analyzers are excellent at identifying dictionary words, keyboard walks, and predictable substitutions. The blind spot vibetell fills is the credential that defeats all of those checks — pure gibberish that scores maximum entropy everywhere yet was produced by an LLM.

What does INCONCLUSIVE mean?

No indicators of autoregressive generation were found. It does not mean the password is random or safe — the tool detects specific patterns, and their absence is honest silence, not a certificate of randomness.

Why LLM_POSSIBLE instead of LLM_LIKELY?

LLM_LIKELY requires both signals to agree. LLM_POSSIBLE means one fired without the other — usually the structure is LLM-like but the character choices don't match expected vocabulary, or vice versa. It's a real signal, not a near-miss.

Could a genuine random password get flagged?

Yes, but rarely. At LLM_LIKELY fewer than 1 in 100,000 genuinely random passwords are flagged — a threshold deliberately tuned for precision, so that a LIKELY verdict can be acted on directly. At LLM_POSSIBLE the net is wider by design, catching more LLM passwords at the cost of a higher false positive rate of about 3 in 1,000. This is a rough lower bound.

Does it work on short passwords?

Detection degrades below 16 characters — fewer adjacent pairs means less structural signal. The tool stays conservative rather than false-alarming: at length 12, the FPR at LLM_LIKELY is still near zero. The minimum supported length is 12 characters. Below that, the tool returns a null verdict rather than guess.

What if someone tries to evade detection?

The realistic threat is AI coding agents generating credentials silently, with no evasion intent. For deliberate evasion, the simplest path is just calling secrets.token_urlsafe() — which vibetell correctly classifies as random, so the problem solves itself. In our testing, explicitly instructing models to avoid the pattern didn't produce independence.

Will this still work as models improve?

In our testing, the bias has been observed across different architectures, parameter scales, and labs. We don't have a clear answer to what would fix it short of training models to delegate credential generation to CSPRNG. Until then, we expect vibetell to correctly classify LLM-generated secrets in the vast majority of cases.

Is my password sent anywhere?

No. All analysis runs entirely in your browser. No data leaves your device. You can disconnect your internet after loading the page and it would still work.