Examples
LLM →
Random →
⚠ Contains a recognizable token ≥5 chars — credential is potentially a hybrid/passphrase. If it is, vibetell's analysis does not apply.
⚠ This password is shorter than 12 characters. vibetell requires at least 12 characters for reliable analysis.
Conventional Strength Metrics — what existing tools say
Entropy
Measures how unpredictable the character choices are. Higher means more varied characters. But a password that perfectly alternates character types can still score maximum entropy — it doesn't check ordering patterns.
Character-frequency information content
KeePass (Simplified)
Estimates strength based on what types of characters are used and how long the password is. It doesn't look at the order of characters — so a perfectly alternating LLM password scores the same as a truly random one.
Pool-size × length bit estimate
zxcvbn
Estimates how long it would take to crack a password by looking for common patterns like dictionary words, keyboard walks, and dates. It's great at catching human patterns, but sees LLM passwords as random noise and rates them as strong.
Pattern-matching strength estimator
(beta)
Enter a credential and press Analyze.
vibetell checks whether a credential's
pattern is consistent with autoregressive generation.
Signal Strength
Shows how many detection signals fired and how strongly they agree. More filled boxes = more evidence. This is not a probability — it shows what was measured, not how likely a conclusion is.
SCT Rate
How often two neighboring characters are the same type (uppercase, lowercase, digit, symbol). LLMs almost never repeat the same type back-to-back (~0.9%), while truly random passwords do it ~28% of the time.
threshold < 0.024
E[SCT]
What the SCT rate should be if the characters were randomly shuffled, based on this password's own mix of character types. Random passwords land close to this number; LLM passwords fall far below it.
expected under randomness
Delta (Δ)
The gap between the actual SCT and what's expected. Negative means the password avoids repeating character types more than chance would explain. LLMs average around −0.21; random passwords hover near zero.
SCT − E[SCT]
Z-Score
How far the SCT falls below what's normal, measured in standard deviations. More negative = more unusual. Values below −2.2 are very rare for random passwords. Shown for context.
std. devs from baseline
LLR Total
Compares the specific characters used against what a random generator would pick. Positive means the character choices look more like an LLM's preferences than pure randomness.
threshold > 0
LLR Digits
How LLM-like the digit choices are. LLMs tend to favor 9, 2, 7, 4 and avoid 0 and 1. Positive means the digits lean toward LLM habits.
digit class component
LLR Letters
How LLM-like the letter choices are. LLMs tend to pick m, v, x, n, L, Q, K and avoid letters like i, o, I, O. Positive means the letters lean toward LLM habits.
letter class component
LLR Symbols
How LLM-like the symbol choices are. LLMs heavily favor #, $, @, ! and rarely use characters like ;, ], [, or backtick. Positive means the symbols lean toward LLM habits.
symbol class component
Soft Indicators
Extra context clues that don't affect the verdict. Rare symbols and repeated characters are more common in random passwords, but can appear in LLM output too. For longer passwords, repeats are expected by chance.
info only, no verdict effect
Max Run
The longest streak of consecutive characters of the same type. LLMs almost never go above 1 (they switch types every character). Random passwords average about 3, humans about 4.
longest same-class streak
Classes
How many character types are present: uppercase, lowercase, digits, symbols. LLMs almost always use all 4. Random passwords and human-typed passwords sometimes use fewer.
U · l · D · S present
Active Path
Which combination of signals triggered the result. When both structure and vocabulary signals fire together, the evidence is strongest. A single signal alone is weaker but still noteworthy.
signals fired

Introduction

vibetell is an upcoming cli-tool designed to analyze credentials for indicators of LLM generation. It does not evaluate for cryptographic randomness; instead, it identifies signatures of LLM-generation that password strength tools often miss.

The blind spot

The gap of existing tools

Modern strength meters measure what characters appear, not how they are ordered, and LLM-generated credentials score highly on all of them. Autoregressive generation appears to leave a structural fingerprint that CSPRNG passwords rarely carry — consistent class-cycling patterns. vibetell's core metric, the Same-Class Transition rate (SCT), is built to measure exactly this. For the full methodology, see the paper.

G7$kL9#mQ2&xP4!wN8@v

Entropy
6.49 bits/char · 99% of theoretical max Strong ✓
KeePass (Simplified)
128.5 bits · pool 94 × length 20 Very strong ✓
zxcvbn
Score 4/4 · centuries to crack offline Very strong ✓
vibetell
SCT 0.000 · LLR +19.41 (class mix matches LLM profile) LLM_LIKELY ✗
Apparent entropy ~105 bits what strength meters see
Minus class structure ~92 bits assumption of rigid cycling
With known biases ~73 bits in reality, it's even less secure.

Signal distribution

How LLM and random passwords score

LLM-generated CSPRNG
← more LLM-like more random →

The two distributions barely overlap. LLMs cycle character types so rigidly that the vast majority of their passwords have zero same-class adjacent pairs, pulling the entire distribution to the left. Toggle to multi-layered to add other signals that catch LLM-generated credentials harder to spot with structure alone.

False positive rate

How often vibetell flags a genuinely random password

Fixed length · 1M passwords <0.001% Fewer than 1 in 100,000 genuine random passwords flagged as LLM_LIKELY.
Mixed lengths 12–128 · 1M passwords <0.001% Same result across a wide range of lengths.

When vibetell flags a password as LLM_LIKELY, almost no genuine random passwords are caught in that net — fewer than 1 in 100,000. This low false positive rate makes vibetell a practical component of a larger credential auditing pipeline.

Why should I care?

LLMs are embedded in the tools that write your code — Copilot, Claude Code, ChatGPT. If a coding agent generates credentials for a .env file or a service configuration on its own, the result looks strong by every conventional measure. A report by security lab Irregular ↗ (2025) estimated an LLM-generated password carries roughly 27 bits of realistic entropy despite appearing to have ~98 bits — a gap large enough to make brute-force feasible. An attacker who guesses a credential was LLM-generated can apply a mask brute-force attack and crack it in mere hours. vibetell is the first tool to detect whether a credential seems LLM-generated.

How is vibetell useful?

  1. A new verification axis. Existing password strength tools measure one dimension: can this credential be guessed by a rule-based attack? None ask whether it has an anomalous structure. These are orthogonal questions. vibetell adds the missing axis to password strength assessment.
  2. Credential auditing. Scan codebases and config files for credentials silently generated by AI agents — ones that pass conventional strength checks but are actually weak. This can be an integration to tools like trufflehog.
  3. Verifying CSPRNG delegation. LLMs sometimes generate credentials directly instead of delegating to a secure generator as instructed. There is now a way to confirm that a key-generation task actually delegated to a secure generator.
  4. Breach forensics. Knowing whether a leaked credential is structurally consistent with LLM generation can narrow down how it was created and what else the same system may have produced. Preliminary testing suggests model-specific quirks exist beyond the universal bias — particular character preferences and structural templates that differ by model family — an area of ongoing research.
  5. Research and education. A live demonstration of a measurable, reproducible structural failure mode in autoregressive generation — and a concrete illustration of why high apparent entropy and actual strength are not the same thing.

Can vibetell detect passwords from any LLM?

Yes. The structural signal (SCT) is parameter-free — it measures deviation from a mathematical baseline, not a fitted profile. The vocabulary signal (LLR) was built from Claude and GPT output but fires on models outside that set. Our most likely hypothesis is that it's not capturing model-specific quirks — it's capturing what instruction-tuned autoregressive generation does to character preferences in general. In the same sense that no one would say zxcvbn is "fitted to specific humans" because it was built on human password data, vibetell isn't "fitted to specific LLMs".

Why does this work on a single credential?

vibetell isn't measuring randomness — it's detecting indicators of autoregressive generation. One sample is enough when you know what pattern to look for, the same way malware signatures or EXIF metadata work. Most LLM passwords carry a specific, measurable structural fingerprint that genuinely random passwords almost never produce by chance.

Does this only apply to passwords?

The tool uses "password" throughout, but the detection applies to any gibberish looking credential — API keys, secret tokens, .env values, signing keys, and so on. The structural bias is a property of how LLMs generate character sequences, not of how those sequences are used. If an LLM produced it, the fingerprint is there regardless of what it's called.

Why doesn't vibetell analyze passwords containing words or recognizable patterns?

vibetell is designed specifically for gibberish credentials — strings that look random to the eye. Passwords built from words, phrases, or word-plus-number combinations occupy a completely different structural space and require different detection methods. More importantly, they're already caught by existing tools: zxcvbn and similar analyzers are excellent at identifying dictionary words, keyboard walks, and predictable substitutions. The blind spot vibetell fills is the credential that defeats all of those checks — pure gibberish that scores maximum entropy everywhere yet was produced by an LLM.

What does INCONCLUSIVE mean?

No indicators of autoregressive generation were found. It does not mean the password is random or safe — the tool detects specific patterns, and their absence is honest silence, not a certificate of randomness.

Why LLM_POSSIBLE instead of LLM_LIKELY?

LLM_LIKELY requires both signals to agree. LLM_POSSIBLE means one fired without the other — usually the structure is LLM-like but the character choices don't match expected vocabulary, or vice versa. It's a real signal, not a near-miss.

Could a genuine random password get flagged?

Yes, but rarely. At LLM_LIKELY fewer than 1 in 100,000 genuinely random passwords are flagged — a threshold deliberately tuned for precision, so that a LIKELY verdict can be acted on directly. At LLM_POSSIBLE the net is wider by design, catching more LLM passwords at the cost of a higher false positive rate of about 3 in 1,000. This is a rough lower bound.

Does it work on short passwords?

Detection degrades below 16 characters — fewer adjacent pairs means less structural signal. The tool stays conservative rather than false-alarming: at length 12, the FPR at LLM_LIKELY is still near zero. The minimum supported length is 12 characters. Below that, the tool returns a null verdict rather than guess.

What if someone tries to evade detection?

The realistic threat is AI coding agents generating credentials silently, with no evasion intent. For deliberate evasion, the simplest path is just calling secrets.token_urlsafe() — which vibetell correctly classifies as random, so the problem solves itself. In our testing, explicitly instructing models to avoid the pattern didn't produce independence.

Will this still work as models improve?

In our testing, the bias has been observed across different architectures, parameter scales, and labs. We don't have a clear answer to what would fix it short of training models to delegate credential generation to CSPRNG. Until then, we expect vibetell to correctly classify LLM-generated secrets in the vast majority of cases.

Is my password sent anywhere?

No. All analysis runs entirely in your browser. No data leaves your device. You can disconnect your internet after loading the page and it would still work.