VectoPulse is a proof-of-concept behavioral intelligence & forensic stylometry engine built for OSINT analysts. It helps correlate anonymous or pseudonymous text to identify mass misinformants, sock puppets, coordinated personas, and cross-platform actors by analyzing how someone writes โ not what they say. IPs can be spoofed. Accounts can be burned. VPNs are cheap. Typing behavior is not.
Disclaimer: VectoPulse produces probabilistic similarity signals, not identity confirmation. Results are suggestive and must never be treated as proof.
- ๐ง Cognitive Complexity Analysis (character-level Shannon entropy)
- ๐งฌ Structural DNA Mapping via trigram overlap
- โ๏ธ Punctuation Rhythm Profiling (subconscious symbol usage)
- ๐ Multilingual by Design (tested with EN, SQ, DE, AR, ZH)
- ๐ฅ๏ธ High-contrast, analyst-friendly Streamlit UI
- ๐ Offline processing โ no data exfiltration
# 1. Clone the repository
git clone https://github.com/korabkeqekolla/vectopulse.git
cd vectopulse
# 2. Install dependencies
pip install -r requirements.txt- Launch the application:
streamlit run vectopulse.py
# if you are on windows use:
python -m streamlit run vectopulse.py- Paste Reference Text (A) and Suspect Text (B) from any source:
- Forums
- Comment sections
- Paste sites
- Social media posts
- Run Behavioral Correlation to receive:
- Similarity Index (SI)
- Entropy comparison
- Punctuation alignment
- Shared structural trigrams
โ ๏ธ SI scores are directional indicators, not identity proof.
VectoPulse analyzes how a text is written rather than what is written. This allows analysts to detect behavioral patterns, cross-platform personas, coordinated accounts, and potential AI-generated content.
- Measures unpredictability of character sequences.
- Formula used: [ H(X) = -\sum_{i=1}^{n} P(x_i) \log_2 P(x_i) ] where (P(x_i)) is the probability of character (x_i) in the text.
- Human text shows unique entropy patterns.
- AI text often has smoother, more uniform entropy.
- Breaks text into 3-character sequences (trigrams).
- Overlap score calculated as: [ Trigram_Similarity(A,B) = \frac{|Trigrams(A) \cap Trigrams(B)|}{|Trigrams(A) \cup Trigrams(B)|} ]
- Humans exhibit repeated unconscious structures; AI tends to be more uniform.
- Analyzes frequency and placement of punctuation marks.
- Rhythm similarity computed as normalized frequency difference for each punctuation: [ PR_Similarity = 1 - \frac{|Freq_A - Freq_B|}{max(Freq_A, Freq_B)} ]
- Humans have subconscious punctuation habits.
- AI may use overly consistent or "correct" punctuation.
- Works across EN, SQ, DE, AR, ZH.
- Adjusts trigram and entropy calculations per language.
- Non-Latin scripts may have lower absolute similarity scores.
- Aggregates entropy, trigram overlap, and punctuation rhythm into a single score: [ SI = w_1Entropy_Similarity + w_2Trigram_Similarity + w_3*PR_Similarity ] where (w_1, w_2, w_3) are weighting coefficients.
- Provides a directional probability of behavioral similarity.
- Not a confirmation of identity.
- Human: uneven punctuation, unique sentence lengths, idiosyncratic trigram patterns.
- AI: uniform punctuation, consistent sentence length, smooth entropy.
- VectoPulse can indicate possible AI-generated or heavily templated text.
- All processing is done locally.
- No text leaves your machine.
- Input Reference Text (A) and Suspect Text (B).
- Compute entropy, trigrams, and punctuation rhythm.
- Output Similarity Index (SI).
- Analyst interprets directional signal for cross-platform linkage, coordinated activity, or AI text detection.
- Validated as a behavioral correlation PoC, not an attribution engine.
- Tested across 5 languages with controlled cases.
- Same-author samples consistently score higher similarity.
- Works best with:
- Medium-to-long texts
- Informal writing
- Repeated human interaction (comments, replies, forum posts)
- Short texts reduce signal strength.
- Topic overlap can inflate similarity.
- Similarity Index values not comparable across languages.
- Matches punctuation rhythms, entropy, and trigrams.
- Indicates potential cross-platform linkage.
- Reveals repeated linguistic micro-patterns.
- Suggests single operator or coordinated group.
- Detects structural DNA similarity and cognitive complexity.
- Supports persona continuity hypothesis.
VectoPulse can illustrate behavioral correlation in real investigations using historical cases like Ted Kaczynski (the Unabomber):
Scenario:
- Reference Text (A): Kaczynskiโs manifesto and known letters
- Suspect Text (B): Anonymous letters of unknown origin
Step 1 โ Cognitive Complexity (Entropy)
- Compute Shannon entropy of character distributions in both texts.
- High similarity indicates comparable cognitive complexity and writing style.
Step 2 โ Structural DNA Mapping (Trigram Overlap)
- Extract 3-character sequences (trigrams) from both texts.
- Compute overlap using: [ Trigram_Similarity(A,B) = \frac{|Trigrams(A) \cap Trigrams(B)|}{|Trigrams(A) \cup Trigrams(B)|} ]
- Identifies repeated micro-patterns unique to the author.
Step 3 โ Punctuation Rhythm Profiling
- Analyze punctuation frequency and placement using: [ PR_Similarity = 1 - \frac{|Freq_A - Freq_B|}{max(Freq_A,Freq_B)} ]
- Detects subconscious punctuation habits, e.g., dashes, semicolons, commas.
Step 4 โ Combined Similarity Index
- Weighted aggregation of entropy, trigram, and punctuation similarities: [ SI = w_1Entropy_Similarity + w_2Trigram_Similarity + w_3*PR_Similarity ]
- High SI suggests strong behavioral similarity between the texts.
Interpretation:
- Directional evidence supports the hypothesis that both texts may have been written by the same individual.
- Note: SI is a probabilistic indicator, not definitive proof.
Disclaimer:
- Used here for educational and illustrative purposes.
- VectoPulse complements other forensic evidence but does not replace legal or investigative procedures.
- Open research PoC
- Contributions welcome for:
- Additional behavioral signals
- Language-specific normalization
- Adversarial testing & evaluation
MIT License โ see LICENSE file.
Korab Keqekolla // KING KOBRA II