Detecting homograph attacks

Threats from 4t2.co users.

The main threats to detect:

  1. Homograph/IDN attacks: using Unicode lookalikes (е vs e, ο vs o, etc.) to impersonate trusted domains. This is exactly what whale-phish exploits.
  2. Punycode domains: legitimate-looking URLs that are actually xn-- encoded internationalized domains.
  3. Known phishing/malware domains: checked against threat intel feeds.
  4. Suspicious patterns: brand names in subdomains/paths to hijack trust (e.g., paypal.evil.com), URL shorteners inside URL shorteners, newly registered domains, etc.
  5. Excessive redirects or shadowed domains.

I’ll build you a Python-based URL analysis module you can drop into your backend, covering all of these.

Now let me run the self-test to verify it works:

Done

The homograph/confusable and punycode detection is working perfectly. The score display shows ? for the remaining rows because after 5 blocked attempts from the same test user ID, the ban manager kicks in (which is correct behavior!). Let me fix the test IDs and the deprecation warnings:

Edited 3 files, ran a command

Fix test to use unique user IDs per test case and fix datetime deprecation warningsurl_safety_checker.py

+21-21

Fix is_banned to use timezone-aware datetimeurl_safety_checker.py

+6-6

Fix test to use unique user IDs per test to avoid hitting the ban limit mid-testurl_safety_checker.py

+9-9

Re-run with fixes appliedScript

Done

All working cleanly. Let me do two small tweaks — javascript/data URIs should be caught by the exotic scheme check before the parser, and excessive subdomains should block at score 35 (adjusting threshold):

Edited 2 files, ran a command

Run exotic scheme check before parsing so JS/data URIs get the right scoreurl_safety_checker.py

+5-2

Remove duplicate exotic scheme check lower in the methodurl_safety_checker.py

+1-4

Final test runScript

Done

All 14 test cases behaving correctly. Clean URLs score 0, dangerous URLs are blocked with explanations.

Done

Here’s a breakdown of what the module catches and how to use it:

Detection layers (scored 0–100, blocked at ≥50):

  • Homograph/mixed-script attacks — detects Cyrillic е next to Latin a, Greek ο next to Latin characters, etc. (the core whale-phish technique). Scores 85.
  • Visually confusable characters — a hardcoded map of ~100+ lookalikes (fullwidth ASCII, Cyrillic, Greek) directly matched against the hostname. Scores 80.
  • Punycode/IDN labels (xn--) — flags these and attempts to decode and recheck. Scores 60+.
  • Brand impersonation — 50+ targeted brands checked; flags them appearing in subdomains of unrelated domains. Scores 60.
  • Shortener-in-shortener — prevents redirect chain abuse via bit.ly, tinyurl, etc. Scores 55.
  • Private/loopback IP hosts — SSRF protection. Scores 90.
  • Credentials embedded in URLsuser:pass@host. Scores 70.
  • Double/null-byte encoding — evasion techniques. Scores 50–90.
  • Newline injection — HTTP header injection risk. Scores 90.
  • High-entropy paths — signals generated phishing URLs (warning only).
  • Suspicious TLDs (.tk, .xyz, etc.) — warning + 15 points.

Drop-in usage:

python

guard = ShortenerGuard()

decision = guard.evaluate(user_id, submitted_url)
if decision["blocked"]:
    return 403, decision["message"]

To connect to Google Safe Browsing / VirusTotal, pass an external_reputation_fn to ShortenerGuard and call check_async() instead.

Recommended next steps:

  1. Back UserBanManager with Redis (INCR + EXPIRE keys) for persistence across restarts
  2. Hook up Google Safe Browsing API (free tier, 10k/day) for known phishing domain lookup
  3. Consider logging all warnings (non-blocked) to a separate queue for manual review—those are borderline cases worth a human eye

claude.ai
“Glad it’s useful! Good luck with 4t2.co—phishing via URL shorteners is a genuinely nasty attack vector, and it’s great you’re taking it seriously. If you ever want to extend it (e.g., Safe Browsing integration, a Redis-backed ban store, or an admin dashboard to review flagged URLs), just shout.” So that can be done easily enough; just send this page and the file to Claude to save using more tokens.