Releases: bk86a/PostalCode2NUTS
v0.14.0
v0.13.0
What's Changed
Added
- Automated test suite (#25): 69 pytest tests covering
postal_patterns.py(preprocessing, tercet_map, extraction),data_loader.py(normalize functions, all 5 lookup tiers), and FastAPI endpoints (/lookup,/pattern,/health). CI now runs tests before publish. - Makefile (#24): standard targets for
lint,format,test,run,docker-build,docker-run. - Pre-commit hooks (#24): ruff lint + format via
.pre-commit-config.yaml. requirements-dev.txt(#22): dev/test dependencies (ruff, bandit, pip-audit, pytest).ruff formatCI check (#24): enforces consistent code formatting in CI.
Changed
- Centralized duplicated logic (#22):
normalize_country()replaces duplicate GR→EL blocks,_db_connection()context manager replaces 6 manual SQLite connect/close patterns,_build_result()helper replaces repetitive result dict construction across all lookup tiers. - Narrowed exception handling (#23): 9 bare
except Exceptionblocks indata_loader.pyreplaced with specific types (sqlite3.Error,httpx.RequestError,OSError,csv.Error, etc.). Silent catch inimport_estimates.pynow logs a message. - Return type hints added to
dispatch()and_rate_limit_handler()inmain.py. - Branch protection enabled on
main: required status checks + PR reviews.
Full Changelog: v0.12.0...v0.13.0
v0.12.0
What's Changed
Fixed
- MT regex (#14): separator between alpha prefix and digits is now optional (
MST1000accepted alongsideMST 1000andMST-1000). Previously, codes without a space failed regex extraction and fell to approximate matching with lower confidence.
Added
- Country-level majority-vote fallback: new Tier 4 in the lookup chain for countries where all postal codes map to the same NUTS1/NUTS2 but NUTS3 has a dominant winner. Returns
match_type: "approximate"with NUTS1/NUTS2 confidence 1.0 and NUTS3 confidence based on agreement ratio (capped at 0.80). Naturally captures MT (MT0/MT00/MT001 at ~77%). Digit-only MT codes like1043that previously returned 404 now get a valid approximate result.
Full Changelog: v0.11.0...v0.12.0
v0.11.0
Added
- FR CEDEX estimates (#8): ~511 French CEDEX postal codes (enterprise/university mail routing) added to
tercet_missing_codes.csvwith high-confidence département→NUTS3 mappings. - FR DOM-TOM estimates (#9): 15 French overseas territory postal codes (Guadeloupe, Martinique, Guyane, La Réunion, Mayotte) added with high-confidence mappings. French Polynesia (987xx) and New Caledonia (988xx) excluded — these are OCTs with no valid NUTS mapping.
- NL missing code estimates (#13): 8 Dutch postal codes for major cities (Amsterdam, The Hague, Utrecht, Maastricht, Arnhem, Apeldoorn, Zwolle) added with high-confidence mappings. Willemstad (3059) excluded — belongs to Curaçao, not the Netherlands.
v0.10.1 — Preprocessing order fix and regex relaxations
Fixes
- Preprocessing order: dot thousand-separator removal now runs before
.0stripping, so locale-formatted codes like13.000correctly become13000instead of13(regression from v0.10.0). - IE regex (#10): space between Eircode routing key and identifier is now optional —
D02X285accepted alongsideD02 X285. - PT regex (#12): space accepted as separator between digit groups —
1000 001alongside1000-001and1000001. - NO (#11): closed as already handled — all regexes are compiled with
re.IGNORECASEand input is uppercased before matching.
Backward compatibility
Fully backward compatible — all previously valid inputs continue to work. Only adds acceptance of additional input formats.
v0.10.0 — Input preprocessing for Excel artifacts
What's new
Generic input preprocessing for postal codes mangled by Excel, CSV exports, or database dumps. Three country-agnostic steps are applied automatically before regex matching:
| Step | Problem | Example | Result |
|---|---|---|---|
Strip .0 suffix |
Excel stores numbers as floats | 28040.0 |
28040 |
| Remove dot thousands | Dot-as-thousand-separator formatting | 13.600 |
13600 |
| Restore leading zeros | Excel strips leading zeros from numbers | 8461 (ES) |
08461 |
This recovers an estimated 2,000–4,000 additional postal code mappings from real-world datasets without any changes to the curated regex patterns.
New metadata: expected_digits
A new expected_digits field in postal_patterns.json enables country-aware leading-zero restoration for 30 countries with fixed-length all-numeric postal codes. Countries with non-numeric formats (IE, MT, NL) are excluded.
Backward compatibility
- Fully backward compatible — correctly formatted postal codes pass through preprocessing unchanged
- No regex patterns were modified
- No API contract changes
Files changed
| File | Change |
|---|---|
app/postal_patterns.py |
New _preprocess() function, updated extract_postal_code() |
app/postal_patterns.json |
Added expected_digits to 30 country entries |
app/__init__.py |
Version bump to 0.10.0 |
CHANGELOG.md |
New 0.10.0 entry |
README.md |
Documented preprocessing steps and expected_digits |
v0.9.0
Added
- NUTS region names in
/lookupresponses:nuts1_name,nuts2_name,nuts3_namefields provide human-readable region names (Latin script) alongside NUTS codes. Names are sourced from the GISCO NUTS CSV distribution. total_nuts_namesfield in/healthendpoint showing how many region names are loaded.- NUTS names are cached in the SQLite DB (
nuts_namestable) for fast restarts.
Notes
- Backward compatible: name fields default to
nullwhen names are unavailable. Existing clients that ignore unknown fields are unaffected. - Graceful degradation: if the NUTS names CSV cannot be downloaded, all name fields are
nullbut lookups continue to work normally. Pre-0.9.0 SQLite caches (without thenuts_namestable) remain fully valid.
v0.8.0 — Single-NUTS3 country fallback
What's Changed
Added a fourth lookup tier that guarantees a NUTS mapping for countries where the entire territory falls under a single NUTS3 region.
How it works
At startup, the service detects countries where every TERCET postal code maps to the same NUTS3 code. When tiers 1–3 (exact, estimated, approximate) all fail for such a country, tier 4 returns the sole NUTS3 region with confidence 1.0.
This is data-driven — no hardcoded country list. If TERCET data changes (e.g. a country gains a second NUTS3 region), the fallback automatically stops applying.
Countries affected
| Country | NUTS3 | Effect |
|---|---|---|
| LI (Liechtenstein) | LI000 | All postal codes now resolve — previously 14–16 unmapped |
| CY (Cyprus) | CY000 | All postal codes now resolve — previously 32–97 unmapped |
| LU (Luxembourg) | LU000 | All postal codes now resolve — previously 21–116 unmapped |
Lookup tiers (updated)
- Exact — direct TERCET match → confidence 1.0
- Estimated — pre-computed from gap analysis → stored confidence
- Approximate — runtime prefix match + majority vote → calculated confidence
- Single-NUTS3 fallback (new) — country has one NUTS3 region → confidence 1.0
v0.7.4 — Complete Malta postal district coverage
What's Changed
Resolved all 8 previously unmapped Malta postal district codes using address data from the ORS and BM databases, verified against OpenStreetMap/Nominatim geocoding.
Background
The TERCET team confirmed that Malta uses only higher-level postal districts — the 2–3 letter area prefix (e.g. VLT in VLT 1010) is the sole determinant for NUTS mapping. The numeric portion is a delivery point within the district.
Resolved codes
All 8 turned out to be typos or transpositions of official MaltaPost codes:
| Code | Actual locality | Official code | Island | NUTS3 |
|---|---|---|---|---|
| ARM | Armier (Mellieha) | MLH | Malta | MT001 |
| CMR | Luqa (AFM HQ) | LQA | Malta | MT001 |
| FLR | Floriana | FRN | Malta | MT001 |
| HRM | Hamrun | HMR | Malta | MT001 |
| MEC | Guardamangia, Pietà | PTA | Malta | MT001 |
| OTP | Tigne Point, Sliema | TPO | Malta | MT001 |
| TNX | Tarxien | TXN | Malta | MT001 |
| VTL | Victoria, Gozo | VCT | Gozo | MT002 |
Impact
Malta now has zero unmapped postal codes — all 17 estimated entries have full NUTS1/2/3 mappings.