Skip to content

Releases: bk86a/PostalCode2NUTS

v0.14.0

03 Mar 13:09
4cd44c8

Choose a tag to compare

What's new

  • Add _meta block (version, date) to postal_patterns.json for change detection by external consumers
  • Surface patterns_version in /health endpoint response

Closes #34

v0.13.0

23 Feb 20:35
1c9163a

Choose a tag to compare

What's Changed

Added

  • Automated test suite (#25): 69 pytest tests covering postal_patterns.py (preprocessing, tercet_map, extraction), data_loader.py (normalize functions, all 5 lookup tiers), and FastAPI endpoints (/lookup, /pattern, /health). CI now runs tests before publish.
  • Makefile (#24): standard targets for lint, format, test, run, docker-build, docker-run.
  • Pre-commit hooks (#24): ruff lint + format via .pre-commit-config.yaml.
  • requirements-dev.txt (#22): dev/test dependencies (ruff, bandit, pip-audit, pytest).
  • ruff format CI check (#24): enforces consistent code formatting in CI.

Changed

  • Centralized duplicated logic (#22): normalize_country() replaces duplicate GR→EL blocks, _db_connection() context manager replaces 6 manual SQLite connect/close patterns, _build_result() helper replaces repetitive result dict construction across all lookup tiers.
  • Narrowed exception handling (#23): 9 bare except Exception blocks in data_loader.py replaced with specific types (sqlite3.Error, httpx.RequestError, OSError, csv.Error, etc.). Silent catch in import_estimates.py now logs a message.
  • Return type hints added to dispatch() and _rate_limit_handler() in main.py.
  • Branch protection enabled on main: required status checks + PR reviews.

Full Changelog: v0.12.0...v0.13.0

v0.12.0

23 Feb 19:21
31c1be8

Choose a tag to compare

What's Changed

Fixed

  • MT regex (#14): separator between alpha prefix and digits is now optional (MST1000 accepted alongside MST 1000 and MST-1000). Previously, codes without a space failed regex extraction and fell to approximate matching with lower confidence.

Added

  • Country-level majority-vote fallback: new Tier 4 in the lookup chain for countries where all postal codes map to the same NUTS1/NUTS2 but NUTS3 has a dominant winner. Returns match_type: "approximate" with NUTS1/NUTS2 confidence 1.0 and NUTS3 confidence based on agreement ratio (capped at 0.80). Naturally captures MT (MT0/MT00/MT001 at ~77%). Digit-only MT codes like 1043 that previously returned 404 now get a valid approximate result.

Full Changelog: v0.11.0...v0.12.0

v0.11.0

23 Feb 19:14
7223267

Choose a tag to compare

Added

  • FR CEDEX estimates (#8): ~511 French CEDEX postal codes (enterprise/university mail routing) added to tercet_missing_codes.csv with high-confidence département→NUTS3 mappings.
  • FR DOM-TOM estimates (#9): 15 French overseas territory postal codes (Guadeloupe, Martinique, Guyane, La Réunion, Mayotte) added with high-confidence mappings. French Polynesia (987xx) and New Caledonia (988xx) excluded — these are OCTs with no valid NUTS mapping.
  • NL missing code estimates (#13): 8 Dutch postal codes for major cities (Amsterdam, The Hague, Utrecht, Maastricht, Arnhem, Apeldoorn, Zwolle) added with high-confidence mappings. Willemstad (3059) excluded — belongs to Curaçao, not the Netherlands.

v0.10.1 — Preprocessing order fix and regex relaxations

23 Feb 18:57
e023c59

Choose a tag to compare

Fixes

  • Preprocessing order: dot thousand-separator removal now runs before .0 stripping, so locale-formatted codes like 13.000 correctly become 13000 instead of 13 (regression from v0.10.0).
  • IE regex (#10): space between Eircode routing key and identifier is now optional — D02X285 accepted alongside D02 X285.
  • PT regex (#12): space accepted as separator between digit groups — 1000 001 alongside 1000-001 and 1000001.
  • NO (#11): closed as already handled — all regexes are compiled with re.IGNORECASE and input is uppercased before matching.

Backward compatibility

Fully backward compatible — all previously valid inputs continue to work. Only adds acceptance of additional input formats.

v0.10.0 — Input preprocessing for Excel artifacts

23 Feb 18:51
370514f

Choose a tag to compare

What's new

Generic input preprocessing for postal codes mangled by Excel, CSV exports, or database dumps. Three country-agnostic steps are applied automatically before regex matching:

Step Problem Example Result
Strip .0 suffix Excel stores numbers as floats 28040.0 28040
Remove dot thousands Dot-as-thousand-separator formatting 13.600 13600
Restore leading zeros Excel strips leading zeros from numbers 8461 (ES) 08461

This recovers an estimated 2,000–4,000 additional postal code mappings from real-world datasets without any changes to the curated regex patterns.

New metadata: expected_digits

A new expected_digits field in postal_patterns.json enables country-aware leading-zero restoration for 30 countries with fixed-length all-numeric postal codes. Countries with non-numeric formats (IE, MT, NL) are excluded.

Backward compatibility

  • Fully backward compatible — correctly formatted postal codes pass through preprocessing unchanged
  • No regex patterns were modified
  • No API contract changes

Files changed

File Change
app/postal_patterns.py New _preprocess() function, updated extract_postal_code()
app/postal_patterns.json Added expected_digits to 30 country entries
app/__init__.py Version bump to 0.10.0
CHANGELOG.md New 0.10.0 entry
README.md Documented preprocessing steps and expected_digits

Closes #16. Subsumes #15.

v0.9.0

20 Feb 08:57

Choose a tag to compare

Added

  • NUTS region names in /lookup responses: nuts1_name, nuts2_name, nuts3_name fields provide human-readable region names (Latin script) alongside NUTS codes. Names are sourced from the GISCO NUTS CSV distribution.
  • total_nuts_names field in /health endpoint showing how many region names are loaded.
  • NUTS names are cached in the SQLite DB (nuts_names table) for fast restarts.

Notes

  • Backward compatible: name fields default to null when names are unavailable. Existing clients that ignore unknown fields are unaffected.
  • Graceful degradation: if the NUTS names CSV cannot be downloaded, all name fields are null but lookups continue to work normally. Pre-0.9.0 SQLite caches (without the nuts_names table) remain fully valid.

v0.8.0 — Single-NUTS3 country fallback

19 Feb 14:31

Choose a tag to compare

What's Changed

Added a fourth lookup tier that guarantees a NUTS mapping for countries where the entire territory falls under a single NUTS3 region.

How it works

At startup, the service detects countries where every TERCET postal code maps to the same NUTS3 code. When tiers 1–3 (exact, estimated, approximate) all fail for such a country, tier 4 returns the sole NUTS3 region with confidence 1.0.

This is data-driven — no hardcoded country list. If TERCET data changes (e.g. a country gains a second NUTS3 region), the fallback automatically stops applying.

Countries affected

Country NUTS3 Effect
LI (Liechtenstein) LI000 All postal codes now resolve — previously 14–16 unmapped
CY (Cyprus) CY000 All postal codes now resolve — previously 32–97 unmapped
LU (Luxembourg) LU000 All postal codes now resolve — previously 21–116 unmapped

Lookup tiers (updated)

  1. Exact — direct TERCET match → confidence 1.0
  2. Estimated — pre-computed from gap analysis → stored confidence
  3. Approximate — runtime prefix match + majority vote → calculated confidence
  4. Single-NUTS3 fallback (new) — country has one NUTS3 region → confidence 1.0

v0.7.4 — Complete Malta postal district coverage

19 Feb 12:05

Choose a tag to compare

What's Changed

Resolved all 8 previously unmapped Malta postal district codes using address data from the ORS and BM databases, verified against OpenStreetMap/Nominatim geocoding.

Background

The TERCET team confirmed that Malta uses only higher-level postal districts — the 2–3 letter area prefix (e.g. VLT in VLT 1010) is the sole determinant for NUTS mapping. The numeric portion is a delivery point within the district.

Resolved codes

All 8 turned out to be typos or transpositions of official MaltaPost codes:

Code Actual locality Official code Island NUTS3
ARM Armier (Mellieha) MLH Malta MT001
CMR Luqa (AFM HQ) LQA Malta MT001
FLR Floriana FRN Malta MT001
HRM Hamrun HMR Malta MT001
MEC Guardamangia, Pietà PTA Malta MT001
OTP Tigne Point, Sliema TPO Malta MT001
TNX Tarxien TXN Malta MT001
VTL Victoria, Gozo VCT Gozo MT002

Impact

Malta now has zero unmapped postal codes — all 17 estimated entries have full NUTS1/2/3 mappings.

v0.7.3

18 Feb 21:10

Choose a tag to compare

What's Changed

  • Health endpoint uncached: /health now explicitly sets Cache-Control: no-cache, no-store so edge caches always pass through to origin for live status checks