Skip to content

feat: add KR OHLCV ingest pipeline with scheduled precompute#181

Open
robin-watcha wants to merge 1 commit intomainfrom
codex/kr-ohlcv-ingest-schedule-only
Open

feat: add KR OHLCV ingest pipeline with scheduled precompute#181
robin-watcha wants to merge 1 commit intomainfrom
codex/kr-ohlcv-ingest-schedule-only

Conversation

@robin-watcha
Copy link
Collaborator

@robin-watcha robin-watcha commented Feb 22, 2026

Summary

  • add TimescaleDB KR OHLCV ingest foundation with 1m hypertable and 1h CAGG migrations
  • add scheduled KR precompute jobs/tasks for incremental and nightly sync
  • add ingestion services for exchange-aware upsert, quarantine handling, and trading-calendar session bounds
  • wire scheduler/taskiq modules and config flags needed for ingest-only rollout
  • add focused tests for migrations, precompute job flow, and timeseries store behavior

Test Plan

  • uv run pytest --no-cov tests/test_kr_cagg_migration.py tests/test_kr_ohlcv_precompute.py tests/test_kr_ohlcv_timeseries_store.py -q
  • uv run pytest --no-cov tests/test_mcp_server_tools.py::test_get_quote_korean_equity tests/test_routers.py::TestHealthRouter::test_health_check -q

Summary by CodeRabbit

  • New Features

    • Added Korean market OHLCV time-series data infrastructure with TimescaleDB support.
    • Implemented automated data precomputation for Korean market candlestick data (1-minute and 1-hour intervals).
    • Added support for multiple Korean trading routes with dual-write capabilities.
    • Introduced configuration flags for canary testing and data validation modes.
  • Chores

    • Enhanced data ingestion and storage with quarantine handling for invalid records.
    • Added periodic scheduled tasks for incremental and nightly data updates.

@coderabbitai
Copy link

coderabbitai bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

Introduces a comprehensive Korean OHLCV (Open-High-Low-Close-Volume) time-series infrastructure built on TimescaleDB. Includes three database migrations establishing tables and materialized views, new services for data normalization and storage, precomputation jobs with incremental and nightly scheduling, configuration flags for dual-route handling, and extensive test coverage for migrations, precomputation logic, and storage operations.

Changes

Cohort / File(s) Summary
Database Migrations
alembic/versions/c4d9e2a8f7b1_*, alembic/versions/c9e4f5b8a2d1_*, alembic/versions/d2f4a8c1b9e3_*
Creates TimescaleDB hypertables for KR market candles (1m and 1h), establishes continuous aggregate materialized views with exchange-specific logic and offsets, configures retention policies (30 days for 1m, 400 days for 1h), and sets up continuous aggregate policies with 5-minute scheduling. Includes data migration from generic to exchange-specific schemas and validation of route mappings.
Configuration & Startup
app/core/config.py, app/core/taskiq_broker.py
Adds four new KR OHLCV feature flags (dual-route support, canary symbols, v2 dual-write), includes field validator for symbol list normalization. Ensures TimescaleDB readiness at worker initialization.
Market Data Services
app/services/kr_trading_calendar.py, app/services/kr_ohlcv_timeseries_store.py, app/services/kr_ohlcv_metrics.py
Provides KRX trading calendar and session utilities with route/exchange normalization. Implements end-to-end OHLCV storage via async upserts with data normalization, quarantine handling for invalid records, continuous aggregate refresh triggering, and retrieval utilities. Lightweight metrics collection for fetch, upsert, and quarantine tracking.
Job Orchestration & Tasks
app/jobs/kr_ohlcv_precompute.py, app/tasks/kr_ohlcv_precompute_tasks.py, app/tasks/__init__.py
Defines incremental (5-min, weekday 08:00–20:59 KST) and nightly (02:25 KST) precompute workflows gathering symbols from multiple sources, resolving routes via dual-route canaries, fetching/aggregating intraday data from KIS, and upserting into time-series store. TaskIQ wrappers add error handling and logging; exports task module registration.
Tests
tests/test_kr_cagg_migration.py, tests/test_kr_ohlcv_precompute.py, tests/test_kr_ohlcv_timeseries_store.py
Validates Alembic migration SQL generation (policy recreation, route validation, bigint conversions), precompute orchestration logic (bootstrap/incremental modes, route resolution, session bounds filtering), and storage service behavior (timezone handling, quarantine, fetch operations).

Sequence Diagram(s)

sequenceDiagram
    participant Job as KR OHLCV<br/>Precompute Job
    participant Store as kr_ohlcv<br/>timeseries_store
    participant DB as TimescaleDB<br/>(market_candles_1m_kr)
    participant CAgg as Continuous<br/>Aggregate<br/>(market_candles_1h_kr)
    participant KIS as KIS<br/>API Client

    Job->>Store: ensure_timescale_ready()
    Store->>DB: Check extension + tables exist
    DB-->>Store: OK
    
    Job->>KIS: Fetch intraday data<br/>(symbol, route, day)
    KIS-->>Job: OHLCV frame (minute bars)
    
    Job->>Job: Aggregate & normalize<br/>minute data
    
    Job->>Store: upsert_market_candles_1m()<br/>(symbol, frame, exchange, route)
    Store->>Store: Validate & normalize
    Store->>DB: INSERT/ON CONFLICT<br/>market_candles_1m_kr
    DB-->>Store: Upsert stats
    
    alt Data covers time range
        Job->>Store: refresh_market_candles_1h_kr()<br/>(start_ts, end_ts)
        Store->>CAgg: CALL refresh_continuous_aggregate()
        CAgg->>DB: Re-aggregate 1h buckets<br/>from market_candles_1m_kr
        DB-->>CAgg: Updated materialized view
    end
    
    Store-->>Job: Return stats (rows, min/max ts)
    Job-->>Job: Return status dict
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A timely hop through TimescaleDB's door,
Korean candles now forevermore!
With buckets of data and paths so divine,
Precomputed hourly, aggregates align!
Exchange-aware routes and quarantine care,
OHLCV magic floating through the air!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add KR OHLCV ingest pipeline with scheduled precompute' clearly and concisely describes the primary change: adding a Korean OHLCV data ingestion pipeline with scheduled precomputation jobs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/kr-ohlcv-ingest-schedule-only

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Feb 22, 2026

PR Review: feat: add KR OHLCV ingest pipeline with scheduled precompute

Overall this is a well-structured addition with good separation of concerns (jobs/services/tasks layers), feature-flag gating for incremental rollout, and a solid suite of unit tests. Below are the issues I found, roughly ordered by severity.


🔴 Bugs / Correctness Issues

1. Route/Exchange inconsistency: NXT vs NX

The second migration's fail-fast guard accepts routes 'J', 'NX', 'NXT' and maps both NX and NXT to exchange NXT. But _VALID_ROUTES in kr_ohlcv_timeseries_store.py is {"J", "NX"}"NXT" is absent. Similarly _ROUTE_TO_EXCHANGE in kr_trading_calendar.py only has "J" and "NX". Any row arriving with route "NXT" will be quarantined as invalid_route instead of being stored. If KIS ever returns the string "NXT" for NXT securities this silently loses data.

2. fetch_previous_close_before_bucket minute fallback queries market_candles_1m_kr instead of market_candles_1m_kr_v2

After the v2 migration market_candles_1m_kr may be stale. The minute fallback path in kr_ohlcv_timeseries_store.py still queries the old table. If the v2 write-path becomes the source of truth, this fallback will return outdated values.

3. Time string comparison for pagination termination

In _page_kr_intraday_day:

next_end_time = (oldest - datetime.timedelta(minutes=1)).strftime("%H%M%S")
if next_end_time < session_start:
    break
if next_end_time == end_time:
    break

session_start is also a %H%M%S string so the < comparison works lexicographically — this is fragile and only correct because the format zero-pads. More importantly, the next_end_time == end_time guard can loop infinitely if the API returns exactly one candle whose timestamp minus one minute equals the current end_time. Convert to datetime objects for comparisons.


🟠 Design / Architecture Issues

4. Three migrations instead of one: schema churn

Migrations c4d9e2a8f7b1c9e4f5b8a2d1d2f4a8c1b9e3 represent iterative design that landed in the codebase:

  • Migration 1 creates market_candles_1m (generic, with market/route columns)
  • Migration 2 creates market_candles_1m_kr (KR-specific, with exchange column), backfills from migration 1, rebuilds the CAGG — but never drops market_candles_1m
  • Migration 3 creates market_candles_1m_kr_v2 (BIGINT prices), backfills from migration 2

market_candles_1m is never dropped, leaving a dead hypertable. If this is a new pipeline the three migrations could be squashed into one that goes directly to the v2 schema, avoiding the dead table and the confusing version trail.

5. ensure_timescale_ready as a hard startup blocker

taskiq_broker.py now calls ensure_timescale_ready() in WorkerInitMiddleware.startup() unconditionally. Any environment without TimescaleDB (plain Postgres, CI, local dev without Docker) will fail to start the taskiq worker entirely. The function already supports allow_test_bypass but only checks ENVIRONMENT == "test". Consider also checking whether the KR OHLCV feature is enabled (KR_OHLCV_DUAL_ROUTE_ENABLED or a new flag) so the guard is opt-in.

6. Per-symbol CAGG refresh calls

_sync_symbol_minutes calls refresh_market_candles_1h_kr once per symbol. For 100 symbols this is 100 separate SELECT refresh_continuous_aggregate(...) calls per job run. These can be batched: collect (min_ts, max_ts) across all symbols in the caller (run_kr_ohlcv_incremental_precompute) and issue a single refresh spanning the union of all touched ranges at the end.

7. Millisecond offset hack for exchange priority in CAGG

first(open, ts + CASE WHEN exchange = 'NXT' THEN INTERVAL '1 millisecond' ELSE INTERVAL '0' END) AS open

This relies on no two candles from the same symbol being within 1ms of each other. Since candles are floored to the minute in _normalize_minute_frame, this won't hold if KRX and NXT both have a candle at the same UTC minute: the tie-breaker is a phantom 1ms shift, not a real timestamp difference. A real priority column (e.g., exchange_priority INTEGER) fed into first(open, exchange_priority) would be more explicit and not depend on timestamp resolution.


🟡 Code Quality Issues

8. Double exception handling in task wrapper (dead code)

run_kr_ohlcv_incremental_precompute (in kr_ohlcv_precompute.py) already catches all exceptions and returns {"status": "failed", ...}. The task wrapper in kr_ohlcv_precompute_tasks.py then wraps the same call in another try/except. The outer handler can never be reached because the inner one swallows all exceptions. Remove one of them.

9. In-memory metrics not usable in multi-worker deployments

kr_ohlcv_metrics.py uses a module-level Counter. In a multi-process taskiq deployment each worker has its own counter; there is no aggregation. Counters are also lost on restart. This is fine as a placeholder, but worth a comment noting the limitation and a TODO to wire into Prometheus/StatsD when ready.

10. KISClient() constructed without user credentials in _collect_kr_symbols

holdings = await KISClient().fetch_my_stocks()

If KISClient uses a single global credential set this is fine, but the function collects holdings for all active users then uses one client. If the system ever needs per-user KIS credentials this will silently fetch from only one account.

11. NXT post-close offset of 4h30m is very wide

_KRX_POSTCLOSE_OFFSET = datetime.timedelta(hours=4, minutes=30) extends the NXT session window until ~20:00 KST. This may ingest after-hours data that shouldn't be included in regular candles. Is this intentional? A comment explaining why 4.5 hours is the right boundary would help reviewers.

12. active_users variable fetched but not needed

In _collect_kr_symbols, active_users is fetched solely to gate the KIS holdings fetch:

if active_users:
    holdings = await KISClient().fetch_my_stocks()

The list itself is unused. The intent seems to be "only call KIS if there are active users", but the check is redundant given that the loop over holdings is a no-op when empty. Either remove the active_users query or explain why the guard is needed.


🟢 Positives

  • Feature-flag gating via KR_OHLCV_DUAL_ROUTE_ENABLED / KR_OHLCV_DUAL_ROUTE_CANARY_SYMBOLS is a good pattern for safe rollout
  • Quarantine table for invalid/unknown exchange data prevents silent data loss
  • All SQL uses parameterized queries via SQLAlchemy text() — no injection risk
  • Migration fail-fast guard (the PL/pgSQL RAISE EXCEPTION block) that aborts backfill if unexpected route data is present is excellent defensive practice
  • ensure_timescale_ready check at startup is better than discovering missing tables mid-job
  • Unit tests cover the critical paths (bootstrap vs incremental day selection, exchange routing, quarantine, CAGG refresh) without requiring a live database
  • The lru_cache on get_xkrx_calendar() is the right call for an expensive object

Summary

Area Finding
Correctness Route NXT/NX inconsistency may silently quarantine valid data
Correctness Minute fallback queries stale table post-v2 migration
Correctness Pagination loop termination uses fragile string comparison
Architecture Three migrations with a dead table; consider squashing
Architecture TimescaleDB startup blocker breaks non-TS environments
Architecture Per-symbol CAGG refresh should be batched
Architecture Millisecond priority hack is fragile
Quality Dead outer exception handler in task wrapper
Quality In-memory metrics siloed per-process

@codecov
Copy link

codecov bot commented Feb 22, 2026

Codecov Report

❌ Patch coverage is 66.38298% with 158 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.62%. Comparing base (0dae5fa) to head (d6febd0).
⚠️ Report is 178 commits behind head on main.

Files with missing lines Patch % Lines
app/services/kr_ohlcv_timeseries_store.py 61.67% 64 Missing ⚠️
app/jobs/kr_ohlcv_precompute.py 78.19% 41 Missing ⚠️
app/services/kr_trading_calendar.py 45.28% 29 Missing ⚠️
app/services/kr_ohlcv_metrics.py 62.96% 10 Missing ⚠️
app/tasks/kr_ohlcv_precompute_tasks.py 47.36% 10 Missing ⚠️
app/core/config.py 71.42% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #181       +/-   ##
===========================================
+ Coverage   53.11%   64.62%   +11.51%     
===========================================
  Files          82      177       +95     
  Lines       12530    22420     +9890     
===========================================
+ Hits         6655    14489     +7834     
- Misses       5875     7931     +2056     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@robin-watcha robin-watcha force-pushed the codex/kr-ohlcv-ingest-schedule-only branch from d6febd0 to 7178085 Compare February 23, 2026 02:27
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@claude
Copy link

claude bot commented Feb 23, 2026

PR Review: KR OHLCV Ingest Pipeline

Overall, this is a well-structured foundation for the TimescaleDB OHLCV ingest system. The exchange-priority trick for the CAGG (first/last with ts + 1ms offsets), the quarantine pattern, and the canary rollout flags are all solid design choices. Below are findings by severity.


🔴 Bugs / Correctness Issues

1. fetch_previous_close_before_bucket queries the wrong table

The minute-level fallback reads from market_candles_1m_kr (the intermediate NUMERIC-price table) instead of the final market_candles_1m_kr_v2 (BIGINT):

# kr_ohlcv_timeseries_store.py
minute_statement = text(
    """
    SELECT close
    FROM market_candles_1m_kr   ← should be market_candles_1m_kr_v2
    ...
    """
)

Once dual-write is disabled and data lives only in _v2, this fallback will silently return None for all symbols.

2. _VALID_ROUTES is inconsistent with migration backfill logic

# kr_ohlcv_timeseries_store.py
_VALID_ROUTES = {"J", "NX"}   # ← "NXT" is absent

The migration validation check accepts 'NXT' as a legacy route (NOT IN ('J', 'NX', 'NXT')) and maps it to exchange 'NXT'. If any row arrives with route='NXT' in the store, it's sent to quarantine as invalid_route, even though the migration accepted it. Document which routes are valid at which layer, or align the sets.

3. record_fetch_success called even when merged is empty

# jobs/kr_ohlcv_precompute.py  _sync_symbol_minutes()
record_fetch_success(route_market)   # ← called unconditionally
if merged.empty:
    continue

A non-trading day (or a day where the API returns nothing) will increment the success counter anyway. Move the metric call inside the not merged.empty branch, or introduce a separate record_fetch_empty counter.


🟠 Migration / Schema Issues

4. Legacy tables are never dropped

After three migrations, the schema contains three overlapping tables:

  • market_candles_1m – original generic table (migration 1)
  • market_candles_1m_kr – exchange-aware table with NUMERIC prices (migration 2)
  • market_candles_1m_kr_v2 – final table with BIGINT prices (migration 3, active)

Migrations 1 and 2's primary tables are superseded but not dropped. They waste storage and hypertable resources. If this is a fresh deployment (no production data yet), consider consolidating all three migrations into one. If not, add a follow-up migration that drops the unused tables after the cutover window.

5. Incomplete downgrade in migration 2

The downgrade for c9e4f5b8a2d1 recreates the CAGG from market_candles_1m_kr but never drops that table. After a downgrade -1 the schema has both the restored CAGG and the now-dangling market_candles_1m_kr table. The downgrade should also restore the CAGG to point at market_candles_1m (the migration-1 table) to be truly reversible.


🟡 Design / Quality Issues

6. Config key naming convention is inconsistent

New settings use SCREAMING_SNAKE_CASE while existing ones use lowercase:

# Existing
kis_ohlcv_cache_max_days: int = 400

# New
KR_OHLCV_DUAL_ROUTE_ENABLED: bool = False
KR_OHLCV_V2_DUAL_WRITE_ENABLED: bool = False

Pydantic BaseSettings treats env vars case-insensitively, but attribute-level consistency matters for readability and grep-ability. Align to the project's existing lowercase convention.

7. _resolve_route opens one DB session per symbol, per run

async def _resolve_route(symbol: str) -> list[str] | None:
    async with AsyncSessionLocal() as session:
        row = (await session.execute(...)).scalar_one_or_none()

With N symbols this creates N sequential round-trips. A single bulk query fetching all active symbols and their nxt_eligible flags in _collect_kr_symbols (or a separate helper) would reduce this to one query and likely shorten the incremental run significantly as the universe grows.

8. _collect_kr_symbols fetches active_users rows unnecessarily

active_users = list(
    (await session.execute(select(User.id).where(...).order_by(User.id)))
    .scalars().all()
)

This is used only as a truthiness check later (if active_users:). Use EXISTS or COUNT instead of materialising all user IDs.

9. In-memory metrics (kr_ohlcv_metrics.py) have no consumer

snapshot() and reset() are exported but nothing in the diff reads them. Multi-worker deployments will also see per-process counters rather than aggregate totals. Either wire snapshot() to an admin endpoint/health check route, or note the limitation in a comment and plan the integration as a follow-up.

10. Redundant exception handling in task layer

run_kr_ohlcv_incremental_precompute (the job function) already catches all exceptions and returns {"status": "failed", ...}. The task wrapper in kr_ohlcv_precompute_tasks.py adds another try/except around the same call that can never be triggered (the job never raises). Remove the outer handler or document why it's there.

11. ensure_timescale_ready during worker startup will crash the worker if TimescaleDB is absent

# taskiq_broker.py WorkerInitMiddleware.startup
await ensure_timescale_ready()   # no try/except

A development environment without TimescaleDB will fail to start the worker entirely, even if the OHLCV feature is disabled by flags. Consider guarding with a feature flag or converting the crash to a warning with graceful degradation.


🟢 Positive Notes

  • The exchange-priority trick (first(open, ts + CASE WHEN exchange='NXT' THEN 1ms END)) is clever and correct for combining KRX and NXT minute data into a unified hourly view.
  • Quarantine pattern is well-designed — invalid rows land in market_candles_ingest_quarantine with a typed reason field instead of silently failing.
  • Session-bound filtering via exchange_calendars is solid; weekend/holiday filtering will prevent ingesting off-hours noise.
  • The test suite covers the critical paths (bootstrap vs incremental day-count selection, quarantine routing, CAGG priority logic in SQL) with good use of monkeypatching instead of requiring a live DB.
  • Canary rollout flags (KR_OHLCV_DUAL_ROUTE_ENABLED, KR_OHLCV_DUAL_ROUTE_CANARY_SYMBOLS) provide a safe incremental path to enabling NXT dual-route ingestion.

Minor

  • _to_kst_timestamp in kr_trading_calendar.py uses value: object as its type hint — datetime.datetime | pd.Timestamp | str would be more informative.
  • Verify that exchange_calendars is listed in pyproject.toml (it's not visible in this diff).
  • The 30-day retention on market_candles_1m may be too short if backfills need to read raw minute data older than that window.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (9)
app/services/kr_ohlcv_metrics.py (1)

10-10: Thread-safety note on global Counter.

_counters[key] += value is not atomic even under CPython's GIL (it's a read-modify-write across multiple bytecode ops). For a lightweight in-process metrics counter this is acceptable, but be aware that concurrent async tasks sharing a thread won't race (single-threaded event loop), while multi-threaded usage (e.g., from TaskIQ workers in thread pools) could lose increments. If precision matters later, consider threading.Lock or an asyncio-aware accumulator.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_ohlcv_metrics.py` at line 10, The global Counter _counters is
not thread-safe because operations like _counters[key] += value are
read-modify-write; protect all mutations by adding a lock (e.g., create
_counters_lock = threading.Lock() at module scope) and wrap every
increment/decrement/clear of _counters in with _counters_lock: ... to ensure
atomic updates; if the module is used only in async single-threaded contexts
prefer an asyncio.Lock and await it around mutations instead, but do not leave
raw _counters[...] += ... unguarded.
tests/test_kr_cagg_migration.py (1)

44-48: next(...) without a default will raise StopIteration on mismatch.

If the migration SQL changes and the assertion target string no longer matches, next(stmt for stmt in statements if ...) on lines 44, 62, and 66 raises StopIteration instead of a clear test failure. Using next(..., None) with an explicit assert gives a better failure message.

Example
-    create_cagg_sql = next(
-        stmt
-        for stmt in statements
-        if "CREATE MATERIALIZED VIEW market_candles_1h_kr" in stmt
-    )
+    create_cagg_sql = next(
+        (stmt for stmt in statements
+         if "CREATE MATERIALIZED VIEW market_candles_1h_kr" in stmt),
+        None,
+    )
+    assert create_cagg_sql is not None, "Expected CREATE MATERIALIZED VIEW statement"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_cagg_migration.py` around lines 44 - 48, The test uses
next(...) without a default which raises StopIteration on mismatch; update each
usage (e.g., the generator that assigns create_cagg_sql and the similar
generators around lines where drop_cagg_sql and the replacement cagg are found)
to call next(<generator>, None) and then add an explicit assert that the result
is not None with a helpful message (e.g., "expected CREATE MATERIALIZED VIEW
market_candles_1h_kr in statements") so the test fails with a clear assertion
error instead of StopIteration.
app/core/config.py (1)

49-52: Naming convention inconsistency with adjacent fields.

The new fields use UPPER_CASE (KR_OHLCV_DUAL_ROUTE_ENABLED, etc.) while the immediately preceding OHLCV-related fields use lower_case (kis_ohlcv_cache_enabled, upbit_ohlcv_cache_enabled, etc.). Since case_sensitive=False in model_config, this won't break env-var loading, but it creates an inconsistent API surface when accessing settings.KR_OHLCV_DUAL_ROUTE_ENABLED vs. settings.kis_ohlcv_cache_enabled in Python code.

Also, per coding guidelines, ensure .env.example or equivalent documentation is updated to include these new environment variables.

As per coding guidelines: "Treat app/core/config.py as the source of truth for runtime settings and environment keys. Keep root documentation and config examples synchronized when adding or renaming settings in app/core/config.py."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/config.py` around lines 49 - 52, Rename the new settings to follow
the existing lower_snake_case convention used by neighboring fields (e.g.,
change KR_OHLCV_DUAL_ROUTE_ENABLED, KR_OHLCV_DUAL_ROUTE_CANARY_SYMBOLS,
KR_OHLCV_DUAL_ROUTE_CANARY_ALL, KR_OHLCV_V2_DUAL_WRITE_ENABLED to
kis_ohlcv_dual_route_enabled, kis_ohlcv_dual_route_canary_symbols,
kis_ohlcv_dual_route_canary_all, kis_ohlcv_v2_dual_write_enabled or another
consistent lower_snake_case prefix matching kis_/upbit_ names), update all
references in the codebase to the new symbols (search for the four uppercase
names), and add the corresponding environment variable keys and descriptions to
.env.example or the project config docs so runtime environment docs stay in sync
with app/core/config.py.
app/core/taskiq_broker.py (1)

9-9: Service-specific import in core broker module.

Importing ensure_timescale_ready from app.services.kr_ohlcv_timeseries_store creates a coupling between the core broker entrypoint and a service-specific module. If this import fails (e.g., missing dependency), all broker processes fail to start. Consider lazy-importing within the guarded block if the feature-flag approach is adopted.

Based on learnings: "Do not add service-specific orchestration logic into core utility modules in app/core/."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/taskiq_broker.py` at line 9, The top-level import of
ensure_timescale_ready from app.services.kr_ohlcv_timeseries_store couples the
core broker and can break broker startup if that service or its deps are
missing; remove the module-level import and instead perform a lazy import inside
the guarded startup path where ensure_timescale_ready is invoked (e.g., inside
the broker startup/ready hook or the feature-flag branch), wrapping the import
and call in try/except to log failures and continue so broker processes don’t
crash; reference ensure_timescale_ready and the broker startup/ready function to
locate where to move the import.
alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py (1)

62-103: Large data migration runs in a single transaction.

The INSERT INTO market_candles_1m_kr_v2 ... SELECT FROM market_candles_1m_kr can be extremely slow and hold a long-running transaction on large datasets, potentially causing lock contention and WAL bloat. Consider whether batching (e.g., via a PL/pgSQL loop with LIMIT/OFFSET or time-range windows) is warranted based on the expected row count.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py` around
lines 62 - 103, The migration currently performs a single large INSERT ...
SELECT via op.execute which can hold a long transaction; change it to migrate in
batches by iterating over ranges (e.g., time windows or primary-key ranges) and
performing repeated smaller INSERT INTO market_candles_1m_kr_v2 ... SELECT FROM
market_candles_1m_kr operations (or use a PL/pgSQL loop with LIMIT/OFFSET)
committing between batches to avoid long locks and WAL bloat; update the
migration function that calls op.execute and implement a bounded-loop batching
strategy using the ts (or PK) column to select slices, ensure each batch uses
the same ROUND(...)::BIGINT transformations and ON CONFLICT handling, and test
for idempotence and resumeability.
app/services/kr_ohlcv_timeseries_store.py (1)

44-91: Extract the repeated empty-frame construction into a private helper.

The same 7-column empty pd.DataFrame is constructed in three separate return paths (lines 46–56, 66–76, and 81–91). An _empty_minute_frame() helper (parallel to the existing _empty_hourly_frame) would eliminate the duplication.

♻️ Proposed refactor
+_MINUTE_COLUMNS = ["datetime", "open", "high", "low", "close", "volume", "value"]
+
+
+def _empty_minute_frame() -> pd.DataFrame:
+    return pd.DataFrame(columns=_MINUTE_COLUMNS)
+
+
 def _normalize_minute_frame(frame: pd.DataFrame) -> pd.DataFrame:
     if frame.empty:
-        return pd.DataFrame(
-            columns=[
-                "datetime",
-                "open",
-                "high",
-                "low",
-                "close",
-                "volume",
-                "value",
-            ]
-        )
+        return _empty_minute_frame()

     normalized = frame.copy()
     if "datetime" not in normalized.columns:
         if "date" in normalized.columns and "time" in normalized.columns:
             normalized["datetime"] = pd.to_datetime(...)
         else:
-            return pd.DataFrame(columns=["datetime", "open", "high", "low", "close", "volume", "value"])
+            return _empty_minute_frame()

     ...
     normalized = normalized.dropna(subset=["datetime"]).copy()
     if normalized.empty:
-        return pd.DataFrame(columns=["datetime", "open", "high", "low", "close", "volume", "value"])
+        return _empty_minute_frame()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_ohlcv_timeseries_store.py` around lines 44 - 91, Extract the
repeated construction of the 7-column empty DataFrame into a new private helper
named _empty_minute_frame(), mirroring the pattern used by _empty_hourly_frame;
implement _empty_minute_frame() to return a pd.DataFrame with columns
["datetime","open","high","low","close","volume","value"] and then replace the
three inline return pd.DataFrame(...) occurrences inside _normalize_minute_frame
with calls to _empty_minute_frame(); ensure _normalize_minute_frame still
imports/uses pd and that behavior is unchanged.
tests/test_kr_ohlcv_precompute.py (1)

102-121: Extract shared test helpers to conftest.py to eliminate duplication.

_ScalarResult, _DummySession, and _DummySessionManager are defined identically here (lines 102–121) and in tests/test_kr_ohlcv_timeseries_store.py (lines 12–48). Move them to tests/conftest.py as fixtures or module-level helpers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_ohlcv_precompute.py` around lines 102 - 121, The three
duplicated test helper classes (_ScalarResult, _DummySession,
_DummySessionManager) should be extracted into tests/conftest.py as shared
helpers/fixtures; create module-level helpers (or async fixtures) that return
the same behavior used here—_ScalarResult with scalar_one_or_none returning the
stored value, _DummySession.execute returning an instance of _ScalarResult
wrapping a Row-like object (with is_active and nxt_eligible), and
_DummySessionManager implementing async __aenter__/__aexit__ to yield
_DummySession—and update both tests (tests/test_kr_ohlcv_precompute.py and
tests/test_kr_ohlcv_timeseries_store.py) to import or use those fixtures instead
of redefining the classes.
tests/test_kr_ohlcv_timeseries_store.py (1)

115-126: Missing test for the third ensure_timescale_ready check — CAGG absent.

ensure_timescale_ready raises RuntimeError("market_candles_1h_kr continuous aggregate is missing") when the CAGG lookup returns None, but there is no test covering this path. Only the extension-missing and minute-table-missing paths are exercised.

✅ Proposed additional test
`@pytest.mark.unit`
`@pytest.mark.asyncio`
async def test_ensure_timescale_ready_fails_when_cagg_missing(monkeypatch):
    monkeypatch.setattr(settings, "ENVIRONMENT", "production", raising=False)
    dummy = _DummySession(values=["timescaledb", "public.market_candles_1m_kr", None])
    monkeypatch.setattr(
        kr_ohlcv_timeseries_store,
        "AsyncSessionLocal",
        lambda: _DummySessionManager(dummy),
    )

    with pytest.raises(RuntimeError, match="market_candles_1h_kr continuous aggregate is missing"):
        await kr_ohlcv_timeseries_store.ensure_timescale_ready(allow_test_bypass=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_ohlcv_timeseries_store.py` around lines 115 - 126, Add a unit
test that covers the CAGG-missing branch in ensure_timescale_ready: create a
test similar to the existing ones that sets settings.ENVIRONMENT to
"production", uses _DummySession(values=["timescaledb",
"public.market_candles_1m_kr", None]) and monkeypatches
kr_ohlcv_timeseries_store.AsyncSessionLocal to return
_DummySessionManager(dummy), then assert that await
kr_ohlcv_timeseries_store.ensure_timescale_ready(allow_test_bypass=False) raises
RuntimeError with message "market_candles_1h_kr continuous aggregate is
missing". Ensure the test uses pytest.mark.asyncio (and pytest.mark.unit if you
want consistency).
app/tasks/kr_ohlcv_precompute_tasks.py (1)

14-51: The try/except blocks in both task wrappers are dead code.

run_kr_ohlcv_incremental_precompute and run_kr_ohlcv_nightly_precompute already wrap their entire body in a broad except Exception and return a {"status": "failed", ...} dict — they never raise. The outer try/except in these task wrappers will never execute.

Either remove the redundant guards, or (preferably) strip the inner except from the job functions and let the task layer be the sole error boundary.

♻️ Simplify task wrappers (option A — remove dead guard)
 `@broker.task`(
     task_name="ohlcv.kr.precompute.incremental",
     schedule=[{"cron": "*/5 8-20 * * 1-5", "cron_offset": "Asia/Seoul"}],
 )
 async def run_kr_ohlcv_incremental_precompute_task() -> dict[str, int | str]:
-    try:
-        return await run_kr_ohlcv_incremental_precompute()
-    except Exception as exc:
-        logger.error(
-            "TaskIQ KR OHLCV incremental precompute failed: %s",
-            exc,
-            exc_info=True,
-        )
-        return {
-            "status": "failed",
-            "mode": "incremental",
-            "error": str(exc),
-        }
+    return await run_kr_ohlcv_incremental_precompute()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/tasks/kr_ohlcv_precompute_tasks.py` around lines 14 - 51, The outer
try/except in the task wrappers run_kr_ohlcv_incremental_precompute_task and
run_kr_ohlcv_nightly_precompute_task is dead because
run_kr_ohlcv_incremental_precompute and run_kr_ohlcv_nightly_precompute already
catch exceptions and return failure dicts; either remove the outer try/except
blocks from the task functions to simplify them, or (preferred) remove the inner
broad except handlers inside run_kr_ohlcv_incremental_precompute and
run_kr_ohlcv_nightly_precompute so the task wrappers become the single error
boundary that logs exceptions and returns the failure dicts (update only those
functions named above to keep behavior consistent).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@alembic/versions/c4d9e2a8f7b1_add_timescale_kr_ohlcv_tables.py`:
- Around line 14-31: The migration creates market_candles_1m with PRIMARY KEY
(market, symbol, ts) but omits the route column, causing rows from different
routes/exchanges with the same timestamp to be dropped; update the migration to
include route (or exchange) in the primary key (i.e., PRIMARY KEY (market,
symbol, route, ts)) or add a composite unique constraint that includes route so
multi-route data is preserved—modify the CREATE TABLE statement for
market_candles_1m in the migration (and any related index/constraint statements)
to reference the route column in the PK/unique key to avoid silent data loss.

In
`@alembic/versions/c9e4f5b8a2d1_rebuild_kr_hour_cagg_with_exchange_priority.py`:
- Around line 184-240: The downgrade() currently recreates the continuous
aggregate but does not remove the backing table created in upgrade
(market_candles_1m_kr); update downgrade() to either drop the table explicitly
(e.g., issue a DROP TABLE IF EXISTS for market_candles_1m_kr after removing
policies and before finishing) or, if keeping the table is intentional, add a
clear comment in downgrade() explaining that market_candles_1m_kr is preserved
on downgrade; reference the downgrade() function and the market_candles_1m_kr
table name when making the change.

In `@alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py`:
- Around line 169-176: The downgrade SQL call using op.execute currently invokes
remove_continuous_aggregate_policy('market_candles_1h_kr_v2', if_not_exists =>
TRUE) which uses the wrong parameter name; update the SQL string in this
migration (and other occurrences) to call
remove_continuous_aggregate_policy('market_candles_1h_kr_v2', if_exists => TRUE)
so TimescaleDB 2.8+ recognizes the argument; locate the op.execute calls that
reference remove_continuous_aggregate_policy and replace any if_not_exists =>
TRUE with if_exists => TRUE.

In `@app/core/taskiq_broker.py`:
- Around line 15-16: The startup method calls ensure_timescale_ready()
unconditionally which makes TimescaleDB required for all broker processes;
update app/core/taskiq_broker.py so startup only invokes
ensure_timescale_ready() when KR is enabled and/or for processes that need
it—e.g., check settings.KR_OHLCV_DUAL_ROUTE_ENABLED (or a dedicated feature
flag) and the process type via is_worker_process/is_scheduler_process before
calling ensure_timescale_ready(); alternatively move this orchestration out of
the core module into a higher-level startup hook so non-KR brokers aren't
blocked.

In `@app/jobs/kr_ohlcv_precompute.py`:
- Around line 251-254: The call to record_fetch_success(route_market) is
executed regardless of whether any data was fetched, inflating the success
metric; update the logic in the loop that handles fetched data (the block
referencing merged, merged.empty, and route_market) so that
record_fetch_success(route_market) is only called when merged is not empty
(i.e., move the call below the if merged.empty: continue guard or add a
conditional that checks not merged.empty before invoking record_fetch_success).
Ensure you only change placement/guarding around record_fetch_success and keep
the existing continue behavior intact.

In `@app/services/kr_trading_calendar.py`:
- Around line 14-28: normalize_route currently rejects the string "NXT" because
_ROUTE_TO_EXCHANGE only contains keys "J" and "NX", while migration c9e4f5b8a2d1
treats "NXT" as a valid route; update _ROUTE_TO_EXCHANGE to include the mapping
"NXT": "NXT" so normalize_route("NXT") succeeds (or alternatively make the
migration stricter), ensuring the change is applied to the _ROUTE_TO_EXCHANGE
dict used by normalize_route.

In `@tests/test_kr_cagg_migration.py`:
- Line 22: Add the pytest unit marker to the test functions by importing pytest
if not present and decorating the target functions (e.g.,
test_cagg_sql_policy_recreated and the other test at line 76) with
`@pytest.mark.unit` so they are categorized as unit tests; ensure the decorator
uses the registered marker name "unit" and apply it immediately above each test
function definition.

In `@tests/test_kr_ohlcv_precompute.py`:
- Around line 109-113: The _DummySession.execute stub has the wrong signature
and will raise TypeError when callers pass params; update the method signature
of _DummySession.execute to accept params=None (e.g., async def execute(self,
_statement, params=None)) and ignore or forward params as needed so it matches
the real session contract used by _resolve_route and mirrors the pattern in
test_kr_ohlcv_timeseries_store.py; keep return of _ScalarResult(...) unchanged.
- Around line 12-45: Add the pytest marker `@pytest.mark.unit` above each test
function in this file (for example above
test_incremental_bootstraps_new_symbols_with_7_days) so the tests are
discoverable by filtered runs; place the decorator immediately before the async
def, e.g. add `@pytest.mark.unit` above
test_incremental_bootstraps_new_symbols_with_7_days and do the same for the
other test functions in tests/test_kr_ohlcv_precompute.py.

In `@tests/test_kr_ohlcv_timeseries_store.py`:
- Around line 85-267: All nine test functions lack a pytest marker; add
`@pytest.mark.unit` above each test definition
(test_ensure_timescale_ready_bypasses_in_test_env,
test_ensure_timescale_ready_fails_when_extension_missing,
test_ensure_timescale_ready_fails_when_minute_table_missing,
test_upsert_market_candles_1m_returns_zero_for_empty_frame,
test_upsert_market_candles_1m_invalid_exchange_to_quarantine,
test_frame_from_hour_rows_builds_dataframe,
test_fetch_market_candles_1h_kr_returns_empty_when_range_invalid,
test_fetch_previous_close_before_bucket_prefers_hour_cagg,
test_fetch_previous_close_before_bucket_uses_minute_fallback_tiebreak) so they
are discoverable by marker-filtered runs; ensure pytest is imported and place
the decorator directly above each async/def test function.

---

Duplicate comments:
In `@alembic/versions/c4d9e2a8f7b1_add_timescale_kr_ohlcv_tables.py`:
- Around line 104-131: The downgrade() block is using the wrong keyword for the
Timescale removal calls; change the SQL argument from if_not_exists => TRUE to
if_exists => TRUE for each remove call (specifically in the
remove_continuous_aggregate_policy('market_candles_1h_kr'),
remove_retention_policy('market_candles_1h_kr'), and
remove_retention_policy('market_candles_1m') statements) so the functions use
if_exists => TRUE instead of if_not_exists => TRUE.

---

Nitpick comments:
In `@alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py`:
- Around line 62-103: The migration currently performs a single large INSERT ...
SELECT via op.execute which can hold a long transaction; change it to migrate in
batches by iterating over ranges (e.g., time windows or primary-key ranges) and
performing repeated smaller INSERT INTO market_candles_1m_kr_v2 ... SELECT FROM
market_candles_1m_kr operations (or use a PL/pgSQL loop with LIMIT/OFFSET)
committing between batches to avoid long locks and WAL bloat; update the
migration function that calls op.execute and implement a bounded-loop batching
strategy using the ts (or PK) column to select slices, ensure each batch uses
the same ROUND(...)::BIGINT transformations and ON CONFLICT handling, and test
for idempotence and resumeability.

In `@app/core/config.py`:
- Around line 49-52: Rename the new settings to follow the existing
lower_snake_case convention used by neighboring fields (e.g., change
KR_OHLCV_DUAL_ROUTE_ENABLED, KR_OHLCV_DUAL_ROUTE_CANARY_SYMBOLS,
KR_OHLCV_DUAL_ROUTE_CANARY_ALL, KR_OHLCV_V2_DUAL_WRITE_ENABLED to
kis_ohlcv_dual_route_enabled, kis_ohlcv_dual_route_canary_symbols,
kis_ohlcv_dual_route_canary_all, kis_ohlcv_v2_dual_write_enabled or another
consistent lower_snake_case prefix matching kis_/upbit_ names), update all
references in the codebase to the new symbols (search for the four uppercase
names), and add the corresponding environment variable keys and descriptions to
.env.example or the project config docs so runtime environment docs stay in sync
with app/core/config.py.

In `@app/core/taskiq_broker.py`:
- Line 9: The top-level import of ensure_timescale_ready from
app.services.kr_ohlcv_timeseries_store couples the core broker and can break
broker startup if that service or its deps are missing; remove the module-level
import and instead perform a lazy import inside the guarded startup path where
ensure_timescale_ready is invoked (e.g., inside the broker startup/ready hook or
the feature-flag branch), wrapping the import and call in try/except to log
failures and continue so broker processes don’t crash; reference
ensure_timescale_ready and the broker startup/ready function to locate where to
move the import.

In `@app/services/kr_ohlcv_metrics.py`:
- Line 10: The global Counter _counters is not thread-safe because operations
like _counters[key] += value are read-modify-write; protect all mutations by
adding a lock (e.g., create _counters_lock = threading.Lock() at module scope)
and wrap every increment/decrement/clear of _counters in with _counters_lock:
... to ensure atomic updates; if the module is used only in async
single-threaded contexts prefer an asyncio.Lock and await it around mutations
instead, but do not leave raw _counters[...] += ... unguarded.

In `@app/services/kr_ohlcv_timeseries_store.py`:
- Around line 44-91: Extract the repeated construction of the 7-column empty
DataFrame into a new private helper named _empty_minute_frame(), mirroring the
pattern used by _empty_hourly_frame; implement _empty_minute_frame() to return a
pd.DataFrame with columns
["datetime","open","high","low","close","volume","value"] and then replace the
three inline return pd.DataFrame(...) occurrences inside _normalize_minute_frame
with calls to _empty_minute_frame(); ensure _normalize_minute_frame still
imports/uses pd and that behavior is unchanged.

In `@app/tasks/kr_ohlcv_precompute_tasks.py`:
- Around line 14-51: The outer try/except in the task wrappers
run_kr_ohlcv_incremental_precompute_task and
run_kr_ohlcv_nightly_precompute_task is dead because
run_kr_ohlcv_incremental_precompute and run_kr_ohlcv_nightly_precompute already
catch exceptions and return failure dicts; either remove the outer try/except
blocks from the task functions to simplify them, or (preferred) remove the inner
broad except handlers inside run_kr_ohlcv_incremental_precompute and
run_kr_ohlcv_nightly_precompute so the task wrappers become the single error
boundary that logs exceptions and returns the failure dicts (update only those
functions named above to keep behavior consistent).

In `@tests/test_kr_cagg_migration.py`:
- Around line 44-48: The test uses next(...) without a default which raises
StopIteration on mismatch; update each usage (e.g., the generator that assigns
create_cagg_sql and the similar generators around lines where drop_cagg_sql and
the replacement cagg are found) to call next(<generator>, None) and then add an
explicit assert that the result is not None with a helpful message (e.g.,
"expected CREATE MATERIALIZED VIEW market_candles_1h_kr in statements") so the
test fails with a clear assertion error instead of StopIteration.

In `@tests/test_kr_ohlcv_precompute.py`:
- Around line 102-121: The three duplicated test helper classes (_ScalarResult,
_DummySession, _DummySessionManager) should be extracted into tests/conftest.py
as shared helpers/fixtures; create module-level helpers (or async fixtures) that
return the same behavior used here—_ScalarResult with scalar_one_or_none
returning the stored value, _DummySession.execute returning an instance of
_ScalarResult wrapping a Row-like object (with is_active and nxt_eligible), and
_DummySessionManager implementing async __aenter__/__aexit__ to yield
_DummySession—and update both tests (tests/test_kr_ohlcv_precompute.py and
tests/test_kr_ohlcv_timeseries_store.py) to import or use those fixtures instead
of redefining the classes.

In `@tests/test_kr_ohlcv_timeseries_store.py`:
- Around line 115-126: Add a unit test that covers the CAGG-missing branch in
ensure_timescale_ready: create a test similar to the existing ones that sets
settings.ENVIRONMENT to "production", uses _DummySession(values=["timescaledb",
"public.market_candles_1m_kr", None]) and monkeypatches
kr_ohlcv_timeseries_store.AsyncSessionLocal to return
_DummySessionManager(dummy), then assert that await
kr_ohlcv_timeseries_store.ensure_timescale_ready(allow_test_bypass=False) raises
RuntimeError with message "market_candles_1h_kr continuous aggregate is
missing". Ensure the test uses pytest.mark.asyncio (and pytest.mark.unit if you
want consistency).

Comment on lines +14 to +31
CREATE TABLE IF NOT EXISTS market_candles_1m (
market TEXT NOT NULL,
symbol TEXT NOT NULL,
ts TIMESTAMPTZ NOT NULL,
open NUMERIC(20, 6) NOT NULL,
high NUMERIC(20, 6) NOT NULL,
low NUMERIC(20, 6) NOT NULL,
close NUMERIC(20, 6) NOT NULL,
volume BIGINT NOT NULL DEFAULT 0,
value BIGINT NOT NULL DEFAULT 0,
source TEXT NOT NULL DEFAULT 'kis',
route TEXT NOT NULL,
fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (market, symbol, ts)
)
"""
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Primary key on (market, symbol, ts) silently drops multi-route data.

The route column is NOT NULL but excluded from the primary key. If KRX and NXT both report the same symbol at the same timestamp, only one row survives the PK constraint, losing data from one exchange. This is resolved in the subsequent migration (c9e4f5b8a2d1) which creates market_candles_1m_kr with (exchange, symbol, ts) PK, but be aware of this data-loss window if ingest is running before all migrations complete.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@alembic/versions/c4d9e2a8f7b1_add_timescale_kr_ohlcv_tables.py` around lines
14 - 31, The migration creates market_candles_1m with PRIMARY KEY (market,
symbol, ts) but omits the route column, causing rows from different
routes/exchanges with the same timestamp to be dropped; update the migration to
include route (or exchange) in the primary key (i.e., PRIMARY KEY (market,
symbol, route, ts)) or add a composite unique constraint that includes route so
multi-route data is preserved—modify the CREATE TABLE statement for
market_candles_1m in the migration (and any related index/constraint statements)
to reference the route column in the PK/unique key to avoid silent data loss.

Comment on lines +184 to +240
def downgrade() -> None:
op.execute(
"""
SELECT remove_continuous_aggregate_policy(
'market_candles_1h_kr',
if_not_exists => TRUE
)
"""
)

op.execute("DROP MATERIALIZED VIEW IF EXISTS market_candles_1h_kr")

op.execute(
"""
CREATE MATERIALIZED VIEW market_candles_1h_kr
WITH (timescaledb.continuous) AS
SELECT
symbol,
time_bucket(INTERVAL '1 hour', ts, 'Asia/Seoul') AS bucket_start,
first(open, ts) AS open,
MAX(high) AS high,
MIN(low) AS low,
last(close, ts) AS close,
SUM(volume)::BIGINT AS volume,
SUM(value)::BIGINT AS value
FROM market_candles_1m_kr
WHERE exchange IN ('KRX', 'NXT')
GROUP BY symbol, bucket_start
WITH NO DATA
"""
)

op.execute(
"""
ALTER MATERIALIZED VIEW market_candles_1h_kr
SET (timescaledb.materialized_only = false)
"""
)

op.execute(
"""
CREATE UNIQUE INDEX IF NOT EXISTS ux_market_candles_1h_kr_symbol_bucket
ON market_candles_1h_kr (symbol, bucket_start)
"""
)

op.execute(
"""
SELECT add_continuous_aggregate_policy(
'market_candles_1h_kr',
start_offset => INTERVAL '8 days',
end_offset => INTERVAL '1 minute',
schedule_interval => INTERVAL '5 minutes',
if_not_exists => TRUE
)
"""
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Downgrade does not drop market_candles_1m_kr table.

The upgrade creates market_candles_1m_kr (line 12) and populates it, but the downgrade only recreates the CAGG view — it never drops market_candles_1m_kr. If the intent of downgrade is to fully revert, the table should be dropped. If it's intentionally left (to preserve data), consider documenting that choice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@alembic/versions/c9e4f5b8a2d1_rebuild_kr_hour_cagg_with_exchange_priority.py`
around lines 184 - 240, The downgrade() currently recreates the continuous
aggregate but does not remove the backing table created in upgrade
(market_candles_1m_kr); update downgrade() to either drop the table explicitly
(e.g., issue a DROP TABLE IF EXISTS for market_candles_1m_kr after removing
policies and before finishing) or, if keeping the table is intentional, add a
clear comment in downgrade() explaining that market_candles_1m_kr is preserved
on downgrade; reference the downgrade() function and the market_candles_1m_kr
table name when making the change.

Comment on lines +169 to +176
op.execute(
"""
SELECT remove_continuous_aggregate_policy(
'market_candles_1h_kr_v2',
if_not_exists => TRUE
)
"""
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

TimescaleDB remove_continuous_aggregate_policy if_exists parameter documentation

💡 Result:

TimescaleDB documents if_exists as an optional parameter to remove_continuous_aggregate_policy, with this signature:

  • remove_continuous_aggregate_policy(continuous_aggregate REGCLASS, if_exists BOOL = NULL) RETURNS VOID [1]

if_exists behavior (per docs):

  • When true: it prints a warning/notice instead of raising an error if the policy doesn’t exist (default is false) [1].
  • It was renamed in TimescaleDB 2.8 from if_not_exists to if_exists [1].

Example:

SELECT remove_continuous_aggregate_policy('cpu_view', if_exists => true);

Sources: Timescale API reference page for remove_continuous_aggregate_policy() [1].


🏁 Script executed:

# Check the actual migration file content at lines 169-176
cat -n alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py | sed -n '165,180p'

Repository: mgh3326/auto_trader

Length of output: 598


🏁 Script executed:

# Also check the other migration files mentioned for the same pattern
fd "\.py$" alembic/versions/ | head -20 | while read f; do
  if grep -l "remove_continuous_aggregate_policy" "$f" 2>/dev/null; then
    echo "=== $f ==="
    grep -n "if_not_exists\|if_exists" "$f" | head -5
  fi
done

Repository: mgh3326/auto_trader

Length of output: 984


🏁 Script executed:

# Check for TimescaleDB version constraints in the codebase
fd -e txt -e md -e toml -e yml -e yaml | xargs grep -l "timescaledb\|TimescaleDB" 2>/dev/null | head -10

Repository: mgh3326/auto_trader

Length of output: 45


🏁 Script executed:

# Check the context around remove_continuous_aggregate_policy calls to confirm they're for removal
grep -B 3 "remove_continuous_aggregate_policy" alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py | head -20

Repository: mgh3326/auto_trader

Length of output: 166


🏁 Script executed:

# Also check one of the other files for add_continuous_aggregate_policy to understand the pattern
grep -B 2 "add_continuous_aggregate_policy\|remove_continuous_aggregate_policy" alembic/versions/c4d9e2a8f7b1_add_timescale_kr_ohlcv_tables.py | head -30

Repository: mgh3326/auto_trader

Length of output: 220


Use if_exists => TRUE instead of if_not_exists => TRUE for remove_continuous_aggregate_policy.

TimescaleDB 2.8+ uses if_exists as the parameter for remove_continuous_aggregate_policy. The current code uses if_not_exists, which is incorrect and will cause the downgrade to fail since the parameter is not recognized.

This issue affects multiple migration files (c4d9e2a8f7b1, c9e4f5b8a2d1, d2f4a8c1b9e3) with 13+ occurrences total.

Proposed fix for this file
     op.execute(
         """
         SELECT remove_continuous_aggregate_policy(
             'market_candles_1h_kr_v2',
-            if_not_exists => TRUE
+            if_exists => TRUE
         )
         """
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
op.execute(
"""
SELECT remove_continuous_aggregate_policy(
'market_candles_1h_kr_v2',
if_not_exists => TRUE
)
"""
)
op.execute(
"""
SELECT remove_continuous_aggregate_policy(
'market_candles_1h_kr_v2',
if_exists => TRUE
)
"""
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@alembic/versions/d2f4a8c1b9e3_add_kr_quarantine_and_bigint_v2.py` around
lines 169 - 176, The downgrade SQL call using op.execute currently invokes
remove_continuous_aggregate_policy('market_candles_1h_kr_v2', if_not_exists =>
TRUE) which uses the wrong parameter name; update the SQL string in this
migration (and other occurrences) to call
remove_continuous_aggregate_policy('market_candles_1h_kr_v2', if_exists => TRUE)
so TimescaleDB 2.8+ recognizes the argument; locate the op.execute calls that
reference remove_continuous_aggregate_policy and replace any if_not_exists =>
TRUE with if_exists => TRUE.

Comment on lines 15 to +16
async def startup(self) -> None:
await ensure_timescale_ready()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unconditional ensure_timescale_ready() makes TimescaleDB a hard startup dependency for all broker processes.

This call runs before is_worker_process / is_scheduler_process branching, so every broker, worker, and scheduler process will fail to start if TimescaleDB isn't available or the KR tables haven't been migrated yet. This conflicts with the stated goal of an "ingest-only rollout" — existing non-KR workers would crash if deployed before the migrations run.

Consider gating this behind settings.KR_OHLCV_DUAL_ROUTE_ENABLED or a dedicated feature flag:

Proposed fix
     async def startup(self) -> None:
-        await ensure_timescale_ready()
+        if settings.KR_OHLCV_DUAL_ROUTE_ENABLED:
+            await ensure_timescale_ready()
         if self.broker.is_worker_process:

As per coding guidelines: "Do not add service-specific orchestration logic into core utility modules in app/core/."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/core/taskiq_broker.py` around lines 15 - 16, The startup method calls
ensure_timescale_ready() unconditionally which makes TimescaleDB required for
all broker processes; update app/core/taskiq_broker.py so startup only invokes
ensure_timescale_ready() when KR is enabled and/or for processes that need
it—e.g., check settings.KR_OHLCV_DUAL_ROUTE_ENABLED (or a dedicated feature
flag) and the process type via is_worker_process/is_scheduler_process before
calling ensure_timescale_ready(); alternatively move this orchestration out of
the core module into a higher-level startup hook so non-KR brokers aren't
blocked.

Comment on lines +251 to +254

record_fetch_success(route_market)
if merged.empty:
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

record_fetch_success is called even when merged is empty (no data fetched).

The success metric is incremented unconditionally at line 252, before the if merged.empty: continue guard. An API call that returns no data in-session (e.g., a holiday or data gap) is still recorded as a fetch success, which inflates the counter.

🐛 Proposed fix
-        record_fetch_success(route_market)
         if merged.empty:
             continue
+        record_fetch_success(route_market)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/jobs/kr_ohlcv_precompute.py` around lines 251 - 254, The call to
record_fetch_success(route_market) is executed regardless of whether any data
was fetched, inflating the success metric; update the logic in the loop that
handles fetched data (the block referencing merged, merged.empty, and
route_market) so that record_fetch_success(route_market) is only called when
merged is not empty (i.e., move the call below the if merged.empty: continue
guard or add a conditional that checks not merged.empty before invoking
record_fetch_success). Ensure you only change placement/guarding around
record_fetch_success and keep the existing continue behavior intact.

Comment on lines +14 to +28
_ROUTE_TO_EXCHANGE = {
"J": "KRX",
"NX": "NXT",
}
_EXCHANGE_TO_ROUTE = {
"KRX": "J",
"NXT": "NX",
}


def normalize_route(route: str) -> str:
normalized = str(route or "").strip().upper()
if normalized not in _ROUTE_TO_EXCHANGE:
raise ValueError(f"Unsupported KR route: {route}")
return normalized
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the migration file c9e4f5b8a2d1
find . -type f -name "*c9e4f5b8a2d1*" -o -name "*migration*" | head -20

Repository: mgh3326/auto_trader

Length of output: 383


🏁 Script executed:

# Search for the migration content and normalize_route usage
rg -n "c9e4f5b8a2d1|normalize_route" --type=py -C2

Repository: mgh3326/auto_trader

Length of output: 2662


🏁 Script executed:

# Look for 'NXT' route usage in the codebase
rg -n "'NXT'|\"NXT\"" --type=py -C2

Repository: mgh3326/auto_trader

Length of output: 7847


🏁 Script executed:

# Get full context of kr_trading_calendar.py
cat -n app/services/kr_trading_calendar.py

Repository: mgh3326/auto_trader

Length of output: 3224


normalize_route rejects 'NXT' but the migration accepts it as a valid route.

In migration c9e4f5b8a2d1, route values 'J', 'NX', and 'NXT' are all accepted (lines 50, 81, 96). Both 'NX' and 'NXT' are treated as equivalent routes mapping to the 'NXT' exchange. However, _ROUTE_TO_EXCHANGE only maps 'J' and 'NX', so calling normalize_route("NXT") raises ValueError. If upstream data carries 'NXT' as a route value, this breaks session-bound lookups.

Add 'NXT''NXT' to _ROUTE_TO_EXCHANGE to align with the migration's route handling, or document and enforce that only 'J' and 'NX' are canonical route identifiers while the migration must be updated to reject 'NXT'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_trading_calendar.py` around lines 14 - 28, normalize_route
currently rejects the string "NXT" because _ROUTE_TO_EXCHANGE only contains keys
"J" and "NX", while migration c9e4f5b8a2d1 treats "NXT" as a valid route; update
_ROUTE_TO_EXCHANGE to include the mapping "NXT": "NXT" so normalize_route("NXT")
succeeds (or alternatively make the migration stricter), ensuring the change is
applied to the _ROUTE_TO_EXCHANGE dict used by normalize_route.

return module


def test_cagg_sql_policy_recreated(monkeypatch):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing pytest markers on test functions.

Both test functions lack the required @pytest.mark.unit marker. Since these tests perform no I/O and only verify SQL string emission, they should be categorized as unit tests.

Proposed fix
+import pytest
+
+
+@pytest.mark.unit
 def test_cagg_sql_policy_recreated(monkeypatch):
+@pytest.mark.unit
 def test_v2_migration_creates_bigint_tables_and_policy(monkeypatch):

As per coding guidelines: "Use test markers (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow) to categorize tests appropriately" and "Use only registered pytest markers (slow, integration, unit) when strict markers are enabled."

Also applies to: 76-76

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_cagg_migration.py` at line 22, Add the pytest unit marker to
the test functions by importing pytest if not present and decorating the target
functions (e.g., test_cagg_sql_policy_recreated and the other test at line 76)
with `@pytest.mark.unit` so they are categorized as unit tests; ensure the
decorator uses the registered marker name "unit" and apply it immediately above
each test function definition.

Comment on lines +12 to +45
@pytest.mark.asyncio
async def test_incremental_bootstraps_new_symbols_with_7_days(monkeypatch):
monkeypatch.setattr(
kr_ohlcv_precompute.kr_ohlcv_timeseries_store,
"ensure_timescale_ready",
AsyncMock(return_value=None),
)
monkeypatch.setattr(
kr_ohlcv_precompute,
"_collect_kr_symbols",
AsyncMock(return_value={"005930", "000660"}),
)

latest_bucket_mock = AsyncMock(
side_effect=[None, datetime.datetime(2026, 2, 19, 9, 0)]
)
monkeypatch.setattr(
kr_ohlcv_precompute.kr_ohlcv_timeseries_store,
"fetch_latest_hourly_bucket",
latest_bucket_mock,
)

sync_mock = AsyncMock(return_value={"status": "completed", "rows": 12})
monkeypatch.setattr(kr_ohlcv_precompute, "_sync_symbol_minutes", sync_mock)

result = await kr_ohlcv_precompute.run_kr_ohlcv_incremental_precompute()

assert result["status"] == "completed"
assert result["mode"] == "incremental"
assert result["symbols"] == 2
assert result["bootstrapped"] == 1

called_days = [call.args[1] for call in sync_mock.await_args_list]
assert sorted(called_days) == [1, 7]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add @pytest.mark.unit markers to all test functions.

Same issue as tests/test_kr_ohlcv_timeseries_store.py — none of the five tests carry a marker, making them invisible to filtered pytest runs.

As per coding guidelines: "Use test markers (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow) to categorize tests appropriately."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_ohlcv_precompute.py` around lines 12 - 45, Add the pytest
marker `@pytest.mark.unit` above each test function in this file (for example
above test_incremental_bootstraps_new_symbols_with_7_days) so the tests are
discoverable by filtered runs; place the decorator immediately before the async
def, e.g. add `@pytest.mark.unit` above
test_incremental_bootstraps_new_symbols_with_7_days and do the same for the
other test functions in tests/test_kr_ohlcv_precompute.py.

Comment on lines +109 to +113
class _DummySession:
async def execute(self, _statement):
return _ScalarResult(
type("Row", (), {"is_active": True, "nxt_eligible": True})
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

_DummySession.execute is missing the params=None parameter.

async def execute(self, _statement):   # ← no params argument

If _resolve_route (or any future call site) invokes session.execute(stmt, params) with a positional or keyword params argument, this will raise TypeError. The pattern used in test_kr_ohlcv_timeseries_store.py correctly includes params=None.

🐛 Proposed fix
-        async def execute(self, _statement):
+        async def execute(self, _statement, params=None):
+            del params
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_ohlcv_precompute.py` around lines 109 - 113, The
_DummySession.execute stub has the wrong signature and will raise TypeError when
callers pass params; update the method signature of _DummySession.execute to
accept params=None (e.g., async def execute(self, _statement, params=None)) and
ignore or forward params as needed so it matches the real session contract used
by _resolve_route and mirrors the pattern in test_kr_ohlcv_timeseries_store.py;
keep return of _ScalarResult(...) unchanged.

Comment on lines +85 to +267
@pytest.mark.asyncio
async def test_ensure_timescale_ready_bypasses_in_test_env(monkeypatch):
monkeypatch.setattr(settings, "ENVIRONMENT", "test", raising=False)

def _should_not_open_session():
raise AssertionError("AsyncSessionLocal should not be called in test env")

monkeypatch.setattr(
kr_ohlcv_timeseries_store,
"AsyncSessionLocal",
_should_not_open_session,
)

await kr_ohlcv_timeseries_store.ensure_timescale_ready()


@pytest.mark.asyncio
async def test_ensure_timescale_ready_fails_when_extension_missing(monkeypatch):
monkeypatch.setattr(settings, "ENVIRONMENT", "production", raising=False)
dummy = _DummySession(values=[None])
monkeypatch.setattr(
kr_ohlcv_timeseries_store,
"AsyncSessionLocal",
lambda: _DummySessionManager(dummy),
)

with pytest.raises(RuntimeError, match="TimescaleDB extension is not installed"):
await kr_ohlcv_timeseries_store.ensure_timescale_ready(allow_test_bypass=False)


@pytest.mark.asyncio
async def test_ensure_timescale_ready_fails_when_minute_table_missing(monkeypatch):
monkeypatch.setattr(settings, "ENVIRONMENT", "production", raising=False)
dummy = _DummySession(values=["timescaledb", None])
monkeypatch.setattr(
kr_ohlcv_timeseries_store,
"AsyncSessionLocal",
lambda: _DummySessionManager(dummy),
)

with pytest.raises(RuntimeError, match="market_candles_1m_kr table is missing"):
await kr_ohlcv_timeseries_store.ensure_timescale_ready(allow_test_bypass=False)


@pytest.mark.asyncio
async def test_upsert_market_candles_1m_returns_zero_for_empty_frame():
result = await kr_ohlcv_timeseries_store.upsert_market_candles_1m(
symbol="005930",
exchange="KRX",
route="J",
frame=pd.DataFrame(),
)

assert result == {
"rows": 0,
"min_ts": None,
"max_ts": None,
}


@pytest.mark.asyncio
async def test_upsert_market_candles_1m_invalid_exchange_to_quarantine(monkeypatch):
session = _CaptureWriteSession()
monkeypatch.setattr(
kr_ohlcv_timeseries_store,
"AsyncSessionLocal",
lambda: _DummySessionManager(session),
)

frame = pd.DataFrame(
[
{
"datetime": pd.Timestamp("2026-02-19 09:01:00"),
"open": 100.0,
"high": 101.0,
"low": 99.0,
"close": 100.5,
"volume": 100,
"value": 10050,
}
]
)

result = await kr_ohlcv_timeseries_store.upsert_market_candles_1m(
symbol="005930",
exchange="BADX",
route="J",
frame=frame,
)

assert result["rows"] == 0
assert result["quarantined_rows"] == 1
assert any(
"INSERT INTO market_candles_ingest_quarantine" in stmt
for stmt in session.statements
)


def test_frame_from_hour_rows_builds_dataframe():
frame = kr_ohlcv_timeseries_store.frame_from_hour_rows(
[
{
"datetime": pd.Timestamp("2026-02-19 09:00:00"),
"date": pd.Timestamp("2026-02-19").date(),
"time": pd.Timestamp("2026-02-19 09:00:00").time(),
"open": 100.0,
"high": 101.0,
"low": 99.0,
"close": 100.5,
"volume": 100,
"value": 10050,
}
]
)

assert len(frame) == 1
assert set(frame.columns) == {
"datetime",
"date",
"time",
"open",
"high",
"low",
"close",
"volume",
"value",
}


@pytest.mark.asyncio
async def test_fetch_market_candles_1h_kr_returns_empty_when_range_invalid():
start_bucket = pd.Timestamp("2026-02-19 12:00:00").to_pydatetime()
end_bucket = pd.Timestamp("2026-02-19 10:00:00").to_pydatetime()

frame = await kr_ohlcv_timeseries_store.fetch_market_candles_1h_kr(
symbol="005930",
start_bucket=start_bucket,
end_bucket=end_bucket,
)

assert frame.empty


@pytest.mark.asyncio
async def test_fetch_previous_close_before_bucket_prefers_hour_cagg(monkeypatch):
session = _RecordingSession(cagg_value=101.25, minute_value=88.0)
monkeypatch.setattr(
kr_ohlcv_timeseries_store,
"AsyncSessionLocal",
lambda: _DummySessionManager(session),
)

close = await kr_ohlcv_timeseries_store.fetch_previous_close_before_bucket(
symbol="005930",
before_bucket=datetime.datetime(2026, 2, 19, 10, 0),
)

assert close == 101.25
assert any("FROM market_candles_1h_kr" in sql for sql in session.statements)
assert not any("FROM market_candles_1m_kr" in sql for sql in session.statements)


@pytest.mark.asyncio
async def test_fetch_previous_close_before_bucket_uses_minute_fallback_tiebreak(
monkeypatch,
):
session = _RecordingSession(cagg_value=None, minute_value=77.7)
monkeypatch.setattr(
kr_ohlcv_timeseries_store,
"AsyncSessionLocal",
lambda: _DummySessionManager(session),
)

close = await kr_ohlcv_timeseries_store.fetch_previous_close_before_bucket(
symbol="005930",
before_bucket=datetime.datetime(2026, 2, 19, 10, 0),
)

assert close == 77.7
minute_sql = next(
sql for sql in session.statements if "FROM market_candles_1m_kr" in sql
)
assert "CASE WHEN exchange = 'KRX' THEN 1 ELSE 0 END DESC" in minute_sql
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add @pytest.mark.unit markers to all tests.

None of the 9 test functions carry a pytest marker. With strict marker enforcement enabled, these tests are invisible to filtered runs (pytest -m unit) and violate the test categorization guideline.

🏷️ Example fix (applies to all test functions)
+@pytest.mark.unit
 `@pytest.mark.asyncio`
 async def test_ensure_timescale_ready_bypasses_in_test_env(monkeypatch):
     ...

+@pytest.mark.unit
 `@pytest.mark.asyncio`
 async def test_ensure_timescale_ready_fails_when_extension_missing(monkeypatch):
     ...

+@pytest.mark.unit
 def test_frame_from_hour_rows_builds_dataframe():
     ...
 # (apply to all remaining test functions)

As per coding guidelines: "Use test markers (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow) to categorize tests appropriately."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_ohlcv_timeseries_store.py` around lines 85 - 267, All nine test
functions lack a pytest marker; add `@pytest.mark.unit` above each test definition
(test_ensure_timescale_ready_bypasses_in_test_env,
test_ensure_timescale_ready_fails_when_extension_missing,
test_ensure_timescale_ready_fails_when_minute_table_missing,
test_upsert_market_candles_1m_returns_zero_for_empty_frame,
test_upsert_market_candles_1m_invalid_exchange_to_quarantine,
test_frame_from_hour_rows_builds_dataframe,
test_fetch_market_candles_1h_kr_returns_empty_when_range_invalid,
test_fetch_previous_close_before_bucket_prefers_hour_cagg,
test_fetch_previous_close_before_bucket_uses_minute_fallback_tiebreak) so they
are discoverable by marker-filtered runs; ensure pytest is imported and place
the decorator directly above each async/def test function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant