Skip to content

Harden KR candles Timescale migration and SQL#186

Open
robin-watcha wants to merge 8 commits intomainfrom
codex/kr-candles-timescale-hardening
Open

Harden KR candles Timescale migration and SQL#186
robin-watcha wants to merge 8 commits intomainfrom
codex/kr-candles-timescale-hardening

Conversation

@robin-watcha
Copy link
Collaborator

@robin-watcha robin-watcha commented Feb 23, 2026

Summary

  • add KR candles Timescale migration and companion SQL script
  • harden version checks with minimum TimescaleDB 2.8.1 guard
  • use microsecond integer tie-break keys for deterministic FIRST/LAST ordering
  • remove IF NOT EXISTS from schema-creation paths to fail fast on drift

Validation

  • uv run ruff check alembic/versions/87541fdbc954_add_kr_candles_timescale.py
  • uv run alembic heads

Summary by CodeRabbit

  • New Features

    • KR 1m ingestion and DB-backed 1h continuous aggregates; hourly reads now include session and venues and rebuild the current hour from 1m data.
  • Data Maintenance

    • Automated 90-day retention and periodic continuous-aggregate refresh/maintenance.
  • Tasks & Scheduling

    • Scheduled incremental sync every 10 minutes on weekdays (KST).
  • CLI & Automation

    • CLI entrypoint and Make targets for backfill and incremental sync.
  • Infrastructure

    • DB switched to a TimescaleDB-compatible image.
  • Tests & Docs

    • Extensive tests and planning/validation docs for backfill and hourly ingestion.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds TimescaleDB-backed KR candle storage and maintenance, plus sync tooling: a 1-minute hypertable, a 1-hour continuous aggregate with policy and guarded refresh, a sync service/CLI/job/task to populate it, Makefile targets, Alembic retention migration, tests, and docker image change to TimescaleDB.

Changes

Cohort / File(s) Summary
Alembic migrations
alembic/versions/87541fdbc954_add_kr_candles_timescale.py, alembic/versions/d31f0a2b4c6d_add_kr_candles_retention_policy.py
New revisions to create public.kr_candles_1m hypertable and public.kr_candles_1h continuous aggregate (with extension/version checks, index, policy management, guarded refresh) and to add/remove 90-day retention policies.
Timescale SQL
scripts/sql/kr_candles_timescale.sql
Standalone SQL implementing table/hypertable, index, 1h continuous aggregate (Asia/Seoul bucketing), policy add/remove blocks, and guarded refresh logic with extension/version checks.
Sync service
app/services/kr_candles_sync_service.py
New sync service implementing backfill/incremental flows: symbol/venue planning, KIS API fetch/normalization, dedupe/upsert into kr_candles_1m, timezone/cutoff logic, transactional commits/rollbacks, and metrics; exposes sync_kr_candles.
Hourly read service
app/services/kr_hourly_candles_read_service.py
New read service read_kr_hourly_candles_1h that reads DB hourly aggregates, optionally rebuilds current hour from 1m+KIS API, computes session/venues, and enforces minimum count.
Job / CLI / Makefile
app/jobs/kr_candles.py, scripts/sync_kr_candles.py, Makefile
Adds job wrapper run_kr_candles_sync, CLI entrypoint with Sentry and exit codes, and Makefile targets sync-kr-candles-backfill and sync-kr-candles-incremental.
Task registration
app/tasks/__init__.py, app/tasks/kr_candles_tasks.py
Registers kr_candles_tasks and adds a scheduled TaskIQ task candles.kr.sync (cron every 10 minutes, weekdays, Asia/Seoul) invoking incremental sync.
MCP integration
app/mcp_server/tooling/market_data_quotes.py, app/mcp_server/README.md
Switches KR 1h path to use read_kr_hourly_candles_1h, removes prior intraday paging/caching hooks; README updated to document DB-first 1h behavior and in-memory rebuild for current hour.
Tests
tests/test_kr_candles_sync.py, tests/test_kr_hourly_candles_read_service.py, tests/test_mcp_server_tools.py
Extensive new/updated tests covering sync service, hourly read service, MCP tool integrations, typing adjustments, and script/task/job flows; heavy mocking for DB and KIS interactions.
Tasks registration update
app/tasks/__init__.py
Adds kr_candles_tasks to TASKIQ_TASK_MODULES.
Docker
docker-compose.yml
Replaces DB image postgres:17 with timescale/timescaledb-ha:pg17.
Docs / plans
docs/plans/*kr-candles-local-backfill-validation*.md, docs/plans/kr-ohlcv-1h-v2.md
New design and implementation plan docs for local backfill validation and KR 1h v2 architecture and tasks.
Scripts (SQL/entry)
scripts/sql/kr_candles_timescale.sql, scripts/sync_kr_candles.py
SQL for Timescale setup and guarded refresh; CLI script wiring job wrapper and process exit handling.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Scheduler (TaskIQ)
    participant Task as Task (kr_candles_tasks)
    participant Job as Job (run_kr_candles_sync)
    participant Service as Service (kr_candles_sync_service)
    participant KIS as KIS API
    participant DB as Postgres/TimescaleDB

    Scheduler->>Task: cron trigger (*/10 * * * 1-5 Asia/Seoul)
    Task->>Job: await run_kr_candles_sync(mode="incremental")
    Job->>Service: sync_kr_candles(mode="incremental", sessions, user_id)
    Service->>DB: SELECT last timestamps / read kr_candles_1m / read kr_candles_1h
    alt needs API fetch
        Service->>KIS: fetch minute frames (per symbol/venue/session)
        KIS-->>Service: minute frames
    end
    Service->>DB: INSERT ... ON CONFLICT upsert into public.kr_candles_1m
    DB-->>Service: acknowledge rows
    Service->>DB: manage continuous aggregate policy / refresh_continuous_aggregate on public.kr_candles_1h
    DB-->>Service: policy/refresh result
    Service-->>Job: result summary
    Job-->>Task: {"status":"completed" | "failed", ...}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I nibble minutes into neat little rows,
I bundle hours where Seoul sunrise glows.
Hypertables hum and aggregates sing,
Sync scripts hop fast on a scheduler string.
Hop—candles settle, and market light grows.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Harden KR candles Timescale migration and SQL' accurately summarizes the main changes: adding hardened TimescaleDB migrations and SQL scripts for KR candles with version checks and deterministic ordering.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/kr-candles-timescale-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Feb 23, 2026

PR Review: Harden KR candles Timescale migration and SQL

Overall the migration is well-structured with good defensive guards (version check, to_regclass checks, deterministic tie-break keys). Here are issues worth addressing before merge.


Critical

1. No SQLAlchemy ORM model for kr_candles_1m

The table is created in the migration but there's no corresponding model in app/models/. This has two consequences:

  • alembic autogenerate will treat kr_candles_1m as an untracked table and may generate a migration to drop it.
  • The application has no type-safe way to insert/query this table.

At minimum, add a model and import it in app/models/__init__.py, or add the table to the exclude_tables list in alembic/env.py if it's intentionally managed outside Alembic's autogenerate scope.

2. create_hypertable and CALL refresh_continuous_aggregate inside a transaction

Alembic wraps migrations in a transaction by default (context.begin_transaction() in env.py). Some TimescaleDB operations — particularly CALL refresh_continuous_aggregate — cannot run inside a transaction block and will raise:

ERROR: cannot run refresh outside a transaction block

(In TimescaleDB ≥ 2.x CALL syntax was introduced precisely because the refresh needs its own transaction management.)

Consider adding transaction_per_migration = False or running the refresh in a separate op.execute() outside the Alembic transaction. See the Alembic docs on non-transactional DDL.

3. Initial refresh_continuous_aggregate during migration can time out in production

If kr_candles_1m has historical data when the migration runs, the refresh over the full range can take minutes. This risks hitting connection timeouts or holding locks that affect other operations. Consider either:

  • Skipping the initial refresh in the migration (the continuous aggregate policy will backfill via the scheduler), or
  • Documenting that this step is expected to be slow and should be run with a dedicated maintenance window.

Important

4. Missing chunk_time_interval in create_hypertable

The default chunk interval for TIMESTAMPTZ is 7 days. For 1-minute candles, 7-day chunks are very large and will degrade query performance for recent-data lookups (the most common pattern for trading). A 1-day interval is typical for minute-resolution time series:

SELECT create_hypertable(
    'public.kr_candles_1m',
    'time',
    chunk_time_interval => INTERVAL '1 day'
)

5. migrate_data => TRUE is a no-op on a freshly created table

Since kr_candles_1m is created by this very migration, it will always be empty when create_hypertable is called. migrate_data => TRUE only matters when converting an existing table. Remove it to avoid confusion:

SELECT create_hypertable('public.kr_candles_1m', 'time')

6. Venue tie-break logic needs a comment

The asymmetric CASE in the tie-break key is subtle and easy to mis-read:

-- open uses: KRX=0, NTX=1  → FIRST picks KRX when timestamps tie
-- close uses: KRX=1, NTX=0  → LAST picks KRX when timestamps tie

Both expressions prefer KRX as the authoritative source for open and close. That may be intentional (KRX is the primary venue), but a single-line comment explaining the intent would prevent future contributors from "fixing" this apparent asymmetry.

7. Docker Compose uses postgres:17, not TimescaleDB

The dev docker-compose.yml uses plain Postgres. Developers running make dev cannot test this migration locally without manually switching to a TimescaleDB image (e.g. timescale/timescaledb:latest-pg17). The PR should either update the dev compose file or document the prerequisite clearly (e.g. in DOCKER_USAGE.md).

8. No tests

There are no tests covering the new table/view. At a minimum consider:

  • A migration smoke-test that checks kr_candles_1m exists after alembic upgrade head
  • A unit test for any future service that writes to or reads from this table

Minor

9. Duplicate logic between migration and SQL script

scripts/sql/kr_candles_timescale.sql is an almost exact copy of the migration's upgrade path. When the migration changes, the SQL script must be updated in sync. Consider adding a header comment to the SQL file stating it's a convenience script derived from the Alembic migration, or consolidating the logic so there's one source of truth.

10. start_offset => INTERVAL '2 days' continuous aggregate policy

The policy only re-materialises the last 2 days. Data older than 2 days that has been corrected/inserted late won't be refreshed automatically. If backfill corrections are expected, consider a wider start_offset or a separate backfill procedure.

11. Downgrade silently drops data

op.execute("DROP TABLE IF EXISTS public.kr_candles_1m")

The IF EXISTS makes sense in idempotent tooling, but this silently swallows all candle data without warning. This is standard migration practice, but it conflicts with the PR's stated goal of "fail fast on drift" (the upgrade path removes IF NOT EXISTS). A comment noting the data-loss risk would be consistent with that philosophy.


Validation Gap

The PR notes:

uv run ruff check alembic/versions/87541fdbc954_add_kr_candles_timescale.py
uv run alembic heads

alembic heads only checks the revision DAG, not whether the SQL executes correctly. It's worth documenting whether this was tested against an actual TimescaleDB instance.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
alembic/versions/87541fdbc954_add_kr_candles_timescale.py (1)

59-75: Consider adding a foreign key from symbol to stock_info for referential integrity.

The symbol column lacks a FK constraint. While this table stores raw market data rather than analysis results, linking symbol to the stock_info table would enforce data integrity and prevent orphaned rows with invalid symbols.

Note: FKs from hypertables to regular tables are supported in TimescaleDB. However, ensure column type compatibility: stock_info.symbol is VARCHAR(50) while kr_candles_1m.symbol is TEXT, so the symbol column type may need to be adjusted to match for the FK to work properly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@alembic/versions/87541fdbc954_add_kr_candles_timescale.py` around lines 59 -
75, The kr_candles_1m table is missing a foreign key to stock_info.symbol which
permits orphaned symbols; modify the CREATE TABLE for kr_candles_1m so symbol
has a compatible type (change symbol from TEXT to VARCHAR(50)) and add a FK
constraint (e.g., CONSTRAINT fk_kr_candles_1m_symbol FOREIGN KEY (symbol)
REFERENCES stock_info(symbol)); ensure the column type change and FK are applied
together in the op.execute block that creates kr_candles_1m so TimescaleDB
accepts the reference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@scripts/sql/kr_candles_timescale.sql`:
- Around line 57-83: The array_agg(DISTINCT venue ORDER BY venue) expression in
the CREATE MATERIALIZED VIEW for kr_candles_1h may not be supported by
TimescaleDB continuous aggregates/partial aggregation; replace it with a
Timescale-compatible approach: remove DISTINCT and ORDER BY from the aggregate
in the continuous aggregate (e.g., use array_agg(venue) AS venues) and move the
de-duplication/ordering into a separate step or subquery when querying the
materialized view, or compute a deduplicated ordered venues array via a lateral
subquery over public.kr_candles_1m keyed by time_bucket(bucket, time) and
symbol; update the SELECT for kr_candles_1h to reference the simplified
aggregate and ensure the downstream logic performs DISTINCT+ORDER to produce the
final venues list.

---

Nitpick comments:
In `@alembic/versions/87541fdbc954_add_kr_candles_timescale.py`:
- Around line 59-75: The kr_candles_1m table is missing a foreign key to
stock_info.symbol which permits orphaned symbols; modify the CREATE TABLE for
kr_candles_1m so symbol has a compatible type (change symbol from TEXT to
VARCHAR(50)) and add a FK constraint (e.g., CONSTRAINT fk_kr_candles_1m_symbol
FOREIGN KEY (symbol) REFERENCES stock_info(symbol)); ensure the column type
change and FK are applied together in the op.execute block that creates
kr_candles_1m so TimescaleDB accepts the reference.

@codecov
Copy link

codecov bot commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 59.57447% with 247 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.27%. Comparing base (ea285d2) to head (afde231).

Files with missing lines Patch % Lines
app/services/kr_candles_sync_service.py 34.61% 221 Missing ⚠️
app/services/kr_hourly_candles_read_service.py 89.25% 26 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #186      +/-   ##
==========================================
- Coverage   63.53%   63.27%   -0.26%     
==========================================
  Files         180      184       +4     
  Lines       22421    22927     +506     
==========================================
+ Hits        14245    14508     +263     
- Misses       8176     8419     +243     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@claude
Copy link

claude bot commented Feb 23, 2026

PR Review: Harden KR Candles Timescale Migration and SQL

Overall this is a well-structured PR with solid TimescaleDB integration, good use of parametrized SQL, clean dataclasses, and meaningful test coverage. A few issues worth addressing before merge:


Bugs / Correctness

1. _upsert_rows returns attempted count, not actual upserted count

_UPSERT_SQL uses ON CONFLICT DO UPDATE ... WHERE (IS DISTINCT FROM ...), which means unchanged rows are skipped. But the function returns len(payload) (rows attempted), not actual rows written. The rows_upserted counter in the final result is therefore misleading — it reports attempted, not changed.

# app/services/kr_candles_sync_service.py ~line 740
_ = await session.execute(_UPSERT_SQL, payload)
return len(payload)  # This is rows attempted, not rows upserted

Consider using result.rowcount (or SQLAlchemy's CursorResult.rowcount) if you want accurate upsert counts, or rename the field to rows_attempted.


2. Session not used as async context manager

# ~line 948
session = cast(AsyncSession, cast(object, AsyncSessionLocal()))
try:
    ...
finally:
    await session.close()

The double cast(...) pattern suppresses type errors rather than fixing them, which is a smell. The idiomatic pattern in this codebase (based on CLAUDE.md referencing AsyncSessionLocal) should be async with AsyncSessionLocal() as session:. The current pattern is safe due to the finally block but is inconsistent with standard SQLAlchemy async usage. If AsyncSessionLocal is a factory that doesn't support the context manager protocol, that's worth fixing at the source.


3. Partial failure aborts remaining pairs but commits already-processed pairs

# ~lines 1014-1027
try:
    stats = await _sync_symbol_venue(...)
    await session.commit()
except Exception:
    await session.rollback()
    raise  # Aborts ALL remaining pairs

If pair N fails, pairs 0..N-1 are already committed and pair N's data is rolled back, but pairs N+1..end are never processed. The function then returns "status": "failed". This creates partial success that looks like a complete failure. Consider whether continuing with the next pair (logging the error, not re-raising) is acceptable, or document this behavior explicitly.


Design / Reliability

4. Cron runs every minute on weekdays — even outside all session hours

# app/tasks/kr_candles_tasks.py
schedule=[{"cron": "*/1 * * * 1-5", "cron_offset": "Asia/Seoul"}]

NTX session ends at 20:00 KST, but the task runs every minute from midnight to midnight on weekdays. The venue gate (_should_process_venue) will short-circuit the work, but the task still fires ~500+ no-op times per day. Tightening to "* 7-21 * * 1-5" would cover both KRX (9:00-15:30) and NTX (8:00-20:00) with buffer, and reduce unnecessary scheduler overhead.


5. user_id=1 hardcoded default in task

# app/jobs/kr_candles.py
async def run_kr_candles_sync(*, mode: str, sessions: int = 10, user_id: int = 1)

The task sync_kr_candles_incremental_task calls run_kr_candles_sync(mode="incremental") with no user_id, always using user 1. If the system ever has multiple users with KR holdings, this will silently omit other users' holdings. Consider making this configurable via a settings variable (settings.KR_CANDLES_USER_ID or similar).


6. Missing exchange_calendars in dependency verification

import exchange_calendars as xcals

The diff doesn't include changes to pyproject.toml. If exchange-calendars isn't in the declared dependencies, this will fail at runtime in fresh environments. Please confirm it's listed under project dependencies.


Code Quality

7. value_value naming (line ~676)

value_value = _parse_float(item.get("value"))

value_value is an awkward collision of the field name value and the variable suffix _value. Something like trade_value or notional would be clearer.


8. Retention migration missing module docstring

d31f0a2b4c6d_add_kr_candles_retention_policy.py has no module-level docstring, unlike 87541fdbc954_add_kr_candles_timescale.py. Alembic convention (and the project's own pattern) is to include the """description""" docstring and the Create Date comment. Minor but inconsistent.


9. SQL script vs. migration inconsistency for retention policies

scripts/sql/kr_candles_timescale.sql includes retention policy setup, but the first migration (87541fdbc954) does not — that's in the separate d31f0a2b4c6d migration. A developer who runs the SQL script manually gets retention policies; one who runs only the first migration and forgets the second does not. The SQL script should note this split, or the README/docstring for the SQL script should clarify it's an all-in-one alternative to running both migrations.


Missing Test Coverage

10. _collect_day_rows pagination loop is untested

This is the most complex function in the service — it controls the KIS API page cursor, loop termination, cutoff logic, and deduplication. None of the tests exercise it. A mock-based test covering the pagination boundary (when next_cursor reaches session_start) and the cutoff guard would significantly increase confidence.

11. _sync_symbol_venue orchestration is untested

The function's backfill-vs-incremental branching, the allowed_days filtering, and the day_before_cutoff early termination are all untested. Even a simple mock-based test would help.


Minor Positives Worth Calling Out

  • The microsecond integer tie-break for FIRST/LAST in the continuous aggregate is clever and deterministic.
  • The IS DISTINCT FROM change detection in the UPSERT avoids unnecessary WAL writes on no-change conflicts — good TimescaleDB practice.
  • Removing IF NOT EXISTS from schema creation paths (fail-fast on drift) is the right call for production migrations.
  • Version guard for TimescaleDB ≥ 2.8.1 is a good safety net.
  • Parametrized SQL throughout — no injection vectors.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
app/services/kr_candles_sync_service.py (3)

510-521: min(allowed_days) recomputed on every loop iteration

allowed_days is a set[date] and min(allowed_days) is called inside the while True body on every pass. Although allowed_days is bounded to at most sessions elements (default 10), this is still an avoidable O(n) call per iteration. Precompute once before the loop.

♻️ Proposed fix
+    min_allowed_day = min(allowed_days) if allowed_days is not None else None
+
     while True:
         if _day_before_cutoff(
             target_day=current_day, venue=venue, cutoff_kst=cutoff_kst
         ):
             break

         if allowed_days is not None:
-            if current_day < min(allowed_days):
+            if current_day < min_allowed_day:  # type: ignore[operator]
                 break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_candles_sync_service.py` around lines 510 - 521, The loop
repeatedly calls min(allowed_days) each iteration which is O(n); precompute it
once before the while True loop (e.g., min_allowed_day = min(allowed_days) if
allowed_days is not None else None) and then replace uses of min(allowed_days)
inside the loop with min_allowed_day; update the condition that checks
current_day < min(allowed_days) to use min_allowed_day and keep the rest of the
logic (current_day adjustment and _day_before_cutoff calls) unchanged,
referencing the variables allowed_days, current_day, and the while True block in
kr_candles_sync_service.py.

598-598: Double-cast is a code smell; use a type annotation instead

cast(AsyncSession, cast(object, AsyncSessionLocal())) is a type-checker workaround. The idiomatic fix is a local type annotation:

♻️ Proposed fix
-    session = cast(AsyncSession, cast(object, AsyncSessionLocal()))
+    session: AsyncSession = AsyncSessionLocal()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_candles_sync_service.py` at line 598, The double-cast on
session is a type-checker workaround; replace it with a proper local type
annotation to remove cast noise: declare session with type AsyncSession (e.g.,
"session: AsyncSession = AsyncSessionLocal()") instead of using
cast(AsyncSession, cast(object, AsyncSessionLocal())), referencing the
AsyncSessionLocal() factory and the session variable so the type checker knows
the concrete type without casts.

664-681: Fail-fast raise aborts the entire batch on a single pair error

A transient API error or DB issue on one symbol/venue pair re-raises immediately, causing all subsequent pairs to be silently skipped and their metrics lost. For a batch pipeline that should be resilient to per-symbol failures, consider logging the error and continuing:

♻️ Proposed fix – per-pair isolation with error logging
                 try:
                     stats = await _sync_symbol_venue(
                         session=session,
                         kis=kis,
                         symbol=symbol,
                         venue=venue,
                         mode=normalized_mode,
                         now_kst=now_kst,
                         backfill_days=backfill_days,
                     )
                     await session.commit()
                 except Exception:
                     await session.rollback()
-                    raise
+                    logger.exception(
+                        "KR candles sync failed symbol=%s venue=%s",
+                        symbol,
+                        venue.venue,
+                    )
+                    pairs_skipped += 1
+                    continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_candles_sync_service.py` around lines 664 - 681, The current
try/except around _sync_symbol_venue re-raises on any Exception which aborts the
whole batch; instead, catch Exception, await session.rollback(), log the error
with context (include symbol, venue, normalized_mode, and any error
message/stack) and optionally increment a failure metric or a local counter,
then continue to the next pair without re-raising so subsequent pairs still run;
keep using the same symbols (session, kis, _sync_symbol_venue, stats,
pairs_processed, rows_upserted, pages_fetched, now_kst, backfill_days) to locate
where to add the logging and failure handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/services/kr_candles_sync_service.py`:
- Around line 632-639: The backfill include_today gate currently uses
_VENUE_CONFIG["KRX"].session_end (15:30) which causes partial NTX (20:00) days
to be included; change the logic that sets include_today inside the
normalized_mode == "backfill" branch to compute the threshold from the actual
planned venues' session_end values (e.g. take max(session_end) across the venues
you will fetch) instead of hardcoding KRX, then compare now_kst.time() against
that derived threshold to decide include_today before calling
_recent_session_days; update references to include_today, normalized_mode,
now_kst, _VENUE_CONFIG, session_end, and _recent_session_days accordingly so
_initial_end_time bounded partial NTX days are not persisted.
- Around line 175-185: The current block in sync_kr_candles that builds inactive
from target_symbols and raises ValueError stops the whole run when any holding
is flagged is_active=False; instead, detect inactive (using the existing
inactive = sorted(...) logic), log a warning including count and preview (use
process logger or the existing logger in sync_kr_candles), remove those symbols
from target_symbols/rows_by_symbol so they are skipped, and continue processing
the remaining active symbols; do not raise—just warn and filter out inactive
entries before any upserts.

In `@app/tasks/kr_candles_tasks.py`:
- Around line 15-24: The except block in sync_kr_candles_incremental_task is
dead because run_kr_candles_sync catches/returns all Exceptions; remove the
try/except and either (a) return await run_kr_candles_sync(mode="incremental")
directly, or (b) call run_kr_candles_sync, inspect its returned dict and
log/error-handle based on its "status" field; update the
sync_kr_candles_incremental_task function (and similarly any other sync_*
wrapper) to reflect that control flow change instead of relying on an
unreachable exception handler.

In `@scripts/sync_kr_candles.py`:
- Around line 51-64: The current except block is unreachable because
run_kr_candles_sync swallows exceptions, so failures where result.get("status")
!= "completed" never get reported to Sentry; update the failure branch to call
Sentry (e.g., capture_exception or capture_message) with the result and relevant
context so failures are routed to Sentry. Concretely, in the block after
run_kr_candles_sync where you check result.get("status") != "completed", invoke
capture_exception (or capture_message) with an informative message and include
the result payload and process="sync_kr_candles" as context, keep the existing
logger.error call, and ensure the original except still captures unexpected
exceptions via capture_exception and logger.error.

In `@tests/test_kr_candles_sync.py`:
- Around line 243-252: Remove the dead exception-path subcase in the
test_script_main_exit_codes test: delete the block that patches
sync_kr_candles.run_kr_candles_sync to raise RuntimeError (and the related
capture_mock/capture_exception assertions), because run_kr_candles_sync absorbs
exceptions in production and that branch is unreachable; leave the other
exit-code assertions intact so the test no longer asserts behavior for the
removed except branch in sync_kr_candles.main.
- Around line 255-258: The test uses relative Paths tied to the current working
directory; update them to be repo-root-relative by resolving from the test file
location: in
test_new_retention_migration_contains_upgrade_and_downgrade_policy_sql replace
Path("alembic/versions") with Path(__file__).resolve().parent.parent / "alembic"
/ "versions" (so the glob still works), and likewise replace
Path("scripts/sql/kr_candles_timescale.sql") with
Path(__file__).resolve().parent.parent / "scripts" / "sql" /
"kr_candles_timescale.sql"; keep the same variable names and match/glob logic in
the existing function to avoid changing behavior.
- Around line 28-276: All tests in this file lack the required pytest marker;
add classification by either decorating each test function (e.g.
`@pytest.mark.unit` above test_build_symbol_union_combines_kis_and_manual_symbols,
test_validate_universe_rows_fails_when_table_empty,
test_validate_universe_rows_fails_when_symbol_missing,
test_validate_universe_rows_fails_when_symbol_inactive,
test_build_venue_plan_maps_dual_and_single_venues,
test_should_process_venue_skips_holiday_in_incremental_mode,
test_should_process_venue_skips_outside_session_in_incremental_mode,
test_compute_incremental_cutoff_uses_five_minute_overlap,
test_convert_kis_datetime_to_utc_interprets_naive_as_kst,
test_run_kr_candles_sync_success_payload,
test_run_kr_candles_sync_failure_payload, test_task_payload_success,
test_task_payload_failure, test_script_main_exit_codes,
test_new_retention_migration_contains_upgrade_and_downgrade_policy_sql,
test_sql_script_contains_90_day_retention_policy_for_both_tables) or add a
module-level marker like pytestmark = pytest.mark.unit at top of the file so
every test is categorized as a unit test per guidelines.

---

Nitpick comments:
In `@app/services/kr_candles_sync_service.py`:
- Around line 510-521: The loop repeatedly calls min(allowed_days) each
iteration which is O(n); precompute it once before the while True loop (e.g.,
min_allowed_day = min(allowed_days) if allowed_days is not None else None) and
then replace uses of min(allowed_days) inside the loop with min_allowed_day;
update the condition that checks current_day < min(allowed_days) to use
min_allowed_day and keep the rest of the logic (current_day adjustment and
_day_before_cutoff calls) unchanged, referencing the variables allowed_days,
current_day, and the while True block in kr_candles_sync_service.py.
- Line 598: The double-cast on session is a type-checker workaround; replace it
with a proper local type annotation to remove cast noise: declare session with
type AsyncSession (e.g., "session: AsyncSession = AsyncSessionLocal()") instead
of using cast(AsyncSession, cast(object, AsyncSessionLocal())), referencing the
AsyncSessionLocal() factory and the session variable so the type checker knows
the concrete type without casts.
- Around line 664-681: The current try/except around _sync_symbol_venue
re-raises on any Exception which aborts the whole batch; instead, catch
Exception, await session.rollback(), log the error with context (include symbol,
venue, normalized_mode, and any error message/stack) and optionally increment a
failure metric or a local counter, then continue to the next pair without
re-raising so subsequent pairs still run; keep using the same symbols (session,
kis, _sync_symbol_venue, stats, pairs_processed, rows_upserted, pages_fetched,
now_kst, backfill_days) to locate where to add the logging and failure handling.
ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c1e04c and f98a849.

📒 Files selected for processing (9)
  • Makefile
  • alembic/versions/d31f0a2b4c6d_add_kr_candles_retention_policy.py
  • app/jobs/kr_candles.py
  • app/services/kr_candles_sync_service.py
  • app/tasks/__init__.py
  • app/tasks/kr_candles_tasks.py
  • scripts/sql/kr_candles_timescale.sql
  • scripts/sync_kr_candles.py
  • tests/test_kr_candles_sync.py

Comment on lines +175 to +185
inactive = sorted(
symbol
for symbol in target_symbols
if symbol in rows_by_symbol and not rows_by_symbol[symbol].is_active
)
if inactive:
preview = ", ".join(inactive[:10])
raise ValueError(
f"KR symbol is inactive in kr_symbol_universe: "
f"count={len(inactive)} symbols=[{preview}]"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inactive symbols in holdings abort the entire sync run

If any symbol returned by kis.fetch_my_stocks() or the manual holdings is flagged is_active=False in kr_symbol_universe, the entire sync_kr_candles call raises before a single candle is upserted. In live portfolios it is normal for a ticker to become suspended or delisted while still appearing in holdings. This hard-raise blocks candle syncing for all other active symbols.

Consider filtering inactive symbols out (with a warning) and continuing rather than raising:

🛡️ Proposed fix – warn and filter instead of raise
-    inactive = sorted(
-        symbol
-        for symbol in target_symbols
-        if symbol in rows_by_symbol and not rows_by_symbol[symbol].is_active
-    )
-    if inactive:
-        preview = ", ".join(inactive[:10])
-        raise ValueError(
-            f"KR symbol is inactive in kr_symbol_universe: "
-            f"count={len(inactive)} symbols=[{preview}]"
-        )
-
-    return {symbol: rows_by_symbol[symbol] for symbol in target_symbols}
+    inactive = sorted(
+        symbol
+        for symbol in target_symbols
+        if symbol in rows_by_symbol and not rows_by_symbol[symbol].is_active
+    )
+    if inactive:
+        preview = ", ".join(inactive[:10])
+        logger.warning(
+            "KR symbol is inactive in kr_symbol_universe, skipping: "
+            "count=%d symbols=[%s]",
+            len(inactive),
+            preview,
+        )
+
+    active_symbols = target_symbols - set(inactive)
+    return {symbol: rows_by_symbol[symbol] for symbol in active_symbols}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_candles_sync_service.py` around lines 175 - 185, The current
block in sync_kr_candles that builds inactive from target_symbols and raises
ValueError stops the whole run when any holding is flagged is_active=False;
instead, detect inactive (using the existing inactive = sorted(...) logic), log
a warning including count and preview (use process logger or the existing logger
in sync_kr_candles), remove those symbols from target_symbols/rows_by_symbol so
they are skipped, and continue processing the remaining active symbols; do not
raise—just warn and filter out inactive entries before any upserts.

Comment on lines +632 to +639
backfill_days: list[date] | None = None
if normalized_mode == "backfill":
include_today = now_kst.time() >= _VENUE_CONFIG["KRX"].session_end
backfill_days = _recent_session_days(
now_kst,
session_count,
include_today=include_today,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

include_today threshold is hardcoded to KRX session end, leaving partial NTX data in backfill runs

_VENUE_CONFIG["KRX"].session_end (15:30 KST) is used to decide whether to include today in backfill_days for all venues. NTX closes at 20:00 KST, so between 15:30 and 20:00 KST a backfill run will include today in backfill_days, fetch partial NTX today data (bounded by _initial_end_time), and store an incomplete day. The per-venue session_end should drive the gate for each venue, or the threshold should be the latest close across all planned venues.

🛡️ Proposed fix – derive threshold from the latest planned venue
-        include_today = now_kst.time() >= _VENUE_CONFIG["KRX"].session_end
+        latest_close = max(
+            v.session_end
+            for venues in venue_plan.values()
+            for v in venues
+        )
+        include_today = now_kst.time() >= latest_close
         backfill_days = _recent_session_days(
             now_kst,
             session_count,
             include_today=include_today,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_candles_sync_service.py` around lines 632 - 639, The backfill
include_today gate currently uses _VENUE_CONFIG["KRX"].session_end (15:30) which
causes partial NTX (20:00) days to be included; change the logic that sets
include_today inside the normalized_mode == "backfill" branch to compute the
threshold from the actual planned venues' session_end values (e.g. take
max(session_end) across the venues you will fetch) instead of hardcoding KRX,
then compare now_kst.time() against that derived threshold to decide
include_today before calling _recent_session_days; update references to
include_today, normalized_mode, now_kst, _VENUE_CONFIG, session_end, and
_recent_session_days accordingly so _initial_end_time bounded partial NTX days
are not persisted.

Comment on lines +15 to +24
async def sync_kr_candles_incremental_task() -> dict[str, object]:
try:
return await run_kr_candles_sync(mode="incremental")
except Exception as exc:
logger.error("TaskIQ KR candles sync failed: %s", exc, exc_info=True)
return {
"status": "failed",
"mode": "incremental",
"error": str(exc),
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead except block — run_kr_candles_sync never raises.

run_kr_candles_sync absorbs all Exception subclasses and returns a status dict. Lines 18-24 are unreachable, which creates a maintenance trap (developers may expect this logger path to fire).

♻️ Simplify to reflect the actual control flow
 async def sync_kr_candles_incremental_task() -> dict[str, object]:
-    try:
-        return await run_kr_candles_sync(mode="incremental")
-    except Exception as exc:
-        logger.error("TaskIQ KR candles sync failed: %s", exc, exc_info=True)
-        return {
-            "status": "failed",
-            "mode": "incremental",
-            "error": str(exc),
-        }
+    return await run_kr_candles_sync(mode="incremental")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/tasks/kr_candles_tasks.py` around lines 15 - 24, The except block in
sync_kr_candles_incremental_task is dead because run_kr_candles_sync
catches/returns all Exceptions; remove the try/except and either (a) return
await run_kr_candles_sync(mode="incremental") directly, or (b) call
run_kr_candles_sync, inspect its returned dict and log/error-handle based on its
"status" field; update the sync_kr_candles_incremental_task function (and
similarly any other sync_* wrapper) to reflect that control flow change instead
of relying on an unreachable exception handler.

Comment on lines +51 to +64
try:
result = await run_kr_candles_sync(
mode=args.mode,
sessions=max(args.sessions, 1),
user_id=args.user_id,
)
except Exception as exc:
capture_exception(exc, process="sync_kr_candles")
logger.error("KR candles sync crashed: %s", exc, exc_info=True)
return 1

if result.get("status") != "completed":
logger.error("KR candles sync failed: %s", result)
return 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

capture_exception is unreachable — sync failures bypass Sentry.

run_kr_candles_sync swallows all Exception subclasses internally, so the except on line 57 is dead code and capture_exception is never called. The status != "completed" branch on line 62 only logs locally; no Sentry event is emitted for sync failures.

🛡️ Proposed fix — route failure status to Sentry
-    try:
-        result = await run_kr_candles_sync(
-            mode=args.mode,
-            sessions=max(args.sessions, 1),
-            user_id=args.user_id,
-        )
-    except Exception as exc:
-        capture_exception(exc, process="sync_kr_candles")
-        logger.error("KR candles sync crashed: %s", exc, exc_info=True)
-        return 1
+    result = await run_kr_candles_sync(
+        mode=args.mode,
+        sessions=max(args.sessions, 1),
+        user_id=args.user_id,
+    )
 
     if result.get("status") != "completed":
         logger.error("KR candles sync failed: %s", result)
+        capture_exception(
+            RuntimeError(f"KR candles sync failed: {result}"),
+            process="sync_kr_candles",
+        )
         return 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
result = await run_kr_candles_sync(
mode=args.mode,
sessions=max(args.sessions, 1),
user_id=args.user_id,
)
except Exception as exc:
capture_exception(exc, process="sync_kr_candles")
logger.error("KR candles sync crashed: %s", exc, exc_info=True)
return 1
if result.get("status") != "completed":
logger.error("KR candles sync failed: %s", result)
return 1
result = await run_kr_candles_sync(
mode=args.mode,
sessions=max(args.sessions, 1),
user_id=args.user_id,
)
if result.get("status") != "completed":
logger.error("KR candles sync failed: %s", result)
capture_exception(
RuntimeError(f"KR candles sync failed: {result}"),
process="sync_kr_candles",
)
return 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/sync_kr_candles.py` around lines 51 - 64, The current except block is
unreachable because run_kr_candles_sync swallows exceptions, so failures where
result.get("status") != "completed" never get reported to Sentry; update the
failure branch to call Sentry (e.g., capture_exception or capture_message) with
the result and relevant context so failures are routed to Sentry. Concretely, in
the block after run_kr_candles_sync where you check result.get("status") !=
"completed", invoke capture_exception (or capture_message) with an informative
message and include the result payload and process="sync_kr_candles" as context,
keep the existing logger.error call, and ensure the original except still
captures unexpected exceptions via capture_exception and logger.error.

Comment on lines +28 to +276
def test_build_symbol_union_combines_kis_and_manual_symbols() -> None:
from app.services import kr_candles_sync_service as svc

kis_holdings = [
{"pdno": "5930"},
{"pdno": "035420"},
{"pdno": None},
]
manual_holdings = [
SimpleNamespace(ticker="005930"),
SimpleNamespace(ticker="000660"),
]

symbols = svc._build_symbol_union(kis_holdings, manual_holdings)

assert symbols == {"005930", "035420", "000660"}


def test_validate_universe_rows_fails_when_table_empty() -> None:
from app.services import kr_candles_sync_service as svc

with pytest.raises(ValueError, match="kr_symbol_universe is empty"):
svc._validate_universe_rows(
target_symbols={"005930"},
universe_rows=[],
table_has_rows=False,
)


def test_validate_universe_rows_fails_when_symbol_missing() -> None:
from app.services import kr_candles_sync_service as svc

row = _make_universe_row("005930", nxt_eligible=True, is_active=True)

with pytest.raises(ValueError, match="not registered"):
svc._validate_universe_rows(
target_symbols={"005930", "000660"},
universe_rows=[row],
table_has_rows=True,
)


def test_validate_universe_rows_fails_when_symbol_inactive() -> None:
from app.services import kr_candles_sync_service as svc

inactive_row = _make_universe_row("005930", nxt_eligible=False, is_active=False)

with pytest.raises(ValueError, match="inactive"):
svc._validate_universe_rows(
target_symbols={"005930"},
universe_rows=[inactive_row],
table_has_rows=True,
)


def test_build_venue_plan_maps_dual_and_single_venues() -> None:
from app.services import kr_candles_sync_service as svc

rows_by_symbol = {
"005930": _make_universe_row("005930", nxt_eligible=True, is_active=True),
"000660": _make_universe_row("000660", nxt_eligible=False, is_active=True),
}

plan = svc._build_venue_plan(rows_by_symbol)

assert [v.venue for v in plan["005930"]] == ["KRX", "NTX"]
assert [v.market_code for v in plan["005930"]] == ["J", "NX"]
assert [v.venue for v in plan["000660"]] == ["KRX"]


def test_should_process_venue_skips_holiday_in_incremental_mode() -> None:
from app.services import kr_candles_sync_service as svc

venue = svc._VENUE_CONFIG["KRX"]
now_kst = datetime(2026, 1, 1, 10, 0, tzinfo=svc._KST)

should_process, reason = svc._should_process_venue(
mode="incremental",
now_kst=now_kst,
is_session_day=False,
venue=venue,
)

assert should_process is False
assert reason == "holiday"


def test_should_process_venue_skips_outside_session_in_incremental_mode() -> None:
from app.services import kr_candles_sync_service as svc

venue = svc._VENUE_CONFIG["KRX"]
now_kst = datetime(2026, 2, 23, 8, 10, tzinfo=svc._KST)

should_process, reason = svc._should_process_venue(
mode="incremental",
now_kst=now_kst,
is_session_day=True,
venue=venue,
)

assert should_process is False
assert reason == "outside_session"


def test_compute_incremental_cutoff_uses_five_minute_overlap() -> None:
from app.services import kr_candles_sync_service as svc

cursor_utc = datetime(2026, 2, 23, 1, 30, tzinfo=UTC)

cutoff_kst = svc._compute_incremental_cutoff_kst(cursor_utc)

assert cutoff_kst is not None
assert cutoff_kst.tzinfo == svc._KST
assert cutoff_kst.strftime("%Y-%m-%d %H:%M:%S") == "2026-02-23 10:25:00"


def test_convert_kis_datetime_to_utc_interprets_naive_as_kst() -> None:
from app.services import kr_candles_sync_service as svc

converted = svc._convert_kis_datetime_to_utc(datetime(2026, 2, 23, 9, 0, 0))

assert converted.tzinfo == UTC
assert converted.strftime("%Y-%m-%d %H:%M:%S") == "2026-02-23 00:00:00"


@pytest.mark.asyncio
async def test_run_kr_candles_sync_success_payload(
monkeypatch: pytest.MonkeyPatch,
) -> None:
from app.jobs import kr_candles

monkeypatch.setattr(
kr_candles,
"sync_kr_candles",
AsyncMock(return_value={"mode": "incremental", "rows_upserted": 11}),
)

result = await kr_candles.run_kr_candles_sync(mode="incremental")

assert result["status"] == "completed"
assert result["mode"] == "incremental"
assert result["rows_upserted"] == 11


@pytest.mark.asyncio
async def test_run_kr_candles_sync_failure_payload(
monkeypatch: pytest.MonkeyPatch,
) -> None:
from app.jobs import kr_candles

monkeypatch.setattr(
kr_candles,
"sync_kr_candles",
AsyncMock(side_effect=RuntimeError("sync failure")),
)

result = await kr_candles.run_kr_candles_sync(mode="incremental")

assert result["status"] == "failed"
assert "sync failure" in str(result["error"])


@pytest.mark.asyncio
async def test_task_payload_success(monkeypatch: pytest.MonkeyPatch) -> None:
from app.tasks import kr_candles_tasks

monkeypatch.setattr(
kr_candles_tasks,
"run_kr_candles_sync",
AsyncMock(return_value={"status": "completed", "rows_upserted": 3}),
)

result = await kr_candles_tasks.sync_kr_candles_incremental_task()

assert result["status"] == "completed"
assert result["rows_upserted"] == 3


@pytest.mark.asyncio
async def test_task_payload_failure(monkeypatch: pytest.MonkeyPatch) -> None:
from app.tasks import kr_candles_tasks

monkeypatch.setattr(
kr_candles_tasks,
"run_kr_candles_sync",
AsyncMock(side_effect=RuntimeError("task crash")),
)

result = await kr_candles_tasks.sync_kr_candles_incremental_task()

assert result["status"] == "failed"
assert "task crash" in str(result["error"])


@pytest.mark.asyncio
async def test_script_main_exit_codes(monkeypatch: pytest.MonkeyPatch) -> None:
from scripts import sync_kr_candles

monkeypatch.setattr(sync_kr_candles, "init_sentry", lambda **_: None)
monkeypatch.setattr(
sync_kr_candles,
"run_kr_candles_sync",
AsyncMock(return_value={"status": "completed", "rows_upserted": 1}),
)
success_code = await sync_kr_candles.main(["--mode", "incremental"])
assert success_code == 0

monkeypatch.setattr(
sync_kr_candles,
"run_kr_candles_sync",
AsyncMock(return_value={"status": "failed", "error": "boom"}),
)
failed_status_code = await sync_kr_candles.main(["--mode", "incremental"])
assert failed_status_code == 1

capture_mock = MagicMock()
monkeypatch.setattr(sync_kr_candles, "capture_exception", capture_mock)
monkeypatch.setattr(
sync_kr_candles,
"run_kr_candles_sync",
AsyncMock(side_effect=RuntimeError("hard crash")),
)
exception_code = await sync_kr_candles.main(["--mode", "incremental"])
assert exception_code == 1
capture_mock.assert_called_once()


def test_new_retention_migration_contains_upgrade_and_downgrade_policy_sql() -> None:
versions_dir = Path("alembic/versions")
matches = sorted(versions_dir.glob("*_add_kr_candles_retention_policy.py"))
assert matches, "retention migration file is missing"

content = matches[-1].read_text(encoding="utf-8")

assert "add_retention_policy" in content
assert "remove_retention_policy" in content
assert "kr_candles_1m" in content
assert "kr_candles_1h" in content
assert "90 days" in content


def test_sql_script_contains_90_day_retention_policy_for_both_tables() -> None:
content = Path("scripts/sql/kr_candles_timescale.sql").read_text(encoding="utf-8")

assert "add_retention_policy" in content
assert "remove_retention_policy" in content
assert "public.kr_candles_1m" in content
assert "public.kr_candles_1h" in content
assert "90 days" in content
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

All tests are missing required category markers — coding guideline violation.

Every test function lacks @pytest.mark.unit (or integration/slow). All tests here are unit tests (no DB, no network). Per coding guidelines, all tests must be categorized with registered markers.

♻️ Example fix for the first few functions
+@pytest.mark.unit
 def test_build_symbol_union_combines_kis_and_manual_symbols() -> None:
 
+@pytest.mark.unit
 def test_validate_universe_rows_fails_when_table_empty() -> None:
 
+@pytest.mark.unit
+@pytest.mark.asyncio
 async def test_run_kr_candles_sync_success_payload(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_candles_sync.py` around lines 28 - 276, All tests in this file
lack the required pytest marker; add classification by either decorating each
test function (e.g. `@pytest.mark.unit` above
test_build_symbol_union_combines_kis_and_manual_symbols,
test_validate_universe_rows_fails_when_table_empty,
test_validate_universe_rows_fails_when_symbol_missing,
test_validate_universe_rows_fails_when_symbol_inactive,
test_build_venue_plan_maps_dual_and_single_venues,
test_should_process_venue_skips_holiday_in_incremental_mode,
test_should_process_venue_skips_outside_session_in_incremental_mode,
test_compute_incremental_cutoff_uses_five_minute_overlap,
test_convert_kis_datetime_to_utc_interprets_naive_as_kst,
test_run_kr_candles_sync_success_payload,
test_run_kr_candles_sync_failure_payload, test_task_payload_success,
test_task_payload_failure, test_script_main_exit_codes,
test_new_retention_migration_contains_upgrade_and_downgrade_policy_sql,
test_sql_script_contains_90_day_retention_policy_for_both_tables) or add a
module-level marker like pytestmark = pytest.mark.unit at top of the file so
every test is categorized as a unit test per guidelines.

Comment on lines +243 to +252
capture_mock = MagicMock()
monkeypatch.setattr(sync_kr_candles, "capture_exception", capture_mock)
monkeypatch.setattr(
sync_kr_candles,
"run_kr_candles_sync",
AsyncMock(side_effect=RuntimeError("hard crash")),
)
exception_code = await sync_kr_candles.main(["--mode", "incremental"])
assert exception_code == 1
capture_mock.assert_called_once()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

test_script_main_exit_codes exception-path branch tests dead production code.

The test patches run_kr_candles_sync to raise RuntimeError to cover lines 57-60 of sync_kr_candles.py. Those lines are unreachable in production (as run_kr_candles_sync absorbs all exceptions). If the dead except is removed per the fix suggested above, this test sub-case will need to be updated or removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_candles_sync.py` around lines 243 - 252, Remove the dead
exception-path subcase in the test_script_main_exit_codes test: delete the block
that patches sync_kr_candles.run_kr_candles_sync to raise RuntimeError (and the
related capture_mock/capture_exception assertions), because run_kr_candles_sync
absorbs exceptions in production and that branch is unreachable; leave the other
exit-code assertions intact so the test no longer asserts behavior for the
removed except branch in sync_kr_candles.main.

Comment on lines +255 to +258
def test_new_retention_migration_contains_upgrade_and_downgrade_policy_sql() -> None:
versions_dir = Path("alembic/versions")
matches = sorted(versions_dir.glob("*_add_kr_candles_retention_policy.py"))
assert matches, "retention migration file is missing"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Relative Path("alembic/versions") ties the test to the repo-root working directory.

This is fine when run via make test (repo root), but fragile if run from any other directory. Prefer anchoring to __file__:

♻️ Proposed fix
-    versions_dir = Path("alembic/versions")
+    versions_dir = Path(__file__).parent.parent / "alembic" / "versions"

Same fix applies to line 270 (Path("scripts/sql/kr_candles_timescale.sql")):

-    content = Path("scripts/sql/kr_candles_timescale.sql").read_text(encoding="utf-8")
+    content = (Path(__file__).parent.parent / "scripts" / "sql" / "kr_candles_timescale.sql").read_text(encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_candles_sync.py` around lines 255 - 258, The test uses relative
Paths tied to the current working directory; update them to be
repo-root-relative by resolving from the test file location: in
test_new_retention_migration_contains_upgrade_and_downgrade_policy_sql replace
Path("alembic/versions") with Path(__file__).resolve().parent.parent / "alembic"
/ "versions" (so the glob still works), and likewise replace
Path("scripts/sql/kr_candles_timescale.sql") with
Path(__file__).resolve().parent.parent / "scripts" / "sql" /
"kr_candles_timescale.sql"; keep the same variable names and match/glob logic in
the existing function to avoid changing behavior.

@claude
Copy link

claude bot commented Feb 23, 2026

PR Review: Harden KR Candles Timescale Migration and SQL

Overall this is a well-structured PR with solid TimescaleDB integration, good idempotent SQL guards, and respectable test coverage. The microsecond tie-break keys for deterministic FIRST/LAST ordering are a clever approach. Below are findings from high to low severity.


Critical

1. Session not used as async context manager app/services/kr_candles_sync_service.py:948

# Current – fragile double-cast, not a proper context manager
session = cast(AsyncSession, cast(object, AsyncSessionLocal()))
try:
    ...
finally:
    await session.close()

This bypasses the SQLAlchemy session lifecycle. If the session factory returns an AsyncSession (as AsyncSessionLocal does), use the async context manager:

async with AsyncSessionLocal() as session:
    ...
    # no explicit close needed

The double cast pattern is a type-checker workaround that obscures intent and makes the code fragile.


2. min(allowed_days) called on every loop iteration app/services/kr_candles_sync_service.py:867

while True:
    ...
    if allowed_days is not None:
        if current_day < min(allowed_days):   # O(n) on every iteration
            break

allowed_days is a set[date]. min() on a set is O(n) and is recalculated thousands of times in a tight loop. Precompute before the loop:

earliest_allowed = min(allowed_days) if allowed_days is not None else None
while True:
    if earliest_allowed is not None and current_day < earliest_allowed:
        break

Bugs

3. _upsert_rows always returns payload length, not actual changed rows app/services/kr_candles_sync_service.py:722-741

_ = await session.execute(_UPSERT_SQL, payload)
return len(payload)    # counts no-ops too

The ON CONFLICT ... DO UPDATE SET ... WHERE ... IS DISTINCT FROM clause means rows that haven't changed are not touched. The returned count rows_upserted in the summary will overstate actual writes. If observability matters, use result.rowcount or switch to RETURNING.


4. Layered exception handling creates dead code in scripts/sync_kr_candles.py:1332

run_kr_candles_sync() (app/jobs/kr_candles.py) catches all exceptions internally and returns {"status": "failed"} – it never raises. This means the outer except Exception in main() is unreachable for normal sync failures:

try:
    result = await run_kr_candles_sync(...)  # never raises
except Exception as exc:
    capture_exception(exc, ...)              # dead code for sync errors
    return 1

Either have the job layer re-raise (so the script layer is the single error boundary), or remove the redundant try/except in main(). As-is, Sentry capture_exception is never called for sync failures.


Code Quality

5. Missing module docstring in retention migration alembic/versions/d31f0a2b4c6d_add_kr_candles_retention_policy.py

The retention migration file is missing the module-level docstring and the Create Date comment that the companion migration (87541fdbc954) has. Alembic generates these automatically; they aid in alembic history output.


6. Undocumented venue priority logic in the continuous aggregate alembic/versions/87541fdbc954_add_kr_candles_timescale.py:134-147

The FIRST/LAST tie-break keys use opposite priority for open vs. close:

  • open: KRX wins (CASE WHEN venue = 'KRX' THEN 0 ELSE 1 END)
  • close: NTX wins (CASE WHEN venue = 'KRX' THEN 1 ELSE 0 END)

This appears intentional (NTX closes later so it holds the canonical last trade price for an hour), but there is no comment explaining the reasoning. This logic will confuse future maintainers. A short inline comment is warranted.


7. Variable naming: value_value app/services/kr_candles_sync_service.py:676

value_value = _parse_float(item.get("value"))

Consider trade_value or amount to match the domain concept (거래대금) and avoid the repeated word.


8. Redundant max(sessions, 1) clamping

Applied in both scripts/sync_kr_candles.py:1329 (max(args.sessions, 1)) and inside sync_kr_candles at kr_candles_sync_service.py:942 (session_count = max(int(sessions), 1)). Pick one boundary – either validate at the CLI entry point or in the service, not both.


9. Tests missing @pytest.mark.unit markers tests/test_kr_candles_sync.py

Per CLAUDE.md, unit tests should be decorated with @pytest.mark.unit to allow selective runs (make test-unit). None of the 13 test functions in this file carry a marker, so they won't be picked up by pytest -m unit.


10. _normalize_symbol silently drops symbols longer than 6 characters app/services/kr_candles_sync_service.py:463-471

if len(text_value) == 6 and text_value.isalnum():
    return text_value
return None

Any symbol that is 7+ characters returns None without a log warning. If an ETF or future code arrives in holdings data with a non-standard length, it will be silently skipped. At minimum, add a logger.debug so dropped symbols are visible.


11. Dual-type handling in _build_symbol_union may indicate dead code app/services/kr_candles_sync_service.py:489-494

if isinstance(item, dict):
    raw_symbol = cast(object | None, item.get("pdno"))
else:
    raw_symbol = getattr(item, "pdno", None)

If kis.fetch_my_stocks() always returns a list of dicts (per the KIS API response), the else branch is dead code. If it can return model objects, the type annotation on kis_holdings: Sequence[object] is too loose. Narrowing this would remove the ambiguity.


Minor Observations (no action required)

  • The SQL script (scripts/sql/kr_candles_timescale.sql) includes the retention policy, but the retention policy is also in a separate Alembic migration. This is fine for "run once" manual use, but the README/docstring should note that the script and the migrations are not interchangeable.
  • The NTX session window (08:00–20:00 KST) is notably wider than KRX (09:00–15:30). If NTX hours ever change, _VENUE_CONFIG is the single source of truth – good.
  • _MAX_PAGE_CALLS_PER_DAY = 30 is a hard cap but there is no rate-limiting delay between KIS API pages. Under a large portfolio this may trigger KIS 429s; consider adding a small asyncio.sleep between page calls.

Summary: Two items need attention before merge (session lifecycle, min(allowed_days) in the hot loop). The dead-code in the error handling chain should also be resolved to ensure Sentry captures sync failures. The rest are polish items.

@claude
Copy link

claude bot commented Feb 23, 2026

PR Review: Harden KR candles Timescale migration and SQL

Overall this is a well-structured addition. The architecture is clean — the sync service is decomposed into small, testable pure functions, the migration version-guards are solid, and the job/task separation follows the existing project patterns. Below are findings by severity.


🔴 High — Session creation anti-pattern

File: app/services/kr_candles_sync_service.py:948

session = cast(AsyncSession, cast(object, AsyncSessionLocal()))

The double-cast is a workaround to suppress type errors and is fragile. If AsyncSessionLocal() raises before the try: block the session object is never cleaned up (though this is unlikely). More importantly, the project's own app/core/db.py provides an async context-manager pattern — use it:

async with AsyncSessionLocal() as session:
    # all sync logic here
    ...

This makes rollback/close automatic and matches every other service in the codebase.


🟡 Medium — _upsert_rows reports attempted rows, not actual rows changed

File: app/services/kr_candles_sync_service.py:722–741

_ = await session.execute(_UPSERT_SQL, payload)
return len(payload)          # ← always the input count

The DO UPDATE ... WHERE (columns IS DISTINCT FROM ...) guard means no-op updates are silently skipped, which is the right efficiency choice. But discarding the CursorResult means rows_upserted in the summary dict always reflects rows attempted, not rows written. The logged metrics and task return payload will overcount when most rows are unchanged (i.e., normal incremental runs). Consider renaming the field to rows_attempted, or retrieve result.rowcount if the driver exposes it.


🟡 Medium — Single failing symbol-venue aborts the entire sync

File: app/services/kr_candles_sync_service.py:1025–1027

except Exception:
    await session.rollback()
    raise

Re-raising inside the outer for symbol, venues loop means one bad symbol stops all subsequent pairs. For a nightly/periodic job this is usually undesirable — a temporary KIS API error on one symbol shouldn't block the remaining 20. Consider collecting per-pair errors and continuing, then surfacing them in the summary dict.


🟡 Medium — scripts/sync_kr_candles.py always exits 1 on any non-"completed" result, including "skipped" runs

File: scripts/sync_kr_candles.py:1607–1609

if result.get("status") != "completed":
    logger.error("KR candles sync failed: %s", result)
    return 1

run_kr_candles_sync only returns "status": "completed" or "status": "failed" — it has no "skipped" status variant. However, sync_kr_candles (the inner function) returns a dict without a "status" key that includes "skipped": True. Since the job wrapper always adds "status": "completed" when no exception occurs, the current code works — but it's fragile. If sync_kr_candles is ever called directly from a script the check will always fail. Recommend having sync_kr_candles return an explicit status field.


🟡 Medium — Docker volume incompatibility not documented

File: docker-compose.yml

-    image: postgres:17
+    image: timescale/timescaledb-ha:pg17

timescaledb-ha uses a different binary and initializes a different system catalog than vanilla postgres:17. An existing dev environment with a named volume from postgres:17 will fail to start (catalog version mismatch). This should be called out in the PR description or a DEPLOYMENT.md note, with the fix: docker compose down -v && docker compose up -d.


🟡 Medium — Missing @pytest.mark.unit markers

File: tests/test_kr_candles_sync.py

CLAUDE.md and the Makefile define make test-unit as running @pytest.mark.unit tests. None of the new synchronous tests carry this marker, so make test-unit will skip them entirely.


🟡 Medium — scripts/ package import in tests may be fragile

File: tests/test_kr_candles_sync.py:1846

from scripts import sync_kr_candles

This works only if scripts/ is a Python package (has __init__.py) or the interpreter finds it via namespace packages. The existing scripts/sync_kr_symbol_universe.py etc. are not imported this way in other tests — verify that pytest has scripts/ in sys.path (e.g., via pyproject.toml pythonpath setting), otherwise this test will fail in CI.


🟢 Low — min(allowed_days) called on every loop iteration

File: app/services/kr_candles_sync_service.py:867

if current_day < min(allowed_days):

allowed_days is a set[date]; min() is O(n) on each iteration. For sessions=10 this is negligible, but pre-compute min_allowed_day = min(allowed_days) before the while True: loop for clarity and correctness.


🟢 Low — value_value variable name

File: app/services/kr_candles_sync_service.py:676

value_value = _parse_float(item.get("value"))

This reads poorly. trade_value or just value (since it's in its own scope block) would be cleaner.


🟢 Low — Redundant float() casts in _normalize_intraday_rows

File: app/services/kr_candles_sync_service.py:692–699

_parse_float already returns float | None; by the time the None-check passes, the values are already float. The float(open_value) etc. wraps are harmless but unnecessary.


✅ Positive observations

  • Version guard (>= 2.8.1) with proper tuple comparison — good defensive practice.
  • Microsecond tie-break keys for FIRST/LAST in the continuous aggregate are well-reasoned: same-minute KRX vs NTX rows deterministically prefer KRX for both open and close.
  • ON CONFLICT ... DO UPDATE ... WHERE IS DISTINCT FROM is the right pattern for idempotent upserts.
  • _OVERLAP_MINUTES = 5 overlap on cursor lookback is a solid safeguard against partial-minute candles at the last sync boundary.
  • _normalize_symbol zero-padding for short codes is a nice robustness touch.
  • lru_cache on _get_xkrx_calendar() avoids repeated calendar instantiation across paginated calls.
  • Test coverage is comprehensive — pure functions, job wrapper, task, and script exit-codes are all covered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md (1)

40-42: Add a precondition-check query and remove the fixed LIMIT 20 cap.

Two small but concrete gaps in the validation coverage:

  1. The failure criteria flag an empty/inactive kr_symbol_universe as a hard blocker, but no SQL query is provided to check it before starting ingestion — making that precondition opaque.
  2. LIMIT 20 in Query 1 silently hides any symbols beyond the 20th row. If the universe is larger, incomplete coverage passes unnoticed.
📄 Suggested additions

Add a precondition query under Validation Queries (or in a new Precondition Queries subsection):

-- Precondition: confirm active symbols exist
SELECT symbol, venue, is_active
FROM public.kr_symbol_universe
WHERE is_active = TRUE
ORDER BY symbol;

Drop the arbitrary LIMIT 20 from Query 1, or replace it with a count cross-check:

-ORDER BY cnt DESC
-LIMIT 20;
+ORDER BY cnt DESC;

Also applies to: 49-55

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md` around
lines 40 - 42, Add an explicit precondition-check SQL query to verify active
symbols exist in the kr_symbol_universe before ingestion (e.g., select symbol,
venue, is_active where is_active = TRUE ordered by symbol) and place it under
"Validation Queries" or a new "Precondition Queries" subsection; also remove the
fixed "LIMIT 20" in Query 1 (or replace it with a count-based cross-check that
validates the returned rows equal the universe size) so the validation covers
the full kr_symbol_universe rather than silently truncating results.
docker-compose.yml (2)

12-12: Heads-up: existing pg_data volume must be destroyed before first use.

The pg_data volume populated by the previous postgres:17 image is incompatible with timescale/timescaledb-ha. Attempting to start the container against an existing vanilla-PostgreSQL data directory will fail or produce a broken state. Developers (and CI environments) must run docker compose down -v before pulling the new image.

Consider adding a note to the project README or a docker-compose.override.yml comment to signal this one-time teardown requirement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` at line 12, The docker volume pg_data used in
docker-compose.yml is incompatible with the new image timescale/timescaledb-ha
and must be removed before first use; update documentation and/or the compose
config to warn users to run docker compose down -v (or docker-compose down -v)
before starting the new service, and add a clear comment near the pg_data entry
in docker-compose.yml (reference symbol: pg_data) and a short note in the README
describing the one-time teardown step and rationale (reference image names:
postgres:17 → timescale/timescaledb-ha).

4-4: Pin the image to a specific TimescaleDB version tag.

timescale/timescaledb-ha:pg17 is a floating tag that resolves to the latest TimescaleDB build at pull time. The codebase enforces a minimum TimescaleDB version of 2.8.1 (checked in migrations), and using an unpinned tag means different developers or CI runs may silently pull different versions, defeating that guarantee.

Use a pinned tag like pg17-ts2.24 (or a more recent major.minor series) to ensure reproducible environments:

🔧 Proposed fix
-    image: timescale/timescaledb-ha:pg17
+    image: timescale/timescaledb-ha:pg17-ts2.24
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` at line 4, Replace the floating Docker image tag
"timescale/timescaledb-ha:pg17" in docker-compose.yml with a pinned TimescaleDB
release that satisfies the project's minimum (e.g., use
"timescale/timescaledb-ha:pg17-ts2.24" or a newer pg17-ts<major.minor> tag);
update the image line so CI and developers always pull the same TimescaleDB
minor version that meets the migrations' minimum (>=2.8.1).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md`:
- Around line 48-62: Validation queries scan entire tables and can be fooled by
pre-existing rows; add explicit time-bounded predicates anchored to the
recent-3-trading-sessions window so they only validate newly targeted ranges.
Update the first query on public.kr_candles_1m to include a WHERE time >=
:start_time (or computed start_time for the 3-session window) and the second
query on public.kr_candles_1h to include WHERE bucket >= :start_bucket (or
equivalent timestamp) and, if helpful, AND bucket <= :end_bucket; use these
parameters when running the validation so results reflect only the current
backfill window.
- Around line 28-30: Add an explicit TimescaleDB continuous-aggregate refresh
between the ingestion step and validation to avoid stale/empty results: after
running the backfill (scripts/sync_kr_candles.py --mode backfill) call a refresh
for the public.kr_candles_1h continuous aggregate (using
refresh_continuous_aggregate or equivalent) so the 1h aggregate is up-to-date
before running the validation query against kr_candles_1h.

---

Nitpick comments:
In `@docker-compose.yml`:
- Line 12: The docker volume pg_data used in docker-compose.yml is incompatible
with the new image timescale/timescaledb-ha and must be removed before first
use; update documentation and/or the compose config to warn users to run docker
compose down -v (or docker-compose down -v) before starting the new service, and
add a clear comment near the pg_data entry in docker-compose.yml (reference
symbol: pg_data) and a short note in the README describing the one-time teardown
step and rationale (reference image names: postgres:17 →
timescale/timescaledb-ha).
- Line 4: Replace the floating Docker image tag "timescale/timescaledb-ha:pg17"
in docker-compose.yml with a pinned TimescaleDB release that satisfies the
project's minimum (e.g., use "timescale/timescaledb-ha:pg17-ts2.24" or a newer
pg17-ts<major.minor> tag); update the image line so CI and developers always
pull the same TimescaleDB minor version that meets the migrations' minimum
(>=2.8.1).

In `@docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md`:
- Around line 40-42: Add an explicit precondition-check SQL query to verify
active symbols exist in the kr_symbol_universe before ingestion (e.g., select
symbol, venue, is_active where is_active = TRUE ordered by symbol) and place it
under "Validation Queries" or a new "Precondition Queries" subsection; also
remove the fixed "LIMIT 20" in Query 1 (or replace it with a count-based
cross-check that validates the returned rows equal the universe size) so the
validation covers the full kr_symbol_universe rather than silently truncating
results.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c38947 and 203c542.

📒 Files selected for processing (3)
  • docker-compose.yml
  • docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md
  • docs/plans/2026-02-23-kr-candles-local-backfill-validation-implementation-plan.md
✅ Files skipped from review due to trivial changes (1)
  • docs/plans/2026-02-23-kr-candles-local-backfill-validation-implementation-plan.md

Comment on lines +28 to +30
2. Ingestion:
- Run:
- `uv run python scripts/sync_kr_candles.py --mode backfill --sessions 3`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a manual continuous-aggregate refresh step before querying kr_candles_1h.

TimescaleDB continuous aggregates are NOT refreshed in real-time — they are driven by a background policy. On a local dev instance that policy may not have fired, meaning kr_candles_1h will be empty (or stale) even after a successful backfill of kr_candles_1m. The current Ingestion step goes straight from script execution to validation without triggering a refresh, which will produce a false failure every time the policy hasn't run yet.

Add an explicit refresh call between ingestion and validation:

📄 Proposed addition to the Ingestion step
 2. Ingestion:
    - Run:
      - `uv run python scripts/sync_kr_candles.py --mode backfill --sessions 3`
+   - Then manually refresh the continuous aggregate so `kr_candles_1h` is up-to-date:
+     ```sql
+     CALL refresh_continuous_aggregate('public.kr_candles_1h', NULL, NULL);
+     ```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. Ingestion:
- Run:
- `uv run python scripts/sync_kr_candles.py --mode backfill --sessions 3`
2. Ingestion:
- Run:
- `uv run python scripts/sync_kr_candles.py --mode backfill --sessions 3`
- Then manually refresh the continuous aggregate so `kr_candles_1h` is up-to-date:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md` around
lines 28 - 30, Add an explicit TimescaleDB continuous-aggregate refresh between
the ingestion step and validation to avoid stale/empty results: after running
the backfill (scripts/sync_kr_candles.py --mode backfill) call a refresh for the
public.kr_candles_1h continuous aggregate (using refresh_continuous_aggregate or
equivalent) so the 1h aggregate is up-to-date before running the validation
query against kr_candles_1h.

Comment on lines +48 to +62
## Validation Queries
```sql
SELECT symbol, venue, COUNT(*) AS cnt, MIN(time) AS min_time, MAX(time) AS max_time
FROM public.kr_candles_1m
GROUP BY symbol, venue
ORDER BY cnt DESC
LIMIT 20;
```

```sql
SELECT symbol, bucket, open, high, low, close, volume, value, venues
FROM public.kr_candles_1h
ORDER BY bucket DESC
LIMIT 50;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validation queries lack a time-bound filter — false positives from pre-existing data.

Both Query 1 and Query 2 scan the full table with no WHERE time/bucket >= ... predicate anchored to the "recent 3 trading sessions" window stated in the Goal. If any rows from a previous run already exist, the queries will report success even if the current backfill wrote nothing.

📄 Proposed time-bounded variants
-SELECT symbol, venue, COUNT(*) AS cnt, MIN(time) AS min_time, MAX(time) AS max_time
-FROM public.kr_candles_1m
-GROUP BY symbol, venue
-ORDER BY cnt DESC
-LIMIT 20;
+SELECT symbol, venue, COUNT(*) AS cnt, MIN(time) AS min_time, MAX(time) AS max_time
+FROM public.kr_candles_1m
+WHERE time >= now() - INTERVAL '3 days'   -- adjust to session-start boundary as needed
+GROUP BY symbol, venue
+ORDER BY cnt DESC;
-SELECT symbol, bucket, open, high, low, close, volume, value, venues
-FROM public.kr_candles_1h
-ORDER BY bucket DESC
-LIMIT 50;
+SELECT symbol, bucket, open, high, low, close, volume, value, venues
+FROM public.kr_candles_1h
+WHERE bucket >= now() - INTERVAL '3 days'
+ORDER BY bucket DESC;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-02-23-kr-candles-local-backfill-validation-design.md` around
lines 48 - 62, Validation queries scan entire tables and can be fooled by
pre-existing rows; add explicit time-bounded predicates anchored to the
recent-3-trading-sessions window so they only validate newly targeted ranges.
Update the first query on public.kr_candles_1m to include a WHERE time >=
:start_time (or computed start_time for the 3-session window) and the second
query on public.kr_candles_1h to include WHERE bucket >= :start_bucket (or
equivalent timestamp) and, if helpful, AND bucket <= :end_bucket; use these
parameters when running the validation so results reflect only the current
backfill window.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
5.9% Duplication on New Code (required ≤ 3%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@claude
Copy link

claude bot commented Feb 23, 2026

PR #186 Code Review: KR Candles Timescale Migration and SQL

Overall this is a well-structured PR that introduces TimescaleDB-backed minute candle storage and a DB-first hourly aggregation layer. The test coverage is comprehensive and the migration design is careful. Below are the key findings, from most to least critical.


Bugs / Correctness Issues

1. OHLC accuracy in current-hour reconstruction (kr_hourly_candles_read_service.py)

When merging KRX and NTX data for the current hour, high/low are taken only from the source venue (KRX preferred), not the true max/min across both venues:

# combined is built as one row per minute, each with source.high/source.low
high_ = max(m.high for m in combined)
low_ = min(m.low for m in combined)

If NTX has a higher high or lower low than KRX at the same minute, those extremes are silently discarded. The true hourly high/low should take the maximum/minimum across all venues per minute before aggregating:

high_ = max(max(r.high for r in group.values()) for group in minutes_by_time.values())
low_ = min(min(r.low for r in group.values()) for group in minutes_by_time.values())

2. CLOSE tie-break direction in continuous aggregate (migration + SQL script)

LAST(
    close,
    ((extract(epoch from time) * 1000000)::bigint * 2
    + CASE WHEN venue = 'KRX' THEN 1 ELSE 0 END)
) AS close

LAST picks the row with the maximum sort key. KRX gets +1 so KRX wins over NTX at the same timestamp. Since NTX trades after KRX (until 20:00), for the 15:30–20:00 window only NTX rows exist, so this works by default. But for the 09:00–15:30 overlap, KRX always wins close. This appears intentional — confirm it matches the intended price convention.


Code Quality / Design Issues

3. Session not used as a context manager (kr_candles_sync_service.py, line ~484)

session = cast(AsyncSession, cast(object, AsyncSessionLocal()))
try:
    ...
finally:
    await session.close()

The double-cast is a type-checking workaround that bypasses __aenter__/__aexit__ on the session factory. The correct pattern is:

async with AsyncSessionLocal() as session:
    ...

If using the context manager form is impractical here due to the commit-per-pair loop, explicitly document why. The current pattern works but is fragile and misleads readers — in kr_hourly_candles_read_service.py the same factory is correctly used as a context manager, creating an inconsistency.

4. Bare KISClient() instantiation inside sync_kr_candles

KISClient() is instantiated once and reused across all symbol+venue pairs. If the KIS token expires mid-sync (long-running backfill), all subsequent API calls will fail. Consider whether the token refresh lifecycle of KISClient handles this, and add a note if it does.

5. Strict row count guard may fail during initial deployment (read_kr_hourly_candles_1h)

if len(out) < capped_count:
    raise ValueError(
        f"DB does not have enough KR 1h candles for {universe.symbol}: ..."
    )

This raises on any call before the backfill is complete, including any tool call that requests more rows than exist in the DB. Consider returning available rows (up to count) and documenting the partial-result behavior instead, or adding a dedicated readiness check so callers can distinguish "data not yet available" from "symbol not found".


Infrastructure / Breaking Change

6. docker-compose.yml image switch is breaking for existing deployments

-    image: postgres:17
+    image: timescale/timescaledb-ha:pg17

This changes the database engine for all developers. The -ha variant is larger and has different defaults. Existing docker compose up runs will create a new volume-incompatible container if the data directory format differs. The PR should document:

  • How to migrate existing local dev databases (dump/restore procedure)
  • Whether production deployments can do an in-place upgrade or require a new instance
  • That uv run alembic upgrade head alone is insufficient — the TimescaleDB extension must be installed in the target DB first (CREATE EXTENSION IF NOT EXISTS timescaledb)

Minor Issues

7. Planning docs contain internal AI agent instructions (docs/plans/)

The files added under docs/plans/ contain:

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

These are internal AI workflow instructions and should be stripped before merging, or the docs/plans/ directory should be added to .gitignore.

8. Missing module docstring in retention migration

alembic/versions/d31f0a2b4c6d_add_kr_candles_retention_policy.py is missing the standard Alembic header (module docstring, Create Date). Minor but inconsistent with the other migration in this PR.

9. 90-day retention on kr_candles_1h

Applying a separate 90-day retention policy to the continuous aggregate view will drop hourly bucket data even if the underlying 1m data is also retained for 90 days. Since the 1h view is rebuilt on demand via refresh_continuous_aggregate, this may be redundant overhead. Verify this is intentional (e.g., to control storage for the materialized portion of the view).


What's Done Well

  • The TimescaleDB version guard (>= 2.8.1) with structured comparison is robust.
  • The microsecond integer tie-break for FIRST/LAST is a clean solution for deterministic ordering.
  • Test coverage is thorough: API time windows, venue eligibility, partial API failure, DB-vs-API priority, venue merging, and script exit codes are all tested.
  • The incremental + backfill mode separation with overlap-window logic (_OVERLAP_MINUTES = 5) is a sensible design.
  • Removing the Redis OHLCV cache for KR 1h and replacing it with a DB-first read is architecturally cleaner.
  • The upsert SQL using IS DISTINCT FROM correctly avoids unnecessary write amplification.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
tests/test_kr_hourly_candles_read_service.py (1)

144-174: DummyDB is duplicated across all 9 tests with minor variations.

Each test defines its own DummyDB class with nearly identical structure — only the nxt_eligible, returned hour/minute rows, and sometimes the active flag differ. Consider extracting a reusable factory or fixture to reduce ~200 lines of boilerplate.

♻️ Sketch of a shared factory
def _make_dummy_db(
    *,
    symbol: str,
    nxt_eligible: bool = True,
    is_active: bool = True,
    hour_rows: list[dict] | None = None,
    minute_rows: list[dict] | None = None,
):
    class DummyDB:
        def __init__(self):
            self.calls: list[str] = []

        async def execute(self, query, params=None):
            sql = str(getattr(query, "text", query))
            self.calls.append(sql)
            if "FROM public.kr_symbol_universe" in sql and "LIMIT 1" in sql:
                return _ScalarResult(symbol)
            if "FROM public.kr_symbol_universe" in sql and "WHERE symbol" in sql:
                return _MappingsResult(
                    [{"symbol": symbol, "nxt_eligible": nxt_eligible, "is_active": is_active}]
                )
            if "FROM public.kr_candles_1h" in sql:
                return _MappingsResult(hour_rows or [])
            if "FROM public.kr_candles_1m" in sql:
                rows = list(minute_rows or [])
                if isinstance(params, dict) and params.get("start_time") is not None:
                    start, end = params["start_time"], params["end_time"]
                    rows = [r for r in rows if start <= r["time"] < end]
                return _MappingsResult(rows)
            raise AssertionError(f"unexpected sql: {sql}")
    return DummyDB()

Also applies to: 289-315, 372-391, 454-473, 522-541, 613-632, 678-697, 748-767, 793-812, 836-855

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_hourly_candles_read_service.py` around lines 144 - 174,
Multiple tests duplicate the DummyDB class with only small variations; replace
the repeated classes with a single reusable factory/fixture. Create a function
(e.g., _make_dummy_db) that accepts parameters symbol, nxt_eligible, is_active,
hour_rows, minute_rows and returns an instance whose execute method implements
the existing SQL branching logic (matching "FROM public.kr_symbol_universe",
"FROM public.kr_candles_1h", "FROM public.kr_candles_1m") and time-range
filtering; then update each test to call _make_dummy_db(...) instead of
redefining DummyDB. Ensure the returned object's .calls list and behavior
exactly mirror the original DummyDB so existing assertions keep working.
app/services/kr_hourly_candles_read_service.py (3)

29-30: Unnecessary double cast.

The cast(AsyncSession, cast(object, AsyncSessionLocal())) is overly convoluted. A single cast suffices.

♻️ Simplify
 def _async_session() -> AsyncSession:
-    return cast(AsyncSession, cast(object, AsyncSessionLocal()))
+    return cast(AsyncSession, AsyncSessionLocal())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_hourly_candles_read_service.py` around lines 29 - 30, The
_async_session function uses an unnecessary double cast; replace
cast(AsyncSession, cast(object, AsyncSessionLocal())) with a single cast of the
AsyncSessionLocal() return to AsyncSession (e.g., cast(AsyncSession,
AsyncSessionLocal())) or, even better, annotate AsyncSessionLocal to return
AsyncSession and remove the cast entirely; update the _async_session function
(symbol: _async_session and AsyncSessionLocal) accordingly.

181-215: Multiple independent DB sessions for a single logical read.

read_kr_hourly_candles_1h opens up to 4 separate DB sessions (universe check in _resolve_universe_row, hour rows in _fetch_hour_rows, minute rows in _fetch_minute_rows via _build_current_hour_row, plus the universe queries themselves open one session). Consider accepting an optional session parameter or using a shared session across the read path to reduce connection overhead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_hourly_candles_read_service.py` around lines 181 - 215, The
code currently opens multiple independent DB sessions across the read path;
modify the functions to accept an optional AsyncSession and reuse a single
session instead of creating new ones: add an optional parameter session:
Optional[AsyncSession] = None to read_kr_hourly_candles_1h,
_resolve_universe_row, _fetch_hour_rows, _fetch_minute_rows and
_build_current_hour_row (or whichever helpers call DB), thread the provided
session through each call, and only create/close a new session in
read_kr_hourly_candles_1h when session is None (i.e., use "async with
_async_session() as session" only in the top-level function when needed); update
calls to pass the session to avoid multiple connections and ensure you only
commit/close the session if you created it.

444-463: KRX-priority merge discards NTX high/low extremes.

When both KRX and NTX report for the same minute, source = group.get("KRX") or group.get("NTX") takes OHLC entirely from KRX. The per-minute high/low could be more extreme on NTX, and those values are silently dropped before the hourly max(high) / min(low) aggregation at lines 469-470.

If this is intentional (KRX is canonical price), consider adding a brief inline comment to document the design choice. If the hourly candle should reflect the true high/low across both venues, the merge should take max(high) and min(low) across all venue rows per minute.

♻️ Option: true cross-venue high/low per minute
         source = group.get("KRX") or group.get("NTX")
         if source is None:
             continue
         volume = sum(r.volume for r in group.values())
         value = sum(r.value for r in group.values())
+        high = max(r.high for r in group.values())
+        low = min(r.low for r in group.values())
         combined.append(
             _MinuteRow(
                 minute_time=minute_time,
                 venue=source.venue,
                 open=source.open,
-                high=source.high,
-                low=source.low,
+                high=high,
+                low=low,
                 close=source.close,
                 volume=volume,
                 value=value,
             )
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/kr_hourly_candles_read_service.py` around lines 444 - 463, The
current per-minute merge (iterating minutes_by_time and building combined with
_MinuteRow) always selects OHLC from a single source via source =
group.get("KRX") or group.get("NTX"), which discards NTX extrema; change the
merge so minute high = max(r.high for r in group.values()) and minute low =
min(r.low for r in group.values()) (keep open/close/venue selection as-is or
document choice) and continue summing volume/value across group.values();
alternatively, if KRX dominance is intentional, add a concise inline comment
next to the source selection explaining that KRX is canonical for OHLC to avoid
silent data loss.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/plans/2026-02-23-kr-ohlcv-1h-v2.md`:
- Line 13: The headings jump from h1 to h3; change the task headings to h2 to
satisfy markdownlint MD001 — update the "### Task 1: Fix typing/LSP issues in
MCP integration tests" heading to "## Task 1: Fix typing/LSP issues in MCP
integration tests" and likewise change the "###" headings for Tasks 2–4 (the
headings referenced in the details block) to "##" so each task heading
increments properly from the document h1.

In `@tests/test_kr_hourly_candles_read_service.py`:
- Around line 95-96: Add the missing pytest unit markers by decorating each test
function with `@pytest.mark.unit` (ensure pytest is imported) — specifically add
`@pytest.mark.unit` above test_api_prefetch_plan_time_boundaries_for_nxt_eligible
and the other test functions referenced (test names at lines 250-251, 353-354,
410-411, 500-501, 581-582, 656-657, 715-716, 787-788, 829-830) so all nine tests
are explicitly marked as unit tests; keep the existing `@pytest.mark.asyncio`
decorator and place `@pytest.mark.unit` directly above or alongside it.

In `@tests/test_mcp_server_tools.py`:
- Around line 1430-1461: This test function
test_get_ohlcv_kr_equity_period_1h_includes_session_and_venues is fully mocked
and needs the pytest unit marker; add `@pytest.mark.unit` above the async def and
ensure pytest is imported at the top of the file if not already present so the
marker resolves.

---

Nitpick comments:
In `@app/services/kr_hourly_candles_read_service.py`:
- Around line 29-30: The _async_session function uses an unnecessary double
cast; replace cast(AsyncSession, cast(object, AsyncSessionLocal())) with a
single cast of the AsyncSessionLocal() return to AsyncSession (e.g.,
cast(AsyncSession, AsyncSessionLocal())) or, even better, annotate
AsyncSessionLocal to return AsyncSession and remove the cast entirely; update
the _async_session function (symbol: _async_session and AsyncSessionLocal)
accordingly.
- Around line 181-215: The code currently opens multiple independent DB sessions
across the read path; modify the functions to accept an optional AsyncSession
and reuse a single session instead of creating new ones: add an optional
parameter session: Optional[AsyncSession] = None to read_kr_hourly_candles_1h,
_resolve_universe_row, _fetch_hour_rows, _fetch_minute_rows and
_build_current_hour_row (or whichever helpers call DB), thread the provided
session through each call, and only create/close a new session in
read_kr_hourly_candles_1h when session is None (i.e., use "async with
_async_session() as session" only in the top-level function when needed); update
calls to pass the session to avoid multiple connections and ensure you only
commit/close the session if you created it.
- Around line 444-463: The current per-minute merge (iterating minutes_by_time
and building combined with _MinuteRow) always selects OHLC from a single source
via source = group.get("KRX") or group.get("NTX"), which discards NTX extrema;
change the merge so minute high = max(r.high for r in group.values()) and minute
low = min(r.low for r in group.values()) (keep open/close/venue selection as-is
or document choice) and continue summing volume/value across group.values();
alternatively, if KRX dominance is intentional, add a concise inline comment
next to the source selection explaining that KRX is canonical for OHLC to avoid
silent data loss.

In `@tests/test_kr_hourly_candles_read_service.py`:
- Around line 144-174: Multiple tests duplicate the DummyDB class with only
small variations; replace the repeated classes with a single reusable
factory/fixture. Create a function (e.g., _make_dummy_db) that accepts
parameters symbol, nxt_eligible, is_active, hour_rows, minute_rows and returns
an instance whose execute method implements the existing SQL branching logic
(matching "FROM public.kr_symbol_universe", "FROM public.kr_candles_1h", "FROM
public.kr_candles_1m") and time-range filtering; then update each test to call
_make_dummy_db(...) instead of redefining DummyDB. Ensure the returned object's
.calls list and behavior exactly mirror the original DummyDB so existing
assertions keep working.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 203c542 and afde231.

📒 Files selected for processing (6)
  • app/mcp_server/README.md
  • app/mcp_server/tooling/market_data_quotes.py
  • app/services/kr_hourly_candles_read_service.py
  • docs/plans/2026-02-23-kr-ohlcv-1h-v2.md
  • tests/test_kr_hourly_candles_read_service.py
  • tests/test_mcp_server_tools.py


---

### Task 1: Fix typing/LSP issues in MCP integration tests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Heading level skips from h1 to h3.

Static analysis (markdownlint MD001) flags that heading levels should increment by one. Line 1 is # (h1), but line 13 jumps to ### (h3), skipping h2.

📝 Fix heading levels
-### Task 1: Fix typing/LSP issues in MCP integration tests
+## Task 1: Fix typing/LSP issues in MCP integration tests

And similarly for Tasks 2–4 on lines 34, 57, 77.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Task 1: Fix typing/LSP issues in MCP integration tests
## Task 1: Fix typing/LSP issues in MCP integration tests
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-02-23-kr-ohlcv-1h-v2.md` at line 13, The headings jump from
h1 to h3; change the task headings to h2 to satisfy markdownlint MD001 — update
the "### Task 1: Fix typing/LSP issues in MCP integration tests" heading to "##
Task 1: Fix typing/LSP issues in MCP integration tests" and likewise change the
"###" headings for Tasks 2–4 (the headings referenced in the details block) to
"##" so each task heading increments properly from the document h1.

Comment on lines +95 to +96
@pytest.mark.asyncio
async def test_api_prefetch_plan_time_boundaries_for_nxt_eligible(monkeypatch):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing @pytest.mark.unit markers on all tests.

Per the coding guidelines, tests should be categorized using registered pytest markers (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow). These are all unit tests (mocked DB and API) but lack the @pytest.mark.unit marker.

📝 Example fix for each test
+@pytest.mark.unit
 `@pytest.mark.asyncio`
 async def test_api_prefetch_plan_time_boundaries_for_nxt_eligible(monkeypatch):

Apply the same pattern to all 9 test functions.

As per coding guidelines: "Use test markers (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow) to categorize tests appropriately."

Also applies to: 250-251, 353-354, 410-411, 500-501, 581-582, 656-657, 715-716, 787-788, 829-830

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kr_hourly_candles_read_service.py` around lines 95 - 96, Add the
missing pytest unit markers by decorating each test function with
`@pytest.mark.unit` (ensure pytest is imported) — specifically add
`@pytest.mark.unit` above test_api_prefetch_plan_time_boundaries_for_nxt_eligible
and the other test functions referenced (test names at lines 250-251, 353-354,
410-411, 500-501, 581-582, 656-657, 715-716, 787-788, 829-830) so all nine tests
are explicitly marked as unit tests; keep the existing `@pytest.mark.asyncio`
decorator and place `@pytest.mark.unit` directly above or alongside it.

Comment on lines +1430 to +1461
async def test_get_ohlcv_kr_equity_period_1h_includes_session_and_venues(monkeypatch):
tools = build_tools()
df = _single_row_df()
monkeypatch.setattr(settings, "kis_ohlcv_cache_enabled", False, raising=False)
route_mock = AsyncMock(return_value="UN")
monkeypatch.setattr(market_data_quotes, "_resolve_kr_intraday_route", route_mock)

class DummyKISClient:
async def inquire_time_dailychartprice(
self, code, market, n, end_date=None, end_time=None
):
del code, market, n, end_date, end_time
return df
df = pd.DataFrame(
[
{
"datetime": pd.Timestamp("2026-02-23 09:00:00"),
"date": date(2026, 2, 23),
"time": datetime.time(9, 0, 0),
"open": 100.0,
"high": 110.0,
"low": 90.0,
"close": 105.0,
"volume": 1000,
"value": 105000.0,
"session": "REGULAR",
"venues": ["KRX", "NTX"],
}
]
)
read_mock = AsyncMock(return_value=df)
monkeypatch.setattr(market_data_quotes, "read_kr_hourly_candles_1h", read_mock)

_patch_runtime_attr(monkeypatch, "KISClient", DummyKISClient)
result = await tools["get_ohlcv"]("005930", market="kr", count=50, period="1h")

read_mock.assert_awaited_once_with(symbol="005930", count=50, end_date=None)
assert result["instrument_type"] == "equity_kr"
assert result["period"] == "1h"
assert result["source"] == "kis"
route_mock.assert_awaited_once_with("005930")


@pytest.mark.asyncio
async def test_get_ohlcv_kr_equity_period_1h_backfills_multiple_days(monkeypatch):
tools = build_tools()
target_day = date(2026, 2, 19)
monkeypatch.setattr(settings, "kis_ohlcv_cache_enabled", False, raising=False)
route_mock = AsyncMock(return_value="UN")
monkeypatch.setattr(market_data_quotes, "_resolve_kr_intraday_route", route_mock)
calls: list[tuple[date | None, str | None, str]] = []

class DummyKISClient:
async def inquire_time_dailychartprice(
self, code, market, n, end_date=None, end_time=None
):
del code
calls.append((end_date, end_time, market))
requested_day = end_date or target_day
assert n >= 1
return pd.DataFrame(
[
{
"datetime": pd.Timestamp(f"{requested_day} 09:00:00"),
"date": requested_day,
"time": pd.Timestamp("2026-01-01 09:00:00").time(),
"open": 100.0,
"high": 101.0,
"low": 99.0,
"close": 100.5,
"volume": 1000,
"value": 100500.0,
},
{
"datetime": pd.Timestamp(f"{requested_day} 10:00:00"),
"date": requested_day,
"time": pd.Timestamp("2026-01-01 10:00:00").time(),
"open": 100.5,
"high": 102.0,
"low": 100.0,
"close": 101.5,
"volume": 1100,
"value": 111650.0,
},
]
)

_patch_runtime_attr(monkeypatch, "KISClient", DummyKISClient)
result = await tools["get_ohlcv"](
"005930",
market="kr",
count=4,
period="1h",
end_date=target_day.isoformat(),
)

assert len(calls) >= 3
assert calls[0][2] == "UN"
assert calls[0][0] == target_day
assert calls[1][0] == target_day
assert calls[0][1] == "200000"
assert calls[1][1] is not None
assert calls[1][1] < calls[0][1]
assert any(
day == target_day - timedelta(days=1) and end_time == "200000"
for day, end_time, _ in calls
)
assert len(result["rows"]) == 4
assert result["period"] == "1h"
assert result["instrument_type"] == "equity_kr"
route_mock.assert_awaited_once_with("005930")


@pytest.mark.asyncio
async def test_get_ohlcv_kr_equity_period_1h_backfill_uses_j_close_for_history(
monkeypatch,
):
tools = build_tools()
target_day = date(2026, 2, 19)
monkeypatch.setattr(settings, "kis_ohlcv_cache_enabled", False, raising=False)
monkeypatch.setattr(
market_data_quotes,
"_resolve_kr_intraday_route",
AsyncMock(return_value="J"),
)
calls: list[tuple[date | None, str | None, str]] = []

class DummyKISClient:
async def inquire_time_dailychartprice(
self, code, market, n, end_date=None, end_time=None
):
del code
calls.append((end_date, end_time, market))
requested_day = end_date or target_day
assert n >= 1
return pd.DataFrame(
[
{
"datetime": pd.Timestamp(f"{requested_day} 09:00:00"),
"date": requested_day,
"time": pd.Timestamp("2026-01-01 09:00:00").time(),
"open": 100.0,
"high": 101.0,
"low": 99.0,
"close": 100.5,
"volume": 1000,
"value": 100500.0,
},
{
"datetime": pd.Timestamp(f"{requested_day} 10:00:00"),
"date": requested_day,
"time": pd.Timestamp("2026-01-01 10:00:00").time(),
"open": 100.5,
"high": 102.0,
"low": 100.0,
"close": 101.5,
"volume": 1100,
"value": 111650.0,
},
]
)

_patch_runtime_attr(monkeypatch, "KISClient", DummyKISClient)
result = await tools["get_ohlcv"](
"005930",
market="kr",
count=4,
period="1h",
end_date=target_day.isoformat(),
)

assert len(calls) >= 2
assert calls[0][2] == "J"
assert calls[0][1] == "153000"
assert any(
day == target_day - timedelta(days=1) and end_time == "153000"
for day, end_time, _ in calls
)
assert len(result["rows"]) == 4
assert result["period"] == "1h"
assert result["instrument_type"] == "equity_kr"


@pytest.mark.asyncio
async def test_get_ohlcv_kr_1h_paginates_within_same_day(monkeypatch):
tools = build_tools()
target_day = date(2026, 2, 19)
monkeypatch.setattr(settings, "kis_ohlcv_cache_enabled", False, raising=False)
monkeypatch.setattr(
market_data_quotes,
"_resolve_kr_intraday_route",
AsyncMock(return_value="UN"),
)
calls: list[tuple[date | None, str | None, str]] = []

class DummyKISClient:
async def inquire_time_dailychartprice(
self, code, market, n, end_date=None, end_time=None
):
del code, n
calls.append((end_date, end_time, market))
requested_day = end_date or target_day
if end_time is None or end_time >= "180000":
hours = (19, 18)
else:
hours = (17, 16)
return pd.DataFrame(
[
{
"datetime": pd.Timestamp(f"{requested_day} {hour:02d}:00:00"),
"date": requested_day,
"time": pd.Timestamp(f"2026-01-01 {hour:02d}:00:00").time(),
"open": float(hour),
"high": float(hour) + 0.5,
"low": float(hour) - 0.5,
"close": float(hour) + 0.25,
"volume": 1000 + hour,
"value": float((1000 + hour) * hour),
}
for hour in hours
]
)

_patch_runtime_attr(monkeypatch, "KISClient", DummyKISClient)
result = await tools["get_ohlcv"](
"005930",
market="kr",
count=4,
period="1h",
end_date=target_day.isoformat(),
)

assert len(calls) >= 2
assert len(result["rows"]) == 4
assert all(day == target_day for day, _, _ in calls)
assert all(market == "UN" for _, _, market in calls)
assert calls[0][1] == "200000"
assert calls[1][1] is not None
assert calls[1][1] < calls[0][1]


@pytest.mark.asyncio
async def test_get_ohlcv_kr_1h_stops_when_cursor_stalls(monkeypatch):
tools = build_tools()
target_day = date(2026, 2, 19)
monkeypatch.setattr(settings, "kis_ohlcv_cache_enabled", False, raising=False)
monkeypatch.setattr(
market_data_quotes,
"_resolve_kr_intraday_route",
AsyncMock(return_value="UN"),
)
calls: list[tuple[date | None, str | None, str]] = []

class DummyKISClient:
async def inquire_time_dailychartprice(
self, code, market, n, end_date=None, end_time=None
):
del code, n
calls.append((end_date, end_time, market))
requested_day = end_date or target_day
if requested_day != target_day:
return pd.DataFrame()
return pd.DataFrame(
[
{
"datetime": pd.Timestamp(f"{requested_day} 10:00:00"),
"date": requested_day,
"time": pd.Timestamp("2026-01-01 10:00:00").time(),
"open": 100.0,
"high": 101.0,
"low": 99.0,
"close": 100.5,
"volume": 1000,
"value": 100500.0,
},
{
"datetime": pd.Timestamp(f"{requested_day} 09:00:00"),
"date": requested_day,
"time": pd.Timestamp("2026-01-01 09:00:00").time(),
"open": 99.5,
"high": 100.5,
"low": 99.0,
"close": 100.0,
"volume": 900,
"value": 90000.0,
},
]
)

_patch_runtime_attr(monkeypatch, "KISClient", DummyKISClient)
result = await tools["get_ohlcv"](
"005930",
market="kr",
count=6,
period="1h",
end_date=target_day.isoformat(),
)

same_day_calls = [call for call in calls if call[0] == target_day]
assert len(same_day_calls) == 2
assert same_day_calls[0][1] == "200000"
assert same_day_calls[1][1] is not None
assert same_day_calls[1][1] < same_day_calls[0][1]
assert len(result["rows"]) == 2
assert result["rows"]
row = result["rows"][0]
assert row["session"] == "REGULAR"
assert row["venues"] == ["KRX", "NTX"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a unit marker for this async test.

This test is unit-scoped (fully mocked) but lacks a unit/integration marker required for tests in this path.

🔧 Add a unit marker
-@pytest.mark.asyncio
-async def test_get_ohlcv_kr_equity_period_1h_includes_session_and_venues(monkeypatch):
+@pytest.mark.asyncio
+@pytest.mark.unit
+async def test_get_ohlcv_kr_equity_period_1h_includes_session_and_venues(monkeypatch):

As per coding guidelines, “tests/test_*.py: Use test markers (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow) to categorize tests appropriately”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_mcp_server_tools.py` around lines 1430 - 1461, This test function
test_get_ohlcv_kr_equity_period_1h_includes_session_and_venues is fully mocked
and needs the pytest unit marker; add `@pytest.mark.unit` above the async def and
ensure pytest is imported at the top of the file if not already present so the
marker resolves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant