Skip to content

[CBRD-26291] Recovery for Coordinator#6820

Open
beyondykk9 wants to merge 15 commits intoCUBRID:developfrom
beyondykk9:CBRD-26291-1
Open

[CBRD-26291] Recovery for Coordinator#6820
beyondykk9 wants to merge 15 commits intoCUBRID:developfrom
beyondykk9:CBRD-26291-1

Conversation

@beyondykk9
Copy link
Contributor

http://jira.cubrid.org/browse/CBRD-26291

Purpose

dblink DML 쿼리가 포함된 트랜잭션은 2PC 형태로 트랜잭션이 처리되어야 하며, 트랜잭션 처리 중 서버가 비정상 종료될 경우 서버가 재가동 될 때 복구를 해야 하는데 이 때 복구 방식 또한 2PC 규칙을 지켜야 합니다.

현재 CUBRID의 2PC 트랜잭션 복구 로그는 일부 구현이 되어 있지만, 비정상적으로 동작하고 있어서 정상적으로 동작하도록 수정 구현할 필요가 있습니다.

2PC 트랜잭션의 완전한 복구를 위해서는 코디네이터 서버와 참여자(Participant) 서버 각각에 대해 로그를 준비할 필요가 있습니다. 트랜잭션 단계를 기준으로 로그는 PREPARE (Participant's VOTEs for COMMIT) 단계와 COMMIT 단계, 두 단계에서 코디네이터 및 참여자 서버 대상으로 복구 로그를 구성해야 합니다.

Implementation

"_db_global_tran" 카탈로그 테이블을 추가하고, 코디네이터가 commit/abort decsion을 성공할 때까지 반복 하면서, 본 카탈로그의 내용을 업데이트 한다.

Remarks

N/A

@beyondykk9 beyondykk9 self-assigned this Jan 30, 2026
@beyondykk9 beyondykk9 requested a review from hornetmj as a code owner January 30, 2026 07:45
@beyondykk9 beyondykk9 marked this pull request as draft January 30, 2026 07:45
@beyondykk9 beyondykk9 changed the title [CBRD-26291] temp: Recovery for Coordinator [CBRD-26291] Recovery for Coordinator Feb 2, 2026
@beyondykk9 beyondykk9 added this to the guava milestone Feb 2, 2026
@kangmin5505 kangmin5505 requested a review from Copilot February 2, 2026 07:53
@beyondykk9 beyondykk9 marked this pull request as ready for review February 2, 2026 07:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements 2PC (Two-Phase Commit) coordinator recovery for dblink DML transactions to handle server crashes during distributed transaction processing. The implementation introduces a _db_global_tran catalog table to persist transaction state and a background daemon thread to asynchronously send commit/abort decisions to participants.

Changes:

  • Adds _db_global_tran system catalog table to persist coordinator recovery information with states 'P' (prepare), 'A' (abort), and 'C' (commit)
  • Implements a daemon thread (send_2pc_decision_daemon) that processes recovery queue and sends decisions to participants
  • Modifies 2PC coordinator logic to use catalog-based recovery instead of LOG_2PC_START when CCI_XA is enabled
  • Enables CCI_XA unconditionally for Linux builds and increases DB_MAX_PASSWORD_LENGTH from 8 to 32

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
src/transaction/log_recovery.c Integrates daemon recovery and startup during log recovery phase
src/transaction/log_2pc.c Modifies 2PC coordinator to persist state to catalog and enqueue to daemon
src/query/dblink_scan.h Adds forward declarations for VAL_DESCR to fix compilation
src/query/dblink_global_tran_catalog.h Header for catalog table operations (insert/update/delete/scan)
src/query/dblink_global_tran_catalog.c Implements catalog operations using heap/locator APIs
src/query/dblink_2pc_daemon.h Header for daemon thread and queue management
src/query/dblink_2pc_daemon.c Implements daemon thread, queue, and recovery logic
src/query/dblink_2pc.h Adds declaration for send_decision_one_participant function
src/query/dblink_2pc.c Implements sending decision to single participant for recovery
src/query/DBLINK_2PC_RECOVERY_DESIGN.md Design documentation in Korean explaining the architecture
src/object/schema_system_catalog_install.hpp Adds get_global_tran() declaration
src/object/schema_system_catalog_install.cpp Implements _db_global_tran table schema definition
src/object/schema_system_catalog_constants.h Adds CT_GLOBAL_TRAN_NAME constant
src/object/schema_system_catalog.cpp Registers _db_global_tran in system catalog list
src/compat/dbtype_def.h Adds DB_OBJECT_GLOBAL_TRAN type and increases password length to 32
sa/CMakeLists.txt Adds new source files to build
cubrid/CMakeLists.txt Adds new source files to build
CMakeLists.txt Enables CCI_XA unconditionally for Linux

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +297 to +305
heap_scancache_end (thread_p, &scan);
scan_inited = false;

if (heap_scancache_start_modify (thread_p, &scan, hfid_p, &class_oid, SINGLE_ROW_UPDATE, NULL) != NO_ERROR)
{
error = ER_FAILED;
goto cleanup;
}
scan_inited = true;
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If heap_scancache_start_modify() fails, the function jumps to cleanup but scan_inited is still false, so heap_scancache_end_modify() won't be called. However, if heap_scancache_start() succeeded earlier, heap_scancache_end() will be called instead (line 297). This is correct, but the logic could be clearer by using separate cleanup labels or by ensuring the scancache type (read vs modify) is tracked separately.

Copilot uses AI. Check for mistakes.
@beyondykk9
Copy link
Contributor Author

/run shell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant