Conversation
…tering ## Problem Bundle export API filters relationships by entity type using wildcard queries on source_ref/target_ref URIs (e.g., `source_ref:*malware*`). This is inefficient as it doesn't use the ES inverted index. ## Solution - Add explicit `source_type` and `target_type` fields to relationship docs - Extract type from URI on write (e.g., `malware` from `.../ctia/malware/...`) - Query uses backward-compatible OR clause during migration: `((source_type:malware) OR (source_ref:*malware*))` ## Migration Strategy (Zero-Downtime) 1. Deploy this PR → new docs get fast path, old docs use wildcard fallback 2. Run ES update_by_query to populate fields on existing docs 3. Remove wildcard fallback in follow-up PR ## Files - src/ctia/entity/relationship/es_store.clj - New ES store with mapping & transforms - src/ctia/bundle/core.clj - Backward-compatible query generation - src/ctia/entity/relationship.clj - Use new es-store - src/ctia/task/check_es_stores.clj - Update schema reference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
f8145d0 to
95901e0
Compare
…gr6) Adds a bash script that uses Elasticsearch's _update_by_query API with a Painless script to backfill source_type and target_type fields on existing relationship documents. Usage: # Dry run ./scripts/migrate_relationship_types.sh localhost:9200 ctia_relationship --dry-run # Execute ./scripts/migrate_relationship_types.sh localhost:9200 ctia_relationship The script extracts entity type from source_ref/target_ref URLs: http://example.com/ctia/malware/malware-123 -> source_type: "malware" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
bbee7f9 to
e158c27
Compare
Tests for: - stored-relationship->es-stored-relationship (extracts source_type/target_type) - es-stored-relationship->stored-relationship (removes type fields) - es-partial-stored-relationship->partial-stored-relationship - store-opts functions with :doc wrapper Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New options: - --throttle <N>: Limit to N documents per second using requests_per_second - --async: Run in background with wait_for_completion=false, returns task ID Recommended for production: ./scripts/migrate_relationship_types.sh host:9200 index --throttle 500 --async Also includes: - Estimated time calculation in dry-run mode - Task monitoring and cancellation commands for async mode Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use heredoc with temp file to avoid shell escaping issues with Painless script - The forward slashes in '/ctia/' were being misinterpreted - Tested and verified working against OpenSearch 2.19 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds check_relationship_migration.sh that: - Shows migration statistics (total, migrated, remaining) - Displays progress percentage - Shows breakdown by source_type and target_type - Validates sample documents (ensures extracted types match refs) - Returns exit codes: 0=success, 1=validation error, 2=incomplete Usage: ./scripts/check_relationship_migration.sh <ES_HOST> <INDEX> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use cond-> to only assoc source_type/target_type when non-nil - Update test data to use valid CTIA ID format (type-uuid pattern) - Add test case for graceful handling of unparseable refs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
source_typeandtarget_typefields to relationship documents for efficient filtering in bundle/export API.Problem
source_type/target_typeparamssource_ref:*malware*(wildcard = inefficient, doesn't use ES index)source_type:malware(exact match = uses ES index efficiently)Solution
source_type/target_typefields extracted from URIs on write((source_type:X) OR (source_ref:*X*))_update_by_queryto backfill existing documentsFiles Changed
src/ctia/entity/relationship/es_store.clj- ES mapping and transformssrc/ctia/bundle/core.clj- Backward-compatible query generationsrc/ctia/entity/relationship.clj- Use new es-storescripts/migrate_relationship_types.sh- ES migration scriptscripts/check_relationship_migration.sh- Migration verification scripttest/ctia/entity/relationship/es_store_test.clj- Unit tests for transformstest/ctia/bundle/core_test.clj- Updated query testsDeployment Steps
Phase 1: Deploy this PR
New documents will automatically get
source_type/target_typepopulated.Queries use backward-compatible OR clause (works for both new and old docs).
Phase 2: Update ES mapping
Add the new field definitions to the existing index:
Phase 3: Run migration script
Backfill existing documents with
source_type/target_typevalues:Phase 4: Verify migration
Check migration status and validate data:
Output includes:
Phase 5: Refresh mappings
Reindex documents so new fields are properly searchable:
Phase 6: Follow-up PR
Once migration is complete in all environments, deploy follow-up PR to remove the wildcard fallback from queries.
Script Options
migrate_relationship_types.sh
--dry-run--throttle <N>--asynccheck_relationship_migration.sh
QA Checklist
source_type/target_typepopulatedsource_typefilter works for both new and old docs--dry-runfirst--throttle 500 --asynccheck_relationship_migration.shto verifyRelated
🤖 Generated with Claude Code