Skip to content

Comments

Relationships store types#1473

Open
ereteog wants to merge 7 commits intomasterfrom
relation-store-types
Open

Relationships store types#1473
ereteog wants to merge 7 commits intomasterfrom
relation-store-types

Conversation

@ereteog
Copy link
Contributor

@ereteog ereteog commented Apr 29, 2025

Summary

Add source_type and target_type fields to relationship documents for efficient filtering in bundle/export API.

Related https://cisco-sbg.atlassian.net/browse/XDR-45534

Problem

  • Bundle export API filters relationships by entity type using source_type/target_type params
  • Current query: source_ref:*malware* (wildcard = inefficient, doesn't use ES index)
  • Needed query: source_type:malware (exact match = uses ES index efficiently)

Solution

  1. Add source_type/target_type fields extracted from URIs on write
  2. Backward-compatible query during migration: ((source_type:X) OR (source_ref:*X*))
    • New docs: fast path via exact match
    • Legacy docs: slow path via wildcard fallback
  3. ES migration script using _update_by_query to backfill existing documents

Files Changed

  • src/ctia/entity/relationship/es_store.clj - ES mapping and transforms
  • src/ctia/bundle/core.clj - Backward-compatible query generation
  • src/ctia/entity/relationship.clj - Use new es-store
  • scripts/migrate_relationship_types.sh - ES migration script
  • scripts/check_relationship_migration.sh - Migration verification script
  • test/ctia/entity/relationship/es_store_test.clj - Unit tests for transforms
  • test/ctia/bundle/core_test.clj - Updated query tests

Deployment Steps

Phase 1: Deploy this PR

New documents will automatically get source_type/target_type populated.
Queries use backward-compatible OR clause (works for both new and old docs).

Phase 2: Update ES mapping

Add the new field definitions to the existing index:

java -cp ctia.jar clojure.main -m ctia.task.update-index-state \
  -Dctia.store.es.relationship.update-mappings=true

Phase 3: Run migration script

Backfill existing documents with source_type/target_type values:

# Dry run first
./scripts/migrate_relationship_types.sh <ES_HOST> <INDEX> --dry-run

# Execute with throttling (recommended for production)
./scripts/migrate_relationship_types.sh <ES_HOST> <INDEX> --throttle 500 --async

# Monitor async task
curl -X GET "http://<ES_HOST>/_tasks/<TASK_ID>"

Phase 4: Verify migration

Check migration status and validate data:

./scripts/check_relationship_migration.sh <ES_HOST> <INDEX>

Output includes:

  • Migration statistics (total, migrated, remaining)
  • Progress percentage
  • Breakdown by source_type and target_type
  • Sample validation (ensures extracted types match refs)

Phase 5: Refresh mappings

Reindex documents so new fields are properly searchable:

java -cp ctia.jar clojure.main -m ctia.task.update-index-state \
  -Dctia.store.es.relationship.refresh-mappings=true

Phase 6: Follow-up PR

Once migration is complete in all environments, deploy follow-up PR to remove the wildcard fallback from queries.


Script Options

migrate_relationship_types.sh

Option Description
--dry-run Show count and sample doc without making changes
--throttle <N> Limit to N documents per second
--async Run in background, returns task ID for monitoring

check_relationship_migration.sh

Exit Code Meaning
0 Migration complete and validated
1 Migration complete but validation errors
2 Migration incomplete

QA Checklist

  • Verify new relationships have source_type/target_type populated
  • Verify bundle/export with source_type filter works for both new and old docs
  • Run migration script in INT with --dry-run first
  • Run migration in INT with --throttle 500 --async
  • Run check_relationship_migration.sh to verify
  • Run refresh-mappings after migration completes
  • Verify queries use new fields efficiently (check ES slow logs)

Related

🤖 Generated with Claude Code

@ereteog ereteog changed the title Relation store types Relationships store types Apr 30, 2025
…tering

## Problem
Bundle export API filters relationships by entity type using wildcard
queries on source_ref/target_ref URIs (e.g., `source_ref:*malware*`).
This is inefficient as it doesn't use the ES inverted index.

## Solution
- Add explicit `source_type` and `target_type` fields to relationship docs
- Extract type from URI on write (e.g., `malware` from `.../ctia/malware/...`)
- Query uses backward-compatible OR clause during migration:
  `((source_type:malware) OR (source_ref:*malware*))`

## Migration Strategy (Zero-Downtime)
1. Deploy this PR → new docs get fast path, old docs use wildcard fallback
2. Run ES update_by_query to populate fields on existing docs
3. Remove wildcard fallback in follow-up PR

## Files
- src/ctia/entity/relationship/es_store.clj - New ES store with mapping & transforms
- src/ctia/bundle/core.clj - Backward-compatible query generation
- src/ctia/entity/relationship.clj - Use new es-store
- src/ctia/task/check_es_stores.clj - Update schema reference

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ereteog ereteog force-pushed the relation-store-types branch from f8145d0 to 95901e0 Compare February 13, 2026 12:28
…gr6)

Adds a bash script that uses Elasticsearch's _update_by_query API with a
Painless script to backfill source_type and target_type fields on existing
relationship documents.

Usage:
  # Dry run
  ./scripts/migrate_relationship_types.sh localhost:9200 ctia_relationship --dry-run

  # Execute
  ./scripts/migrate_relationship_types.sh localhost:9200 ctia_relationship

The script extracts entity type from source_ref/target_ref URLs:
  http://example.com/ctia/malware/malware-123 -> source_type: "malware"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ereteog ereteog force-pushed the relation-store-types branch from bbee7f9 to e158c27 Compare February 13, 2026 12:53
ereteog and others added 5 commits February 13, 2026 14:08
Tests for:
- stored-relationship->es-stored-relationship (extracts source_type/target_type)
- es-stored-relationship->stored-relationship (removes type fields)
- es-partial-stored-relationship->partial-stored-relationship
- store-opts functions with :doc wrapper

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New options:
- --throttle <N>: Limit to N documents per second using requests_per_second
- --async: Run in background with wait_for_completion=false, returns task ID

Recommended for production:
  ./scripts/migrate_relationship_types.sh host:9200 index --throttle 500 --async

Also includes:
- Estimated time calculation in dry-run mode
- Task monitoring and cancellation commands for async mode

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use heredoc with temp file to avoid shell escaping issues with Painless script
- The forward slashes in '/ctia/' were being misinterpreted
- Tested and verified working against OpenSearch 2.19

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds check_relationship_migration.sh that:
- Shows migration statistics (total, migrated, remaining)
- Displays progress percentage
- Shows breakdown by source_type and target_type
- Validates sample documents (ensures extracted types match refs)
- Returns exit codes: 0=success, 1=validation error, 2=incomplete

Usage:
  ./scripts/check_relationship_migration.sh <ES_HOST> <INDEX>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use cond-> to only assoc source_type/target_type when non-nil
- Update test data to use valid CTIA ID format (type-uuid pattern)
- Add test case for graceful handling of unparseable refs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant