-
Notifications
You must be signed in to change notification settings - Fork 18
ClickHouse DDL Schema Improvements for Observability #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ClickHouse Docs SchemaRegarding point 2, I'm not sure. This is what's present in the docs: CREATE TABLE otel_traces
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`ParentSpanId` String CODEC(ZSTD(1)),
`TraceState` String CODEC(ZSTD(1)),
`SpanName` LowCardinality(String) CODEC(ZSTD(1)),
`SpanKind` LowCardinality(String) CODEC(ZSTD(1)),
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` String CODEC(ZSTD(1)),
`SpanAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`Duration` Int64 CODEC(ZSTD(1)),
`StatusCode` LowCardinality(String) CODEC(ZSTD(1)),
`StatusMessage` String CODEC(ZSTD(1)),
`Events.Timestamp` Array(DateTime64(9)) CODEC(ZSTD(1)),
`Events.Name` Array(LowCardinality(String)) CODEC(ZSTD(1)),
`Events.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
`Links.TraceId` Array(String) CODEC(ZSTD(1)),
`Links.SpanId` Array(String) CODEC(ZSTD(1)),
`Links.TraceState` Array(String) CODEC(ZSTD(1)),
`Links.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_duration Duration TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
CREATE TABLE otel_traces_trace_id_ts
(
`TraceId` String CODEC(ZSTD(1)),
`Start` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`End` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY (TraceId, toUnixTimestamp(Start))
CREATE MATERIALIZED VIEW otel_traces_trace_id_ts_mv TO otel_traces_trace_id_ts
(
`TraceId` String,
`Start` DateTime64(9),
`End` DateTime64(9)
)
AS SELECT
TraceId,
min(Timestamp) AS Start,
max(Timestamp) AS End
FROM otel_traces
WHERE TraceId != ''
GROUP BY TraceIdClickHouse Exporter SchemaAnd this is what's generated by the ClickHouse Exporter: CREATE TABLE IF NOT EXISTS "otel"."otel_traces" (
Timestamp DateTime64(9) CODEC(Delta, ZSTD(1)),
TraceId String CODEC(ZSTD(1)),
SpanId String CODEC(ZSTD(1)),
ParentSpanId String CODEC(ZSTD(1)),
TraceState String CODEC(ZSTD(1)),
SpanName LowCardinality(String) CODEC(ZSTD(1)),
SpanKind LowCardinality(String) CODEC(ZSTD(1)),
ServiceName LowCardinality(String) CODEC(ZSTD(1)),
ResourceAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
ScopeName String CODEC(ZSTD(1)),
ScopeVersion String CODEC(ZSTD(1)),
SpanAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
Duration UInt64 CODEC(ZSTD(1)),
StatusCode LowCardinality(String) CODEC(ZSTD(1)),
StatusMessage String CODEC(ZSTD(1)),
Events Nested (
Timestamp DateTime64(9),
Name LowCardinality(String),
Attributes Map(LowCardinality(String), String)
) CODEC(ZSTD(1)),
Links Nested (
TraceId String,
SpanId String,
TraceState String,
Attributes Map(LowCardinality(String), String)
) CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_duration Duration TYPE minmax GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toDateTime(Timestamp))
TTL toDateTime(Timestamp) + INTERVAL 259200 SECOND
SETTINGS index_granularity=8192, ttl_only_drop_parts = 1
CREATE TABLE IF NOT EXISTS "otel"."otel_traces_trace_id_ts" (
TraceId String CODEC(ZSTD(1)),
Start DateTime64(9) CODEC(Delta, ZSTD(1)),
End DateTime64(9) CODEC(Delta, ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toDate(Start)
ORDER BY (TraceId, toDateTime(Start))
TTL toDateTime(Start) + INTERVAL 259200 SECOND
SETTINGS index_granularity=8192, ttl_only_drop_parts = 1
CREATE MATERIALIZED VIEW IF NOT EXISTS "otel"."otel_traces_trace_id_ts_mv"
TO "otel"."otel_traces_trace_id_ts"
AS SELECT
TraceId,
min(Timestamp) as Start,
max(Timestamp) as End
FROM "otel"."otel_traces"
GROUP BY TraceId |
|
Thanks @garysassano for taking a look at this. Taking a brief look at this I've got a few feedback points, but I wonder if @SpencerTorres would have some insight on this too. For 1: While there is a precision loss, I wonder how much it truly matters for the Regarding the datetime conversions in TTL and ORDER BY clauses, I think the primary importance would be maintaining compatibility with the existing queries from Grafana datasources and HyperDX/Clickstack. It does seem like there are mixed opinions (1 and 2) on the best schema for this data. At a glance it doesn't seem like these changes would break queries and would just improve performance, but I'd want to verify that first. |
Summary
This PR aligns ClickHouse DDL schema with the official observability schema design by fixing inconsistencies in timestamp handling and removing unnecessary type conversions.
Changes Made
1. Fixed DateTime Precision Inconsistency
Issue:
traces_trace_id_tstable usedDateTimewhile maintracestable usesDateTime64(9), causing precision loss in materialized view aggregations.Reference: ClickHouse DateTime64 Documentation
2. Removed Unnecessary Type Conversions
Issue: TTL expressions and ordering keys used
toDateTime()conversions that add overhead and reduce performance.Reference: ClickHouse TTL Documentation
3. Performance Issues with Ordering Keys
Issue: Using
toDateTime()inORDER BYclauses causes slow query performance withDateTime64as noted in ClickHouse documentation.Reference: ClickHouse Primary Keys and Ordering
4. Metrics Schema Using
toUnixTimestamp64Nano()Note: The metrics schema uses
toUnixTimestamp64Nano(TimeUnix)inORDER BYclauses, which may need future evaluation for performance impact:ORDER BY (ServiceName, MetricName, toUnixTimestamp64Nano(TimeUnix))Reference: ClickHouse Observability Schema Design
Impact
Files Modified
src/bin/clickhouse-ddl/ddl_traces.rssrc/bin/clickhouse-ddl/ddl_metrics.rsMigration Notes
For existing deployments, the DateTime precision change requires data migration. TTL and ordering key changes are backward compatible.