diff --git a/docs/configuration/pgdog.toml/general.md b/docs/configuration/pgdog.toml/general.md index 586c6dc..71484ff 100644 --- a/docs/configuration/pgdog.toml/general.md +++ b/docs/configuration/pgdog.toml/general.md @@ -410,6 +410,12 @@ Available options: Default: **`auto`** +### `system_catalogs_omnisharded` + +Enables sticky routing for system catalog tables and treats them as [omnisharded](../../features/sharding/omnishards.md) tables. This makes tools like `psql` work out of the box. + +Default: **`true`** (enabled) + ## Logging ### `log_connections` diff --git a/docs/configuration/pgdog.toml/sharded_tables.md b/docs/configuration/pgdog.toml/sharded_tables.md index decf45f..294fb66 100644 --- a/docs/configuration/pgdog.toml/sharded_tables.md +++ b/docs/configuration/pgdog.toml/sharded_tables.md @@ -83,10 +83,16 @@ The data type of the column. Currently supported options are: [Omnisharded](../../features/sharding/omnishards.md) tables are tables that have the same data on all shards. They typically are small and contain metadata, e.g., list of countries, cities, etc., and are used in joins. PgDog allows to read from these tables directly and load balances traffic evenly across all shards. -#### Example +By default, all tables unless otherwise configured as sharded, are considered omnisharded. + +#### Sticky routing + +Sticky routing disables round robin for omnisharded tables and sends the queries touching those tables to the same shard, guaranteeing consistent results for the duration of a client's connection: + ```toml [[omnisharded_tables]] database = "prod" +sticky = true tables = [ "settings", "cities", diff --git a/docs/features/sharding/omnishards.md b/docs/features/sharding/omnishards.md index f3a0fab..0189283 100644 --- a/docs/features/sharding/omnishards.md +++ b/docs/features/sharding/omnishards.md @@ -9,24 +9,39 @@ Other names for these tables include **mirrored tables** and **replicated tables ## Configuration -Omnisharded tables are configured in [`pgdog.toml`](../../configuration/pgdog.toml/sharded_tables.md#omnisharded-tables): +Unless otherwise specified as a [sharded table](../../configuration/pgdog.toml/sharded_tables.md), all tables are omnisharded by default. This makes configuration simpler, and doesn't require explicitly enumerating all tables in `pgdog.toml`. For example: ```toml -[[omnisharded_tables]] +[[sharded_tables]] database = "prod" -tables = [ - "settings", - "cities", - "terms_of_service", - "ip_blocks", -] +column = "user_id" ``` -## Query routing +This will configure all tables that have the `user_id` as sharded and all others as omnisharded. + +### Query routing Omnisharded tables are treated differently by the query router. Write queries are sent to all shards concurrently, while read queries are distributed evenly between shards using round robin. -If the query contains a sharding key, it will be used instead, and omnisharded tables in that query will be ignored. +For example, the following `INSERT` query will be sent to all shards concurrently: + +```postgresql +INSERT INTO omnisharded_table (id, value) VALUES ($1, $2); +``` + +All configured shards will receive and store the same row. When reading that row, PgDog will choose one of the shards using the round robin algorithm, to distribute read load evenly. + +#### Sharded and omnisharded tables + +If a query references both sharded and omnisharded tables, the **sharded** table routing will take priority. Omnisharded tables are assumed to contain the same data on all shards, so joins referencing omnisharded tables will work as expected. + +For example, assuming `users` table is sharded on the `id` column and `global_settings` table is omnisharded, the following query will be sent to the shard corresponding to the value of the `users.id` filter: + +```postgresql +SELECT * FROM users +INNER JOIN global_settings ON global_settings.active = true +WHERE users.id = $1; +``` ### Consistency @@ -34,22 +49,43 @@ Writing data to omnisharded tables is atomic if you enable [two-phase commit](2p If you can't or choose not to use 2pc, make sure writes to omnisharded tables can be repeated in case of failure. This can be achieved by using unique indexes and `INSERT ... ON CONFLICT ... DO UPDATE` queries. -Since reads from omnisharded tables are routed to individual shards, while a two-phase commit takes place, queries to these tables may return different results for a brief period of time. +Since data in all omnisharded tables is identical, no cross-shard indexes are necessary to achieve data integrity. You can use regular PostgreSQL `UNIQUE` indexes on individual shards. + +!!! note "Eventual consistency" + Reads from omnisharded tables are routed to individual shards using round robin. While a two-phase commit takes place, different transactions may return different results for a brief period of time (usually less than a millisecond). + ### Sticky routing While most omnisharded tables should be identical on all shards, others could differ in subtle ways. -For example, if you configure system catalogs as omnisharded, e.g. to make Rails or other ORMs work out of the box, round robin query routing will return different results for each query. +For example, system catalogs (e.g. `pg_database`, `pg_class`, etc.) could have different OIDs for custom data types (e.g. `VECTOR`, `CREATE TYPE`) on different shards. To make Rails and some other ORMs work out of the box, you can enable sticky routing, which disables round robin and sends omnisharded queries to one shard for the duration of a client's connection. + +For example: -When enabled, sticky routing will ensure that queries sent by a client to omnisharded tables will be consistently routed to the same shard, for the duration of the client connection. +```toml +[[omnisharded_tables]] +database = "prod" +sticky = true +tables = [ + "pg_class", + "pg_database" +] +``` -To enable it, configure your omnisharded tables as follows: +You can enable sticky routing for all omnisharded tables in [`pgdog.toml`](../../configuration/pgdog.toml/general.md#omnisharded_sticky): + +```toml +[general] +omnisharded_sticky = true +``` + +The following system catalogs are using sticky routing by default: ```toml [[omnisharded_tables]] database = "prod" -sticky = true # Enable sticky routing for the following tables. +sticky = true tables = [ "pg_class", "pg_attribute", @@ -70,4 +106,11 @@ tables = [ ] ``` -Once configured, commands like `\d`, `\d+` and others sent from `psql` will start to return correct results as well. +This is configurable with the `system_catalogs_omnisharded` setting in [`pgdog.toml`](../../configuration/pgdog.toml/general.md#system_catalogs_omnisharded): + +```toml +[general] +system_catalogs_omnisharded = true +``` + +If enabled (it is by default), commands like `\d`, `\d+` and others sent from `psql` will start to return correct results. diff --git a/docs/features/transaction-mode.md b/docs/features/transaction-mode.md index 0c2cc0b..2554487 100644 --- a/docs/features/transaction-mode.md +++ b/docs/features/transaction-mode.md @@ -59,7 +59,7 @@ This is performed efficiently, and server parameters are updated only if they di 1. The database has a primary and replica(s) 2. The database has more than one shard 3. [`prepared_statements`](../configuration/pgdog.toml/general.md#prepared_statements) is set to `"full"` - 4. [`query_parser_enabled`](../configuration/pgdog.toml/general.md#query_parser_enabled) is set to `true` + 4. [`query_parser`](../configuration/pgdog.toml/general.md#query_parser_enabled) is set to `"on"` This is to avoid unnecessary overhead of using `pg_query` (however small), when we don't absolutely have to. diff --git a/docs/roadmap.md b/docs/roadmap.md index 81bee4a..91d20ea 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -44,14 +44,14 @@ Query engine provides a uniform view over multiple shards. Clients can use regul | Feature | Status | Notes | |----------|--------|-------| | [Direct-to-shard reads](features/sharding/query-routing.md#select) | :material-check-circle-outline: | Sharding key must be specified in the query. | -| [Direct-to-shard writes](features/sharding/query-routing.md#insert) | :material-wrench: | Sharding key must be specified in the query. Multi-tuple `INSERT`s not supported yet. | +| [Direct-to-shard writes](features/sharding/query-routing.md#insert) | :material-check-circle-outline: | Sharding key must be specified in the query. Multi-tuple `INSERT`s are supported and sent to their respective shards automatically with a cross-shard query. Sharding key updates are supported for one row at a time. | | [Cross-shard queries](features/sharding/cross-shard-queries/index.md) | :material-wrench: | Partial [aggregates](#aggregates) and [sorting](#sorting) support. CTEs & subqueries not supported yet. | | Cross-shard CTEs | :material-calendar-check: | [#380](https://github.com/pgdogdev/pgdog/issues/380) | | Cross-shard subqueries | :material-calendar-check: | [#381](https://github.com/pgdogdev/pgdog/issues/381) | | Cross-shard joins | :material-calendar-check: | [#94](https://github.com/pgdogdev/pgdog/issues/94) | | [Cross-shard transactions](features/sharding/2pc.md) | :material-wrench: | Supports [two-phase commit](features/sharding/2pc.md). Not benchmarked yet. | | [Omnisharded tables](features/sharding/omnishards.md) | :material-wrench: | Unsharded tables with identical data on all shards. | -| Rewrite queries | :material-calendar-check: | Alter queries to support aggregate/sorting by rows not returned in result set. | +| Rewrite queries | :material-wrench: | Alter queries to support aggregate/sorting by rows not returned in result set. | | [`COPY`](features/sharding/cross-shard-queries/copy.md) | :material-check-circle-outline: | Sharding key must be specified in the statement and the data. Supports text, CSV, and binary formats only. | | Multi-statement queries | :material-calendar-check: | e.g.: `SELECT 1; SELECT 2;`. First query is used for routing only, entire request sent to the same shard(s). [#395](https://github.com/pgdogdev/pgdog/issues/395). | @@ -66,7 +66,9 @@ Support for aggregate functions in [cross-shard](features/sharding/cross-shard-q | `COUNT` | :material-check-circle-outline: | 〃 | | `MIN` | :material-check-circle-outline: | 〃 | | `MAX` | :material-check-circle-outline: | 〃 | -| `AVG` | :material-calendar-check: | [#434](https://github.com/pgdogdev/pgdog/issues/434) | +| `AVG` | :material-wrench: | Works in top level statement, but not in subqueries or CTEs. | +| `STDDEV` | :material-wrench: | 〃 | +| `VARIANCE` | :material-wrench: | 〃 | | Percentile distributions | :material-close: | Could be expensive to calculate, need spill to disk. | #### Sorting @@ -87,8 +89,8 @@ Support for sorting rows in [cross-shard](features/sharding/cross-shard-queries/ | Feature | Status | Notes | |-|-|-| -| [Data sync](features/sharding/resharding/hash.md) | :material-wrench: | Sync table data with logical replication. Not benchmarked yet. | -| [Schema sync](features/sharding/resharding/schema.md) | :material-wrench: | Sync table, index and constraint definitions. Not benchmarked yet. | +| [Data sync](features/sharding/resharding/hash.md) | :material-wrench: | Sync table data with logical replication. | +| [Schema sync](features/sharding/resharding/schema.md) | :material-wrench: | Sync table, index and constraint definitions. | | Online rebalancing | :material-calendar-check: | Not automated yet, requires manual orchestration. | ### Schema & data integrity