Add `tsvector` type support by abrightwell · Pull Request #2510 · jackc/pgx

abrightwell · 2026-03-03T16:07:33Z

Implement PostgreSQL tsvector type with support for:

Lexemes with positions and weights (A, B, C, D)
Binary and text format encoding/decoding
Quote and backslash escape handling
Array type support
CopyFrom operations

Note: Some escape sequences (doubled quotes, backslash escapes) are PostgreSQL-specific and not supported by CockroachDB.

Resolves #2483

abrightwell

For the most part, I tried to leverage/follow the hstore implementation as an example, as it seemed like the closest. Granted, it wasn't a perfect 1:1 but overall I think it tracks.

In the tests, I tried to be as comprehensive as possible. Initially I had considered incorporating the many different permutations found in the core PG test cases. Ultimately, though, settled on what I felt was a good cross section, trying to hit the core while also including sufficient coverage of the edges.

There were some cases that were not supported by CRDB, which have been appropriately noted and skipped. These were entirely related to how CRDB parses/handles escapes. These differences were confirmed via testing with CRDB as well as identifying how they are handled in the CRDB source.

I intentionally did not want to include tsquery as part of these changes. While I think it's obviously important that tsvector and tsquery both be available. It didn't seem like tsvector would be completely useless in isolation. And it's obviously a prerequisite for working with tsquery anyway. I'm happy to follow up with a second PR to introduce the other type if that's desired.

abrightwell · 2026-03-03T16:08:04Z

pgconn/pgconn_test.go

 	_, err = pgconn.ConnectConfig(context.Background(), config)
 	require.Error(t, err, "connect should return error for invalid token")
 }
+


This was caught/fixed via the linter.

abrightwell · 2026-03-03T16:15:02Z

pgtype/tsvector_test.go

+	t.Run("PostgreSQL", func(t *testing.T) {
+		skipCockroachDB(t, "CockroachDB does not support these escape sequences in tsvector")


So, interestingly, the CRDB parser does not handle the '' case for escaping single quotes. As I was looking in to it, I found that it's because the parser doesn't do any kind of look ahead to check for the value of the next character. Therefore, if it encounters a second ' it'll assume that it's completed the parsing of the word portion of the lexeme. We do, however, make sure to handle/support that case as well as the \' case as Postgres allows for both.

abrightwell · 2026-03-03T16:19:42Z

pgtype/tsvector.go

+type TSVector struct {
+	Lexemes []TSVectorLexeme
+	Valid   bool
+}


The one thing that I waffled a bit back and forth on was building out a composite type like this. Candidly, it felt a little clumsy to work with in practice, but after some chewing I figured it made sense.

Initially, I thought perhaps going with type TSVector []TSVectorLexeme might have been a better approach, but ultimately decided against it to ensure the explicit inclusion of Valid.

abrightwell · 2026-03-03T16:24:27Z

pgtype/tsvector.go

+		case '\'':
+			// Escaped quote ('') — write a literal single quote
+			if !p.atEnd() && p.peek() == '\'' {
+				p.consume()
+				buf.WriteByte('\'')
+			} else {
+				// Closing quote — lexeme is complete
+				return buf.String(), nil
+			}


Here is where we handle that look ahead that CRDB doesn't to determine if we are at the end of the lexeme or escaping a single-quote.

jackc · 2026-03-07T01:46:53Z

It seems reasonable. Though to be honest, I lack context for how tsvector and tsquery are used outside of the database. Whenever I've used them, values of those types were only used internally to the PG server.

Implement PostgreSQL `tsvector` type with support for: - Lexemes with positions and weights (A, B, C, D) - Binary and text format encoding/decoding - Quote and backslash escape handling - Array type support - CopyFrom operations Note: Some escape sequences (doubled quotes, backslash escapes) are PostgreSQL-specific and not supported by CockroachDB. Resolves jackc#2483

abrightwell · 2026-03-08T14:29:53Z

Yeah, to be fair, the application side of it isn't always clear to me either. Though, in the context of serialize/deserialize support for replication/snapshotting/etc. minimally supporting tsvector seemed like a good starting point as it relates to the whole 'tsearch' feature. With tsquery honestly I'm not convinced that there are many use cases where it would be a column type. As my interactions with it have always been entirely at query time and never persisted.

abrightwell · 2026-03-09T00:24:07Z

The initial workflow run failed on the Check formatting step. That's been corrected.

Then, on my fork, it made it through but failed on Test (1.24, 15). Basically, it SIGTERMed on that particular test. My suspicion was that it's an OOM issue in the container as I know that the -race flag can send the memory quite high.

Locally, I was able to reproduce the SIGTERM case while observing docker stats. Sometimes it would max and fail, other times it would get VERY close and eventually pass:

❯ devcontainer exec --workspace-folder . go version
go version go1.24.12 linux/arm64

❯ devcontainer exec --workspace-folder . ./test.sh pg15 -parallel=1 -race ./...
==> Testing against PostgreSQL 15 (port 5415)
ok      github.com/jackc/pgx/v5 17.651s
?       github.com/jackc/pgx/v5/examples/chat   [no test files]
?       github.com/jackc/pgx/v5/examples/todo   [no test files]
?       github.com/jackc/pgx/v5/examples/url_shortener  [no test files]
?       github.com/jackc/pgx/v5/internal/faultyconn     [no test files]
ok      github.com/jackc/pgx/v5/internal/iobufpool      1.148s
ok      github.com/jackc/pgx/v5/internal/pgio   1.007s
ok      github.com/jackc/pgx/v5/internal/pgmock 1.014s
ok      github.com/jackc/pgx/v5/internal/sanitize       1.009s
?       github.com/jackc/pgx/v5/internal/stmtcache      [no test files]
?       github.com/jackc/pgx/v5/log/testingadapter      [no test files]
ok      github.com/jackc/pgx/v5/multitracer     1.009s
ok      github.com/jackc/pgx/v5/pgconn  55.439s
ok      github.com/jackc/pgx/v5/pgconn/ctxwatch 2.103s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        5.001s
ok      github.com/jackc/pgx/v5/pgproto3        86.007s
?       github.com/jackc/pgx/v5/pgproto3/example/pgfortune      [no test files]
ok      github.com/jackc/pgx/v5/pgtype  8.784s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 1.086s
ok      github.com/jackc/pgx/v5/pgxpool 23.783s
?       github.com/jackc/pgx/v5/pgxtest [no test files]
ok      github.com/jackc/pgx/v5/stdlib  7.174s
?       github.com/jackc/pgx/v5/testsetup       [no test files]
ok      github.com/jackc/pgx/v5/tracelog        2.684s
==> Tests passed against PostgreSQL 15

Interestingly, these passed for pretty much every other configuration in the CI matrix. So it's likely just one of those 'things'... 🤷

jackc · 2026-03-09T12:45:31Z

LGTM - thanks

abrightwell commented Mar 3, 2026

View reviewed changes

abrightwell force-pushed the abrightwell-tsvector branch from 29c0f91 to ea6b093 Compare March 8, 2026 13:43

jackc merged commit 6e1e9eb into jackc:master Mar 9, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `tsvector` type support#2510

Add `tsvector` type support#2510
jackc merged 1 commit intojackc:masterfrom
abrightwell:abrightwell-tsvector

abrightwell commented Mar 3, 2026

Uh oh!

abrightwell left a comment •

edited

Loading

Uh oh!

abrightwell Mar 3, 2026

Uh oh!

abrightwell Mar 3, 2026

Uh oh!

abrightwell Mar 3, 2026

Uh oh!

abrightwell Mar 3, 2026

Uh oh!

jackc commented Mar 7, 2026 •

edited

Loading

Uh oh!

abrightwell commented Mar 8, 2026

Uh oh!

abrightwell commented Mar 9, 2026

Uh oh!

Uh oh!

jackc commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		t.Run("PostgreSQL", func(t *testing.T) {
		skipCockroachDB(t, "CockroachDB does not support these escape sequences in tsvector")

Uh oh!

Conversation

abrightwell commented Mar 3, 2026

Uh oh!

abrightwell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abrightwell Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

abrightwell Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

abrightwell Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

abrightwell Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jackc commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abrightwell commented Mar 8, 2026

Uh oh!

abrightwell commented Mar 9, 2026

Uh oh!

Uh oh!

jackc commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abrightwell left a comment •

edited

Loading

jackc commented Mar 7, 2026 •

edited

Loading