Skip to content

Add BigQuery API support with query execution and permission model#275

Open
wbssbw wants to merge 18 commits intoobelisk:mainfrom
wbssbw:feat/bigquery-apis
Open

Add BigQuery API support with query execution and permission model#275
wbssbw wants to merge 18 commits intoobelisk:mainfrom
wbssbw:feat/bigquery-apis

Conversation

@wbssbw
Copy link
Collaborator

@wbssbw wbssbw commented Feb 24, 2026

Summary

Implements a new BigQuery API that allows Plaid modules to execute read-only queries against configured tables. The implementation includes service account authentication, a permission model that maps modules to allowed datasets and tables, SQL query construction with strict identifier validation to prevent injection attacks, and schema-driven type decoding to preserve numeric and boolean types instead of coercing all values to strings.

Changes

Standard library (plaid-stl):

  • Added gcp::bigquery module with query_table() function
  • Defined ReadTableRequest and ReadTableResponse types
  • Implemented Filter expression tree for WHERE clauses with And, Or, and Condition nodes
  • Added Operator enum (Eq, Ne, Lt, Le, Gt, Ge, Like, IsNull, IsNotNull)
  • Implemented Deref and IntoIterator for ReadTableResponse to enable direct iteration

Runtime API (plaid):

  • Created gcp::bigquery module with BigQuery client
  • Implemented service account authentication
  • Added permission checking via r config map: module → dataset → allowed tables
  • Built SQL query generator with strict identifier validation ([A-Za-z_][A-Za-z0-9_]*)
  • Implemented recursive WHERE clause builder that validates all column names
  • Added schema support (schemas config) to decode BigQuery string responses into correct JSON types (integer, float, boolean, string)
  • Integrated with existing API error handling and function registration

Configuration:

  • Added bigquery field to GcpConfig containing credentials, permissions, schemas, and timeout
  • Permissions use three-level nesting: r[module][dataset] = [table1, table2, ...]
  • Schemas use three-level nesting: schemas[dataset][table][column] = type

Dependencies:

  • Added google-cloud-bigquery v0.15.0 (gated behind gcp feature)
  • Pinned chrono to 0.4.39 (compatibility problems with bigquery crate)

Security Considerations

SQL injection prevention:

  • All dataset, table, and column identifiers are validated against a strict alphanumeric+underscore allowlist before query construction
  • String values in WHERE clauses are escaped using standard SQL single-quote doubling (''')
  • No user input is directly concatenated into SQL; all queries are built programmatically

Authorization:

  • Modules must be explicitly granted access to each dataset and table via the r config map
  • Queries from modules without permission are rejected with BadRequest before any network call
  • Credentials are read from config rather than files, reducing exposure surface

Data access:

  • API is read-only (SELECT queries only)
  • Modules must explicitly list columns; SELECT * is rejected

Performance/Operational Impact

  • Queries are executed with a configurable timeout (default 5 seconds)
  • Rows are streamed one at a time rather than buffered entirely in memory
  • Return buffer is currently capped at 1 MiB; large result sets may be truncated (modules must handle partial results or add pagination)

@wbssbw wbssbw force-pushed the feat/bigquery-apis branch from 6dcf5a0 to a831dd1 Compare February 24, 2026 17:19
@wbssbw wbssbw marked this pull request as ready for review February 24, 2026 17:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive BigQuery API that enables Plaid modules to execute read-only SQL queries against configured BigQuery tables. The implementation follows a defense-in-depth security model with service account authentication, a three-level permission system (module → dataset → table), strict SQL identifier validation to prevent injection attacks, and schema-driven type decoding to preserve numeric and boolean types in JSON responses.

Changes:

  • Added gcp::bigquery module to both plaid-stl and plaid runtime with query_table() function
  • Implemented Filter expression tree API for building WHERE clauses with validation
  • Added schema configuration to decode BigQuery string responses into correct JSON types (integer, float, boolean, string)

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
runtime/plaid/src/functions/api.rs Registers gcp_bigquery_query_table function with proper feature gating and test mode restrictions
runtime/plaid/src/apis/mod.rs Adds BigQueryError to ApiError enum with From trait implementation for error conversion
runtime/plaid/src/apis/gcp/mod.rs Integrates BigQuery client into GCP module structure with async initialization
runtime/plaid/src/apis/gcp/bigquery.rs Core implementation: authentication, permission checking, SQL generation with identifier validation, type decoding, and query execution
runtime/plaid/Cargo.toml Adds google-cloud-bigquery dependency and pins chrono to 0.4.39 for compatibility
runtime/plaid-stl/src/gcp/mod.rs Exports bigquery module in standard library
runtime/plaid-stl/src/gcp/bigquery.rs Defines request/response types, Filter expression tree, Operator enum, and query_table() function with 1 MiB return buffer
runtime/Cargo.lock Adds transitive dependencies for BigQuery support including arrow, prost, tonic, and authentication libraries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wbssbw wbssbw force-pushed the feat/bigquery-apis branch from 60b6ad5 to f4b64a4 Compare February 24, 2026 18:03
/// Column names inside `Condition` nodes are validated with
/// [`is_valid_identifier`] before use. `And` and `Or` nodes must contain at
/// least one child.
fn build_filter_sql(filter: &Filter) -> Result<String, ApiError> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No chance of causing stack exhaustion here?

Copy link
Collaborator Author

@wbssbw wbssbw Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - added max depth limit in 1645008 and fd7131b

@wbssbw wbssbw force-pushed the feat/bigquery-apis branch from 7e76ee1 to fd7131b Compare February 26, 2026 16:43
@wbssbw wbssbw requested a review from obelisk February 26, 2026 16:44
Comment on lines +343 to +348
let mut sql = format!("SELECT {column_list} FROM `{dataset}`.`{table}`");

if let Some(f) = filter {
sql.push_str(" WHERE ");
sql.push_str(&build_filter_sql(f, 0)?);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format strings for building SQL is highly discouraged. This should use a query builder for safety or we need more information on why we can't use a query builder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at full query builder libraries but none have mature BigQuery/GoogleSQL support, so that path would have added a dependency without meaningfully improving security

Instead I switched the WHERE clause to use GoogleSQL named query parameters (@pn). Filter values are now passed as typed QueryParameter entries on QueryRequest rather than being inlined as SQL literals. Now, BigQuery owns value serialization and the hand-rolled string-escaping code (build_value_sql) is gone entirely

See 489bc48

Comment on lines +359 to +398
fn build_filter_sql(filter: &Filter, depth: usize) -> Result<String, ApiError> {
if depth > MAX_FILTER_DEPTH {
return Err(ApiError::BadRequest);
}
match filter {
Filter::And(children) | Filter::Or(children) if children.is_empty() => {
Err(ApiError::BadRequest)
}
Filter::And(children) => {
if children.len() > MAX_FILTER_CHILDREN {
return Err(ApiError::BadRequest);
}
let parts = children
.iter()
.map(|c| build_filter_sql(c, depth + 1))
.collect::<Result<Vec<_>, _>>()?;
Ok(format!("({})", parts.join(" AND ")))
}
Filter::Or(children) => {
if children.len() > MAX_FILTER_CHILDREN {
return Err(ApiError::BadRequest);
}
let parts = children
.iter()
.map(|c| build_filter_sql(c, depth + 1))
.collect::<Result<Vec<_>, _>>()?;
Ok(format!("({})", parts.join(" OR ")))
}
Filter::Condition {
column,
operator,
value,
} => {
if !is_valid_identifier(column) {
return Err(ApiError::BadRequest);
}
build_condition_sql(column, operator, value)
}
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly going to blow the stack because it's not tail recursive. So this should either be a loop, be tail recursive, or limit the number of recursions to a number that will not break the stack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursion depth is already bounded by MAX_FILTER_DEPTH, which limits the call stack to 5 frames before returning BadRequest. Each frame is a small fixed allocation (no large locals), so the worst-case stack consumption here is on the order of a few hundred bytes

Copy link
Owner

@obelisk obelisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it has some rather large structural issues

@wbssbw wbssbw requested a review from obelisk March 2, 2026 14:28
@obelisk
Copy link
Owner

obelisk commented Mar 4, 2026

@wbssbw I think I've fixed all the issues with 1.93. Please try rebasing on main

Copy link
Owner

@obelisk obelisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase on main where 1.93 should have all the issues resolved

@wbssbw wbssbw force-pushed the feat/bigquery-apis branch from 489bc48 to 09bc509 Compare March 4, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants