Add BigQuery API support with query execution and permission model#275
Add BigQuery API support with query execution and permission model#275wbssbw wants to merge 18 commits intoobelisk:mainfrom
Conversation
6dcf5a0 to
a831dd1
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive BigQuery API that enables Plaid modules to execute read-only SQL queries against configured BigQuery tables. The implementation follows a defense-in-depth security model with service account authentication, a three-level permission system (module → dataset → table), strict SQL identifier validation to prevent injection attacks, and schema-driven type decoding to preserve numeric and boolean types in JSON responses.
Changes:
- Added
gcp::bigquerymodule to both plaid-stl and plaid runtime withquery_table()function - Implemented Filter expression tree API for building WHERE clauses with validation
- Added schema configuration to decode BigQuery string responses into correct JSON types (integer, float, boolean, string)
Reviewed changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| runtime/plaid/src/functions/api.rs | Registers gcp_bigquery_query_table function with proper feature gating and test mode restrictions |
| runtime/plaid/src/apis/mod.rs | Adds BigQueryError to ApiError enum with From trait implementation for error conversion |
| runtime/plaid/src/apis/gcp/mod.rs | Integrates BigQuery client into GCP module structure with async initialization |
| runtime/plaid/src/apis/gcp/bigquery.rs | Core implementation: authentication, permission checking, SQL generation with identifier validation, type decoding, and query execution |
| runtime/plaid/Cargo.toml | Adds google-cloud-bigquery dependency and pins chrono to 0.4.39 for compatibility |
| runtime/plaid-stl/src/gcp/mod.rs | Exports bigquery module in standard library |
| runtime/plaid-stl/src/gcp/bigquery.rs | Defines request/response types, Filter expression tree, Operator enum, and query_table() function with 1 MiB return buffer |
| runtime/Cargo.lock | Adds transitive dependencies for BigQuery support including arrow, prost, tonic, and authentication libraries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
60b6ad5 to
f4b64a4
Compare
| /// Column names inside `Condition` nodes are validated with | ||
| /// [`is_valid_identifier`] before use. `And` and `Or` nodes must contain at | ||
| /// least one child. | ||
| fn build_filter_sql(filter: &Filter) -> Result<String, ApiError> { |
There was a problem hiding this comment.
No chance of causing stack exhaustion here?
7e76ee1 to
fd7131b
Compare
| let mut sql = format!("SELECT {column_list} FROM `{dataset}`.`{table}`"); | ||
|
|
||
| if let Some(f) = filter { | ||
| sql.push_str(" WHERE "); | ||
| sql.push_str(&build_filter_sql(f, 0)?); | ||
| } |
There was a problem hiding this comment.
Format strings for building SQL is highly discouraged. This should use a query builder for safety or we need more information on why we can't use a query builder.
There was a problem hiding this comment.
I looked at full query builder libraries but none have mature BigQuery/GoogleSQL support, so that path would have added a dependency without meaningfully improving security
Instead I switched the WHERE clause to use GoogleSQL named query parameters (@pn). Filter values are now passed as typed QueryParameter entries on QueryRequest rather than being inlined as SQL literals. Now, BigQuery owns value serialization and the hand-rolled string-escaping code (build_value_sql) is gone entirely
See 489bc48
| fn build_filter_sql(filter: &Filter, depth: usize) -> Result<String, ApiError> { | ||
| if depth > MAX_FILTER_DEPTH { | ||
| return Err(ApiError::BadRequest); | ||
| } | ||
| match filter { | ||
| Filter::And(children) | Filter::Or(children) if children.is_empty() => { | ||
| Err(ApiError::BadRequest) | ||
| } | ||
| Filter::And(children) => { | ||
| if children.len() > MAX_FILTER_CHILDREN { | ||
| return Err(ApiError::BadRequest); | ||
| } | ||
| let parts = children | ||
| .iter() | ||
| .map(|c| build_filter_sql(c, depth + 1)) | ||
| .collect::<Result<Vec<_>, _>>()?; | ||
| Ok(format!("({})", parts.join(" AND "))) | ||
| } | ||
| Filter::Or(children) => { | ||
| if children.len() > MAX_FILTER_CHILDREN { | ||
| return Err(ApiError::BadRequest); | ||
| } | ||
| let parts = children | ||
| .iter() | ||
| .map(|c| build_filter_sql(c, depth + 1)) | ||
| .collect::<Result<Vec<_>, _>>()?; | ||
| Ok(format!("({})", parts.join(" OR "))) | ||
| } | ||
| Filter::Condition { | ||
| column, | ||
| operator, | ||
| value, | ||
| } => { | ||
| if !is_valid_identifier(column) { | ||
| return Err(ApiError::BadRequest); | ||
| } | ||
| build_condition_sql(column, operator, value) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This is possibly going to blow the stack because it's not tail recursive. So this should either be a loop, be tail recursive, or limit the number of recursions to a number that will not break the stack
There was a problem hiding this comment.
The recursion depth is already bounded by MAX_FILTER_DEPTH, which limits the call stack to 5 frames before returning BadRequest. Each frame is a small fixed allocation (no large locals), so the worst-case stack consumption here is on the order of a few hundred bytes
obelisk
left a comment
There was a problem hiding this comment.
This seems like it has some rather large structural issues
|
@wbssbw I think I've fixed all the issues with 1.93. Please try rebasing on main |
obelisk
left a comment
There was a problem hiding this comment.
Please rebase on main where 1.93 should have all the issues resolved
489bc48 to
09bc509
Compare
Summary
Implements a new BigQuery API that allows Plaid modules to execute read-only queries against configured tables. The implementation includes service account authentication, a permission model that maps modules to allowed datasets and tables, SQL query construction with strict identifier validation to prevent injection attacks, and schema-driven type decoding to preserve numeric and boolean types instead of coercing all values to strings.
Changes
Standard library (plaid-stl):
gcp::bigquerymodule withquery_table()functionReadTableRequestandReadTableResponsetypesFilterexpression tree for WHERE clauses withAnd,Or, andConditionnodesOperatorenum (Eq,Ne,Lt,Le,Gt,Ge,Like,IsNull,IsNotNull)DerefandIntoIteratorforReadTableResponseto enable direct iterationRuntime API (plaid):
gcp::bigquerymodule withBigQueryclientrconfig map: module → dataset → allowed tables[A-Za-z_][A-Za-z0-9_]*)schemasconfig) to decode BigQuery string responses into correct JSON types (integer, float, boolean, string)Configuration:
bigqueryfield toGcpConfigcontaining credentials, permissions, schemas, and timeoutr[module][dataset] = [table1, table2, ...]schemas[dataset][table][column] = typeDependencies:
google-cloud-bigqueryv0.15.0 (gated behindgcpfeature)chronoto0.4.39(compatibility problems with bigquery crate)Security Considerations
SQL injection prevention:
'→'')Authorization:
rconfig mapBadRequestbefore any network callData access:
SELECT *is rejectedPerformance/Operational Impact