Skip to content

Fix UNNEST bulk insert column mapping#2239

Open
myers wants to merge 4 commits intodolthub:mainfrom
myers:unnest-bug
Open

Fix UNNEST bulk insert column mapping#2239
myers wants to merge 4 commits intodolthub:mainfrom
myers:unnest-bug

Conversation

@myers
Copy link

@myers myers commented Jan 25, 2026

Summary

Adds native ROWS FROM table function support for multi-argument UNNEST.

Django's bulk_create generates queries like:

INSERT INTO table (col1, col2)
SELECT * FROM UNNEST(array1, array2)

Multi-argument UNNEST(arr1, arr2, ...) is PostgreSQL syntactic sugar for ROWS FROM(unnest(arr1), unnest(arr2), ...), which zips results together with NULL padding for shorter arrays.

Changes

  • server/ast/table_expr.go: Detect and expand multi-arg UNNEST to RowsFromExpr
  • server/ast/aliased_table_expr.go: Support WITH ORDINALITY via RowsFromExpr
  • server/ast/select_clause.go: Update multi-arg UNNEST rewrite to use RowsFromExpr

Dependencies

This PR depends on:

Test plan

  • TestArrayFunctions/multi-argument_unnest - Tests equal/unequal length arrays, NULL padding, aliases
  • TestInsert/insert_from_unnest - Django bulk_create pattern
  • TestCreateFunctionsLanguageSQL/function_returning_multiple_rows - User-defined SRFs still work
  • Full test suite passes

@Hydrocharged
Copy link
Collaborator

Thank you for the contribution! The best way to handle this would be to rewrite the UNNEST table function using the ROWS FROM syntax, which we do not yet support. This is specified in the Postgres documentation:

The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST (Section 9.19) had been called on each parameter separately and combined using the ROWS FROM construct.

Expands multi-argument UNNEST(arr1, arr2, ...) into
ROWS FROM(unnest(arr1), unnest(arr2), ...) which properly
zips results together with NULL padding for shorter arrays.

Changes:
- table_expr.go: Detect and expand multi-arg UNNEST to RowsFromExpr
- aliased_table_expr.go: Support WITH ORDINALITY via RowsFromExpr
- select_clause.go: Update multi-arg UNNEST rewrite to use RowsFromExpr

Depends on:
- dolthub/vitess#454
- dolthub/go-mysql-server#3412
@myers
Copy link
Author

myers commented Feb 1, 2026

@Hydrocharged Like this?

@Hydrocharged
Copy link
Collaborator

We need tests for ROWS FROM as well to ensure that it works outside of the UNNEST case too. Once those are in, I'll do a deep dive with the review.

- Add comprehensive tests for explicit ROWS FROM syntax:
  - ROWS FROM with generate_series
  - ROWS FROM with unnest
  - ROWS FROM with mixed functions
- Fix table_expr.go and aliased_table_expr.go to use RowsFromExpr
  for all multi-function ROWS FROM cases, not just multi-arg UNNEST

Depends on:
- dolthub/vitess#454
- dolthub/go-mysql-server#3412
@myers
Copy link
Author

myers commented Feb 3, 2026

We need tests for ROWS FROM as well to ensure that it works outside of the UNNEST case too. Once those are in, I'll do a deep dive with the review.

@Hydrocharged done!

@Hydrocharged
Copy link
Collaborator

I apologize for the delay on the review, but I'm starting on it now. I'll have something for you much later in the day.

Copy link
Collaborator

@Hydrocharged Hydrocharged left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have here does work for the specific case of ROWS FROM, however we can generalize the changes to enable more broad support.

From looking through the documentation, it appears that ROWS FROM behaves no differently than an expression. Therefore, SELECT * FROM ROWS FROM (...); has the same relative behavior as SELECT * FROM expression();. With that in mind, we have vitess.TableFuncExpr to handle that case, which would apply to all expressions that are positioned after FROM.

With the above information, you can remove the new vitess.RowsFromExpr and replace it with vitess.TableFuncExpr. You'll still need to modify vitess though, as TableFuncExpr allows for a table alias, but not column aliases (for example, SELECT * FROM generate_series(1, 5) AS s(n); fails while removing the (n) works). The new Columns field will require GMS modifications (similar to your existing changes made for RowsFromExpr). We should also leave a comment in vitess that the Columns field is a Doltgres extension, and not used by MySQL.

Lastly, plan.RowsFrom would be moved out of GMS and into Doltgres. From here, we have two routes. The first would be to allow TableFuncExpr to reference expressions rather than strictly functions and table functions. The second would be to implement the expression as a function. The long-term goal is to implement the first one, but that may be a larger lift than just implementing this as a function. If you do go the function route, then you can use a unique name (such as dolt_rowsfrom as we disallow user-functions with the dolt prefix), with a TODO mentioning the above.

Move RowsFrom and RowsFromIter from GMS into server/node as an
ExecBuilderNode. Register BuildMultiExprTableFunc factory to handle
nameless TableFuncExpr (ROWS FROM pattern). Update AST conversion to
emit TableFuncExpr instead of the removed RowsFromExpr.
@myers
Copy link
Author

myers commented Feb 12, 2026

@Hydrocharged changes made.

Copy link
Collaborator

@Hydrocharged Hydrocharged left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is closer, but there's still a bit of work left to do.

Comment on lines +133 to +136
// For non-RowsFromExpr expressions, ordinality is not yet supported
if node.Ordinality {
return nil, errors.Errorf("ordinality is only supported for ROWS FROM expressions")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems false due to the more generalized TableFuncExpr

return nil, err
}

// Wrap in a subquery as the original code did
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to reference the original code. Was this written with AI assistance?

}

// Handle RowsFromExpr specially - it can have WITH ORDINALITY and column aliases
if rowsFrom, ok := node.Expr.(*tree.RowsFromExpr); ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in the switch statement below

Comment on lines +169 to +170
selectExprs := make(vitess.SelectExprs, 0, len(tableFuncExpr.Exprs))
for _, argExpr := range tableFuncExpr.Exprs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use append here since we're creating selectExprs based on the length of tableFuncExpr.Exprs, and then directly iterating over tableFuncExpr.Exprs. This should use the index variable rather than ignore it.

Comment on lines +104 to +129
// Handle multi-argument UNNEST specially: UNNEST(arr1, arr2, ...)
// is syntactic sugar for ROWS FROM(unnest(arr1), unnest(arr2), ...)
if len(node.Items) == 1 {
if funcExpr, ok := node.Items[0].(*tree.FuncExpr); ok {
funcName := funcExpr.Func.String()
if strings.EqualFold(funcName, "unnest") && len(funcExpr.Exprs) > 1 {
// Expand multi-arg UNNEST into separate unnest calls
selectExprs := make(vitess.SelectExprs, len(funcExpr.Exprs))
for i, arg := range funcExpr.Exprs {
argExpr, err := nodeExpr(ctx, arg)
if err != nil {
return nil, err
}
selectExprs[i] = &vitess.AliasedExpr{
Expr: &vitess.FuncExpr{
Name: vitess.NewColIdent("unnest"),
Exprs: vitess.SelectExprs{&vitess.AliasedExpr{Expr: argExpr}},
},
}
}
return &vitess.TableFuncExpr{
Exprs: selectExprs,
}, nil
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this is here. UNNEST won't resolve to *tree.RowsFromExpr, unless you mean for the nested case? Because for that, it should be handled in the natural path (probably under *tree.AliasedTableExpr, but definitely not here).

if r.withOrdinality {
schema = append(schema, &sql.Column{
Name: "ordinality",
Type: types.Int64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a GMS type, not a Doltgres type, which we usually use pgtypes as an alias to make that distinction within a file. All returned types (unless GMS specifically has to use the type, and even then there are usually workarounds) should be Doltgres types. Also, is ordinality an int4 or int8? This should be confirmed via Postgres (generally through documentation so that it can be verified later, or from direct testing if it's not in the docs).

// DebugString implements the sql.DebugStringer interface.
func (r *RowsFrom) DebugString() string {
var sb strings.Builder
sb.WriteString("RowsFrom(")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String writes "ROWS FROM" while this one writes RowsFrom.

Comment on lines +371 to +386
type singleValueIter struct {
value interface{}
consumed bool
}

func (s *singleValueIter) Next(ctx *sql.Context) (sql.Row, error) {
if s.consumed {
return nil, io.EOF
}
s.consumed = true
return sql.Row{s.value}, nil
}

func (s *singleValueIter) Close(ctx *sql.Context) error {
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, we already have sql.RowsToRowIter

// It executes multiple set-returning functions in parallel and zips their results together.
// When one function is exhausted before another, NULL is used for its values.
type RowsFromIter struct {
functions []sql.Expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than building the iterators here, why not just build them in BuildRowIter and return an error there? Then we don't have to worry about this initialized stuff.

},
},
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need tests for more general ROWS FROM behavior. For example, using ordinality and column aliases, not just table aliases. Same goes for other statements that use TableFuncExpr (which you can use generate_series for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments