Skip to content

Conversation

@jemiahw
Copy link
Collaborator

@jemiahw jemiahw commented Nov 11, 2025

Users can now define a schema for entity types, including validation rules and indexing. New API endpoints have been added to fetch a list of entity types and schemas.

Adds support for per-entity-type schemas. Schemas are definied in configuration and applied as MongoDB data validations and indexes. The /api/v1/schemas/ endpoint exposes schemas.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a schema validation framework for entity types, enabling users to define field validation rules and database indexes. The implementation includes MongoDB/DocumentDB JSON schema validation, automatic index management, and new API endpoints for querying entity types and schemas.

  • Added comprehensive schema validation framework with support for field types, patterns, enums, and case rules
  • Implemented automatic MongoDB index creation/deletion with retry logic for handling concurrent operations
  • Added /schemas and /entity-types REST API endpoints to expose schema configurations

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
server/server.go Integrated schema DDL execution during server boot with retry logic for MongoDB collMod conflicts
schema/mongo_schema.go Core schema validation and MongoDB index management implementation
schema/mongo_schema_test.go Unit tests for BSON schema validator and index operations
schema/mongo_integration_test.go Integration tests verifying schema validation and index management against MongoDB
schema/config.go Schema configuration data structures and YAML mappings
schema/README.md Documentation for schema validation framework usage and extension
config/config.go Added Schemas field to application config
api/api_gomux.go Added HTTP handlers for schemas and entity-types endpoints
api/get_schemas_test.go Tests for GET /schemas endpoint
api/entity_types_test.go Tests for GET /entity-types endpoint
entity/validate.go Added EntityTypes() method to Validator interface
go.mod Added github.com/pkg/errors dependency
go.sum Updated checksums for new dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"context"
"fmt"
"log"
"math/rand"
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using math/rand for jitter in retry logic is not cryptographically secure. While this is acceptable for jitter, consider using crypto/rand or math/rand/v2 for better randomness quality.

Suggested change
"math/rand"
"crypto/rand"

Copilot uses AI. Check for mistakes.
jemiahw and others added 2 commits November 11, 2025 10:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 15 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +263 to +268
for ; try < 5; try++ {
if err = schema.CreateOrUpdateMongoSchema(context.Background(), db, s.appCtx.Config.Schemas); err == nil {
return nil
}
// Sleep 2-4 seconds before retrying. Updates are very fast, so we don't need long waits.
jitter := time.Duration(rand.Intn(2000)) * time.Millisecond
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of math/rand without seeding could lead to predictable retry patterns. Consider using math/rand/v2 (available in Go 1.22+) which is automatically seeded, or explicitly seed the random number generator with rand.New(rand.NewSource(time.Now().UnixNano())) to ensure different jitter values across application restarts.

Suggested change
for ; try < 5; try++ {
if err = schema.CreateOrUpdateMongoSchema(context.Background(), db, s.appCtx.Config.Schemas); err == nil {
return nil
}
// Sleep 2-4 seconds before retrying. Updates are very fast, so we don't need long waits.
jitter := time.Duration(rand.Intn(2000)) * time.Millisecond
// Seed a local random number generator for jitter
rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
for ; try < 5; try++ {
if err = schema.CreateOrUpdateMongoSchema(context.Background(), db, s.appCtx.Config.Schemas); err == nil {
return nil
}
// Sleep 2-4 seconds before retrying. Updates are very fast, so we don't need long waits.
jitter := time.Duration(rnd.Intn(2000)) * time.Millisecond

Copilot uses AI. Check for mistakes.
// CreateOrUpdateMongoSchema creates or updates the MongoDB schema for the given entity. If the schema is nil or has
// empty fields, it removes the JSON schema validation. If the schema is not nil, it ensures that the indexes in the
// schema exists, and any indexes that are not in the schema are removed. Entity Collection creation is handled by the
// index creation process. We assume that any reasonably designed schema should not solely rely on full collection scans.
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "reasonaby" should be "reasonably".

Copilot uses AI. Check for mistakes.
# How It Works
## Basic Structure
### Syntax
**Syntax:** YAML format with `validations` at the top level.
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation inconsistency: Line 8 mentions validations at the top level, but the example YAML at line 37 uses schemas at the top level. This inconsistency could confuse users about the correct configuration structure.

Suggested change
**Syntax:** YAML format with `validations` at the top level.
**Syntax:** YAML format with `schemas` at the top level.

Copilot uses AI. Check for mistakes.

Define how the validation appears in your YAML file:
```yaml
validations:
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation inconsistency: The example at line 125 again uses validations as the top-level key, but the actual implementation in the codebase uses schemas (as seen in config.go line 125 and the first README example at line 37). This inconsistency should be corrected to match the actual implementation.

Suggested change
validations:
schemas:

Copilot uses AI. Check for mistakes.
func updateMongoJSONValidation(ctx context.Context, db *mongo.Database, entity string, schema Schema, global Global) error {
log.Printf("INFO: Generating bson schema validator for %s", entity)

// If there are no field defined, we assume the entity owner want to bypass JSON validation and only use
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue in comment: "owner want" should be "owner wants".

Suggested change
// If there are no field defined, we assume the entity owner want to bypass JSON validation and only use
// If there are no field defined, we assume the entity owner wants to bypass JSON validation and only use

Copilot uses AI. Check for mistakes.
// We only handle enums for strings right now.
if field.Type != "string" && field.Enum != nil {
return nil, errors.Wrapf(errEnumNotString, "field %s is of type %q", field.Name, field.Type)
}
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum validation logic only checks if field.Type != "string", but bool-str type is treated as a string with a hardcoded enum of ["true", "false"]. This could cause confusion if someone tries to define a custom enum for a bool-str field. Consider adding a specific check to prevent enums on bool-str fields since the enum is automatically set, or document this behavior clearly.

Suggested change
}
}
// Prevent custom enums for "bool-str" fields, as the enum is always ["true", "false"]
if field.Type == "bool-str" && field.Enum != nil {
return nil, errors.New(fmt.Sprintf("Custom enums are not supported for bool-str fields (field %s)", field.Name))
}

Copilot uses AI. Check for mistakes.
Comment on lines +1214 to +1215
cfg.Entities = map[string]schema.EntitySchema{
entityType: api.schemas.Entities[entityType],
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential nil pointer dereference: when an entity type is valid but not present in api.schemas.Entities, accessing api.schemas.Entities[entityType] at line 1215 will return a zero-value EntitySchema with a nil Schema field. While this may be intentional to return an empty schema for known types without configured schemas, it could be clearer to explicitly handle this case or document the behavior.

Suggested change
cfg.Entities = map[string]schema.EntitySchema{
entityType: api.schemas.Entities[entityType],
// If the entity type is valid but not present in api.schemas.Entities,
// explicitly return an empty schema for that type.
entitySchema, ok := api.schemas.Entities[entityType]
if !ok {
// Entity type is valid but has no configured schema; return empty schema.
entitySchema = schema.EntitySchema{}
}
cfg.Entities = map[string]schema.EntitySchema{
entityType: entitySchema,

Copilot uses AI. Check for mistakes.
return nil, errors.Wrap(err, "Failed to decode index description")
}

log.Printf("INFO: Current existing index for %s: %+v", coll.Name(), index)
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in log message: "exising" should be "existing".

Copilot uses AI. Check for mistakes.
func updateMongoJSONValidation(ctx context.Context, db *mongo.Database, entity string, schema Schema, global Global) error {
log.Printf("INFO: Generating bson schema validator for %s", entity)

// If there are no field defined, we assume the entity owner want to bypass JSON validation and only use
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue in comment: "field defined" should be "fields defined".

Suggested change
// If there are no field defined, we assume the entity owner want to bypass JSON validation and only use
// If there are no fields defined, we assume the entity owner want to bypass JSON validation and only use

Copilot uses AI. Check for mistakes.
type Field struct {
// Name is the name of the field in the schema.
Name string `yaml:"name"`
// Type is the type of the field. This can be string, int, bool, object, or enum.
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation inconsistency in comment: The comment at line 37 says "This can be string, int, bool, object, or enum" but enum is not a valid type. Looking at the BSONSchemaValidator function, the valid types are: string, int, bool, object, datetime, int-str, and bool-str. The comment should be updated to reflect the actual supported types.

Suggested change
// Type is the type of the field. This can be string, int, bool, object, or enum.
// Type is the type of the field. Supported types: string, int, bool, object, datetime, int-str, bool-str.

Copilot uses AI. Check for mistakes.
@jemiahw
Copy link
Collaborator Author

jemiahw commented Nov 11, 2025

I copied the schema code from block-etre and made a few changes along the way. The main changes are:

  • In the configuration, I renamed from validations->schema

  • Renamed the validations package to schema
    This keeps naming a little more consistent.

  • API endpoint for schemas returns the configured schema, not the Mongo DB schema
    This is simpler to implement and keeps us database agnostic. If we swapped out MongoDB for something else, the schema endpoint would not change. Also, it lets us easily add description or other fields to the config, that can then be returned in the payload.

  • Added an API endpoint to list all entity types
    This is mostly for convenience and prepping for an eventual MCP. Its handy to be able to discover the entity types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants