-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Task: Design Core Database Schema
Is your feature request related to a problem? Please describe.
A well-defined database schema is fundamental for storing user information, document metadata, extracted text, tokens, and other application data. This task is to finalize the initial core schema.
Describe the solution you'd like
Define and document the relational database schema for the core entities of UnravelDocs. This includes tables for users, documents, tokens (verification, password reset), and any other essential supporting tables. Specify table names, columns, data types, constraints (primary keys, foreign keys, unique constraints, not null), and relationships.
Acceptance Criteria:
- User Table (
users):- Schema defined for storing user credentials, profile information, roles, and verification status.
- Document Table (
documents):- Schema defined for storing document metadata: uploader, original filename, storage path, file type, size, upload timestamp, processing status (e.g., UPLOADED, QUEUED_FOR_OCR, PROCESSING_OCR, PROCESSED, FAILED_OCR), extracted text (or path to it).
- Token Tables (e.g.,
email_verification_tokens,password_reset_tokens):- Schema for storing tokens, their expiry, usage status, and association with users.
- Relationships:
- Clearly defined relationships between tables (e.g., one-to-many between users and documents).
- Foreign key constraints are specified.
- Indexes:
- Initial set of necessary indexes identified for performance (e.g., on foreign keys, frequently queried columns like
user_idindocuments,statusindocuments,tokenin token tables).
- Initial set of necessary indexes identified for performance (e.g., on foreign keys, frequently queried columns like
- Data Types: Appropriate data types chosen for all columns.
- Constraints: Primary keys, unique constraints (e.g., user email), and NOT NULL constraints are defined.
- Documentation: The schema is documented (e.g., using an ERD diagram, SQL DDL statements, or a markdown description).
Proposed Tables (Initial Draft - to be refined):
-
usersid(PK, e.g., UUID or BIGINT auto-increment)email(VARCHAR, UNIQUE, NOT NULL)password_hash(VARCHAR, NOT NULL)first_name(VARCHAR, NULLABLE)last_name(VARCHAR, NULLABLE)roles(VARCHAR or separate roles table, e.g., "ROLE_USER,ROLE_ADMIN")is_email_verified(BOOLEAN, DEFAULT FALSE)created_at(TIMESTAMP, NOT NULL)updated_at(TIMESTAMP, NOT NULL)
-
documentsid(PK, e.g., UUID or BIGINT auto-increment)user_id(FK referencesusers.id, NOT NULL)original_filename(VARCHAR, NOT NULL)storage_identifier(VARCHAR, NOT NULL, e.g., S3 key or file path)file_type(VARCHAR, NOT NULL, e.g., "image/png")file_size_bytes(BIGINT, NOT NULL)status(VARCHAR, NOT NULL, e.g., "UPLOADED", "QUEUED_FOR_OCR", "PROCESSING_OCR", "PROCESSED", "FAILED_OCR")extracted_text(TEXT, NULLABLE - or path to text file if very large)ocr_failure_reason(VARCHAR, NULLABLE)uploaded_at(TIMESTAMP, NOT NULL)processed_at(TIMESTAMP, NULLABLE)updated_at(TIMESTAMP, NOT NULL)
-
email_verification_tokensid(PK)user_id(FK referencesusers.id, NOT NULL, UNIQUE - one active token per user)token(VARCHAR, UNIQUE, NOT NULL)expiry_date(TIMESTAMP, NOT NULL)created_at(TIMESTAMP, NOT NULL)
-
password_reset_tokensid(PK)user_id(FK referencesusers.id, NOT NULL, UNIQUE - one active token per user)token(VARCHAR, UNIQUE, NOT NULL)expiry_date(TIMESTAMP, NOT NULL)created_at(TIMESTAMP, NOT NULL)
-
(Optional)
user_google_auth_tokensuser_id(PK, FK referencesusers.id)access_token(VARCHAR, NOT NULL)refresh_token(VARCHAR, NULLABLE)expiry_timestamp(TIMESTAMP, NOT NULL)scopes(VARCHAR)updated_at(TIMESTAMP, NOT NULL)
Tasks:
- Review and refine the proposed table structures.
- Define all columns, data types, PKs, FKs, UNIQUE constraints, and NOT NULL constraints.
- Identify and define necessary indexes.
- Create SQL DDL scripts for creating the tables.
- (Optional) Create an Entity-Relationship Diagram (ERD).
- Document the final schema.
- Set up database migration tooling (e.g., Flyway, Liquibase) to manage schema changes.
Open Questions/Considerations:
- Primary Key Strategy: UUID vs. Auto-incrementing integers.
- Storing Extracted Text: Directly in the
documentstable (if typically not excessively large) or as separate files/objects in storage (with a path in thedocumentstable)? - Roles Management: Simple string in
userstable or a separaterolesanduser_rolesjunction table for more complex role management? - Audit Fields: Consistent use of
created_at,updated_at. - Database Choice: While the schema is somewhat RDBMS-agnostic, specific data types might vary (e.g., PostgreSQL, MySQL).