SmartDoc Analyst

RAG (Retrieval Augmented Generation) application for document analysis. Upload PDFs, ask questions, get AI-powered answers with source citations. Built with NX monorepo, LangChain, Groq, and Pinecone.

✨ Features

Capability	Stack
🤖 LLM	Groq (Llama 3.3 70B)
📐 Embeddings	Hugging Face (all-MiniLM-L6-v2)
📦 Vector DB	Pinecone
🔗 RAG Pipeline	LangChain.js
📄 Documents	PDF, TXT, MD
💬 Contextual chat	History in prompt
⚡ Streaming	Real-time responses

🔄 RAG Flow

flowchart TB
    subgraph Ingestion
        A[📄 PDF/TXT/MD] --> B[RecursiveCharacterTextSplitter]
        B --> C[HuggingFace Embeddings]
        C --> D[(Pinecone)]
    end
    subgraph Query
        Q[❓ Question] --> C
        D --> E[Similarity Search]
        E --> F[Top-K Context]
        F --> G[Groq LLM]
        G --> R[📝 Response + Sources]
    end

📁 Structure

smartdoc-analyst/
├── apps/
│   ├── frontend/     # Angular 17+ UI (Tailwind CSS, RxJS)
│   └── server/       # NestJS API (Groq + Pinecone)
├── libs/
│   ├── api-interfaces/  # Shared TypeScript interfaces
│   └── ai-engine/      # LangChain RAG orchestration

Setup

1. Install dependencies

npm install

2. Environment variables

Copy .env.example to .env and fill in your API keys:

cp .env.example .env

Required:

GROQ_API_KEY - Groq Cloud
PINECONE_API_KEY - Pinecone
PINECONE_INDEX_NAME - Your Pinecone index name (default: smartdoc-index)
HUGGINGFACE_API_KEY - Hugging Face - Required for embeddings (chat + document ingestion)

3. Run

Option A: Local Development

API Server:

npm run serve:server

Frontend:

npm run serve:frontend

Option B: Docker (Recommended for Production)

Production:

# Build and start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Development (with hot reload):

docker-compose -f docker-compose.dev.yml up

The application will be available at:

Frontend: http://localhost (or http://localhost:4200 in dev mode)
API: http://localhost:3000
API Docs: http://localhost:3000/api/docs

Commands

Command	Description
`npm run serve:server`	Start NestJS API on port 3000 (validates .env first)
`npm run serve:frontend`	Start Angular app
`npm run build`	Build all apps and libs
`npm run lint`	Lint all projects
`nx run server:validate-env`	Validate that all required .env keys are set
`nx run server:test`	Run server unit and e2e tests (29 tests)
`nx run frontend:test`	Run frontend unit tests (components, services, pipes)

API Documentation

Interactive API documentation is available via Swagger/OpenAPI:

Local: http://localhost:3000/api/docs (when server is running)
Features:
- Try out endpoints directly from the browser
- View request/response schemas
- See example requests and responses
- All endpoints documented with descriptions and examples

CI/CD

GitHub Actions runs on every push and pull request to main/master:

Lint: Code quality checks across all projects
Tests: Server unit + e2e + Frontend unit tests with coverage reporting (mock API keys in CI)
Build: All apps and libs in production mode
Security: Automated security scanning with npm audit and Snyk

See .github/workflows/ci.yml.

Dependencies: Automated dependency updates via Dependabot.

Health check:

GET /health - Quick check (environment variables only)
GET /health?checkServices=true - Full check with connectivity tests to Pinecone, Groq, and Hugging Face
Returns { status, timestamp, env, services? } for monitoring

Docker

Production Build

The project includes Dockerfiles for both server and frontend:

Server: Multi-stage build with Node.js 20 Alpine
Frontend: Angular build served with Nginx
Health checks: Built-in health monitoring
Data persistence: Volume mounts for data/ directory

Docker Commands

# Build images
docker-compose build

# Start services
docker-compose up -d

# View logs
docker-compose logs -f server
docker-compose logs -f frontend

# Stop services
docker-compose down

# Rebuild and restart
docker-compose up -d --build

Environment Variables in Docker

Create a .env file in the project root (or set environment variables):

GROQ_API_KEY=your_key_here
PINECONE_API_KEY=your_key_here
PINECONE_INDEX_NAME=smartdoc-index
HUGGINGFACE_API_KEY=your_key_here

Docker Compose will automatically load these variables.

Contributing

We welcome contributions! Please see our Contributing Guide for details on how to submit pull requests, report bugs, and suggest enhancements.

Security

For security vulnerabilities, please see our Security Policy.

Architecture

Clean Architecture & SOLID principles
api-interfaces: Shared contracts between frontend and backend
ai-engine: LangChain.js orchestration (Groq LLM, Pinecone vector store, Hugging Face embeddings)
ChatModule: NestJS module that receives prompts, queries Pinecone, returns LLM responses. Includes conversation history in RAG prompt for contextual answers
Rate limiting: 60 requests/minute per IP (configurable via THROTTLE_TTL, THROTTLE_LIMIT). /health excluded
Logging: Pino structured JSON logs. Use node dist/apps/server/main.js | npx pino-pretty for readable dev output
DocumentsModule: Upload PDF/TXT/MD files; parses, chunks, embeds, and upserts to Pinecone. Registry persisted to data/documents.json. POST /api/documents/upload-stream streams progress (parsing → chunking → indexing)
ConversationsModule: Persists conversations to data/conversations.json (survives server restart)
ChatService (frontend): RxJS-based reactive stream for chat messages
HTTP Interceptors: Global error handling, loading states, and automatic retry for transient failures
Performance: Optimized template subscriptions to prevent memory leaks
Security: Markdown sanitization prevents XSS attacks in AI-generated content

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.angular/cache/20.3.15/frontend		.angular/cache/20.3.15/frontend
.github		.github
.vscode		.vscode
apps		apps
assets		assets
libs		libs
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.frontend		Dockerfile.frontend
Dockerfile.server		Dockerfile.server
README.md		README.md
SECURITY.md		SECURITY.md
codecov.yml		codecov.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
jest.config.ts		jest.config.ts
jest.preset.js		jest.preset.js
nginx.conf		nginx.conf
nx.json		nx.json
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartDoc Analyst

✨ Features

🔄 RAG Flow

📁 Structure

Setup

1. Install dependencies

2. Environment variables

3. Run

Commands

API Documentation

CI/CD

Docker

Production Build

Docker Commands

Environment Variables in Docker

Contributing

Security

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

y-randhal/smartdoc-analyst

Folders and files

Latest commit

History

Repository files navigation

SmartDoc Analyst

✨ Features

🔄 RAG Flow

📁 Structure

Setup

1. Install dependencies

2. Environment variables

3. Run

Commands

API Documentation

CI/CD

Docker

Production Build

Docker Commands

Environment Variables in Docker

Contributing

Security

Architecture

About

Topics

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages