A document analysis and viewing application that provides an intuitive interface for browsing, searching, and analyzing collections of documents. Integrates with the Document Analysis Tool for OCR, entity extraction, and document processing.
- Collections Management: Organize documents into collections, create new collections, and add files to existing ones
- Document Viewer: View document pages with high-quality images and extracted text
- Entity Extraction: Automatic extraction of people, places, dates, and objects mentioned in documents
- Geographic Visualization: Interactive map showing locations mentioned in documents (powered by Leaflet)
- Timeline Visualization: Visual timeline of dates referenced in documents
- False Redaction Detection: Detect hidden text under visual redactions in PDFs
- Correspondence Support: Special handling for letters with From/To information
- Search: Live search across document content, entities, and metadata within collections
- Real-time Processing: Monitor document processing progress with a minimizable status indicator
document-viewer/
├── client/ # Next.js frontend application
│ ├── app/ # App Router pages and API routes
│ ├── components/ # React components
│ └── lib/ # Utilities and database client
├── prisma/ # Database schema and migrations
├── scripts/ # Utility scripts
├── server/ # Backend services (news-scraper, text-analysis)
└── shared/ # Shared types and utilities
- Node.js 18+
- PostgreSQL 12+
- Document Analysis Tool running on
localhost:3001
-
Clone the repository:
git clone https://github.com/your-repo/document-viewer.git cd document-viewer -
Install dependencies:
npm install
-
Configure environment variables:
Copy the example environment file and update the values:
cp .env.example .env
Key environment variables:
# Database connection DATABASE_URL="postgresql://postgres:password@localhost:5432/jfk_documents?schema=public" # Media source: 'local' uses Document Analysis Tool, 'remote' uses external API NEXT_PUBLIC_MEDIA_SOURCE=local # Document Analysis Tool API URL (required when NEXT_PUBLIC_MEDIA_SOURCE=local) DOCUMENT_ANALYZER_URL=http://localhost:3001
-
Set up the database:
# Start PostgreSQL (if using Docker) npm run db:local # Generate Prisma client and run migrations npm run post npx prisma migrate dev
# Start the client application
npm run dev:clientThe application will be available at http://localhost:3000.
# Start all services with Docker Compose
docker compose -f docker-compose.dev.yml up -d --build- Click "Create New Collection" in the sidebar
- Select a folder or file to process
- Choose OCR settings (Grok, DocTR, or TrOCR)
- Optionally enable entity analysis and false redaction detection
- Click "Start Processing"
The progress can be minimized to a floating indicator while processing continues in the background.
- Click the
...menu next to a collection in the sidebar - Select "Add Files"
- Choose files/folders to add and processing options
- Click on any document to open the document viewer
- Navigate pages using the page selector
- View extracted entities (people, places, dates, objects) in the sidebar panels
- Explore the geographic map and timeline visualizations
- Use the "Sync" button to refresh document data from the analyzer
On a collection page, use the search bar to filter documents by:
- Document title
- Summary content
- People mentioned
- Places mentioned
- Dates referenced
# Import documents from local analysis output
npm run db:import-local
# Update existing documents from local analysis
npm run db:update-local
# Reset database
npm run db:reset- Frontend: Next.js 15 (App Router), React 19, Tailwind CSS
- UI Components: Radix UI, Lucide Icons, Framer Motion
- Maps: Leaflet, React-Leaflet
- Charts: Recharts, D3.js
- Database: PostgreSQL with Prisma ORM
- Real-time: Server-Sent Events (SSE) for processing updates
The application includes several API endpoints:
/api/docs/documents/[id]- Get document details/api/docs/documents/[id]/sync- Sync document from analyzer/api/docs/document-status- List documents with filtering/api/docs/document-groups- Get available collections/api/docs/sync-all- Bulk sync all documents/api/analyzer-proxy/[...path]- Proxy requests to Document Analysis Tool/api/filesystem/browse- Browse local filesystem for file selection
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the GNU General Public License v3.0.