🕒 TimeNet: A Temporal-Aware Question Answering Agent for Contextual and Event-Centric Information Retrieval

Overview

TimeNet is a specialized Question Answering (QA) agent designed to handle temporal reasoning in natural language queries. It addresses challenges in identifying, normalizing, and reasoning over time-related information — especially for questions that involve recurring events, ambiguous time expressions, and temporal relationships between events. Built with a ReAct-style Agent architecture and powered by a temporal knowledge graph, TimeNet demonstrates improved performance over standard RAG (Retrieval-Augmented Generation) systems.

Features

⚙️ Temporal Knowledge Graph: Constructed using Memgraph, storing over 2,000 entities and 750 time-related nodes with rich temporal relations.
🗂️ Graph Construction Pipeline: Automated pipeline for crawling, preprocessing, and enriching event data.
🤖 ReAct Agent Architecture: Modular reasoning agent that supports multi-step inference and temporal normalization.
📊 Benchmark Evaluation: 210+ temporal QA examples spanning 6 question types, evaluated using LLM-as-Judge, F1-Score, and Time IoU.
🔁 Training Time-aware Embedding model: Fine-tuning 'intfloat multilingual-e5-large' with existing triplets to focus on temporal reasoning. (In processing)

System Architecture

TimeNet/
├── agent_workflow/          # Core agent implementation
│   ├── state.py            # Agent state management
│   ├── tool.py             # Agent tools and utilities
│   └── workflow.py         # Main workflow logic
├── benchmark/              # Evaluation and benchmarking
├── config/                 # Configuration files
├── data_processing/        # Data preprocessing utilities
├── embedding_training/     # Training embeddings for temporal understanding
├── experiment/            # Experimental results and analysis
│   └── results/           # Evaluation results (F1 scores, LLM evaluations)
├── prompt/                # Prompt engineering and graph extraction
└── utils/                 # General utilities

Usage

Setting up the Environment

conda create -n timenet python=3.11
conda activate timenet
pip install -r requirements.txt

Create .env file

NEO4J_URL = 
NEO4J_USERNAME = 
NEO4J_PASSWORD = 

GOOGLE_API_KEY = 
OPENAI_API_KEY = 
GROQ_API_KEY =

AZURE_GPT_KEY = 
AZURE_GPT_URL =

TAVILY_API_KEY =

MONGO_URI = 
MONGO_DB_NAME = TimeNet

QDRANT_URL =
QDRANT_API_KEY = 
QDRANT_DB_NAME = kg_triplets

WANDB_API_KEY =

Running the System

docker-compose up

langgraph dev

Core Components

🗂️ Graph Construction Pipeline

TimeNet builds a temporal knowledge graph to store structured, time-anchored information, mainly on Viettel-related events and Vietnamese public holidays.

📥 Data Collection (data_processing/data_crawling)
- Crawl structured event data from Wikipedia and Viettel news.
- Use ScrapeGraphAI for web scraping + keyword-based search (e.g., "What was Viettel’s biggest milestone in 2021?").
- Ensure timestamps (day/month/year) are clearly extracted.
🧹 Preprocessing & Normalization (data_processing/data_crawling)
- Clean and normalize raw data into structured form:
  - Event name
  - Start/end time (Gregorian + Lunar)
  - Description and location
- Store in MongoDB.
🔍 Entity & Relation Extraction (data_processing/data_transform)
- Use GPT-4o + few-shot CoT to extract:
  - Time expressions
  - Relationships (e.g., OCCURRED_AT, PRECEDES)
🧱 Graph Updating (data_processing/data_transform)
- Use Cypher queries to check duplicates and merge or insert nodes.
- Automatically update and scale graph over time.
📐 Embedding & Indexing (data_processing/data_transform)
- Generate embeddings via text-embedding-003-small:
  - Triplets (⟨subject, predicate, object⟩)
  - Node names
- Store embeddings in QDrant for fast vector search.

🤖 ReAct Agent Flow (Workflow)

TimeNet follows a ReAct-style agent design combining reasoning and tool-use.

🔍 Analysis Node
- Analyzes user query
- Decides next action (tool use, graph search, answer generation)
- Reformulates sub-queries for optimization
🧠 Subgraph Retriever
- Keyword Extraction: Finds temporal + event-related terms
- Vector Search: Retrieves relevant subgraphs from QDrant
- Triplet Selection: Selects ~15 most relevant triplets using cosine similarity
🛠️ Toolset
- Web Search Tool: Uses Tavily for missing or up-to-date info
- Time Normalization Tools: Converts various time formats (e.g., "next Friday", "last month") into standard Gregorian dates
✅ Answer Node
- Synthesizes reasoning results
- Formats final answer (timeline, events, durations)

📊 Temporal Embedding Training (In Progress)

TimeNet is currently developing specialized temporal embeddings to better capture time-related semantic relationships. The embedding training process is ongoing work and aims to improve the system's ability to understand and reason about temporal expressions.

Training Methodology

Base Model Selection
- Starting with intfloat/multilingual-e5-large as our foundation model
- Selected for its strong multilingual capabilities and performance on semantic similarity tasks
Training Data Preparation
- Using temporally-rich triplets from the knowledge graph (<subject, predicate, object>)
- Processing pipeline:
```
Triplets CSV → Data Loading → Negative Sample Generation → Training Example Creation
```
- Implementing negative sampling strategies:
  - Wrong object for given subject-predicate pairs
  - Wrong predicate for given subject-object pairs
- Creating query-answer pairs with temporal descriptions
Contrastive Learning Approach
- Using cosine similarity loss function
- Training with positive examples (labeled 1.0) and negative examples (labeled 0.0)
- Employing SentenceTransformer with mean pooling for sequence representation
Training Configuration
- Hyperparameters:
  - Learning rate: 2e-5
  - Batch size: 16
  - Training epochs: 40
  - Warmup steps: 500
- Training-validation split: 80%-20%
- Using Weights & Biases for experiment tracking
Caching & Optimization
- Implementing data preprocessing caching for faster iteration
- Pre-computing training data and storing in pickle format

Note: This component is still under active development. Performance metrics and integration results will be reported in future updates.

Dataset & Benchmark

Data Volume:
- 800+ annotated events from 2018–2025.
- Stored in MongoDB and embedded via text-embedding-003-small.
Question Categories:
- Explicit, Implicit, Ordinal, Temporal Answer, Duration, Non-temporal.
Evaluation Set:
- 180 curated questions from events + 30 general temporal questions.

Evaluation

🧪 Metrics

Metric	Description
LLM-as-Judge	GPT-4-o scores (0–5) comparing predicted vs. reference answers
F1-Score	Event match quality (name similarity threshold 0.8 via embeddings)
Time IoU	Overlap between predicted and true event time ranges

📊 Results

Metric	Baseline (RAG)	TimeNet
LLM-as-Judge	3.16	3.38
F1-Score	0.59	0.81
Time IoU	0.49	0.58

⚠️ Limitations

❌ Difficulty with broad time queries (e.g., "List all events in 2024")
⚠️ Noisy data can introduce inconsistencies in the graph
📉 Generic embeddings may miss subtle temporal cues

Future Work

🔁 Train a time-aware embedding model for better temporal matching.
🧹 Add modules for data validation, de-duplication, and graph update optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕒 TimeNet: A Temporal-Aware Question Answering Agent for Contextual and Event-Centric Information Retrieval

Overview

Features

System Architecture

Usage

Setting up the Environment

Create .env file

Running the System

Core Components

🗂️ Graph Construction Pipeline

🤖 ReAct Agent Flow (Workflow)

📊 Temporal Embedding Training (In Progress)

Training Methodology

Dataset & Benchmark

Evaluation

🧪 Metrics

📊 Results

⚠️ Limitations

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
agent_workflow		agent_workflow
benchmark		benchmark
config		config
data		data
data_processing		data_processing
docs		docs
embedding_training		embedding_training
experiment		experiment
prompt		prompt
utils		utils
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
langgraph.json		langgraph.json
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🕒 TimeNet: A Temporal-Aware Question Answering Agent for Contextual and Event-Centric Information Retrieval

Overview

Features

System Architecture

Usage

Setting up the Environment

Create .env file

Running the System

Core Components

🗂️ Graph Construction Pipeline

🤖 ReAct Agent Flow (Workflow)

📊 Temporal Embedding Training (In Progress)

Training Methodology

Dataset & Benchmark

Evaluation

🧪 Metrics

📊 Results

⚠️ Limitations

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages