GitHub - Duguce/GuessArena-Demo: A web-based interactive demo for the GuessArena evaluation framework

GuessArena Demo

A web-based interactive demo for the GuessArena evaluation framework

Note

GuessArena Demo is a lightweight web application that simulates a card-guessing game with both player interaction and AI-versus-AI simulation.
It provides an intuitive, hands-on interface to explore the evaluation methodology introduced in our paper:
“GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning” .

🚀 Features

Player Mode – Interactively play the guessing game by asking yes/no questions to an AI judge
AI Simulation Mode – Observe two LLMs engaging in a self-play guessing process
Leaderboard Tracking – Compare model performance across different domains and settings
Customizable Decks – Use built-in card sets or define your own domain-specific decks
Domain-Specific Scenarios – Evaluate reasoning in different industries and knowledge areas

📦 Requirements

Python 3.8+
Flask
OpenAI API access

🛠 Installation

Clone the repository:

git clone git@github.com:Duguce/GuessArena-Demo.git
cd GuessArena-Demo

Create and activate a conda environment:

conda create -n guessarena python=3.10
conda activate guessarena

Install dependencies:
```
pip install -r requirements.txt
```
Configure your API settings in config/settings.json

🌅 Usage

Set up your API keys in config/models.ini for the AI models you want to use.
Start the application:
```
python app.py
```
Open your browser and go to http://localhost:8888
Choose between Player Mode or AI Simulation

📁 Project Structure

/config - Configuration files and model settings
/data - Leaderboard data, logs, and card decks
/prompts - Prompt templates for AI models
/static - Static assets (CSS, JavaScript)
/templates - HTML templates

🔒 Security

The application includes several security features:

Content Security Policy
Rate limiting for API endpoints
Path traversal prevention
Secure file access

📖 Citation

@inproceedings{
    GuessArena,
    title = "{G}uess{A}rena: Guess Who {I} Am? A Self-Adaptive Framework for Evaluating {LLM}s in Domain-Specific Knowledge and Reasoning",
    author = "Yu, Qingchen  and
      Zheng, Zifan  and
      Chen, Ding  and
      Niu, Simin  and
      Tang, Bo  and
      Xiong, Feiyu  and
      Li, Zhiyu",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.534/",
    doi = "10.18653/v1/2025.acl-long.534",
    pages = "10897--10912",
    ISBN = "979-8-89176-251-0",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GuessArena Demo

🚀 Features

📦 Requirements

🛠 Installation

🌅 Usage

📁 Project Structure

🔒 Security

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
prompts		prompts
static		static
templates		templates
.gitignore		.gitignore
.htaccess		.htaccess
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Duguce/GuessArena-Demo

Folders and files

Latest commit

History

Repository files navigation

GuessArena Demo

🚀 Features

📦 Requirements

🛠 Installation

🌅 Usage

📁 Project Structure

🔒 Security

📖 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages