Change the repository type filter
All
Repositories list
10 repositories
- General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.
- BizFinBench.v2: A Unified Offline–Online Bilingual Benchmark for Expert-Level Financial Capability Evaluation of LLMs
CCPO
PublicCompress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI AgentsFinMTM
PublicFinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent EvaluationBizFinBench
PublicA Business-Driven Real-World Financial Benchmark for Evaluating LLMsPuzzleClone
PublicPuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning Data- [MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
NEXUS-O
Public[MM 2025] NEXUS-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And VisionPolyhedronEvaluator
PublicPublished_Papers
Public