WiserUI-Bench

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Authors: Jaehyun Jeon, Min Soo Kim, Jang Han Yoon, Sumin Shim, Yejin Choi, Hanbin Kim, Dae Hyun Kim, Youngjae Yu

We introduce WiserUI-Bench, a benchmark for evaluating MLLMs’ understanding of user behavior-oriented UI/UX design using real-world A/B-tested interfaces and expert-curated key interpretations. Results show current models struggle with nuanced reasoning about UI/UX design and its behavioral impact. For further details, please check out our paper.

Inference

We provide an inference framework for (1) UI/UX design selection and (2) UI/UX design interpretation task on WiserUI-Bench.

pip install -r requirements.txt
cd inference
bash execute.sh

For execute.sh,
- Input your model / method / task number / gpu count based on your needs.
- Input your OpenAI/Claude API key if needed.
All the open-source models we used are supported by vllm.

Code Structure

inference/
├── prompts_task1/  # Prompts for Task 1 (selection)
├── prompts_task2/  # Prompts for Task 2 (interpretation)
├── task.py         # entry-point on WiserUI-Bench
├── methods.py      # handling prompting methods
└── VLM.py          # model inference wrapper

You can also use your custom prompts, placing in prompts folder.

Supported Models

We support inference with the following models currently:

Proprietary : o1, GPT-4o, Claude 3.5 Sonnet
Open-source : Qwen-2.5-VL (7B, 32B), InternVL-2.5 (8B, 38B), LLaVA-NeXT 7B, LLaVA-OneVision 7B

You can also use your own models by modifying the provided code.

TODO

[] Release evaluation code
[] Release mechanism supporting custom A/B-tested UI datasets

Citation

If you find our project useful, please cite:

@misc{jeon2026mllmscaptureinterfacesguide,
      title={Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding}, 
      author={Jaehyun Jeon and Min Soo Kim and Jang Han Yoon and Sumin Shim and Yejin Choi and Hanbin Kim and Dae Hyun Kim and Youngjae Yu},
      year={2026},
      eprint={2505.05026},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.05026}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figure		figure
inference		inference
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WiserUI-Bench

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Inference

Code Structure

Supported Models

TODO

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WiserUI-Bench

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Inference

Code Structure

Supported Models

TODO

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages