UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Overview

The UCFE Benchmark provides a user-centric framework for evaluating the performance of large language models (LLMs) in complex financial tasks. The complete benchmark dataset is available in UCFE_bench.json.

How to Run the Simulator

Follow these steps to set up and run the simulator:

Set your API key in the config folder.
Run the simulator with the following command: python run_ckpt.py

How to Evaluate the Model

You can evaluate individual models or run evaluations for all models:

Evaluate for a single model: bash scripts/eval_model.sh
Evaluate for all models: bash scripts/eval_all.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Overview

How to Run the Simulator

How to Evaluate the Model

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Overview

How to Run the Simulator

How to Evaluate the Model