Question 1

What is Sarisari-Bench?

Accepted Answer

Sarisari-Bench is an AI agent benchmark that simulates managing a sari-sari store (small neighborhood store) in the Philippines. It evaluates how well AI models can make coherent business decisions over a 30-day period.

Question 2

How does the benchmark work?

Accepted Answer

Each AI model starts with 10,000 PHP and must manage inventory, handle customer demand, and make purchasing decisions over 30 simulated days. The primary metric is the final cash balance, measuring profitability and decision-making quality.

Question 3

Which AI models are supported?

Accepted Answer

Sarisari-Bench supports major API models (GPT-4o, Claude, Gemini) and local LLMs via Ollama and LM Studio, including Llama, Phi, CodeLlama, and Gemma models.

Question 4

What is a sari-sari store?

Accepted Answer

A sari-sari store is a small neighborhood convenience store commonly found in the Philippines. They sell everyday items like snacks, drinks, canned goods, and household essentials in small quantities.

Question 5

How can I run the benchmark myself?

Accepted Answer

You can clone the repository from GitHub and use the provided Python scripts (run_benchmark.py) to test models locally with your own API keys or local LLM setup.

Rank	Model	Final Cash (₱)	Return	Profit (₱)
1	Gemini 2.5 Flash	₱12,871.00	128.7%	+₱2,871.00
2	Claude Sonnet 4	₱11,911.00	119.1%	+₱1,911.00
3	GPT-4o	₱11,377.00	113.8%	+₱1,377.00
4	GPT-4.1	₱11,310.00	113.1%	+₱1,310.00
5	GPT-4o Mini	₱11,067.00	110.7%	+₱1,067.00
6	Gemini 2.0 Flash	₱10,965.00	109.7%	+₱965.00
7	CodeLlama 7B	₱10,952.00	109.5%	+₱952.00
8	GPT-4.1 Mini	₱10,924.00	109.2%	+₱924.00
9	Grok 3 Mini	₱10,664.00	106.6%	+₱664.00
10	Claude 3.5 Haiku	₱10,000.00	100.0%	+₱0.00

Sarisari-Bench

Sarisari-Bench

Return on Investment

Cash Balance Over Time

Explore

Leaderboard

Models

Runs

Frequently Asked Questions

What is Sarisari-Bench?

How does the benchmark work?

Which AI models are supported?

What is a sari-sari store?

How can I run the benchmark myself?