M4 Max — LLM Benchmarks

Measured LLM inference benchmarks for M4 Max across all RAM configurations (128 GB). 8 benchmark rows across 8 models. Compare how RAM affects throughput. Real runs, not estimates.

8Benchmark rows

8Models tested

1RAM configurations

184.4Fastest avg tok/s

RAM configurations

Each configuration differs only in unified memory. More RAM = larger models fit. Throughput is similar across RAM tiers at the same model size.

128 GBM4 Max (128 GB)8 rows · 8 models184.4 tok/s peak

All benchmark rows — M4 Max

Sorted by avg tok/s descending. Click source badge to see original measurement.

Chip (RAM)	Model	Quant	RAM req.	Avg tok/s	Prompt tok/s	Runtime	Source
M4 Max (128 GB)	Qwen 3 0.6B	Q8_0	—	184.4 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Gemma 3 4B	Q4_0	—	100.5 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Qwen 3 30B A3B	Q4_K_M	—	70.2 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Qwen 3 8B	Q4_K_M	—	63.1 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Qwen 2.5 7B Instruct	Q8_0	—	49.7 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Gemma 3 27B	Q8_0	—	14.5 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Qwen 3 32B	Q4_K_M	—	11.7 tok/s	—	LM Studio	ref
M4 Max (128 GB)	Qwen 3 235B A22B	Q4_K_M	—	8.1 tok/s	—	LM Studio	ref

Models tested on M4 Max

Gemma 3 27B Gemma 3 4B Qwen 2.5 7B Instruct Qwen 3 0.6B Qwen 3 235B A22B Qwen 3 30B A3B Qwen 3 32B Qwen 3 8B

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

Data sourced from factory lab measurements and community reference runs. See all chips →