M5 MacBook — LLM Inference Benchmarks

How fast is the M5 chip family for local LLM inference? M5 base and M5 Max community reference data are both live. M5 Max (32-core, 36 GB) hits 61.6 tok/s on Llama 3.1 8B — matching M2 Ultra.

Community benchmarks live — factory lab coming — M5 community data from LocalScore is in the dataset. First-party factory lab runs (128 GB unit) will be published as verified rows. Subscribe via RSS to be notified.

~33%Faster than M4 base on Llama 3.1 8B

128 GBLab unit unified memory

7Published M5 rows (0 verified)

32 GBM5 base RAM ceiling (same as M4 base)

M5 vs M4 vs M3 — side-by-side (early data)

Early community reference benchmarks from LocalScore. All Q4_K - Medium quantization. These are from community submissions, not factory lab runs — treat as directional until verified.

Model	M5 (32 GB)	M4 (32 GB)	M3 (24 GB)	M5 vs M4 gain
Llama 3.2 1B Instruct	98.4 tok/s	75.6 tok/s	64.7 tok/s	+30%
Llama 3.1 8B Instruct	22.3 tok/s	16.8 tok/s	10.2 tok/s	+33%
Qwen 2.5 14B Instruct	11.5 tok/s	8.6 tok/s	6.1 tok/s	+34%

M5 source: LocalScore accelerator/2912. M4/M3 sources: LocalScore community aggregation. Factory lab data will replace or supplement these rows.

Visual comparison — Llama 3.1 8B Instruct

Bar represents tok/s relative to M5 (32 GB). All Q4_K - Medium.

M5 (10-core GPU, 32 GB) 22.3 tok/s

M4 (10-core GPU, 32 GB) 16.8 tok/s

M4 (10-core GPU, 24 GB) 15.9 tok/s

M3 (10-core GPU, 16 GB) 13.5 tok/s

M3 (10-core GPU, 24 GB) 10.2 tok/s

What we know about M5 so far

Confirmed specs (M5 MacBook)

10-core CPU (4 performance + 6 efficiency cores)
10-core GPU (base configuration)
Unified memory: 16 GB, 24 GB, or 32 GB
Memory bandwidth: significantly higher than M4 base
Neural Engine: 38 TOPS (claimed)
Built on 3nm process (TSMC N3P)

LLM inference implications

~33% higher tok/s vs M4 base on Llama 3.1 8B (measured LocalScore data)
32 GB RAM tier enables 14B models at Q8 (13.4 GB) and some 32B at Q3
Memory bandwidth is the primary driver for this gain
Model fit is identical to M4 base at same RAM tier
First-party 128 GB unit benchmarks to include 32B and potentially 70B

Should M3/M4 MacBook owners upgrade for LLM work?

For M5 base MacBook: probably not — the ~30–65% throughput gain is real but the RAM ceiling stays at 32 GB, which limits you to the same model sizes as before. If your bottleneck is running larger models (32B+), M5 base doesn't change that. If you're running 8B–14B models and want faster responses, the upgrade is meaningful.

M5 Max is a different story. At 36 GB, M5 Max (32-core) hits 61.6 tok/s on Llama 3.1 8B — matching M2 Ultra and beating M3 Max 30-core by 64%. It's the highest-bandwidth option at 36 GB. See M5 Max vs M3 Max →

All M5 benchmark rows

Community reference rows from LocalScore. Factory lab rows (128 GB unit) coming soon. Sorted by avg tok/s.

Chip	Model	Quant	Prompt tok/s	Avg tok/s	TTFT	Source
M5 Max (32-core GPU, 36 GB)	Llama 3.2 1B Instruct	Q4_K - Medium	3620.7 tok/s	229.0 tok/s	0.3s	ref
M5 (10-core GPU, 32 GB)	Llama 3.2 1B Instruct	Q4_K - Medium	1271.5 tok/s	98.4 tok/s	1.0s	ref
M5 (10-core GPU, 16 GB)	Llama 3.2 1B Instruct	Q4_K - Medium	1244.9 tok/s	98.1 tok/s	1.0s	ref
M5 Max (32-core GPU, 36 GB)	Llama 3.1 8B Instruct	Q4_K - Medium	630.3 tok/s	61.6 tok/s	1.9s	ref
M5 Max (32-core GPU, 36 GB)	Qwen 2.5 14B Instruct	Q4_K - Medium	343.2 tok/s	34.3 tok/s	3.6s	ref
M5 (10-core GPU, 32 GB)	Llama 3.1 8B Instruct	Q4_K - Medium	210.7 tok/s	22.3 tok/s	6.0s	ref
M5 (10-core GPU, 32 GB)	Qwen 2.5 14B Instruct	Q4_K - Medium	110.4 tok/s	11.5 tok/s	11.6s	ref

Factory lab unit (128 GB) will add Qwen 3 32B, Llama 3.3 70B, and quantization ladder rows. Subscribe to be notified.

M5 (10-core GPU) — all RAM variants M5 (10-core GPU, 16 GB)M5 (10-core GPU, 32 GB)M5 Max (32-core GPU, 36 GB) — 61.6 tok/s on 8B M4 (10-core GPU, 32 GB)

M5 vs M4 →M5 vs M3 (+65%) →M5 Max vs M3 Max (+64%) →Apple Silicon generations comparison Buying guide All benchmarks →

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

Rows marked "ref" are community-submitted reference benchmarks. Factory lab rows are marked "factory lab" and are first-party verified.