← All benchmarks

M5 MacBook — LLM Inference Benchmarks

How fast is the M5 chip family for local LLM inference? M5 base and M5 Max community reference data are both live. M5 Max (32-core, 36 GB) hits 61.6 tok/s on Llama 3.1 8B — matching M2 Ultra.

Community benchmarks live — factory lab coming — M5 community data from LocalScore is in the dataset. First-party factory lab runs (128 GB unit) will be published as verified rows. Subscribe via RSS to be notified.
~33%Faster than M4 base on Llama 3.1 8B
128 GBLab unit unified memory
7Published M5 rows (0 verified)
32 GBM5 base RAM ceiling (same as M4 base)

M5 vs M4 vs M3 — side-by-side (early data)

Early community reference benchmarks from LocalScore. All Q4_K - Medium quantization. These are from community submissions, not factory lab runs — treat as directional until verified.

ModelM5 (32 GB)M4 (32 GB)M3 (24 GB)M5 vs M4 gain
Llama 3.2 1B Instruct98.4 tok/s75.6 tok/s64.7 tok/s+30%
Llama 3.1 8B Instruct22.3 tok/s16.8 tok/s10.2 tok/s+33%
Qwen 2.5 14B Instruct11.5 tok/s8.6 tok/s6.1 tok/s+34%

M5 source: LocalScore accelerator/2912. M4/M3 sources: LocalScore community aggregation. Factory lab data will replace or supplement these rows.

Visual comparison — Llama 3.1 8B Instruct

Bar represents tok/s relative to M5 (32 GB). All Q4_K - Medium.

M5 (10-core GPU, 32 GB) 22.3 tok/s
M4 (10-core GPU, 32 GB) 16.8 tok/s
M4 (10-core GPU, 24 GB) 15.9 tok/s
M3 (10-core GPU, 16 GB) 13.5 tok/s
M3 (10-core GPU, 24 GB) 10.2 tok/s

What we know about M5 so far

Confirmed specs (M5 MacBook)

  • 10-core CPU (4 performance + 6 efficiency cores)
  • 10-core GPU (base configuration)
  • Unified memory: 16 GB, 24 GB, or 32 GB
  • Memory bandwidth: significantly higher than M4 base
  • Neural Engine: 38 TOPS (claimed)
  • Built on 3nm process (TSMC N3P)

LLM inference implications

  • ~33% higher tok/s vs M4 base on Llama 3.1 8B (measured LocalScore data)
  • 32 GB RAM tier enables 14B models at Q8 (13.4 GB) and some 32B at Q3
  • Memory bandwidth is the primary driver for this gain
  • Model fit is identical to M4 base at same RAM tier
  • First-party 128 GB unit benchmarks to include 32B and potentially 70B
Should M3/M4 MacBook owners upgrade for LLM work?

For M5 base MacBook: probably not — the ~30–65% throughput gain is real but the RAM ceiling stays at 32 GB, which limits you to the same model sizes as before. If your bottleneck is running larger models (32B+), M5 base doesn't change that. If you're running 8B–14B models and want faster responses, the upgrade is meaningful.

M5 Max is a different story. At 36 GB, M5 Max (32-core) hits 61.6 tok/s on Llama 3.1 8B — matching M2 Ultra and beating M3 Max 30-core by 64%. It's the highest-bandwidth option at 36 GB. See M5 Max vs M3 Max →

All M5 benchmark rows

Community reference rows from LocalScore. Factory lab rows (128 GB unit) coming soon. Sorted by avg tok/s.

ChipModelQuantPrompt tok/sAvg tok/sTTFTSource
M5 Max (32-core GPU, 36 GB)Llama 3.2 1B InstructQ4_K - Medium3620.7 tok/s229.0 tok/s0.3sref
M5 (10-core GPU, 32 GB)Llama 3.2 1B InstructQ4_K - Medium1271.5 tok/s98.4 tok/s1.0sref
M5 (10-core GPU, 16 GB)Llama 3.2 1B InstructQ4_K - Medium1244.9 tok/s98.1 tok/s1.0sref
M5 Max (32-core GPU, 36 GB)Llama 3.1 8B InstructQ4_K - Medium630.3 tok/s61.6 tok/s1.9sref
M5 Max (32-core GPU, 36 GB)Qwen 2.5 14B InstructQ4_K - Medium343.2 tok/s34.3 tok/s3.6sref
M5 (10-core GPU, 32 GB)Llama 3.1 8B InstructQ4_K - Medium210.7 tok/s22.3 tok/s6.0sref
M5 (10-core GPU, 32 GB)Qwen 2.5 14B InstructQ4_K - Medium110.4 tok/s11.5 tok/s11.6sref

Factory lab unit (128 GB) will add Qwen 3 32B, Llama 3.3 70B, and quantization ladder rows. Subscribe to be notified.

Related pages

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

Rows marked "ref" are community-submitted reference benchmarks. Factory lab rows are marked "factory lab" and are first-party verified.