← All benchmarks

LLM Models on Apple Silicon

Inference benchmark data for 15 models across 75 chip variants. Sorted by model size, largest first.

15Models benchmarked
75Chip variants covered
194Benchmark rows
0.6B – 235BParameter range

Llama 3.3

70B

1 model in family · 1 benchmark rows · 1 chip tested

7.1 tok/sfastest (M4 Max (24-core GPU))

Llama 2

7B-8B

1 model in family · 5 benchmark rows · 5 chips tested

94.3 tok/sfastest (M2 Ultra (76-core GPU, 192 GB))

Llama 3.1

7B-8B

1 model in family · 52 benchmark rows · 52 chips tested

63.3 tok/sfastest (M3 Ultra (80-core GPU, 256 GB))

Qwen 2.5

7B-8B

3 models in family · 54 benchmark rows · 54 chips tested

49.7 tok/sfastest (M4 Max (128 GB))

Gemma 3

0.5B-4B

2 models in family · 2 benchmark rows · 1 chip tested

100.5 tok/sfastest (M4 Max (128 GB))

Llama 3.2

0.5B-4B

1 model in family · 63 benchmark rows · 63 chips tested

229.0 tok/sfastest (M5 Max (32-core GPU, 36 GB))

Qwen 3

0.5B-4B

6 models in family · 17 benchmark rows · 3 chips tested

184.4 tok/sfastest (M4 Max (128 GB))

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See full benchmark table →