LLM Models on Apple Silicon
Inference benchmark data for 15 models across 75 chip variants. Sorted by model size, largest first.
15Models benchmarked
75Chip variants covered
194Benchmark rows
0.6B – 235BParameter range
Llama 3.3
70B1 model in family · 1 benchmark rows · 1 chip tested
7.1 tok/sfastest (M4 Max (24-core GPU))
Llama 2
7B-8B1 model in family · 5 benchmark rows · 5 chips tested
94.3 tok/sfastest (M2 Ultra (76-core GPU, 192 GB))
Llama 3.1
7B-8B1 model in family · 52 benchmark rows · 52 chips tested
63.3 tok/sfastest (M3 Ultra (80-core GPU, 256 GB))
Qwen 2.5
7B-8B3 models in family · 54 benchmark rows · 54 chips tested
49.7 tok/sfastest (M4 Max (128 GB))
Gemma 3
0.5B-4B2 models in family · 2 benchmark rows · 1 chip tested
100.5 tok/sfastest (M4 Max (128 GB))
Llama 3.2
0.5B-4B1 model in family · 63 benchmark rows · 63 chips tested
229.0 tok/sfastest (M5 Max (32-core GPU, 36 GB))
Qwen 3
0.5B-4B6 models in family · 17 benchmark rows · 3 chips tested
184.4 tok/sfastest (M4 Max (128 GB))
Data
benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export