LLM Models on Apple Silicon

Inference benchmark data for 15 models across 75 chip variants. Sorted by model size, largest first.

15Models benchmarked

75Chip variants covered

194Benchmark rows

0.6B – 235BParameter range

Llama 3.3

70B

1 model in family · 1 benchmark rows · 1 chip tested

7.1 tok/sfastest (M4 Max (24-core GPU))

Llama 3.3 70B

1 rows1 chips7.1 tok/sfits 50 GB

Llama 2

7B-8B

1 model in family · 5 benchmark rows · 5 chips tested

94.3 tok/sfastest (M2 Ultra (76-core GPU, 192 GB))

Llama 2 7B

5 rows5 chips94.3 tok/sfits 3.56 GB

Llama 3.1

7B-8B

1 model in family · 52 benchmark rows · 52 chips tested

63.3 tok/sfastest (M3 Ultra (80-core GPU, 256 GB))

Llama 3.1 8B Instruct

52 rows52 chips63.3 tok/s

Qwen 2.5

7B-8B

3 models in family · 54 benchmark rows · 54 chips tested

49.7 tok/sfastest (M4 Max (128 GB))

Qwen 2.5 14B

1 rows1 chips18.6 tok/sfits 8 GB

Qwen 2.5 14B Instruct

52 rows52 chips36.7 tok/s

Qwen 2.5 7B Instruct

1 rows1 chips49.7 tok/s

Gemma 3

0.5B-4B

2 models in family · 2 benchmark rows · 1 chip tested

100.5 tok/sfastest (M4 Max (128 GB))

Gemma 3 27B

1 rows1 chips14.5 tok/s

Gemma 3 4B

1 rows1 chips100.5 tok/s

Llama 3.2

0.5B-4B

1 model in family · 63 benchmark rows · 63 chips tested

229.0 tok/sfastest (M5 Max (32-core GPU, 36 GB))

Llama 3.2 1B Instruct

63 rows63 chips229.0 tok/s

Qwen 3

0.5B-4B

6 models in family · 17 benchmark rows · 3 chips tested

184.4 tok/sfastest (M4 Max (128 GB))

Qwen 3 235B A22B

1 rows1 chips8.1 tok/s

Qwen 3 32B

3 rows3 chips22.0 tok/sfits 11 GBlab

Qwen 3 30B A3B

5 rows2 chips92.1 tok/sfits 16.12 GB

Qwen 3 8B

1 rows1 chips63.1 tok/s

Qwen 3 4B

6 rows1 chips149.1 tok/sfits 2.54 GB

Qwen 3 0.6B

1 rows1 chips184.4 tok/s

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export

See full benchmark table →