Benchmarking AI Models on Mac M3 Max, Windows Desktop, and ASUS ROG SCAR: Battle of GPUs

Introduction

With AI models becoming more advanced, the hardware powering them plays a crucial role in delivering efficient performance. In this article, I set out to compare three powerful machines—Mac M3 Max, ASUS ROG SCAR, and Windows Desktop—using three AI models: Llama3.2:3B, Qwen2.5:7B, and Qwen2.5:14B. The primary goal is to highlight how NVIDIA GPUs, particularly the 4090, stack up against other configurations in terms of latency, throughput, and request handling capabilities.

Machines Tested

Mac M3 Max: 40-core GPU.
ASUS ROG SCAR (Windows Laptop): NVIDIA 4090 GPU, Intel i9-14900k, 32GB RAM.
Windows Desktop: NVIDIA 3060ti GPU, Intel i7-13700k, 64GB RAM.

Test Parameters

The benchmarking tool Bombardier was used with the following configurations:

Single Request Test: 1 connection, 100 requests.
Multi-Request Test: 10 connections, 100 requests.

Each test consisted of POST requests sent to a local AI inference endpoint, measuring performance across various dimensions.

Benchmark Results

Single Request Test: Llama3.2:3B

Machine	Avg Req/sec	Latency (Avg)	Throughput (KB/s)	Completion Time
Mac M3 Max	0.32	3.16s	11.05	5m15s
Windows Desktop	0.38	2.44s	14.49	4m3s
ASUS ROG SCAR	0.46	2.01s	17.46	3m20s

Single Request Test: Qwen2.5:7B

Machine	Avg Req/sec	Latency (Avg)	Throughput (KB/s)	Completion Time
Mac M3 Max	0.43	3.14s	6.76	5m14s
Windows Desktop	0.36	2.64s	8.12	4m23s
ASUS ROG SCAR	0.51	2.12s	10.24	3m32s

Single Request Test: Qwen2.5:14B

Machine	Avg Req/sec	Latency (Avg)	Throughput (KB/s)	Completion Time
Mac M3 Max	0.16	6.45s	3.55	10m45s
ASUS ROG SCAR	0.23	4.35s	5.45	7m15s

Multi-Request Test: Llama3.2:3B

Machine	Avg Req/sec	Latency (Avg)	Throughput (KB/s)	Completion Time
Mac M3 Max	0.48	21.74s	15.70	3m45s
Windows Desktop	0.92	11.42s	29.97	1m59s
ASUS ROG SCAR	1.20	8.30s	24.72	1m26s

Multi-Request Test: Qwen2.5:7B

Machine	Avg Req/sec	Latency (Avg)	Throughput (KB/s)	Completion Time
Mac M3 Max	0.41	23.66s	8.68	4m6s
Windows Desktop	0.86	11.92s	17.49	2m3s
ASUS ROG SCAR	1.20	8.30s	24.72	1m26s

Multi-Request Test: Qwen2.5:14B

Machine	Avg Req/sec	Latency (Avg)	Throughput (KB/s)	Completion Time
Mac M3 Max	0.20	49.31s	4.58	8m33s
ASUS ROG SCAR	0.60	15.70s	13.96	2m43s

Graphical Representation

1. Average Latency by Machine and Model

2. Throughput by Machine and Model

3. Average Requests Per Second by Machine and Model

4. Completion Time by Machine and Model

Key Insights

ASUS ROG SCAR consistently outperformed other machines, demonstrating the power of the NVIDIA 4090 GPU and Intel i9-14900k CPU for AI inference workloads.
Windows Desktop provided solid performance but lagged slightly behind the ASUS ROG SCAR in concurrent tests. I included it in the tests just to create a stable baseline.
Mac M3 Max struggled in multi-request tests, with higher latency and lower throughput, highlighting optimization gaps for specific AI tasks.
The Qwen2.5:14B model showed significant performance drops across all machines, highlighting the increased demands of larger models.

Conclusion

This benchmarking analysis showcases the strengths and limitations of each machine under AI inference workloads. The ASUS ROG SCAR emerges as the best performer, while the Mac M3 Max shows potential in single-request scenarios. These results can guide hardware selection for AI developers seeking optimal performance.