Benchmarking AI Models on Mac M3 Max, Windows Desktop, and ASUS ROG SCAR: Battle of GPUs


Introduction

With AI models becoming more advanced, the hardware powering them plays a crucial role in delivering efficient performance. In this article, I set out to compare three powerful machines—Mac M3 Max, ASUS ROG SCAR, and Windows Desktop—using three AI models: Llama3.2:3B, Qwen2.5:7B, and Qwen2.5:14B. The primary goal is to highlight how NVIDIA GPUs, particularly the 4090, stack up against other configurations in terms of latency, throughput, and request handling capabilities.

Machines Tested

  1. Mac M3 Max: 40-core GPU.
  2. ASUS ROG SCAR (Windows Laptop): NVIDIA 4090 GPU, Intel i9-14900k, 32GB RAM.
  3. Windows Desktop: NVIDIA 3060ti GPU, Intel i7-13700k, 64GB RAM.

Test Parameters

The benchmarking tool Bombardier was used with the following configurations:

  • Single Request Test: 1 connection, 100 requests.
  • Multi-Request Test: 10 connections, 100 requests.

Each test consisted of POST requests sent to a local AI inference endpoint, measuring performance across various dimensions.


Benchmark Results

Single Request Test: Llama3.2:3B

MachineAvg Req/secLatency (Avg)Throughput (KB/s)Completion Time
Mac M3 Max0.323.16s11.055m15s
Windows Desktop0.382.44s14.494m3s
ASUS ROG SCAR0.462.01s17.463m20s

Single Request Test: Qwen2.5:7B

MachineAvg Req/secLatency (Avg)Throughput (KB/s)Completion Time
Mac M3 Max0.433.14s6.765m14s
Windows Desktop0.362.64s8.124m23s
ASUS ROG SCAR0.512.12s10.243m32s

Single Request Test: Qwen2.5:14B

MachineAvg Req/secLatency (Avg)Throughput (KB/s)Completion Time
Mac M3 Max0.166.45s3.5510m45s
ASUS ROG SCAR0.234.35s5.457m15s

Multi-Request Test: Llama3.2:3B

MachineAvg Req/secLatency (Avg)Throughput (KB/s)Completion Time
Mac M3 Max0.4821.74s15.703m45s
Windows Desktop0.9211.42s29.971m59s
ASUS ROG SCAR1.208.30s24.721m26s

Multi-Request Test: Qwen2.5:7B

MachineAvg Req/secLatency (Avg)Throughput (KB/s)Completion Time
Mac M3 Max0.4123.66s8.684m6s
Windows Desktop0.8611.92s17.492m3s
ASUS ROG SCAR1.208.30s24.721m26s

Multi-Request Test: Qwen2.5:14B

MachineAvg Req/secLatency (Avg)Throughput (KB/s)Completion Time
Mac M3 Max0.2049.31s4.588m33s
ASUS ROG SCAR0.6015.70s13.962m43s

Graphical Representation

1. Average Latency by Machine and Model

2. Throughput by Machine and Model

3. Average Requests Per Second by Machine and Model

4. Completion Time by Machine and Model


Key Insights

  1. ASUS ROG SCAR consistently outperformed other machines, demonstrating the power of the NVIDIA 4090 GPU and Intel i9-14900k CPU for AI inference workloads.
  2. Windows Desktop provided solid performance but lagged slightly behind the ASUS ROG SCAR in concurrent tests. I included it in the tests just to create a stable baseline.
  3. Mac M3 Max struggled in multi-request tests, with higher latency and lower throughput, highlighting optimization gaps for specific AI tasks.
  4. The Qwen2.5:14B model showed significant performance drops across all machines, highlighting the increased demands of larger models.

Conclusion

This benchmarking analysis showcases the strengths and limitations of each machine under AI inference workloads. The ASUS ROG SCAR emerges as the best performer, while the Mac M3 Max shows potential in single-request scenarios. These results can guide hardware selection for AI developers seeking optimal performance.


Leave a Reply

Your email address will not be published. Required fields are marked *