May 10, 20251 yr These leaderboards assess how well models perform with regard to latency, memory, throughput, and power consumption. Often used by ML engineers optimizing for deployment on edge devices, these tools are more technical and infrastructure-focused. They may also benchmark quantized models, model distillation, or fine-tuning effectiveness. Tools: Optimum LLM Performance Leaderboard – Measures throughput and latency of LLMs across hardware types and quantization schemes (e.g., INT8, FP16). Sotabench – Tracks reproducible model benchmarks submitted by users, focusing on vision and NLP models across classic datasets like ImageNet and SQuAD.
Create an account or sign in to comment