Configuration

Number of simultaneous requests processed at once

Resource Requirements

Understanding Concurrency vs. Requests/Second

Concurrency Level: The number of simultaneous requests a single GPU instance can process at the same time. Higher concurrency = more parallel processing but potentially higher latency.

Target Requests/Second: Your desired throughput - how many requests per second you need across your entire system. This helps calculate how many GPU instances you need.

Desired throughput: how many requests per second you need to handle

Performance Metrics

⏱️
Time to First Token (TTFT)
-
ms
Inter-Token Latency (ITL)
-
ms
🚀
Throughput
-
tokens/sec

Detailed Calculations

Performance Visualization

Resource Requirements & Cost Estimation