NVIDIA GPU Estimator

Configuration

Model:

Configuration:

Input / Output Tokens:

Concurrency Level: Number of simultaneous requests processed at once

Custom Input Tokens:

Custom Output Tokens:

Use Custom Values

Resource Requirements

Understanding Concurrency vs. Requests/Second

Concurrency Level: The number of simultaneous requests a single GPU instance can process at the same time. Higher concurrency = more parallel processing but potentially higher latency.

Target Requests/Second: Your desired throughput - how many requests per second you need across your entire system. This helps calculate how many GPU instances you need.

Target Requests/Second: Desired throughput: how many requests per second you need to handle

Target Tokens/Day (Million):

Performance Metrics

⏱️

Time to First Token (TTFT)

⚡

Inter-Token Latency (ITL)

🚀

Throughput

tokens/sec

Detailed Calculations

Performance Visualization

Resource Requirements & Cost Estimation

Model Pricing Calculator

Calculate Token Prices from TCO

Enter the TCO (Total Cost of Ownership) per GPU per hour. The calculator will determine the optimal price per input and output token based on the selected configuration's performance metrics.

TCO per GPU/Hour ($): Total Cost of Ownership per GPU per hour