Plan and estimate token budgets for AI prompts. Allocate room for system instructions, context, examples, and output within your model token limit.

MODEL / CONTEXT WINDOW

TOTAL BUDGET 128,000 tokens

SYSTEM INSTRUCTIONS Persona, rules, constraints

Allocated: 12,800 tokens

CONTEXT / DOCUMENTS Files, knowledge, background

Allocated: 64,000 tokens

EXAMPLES / FEW-SHOT Input-output demonstrations

Allocated: 25,600 tokens

OUTPUT RESERVE Max tokens for model response

💡 Reserve this for the model's response. Set your API max_tokens to this value. This section has no text input — it's a budget allocation only.

Allocated: 25,600 tokens Suggested max_tokens: 25,600

System 10%

Context 50%

Examples 20%

Output 20%

TOTAL USED 100%

HOW TO USE

Select Your Model

Pick your LLM and its context window from the dropdown, or enter a custom token limit.

Set Percentages

Use + / − buttons to allocate what percent of the budget goes to each prompt section. Total must equal 100%.

Paste & Verify

Optionally paste real text into each section to see how many tokens it uses vs. your allocation.

Export Your Plan

Download your budget as JSON or plain text to reference when building your prompt.

FEATURES

4 Section Budget 20+ Model Presets Live Token Estimate Visual Bar Chart JSON Export Custom Limits

USE CASES

🤖 RAG pipeline prompt planning

🔧 API prompt engineering & optimization

📄 Document Q&A context allocation

🧪 Few-shot learning experiment setup

💡 Chatbot system prompt budgeting

WHAT IS THIS?

Every LLM has a context window — a hard limit on how many tokens it can process at once. Exceeding it causes truncation, errors, or degraded responses. This planner helps you divide that budget intelligently across the four main components of a production prompt: system instructions, context/documents, examples, and output reserve.

RELATED TOOLS

FREQUENTLY ASKED QUESTIONS

What is a token in the context of LLMs?

A token is roughly 3–4 characters or about ¾ of a word in English. LLMs process text as tokens rather than characters or words. A 1,000-word document is approximately 1,300–1,500 tokens. Images and other modalities have their own token counting rules.

Why does token budgeting matter?

Every model has a fixed context window. If your total prompt + response exceeds this limit, the model will either truncate your input, throw an error, or produce incomplete responses. Planning ahead prevents these issues and ensures predictable API costs.

What percentage should I give to each section?

It depends on your use case. For RAG (retrieval-augmented generation), context may need 60–70%. For few-shot tasks, examples might need 30–40%. A common starting split is: System 10%, Context 50%, Examples 20%, Output 20%.

Is the token count in this tool 100% accurate?

This tool uses a fast heuristic estimator (approx 4 chars per token) for real-time feedback. For production accuracy, use the official tokenizer for your model: tiktoken for OpenAI, or the Anthropic/Google tokenizer APIs. The estimates here are a reliable planning guide.

What is "Output Reserve" and how does it differ?

Output Reserve is the portion of the context window set aside for the model's response. You don't fill it with text — instead, you use this number as your API's max_tokens parameter. If your prompt consumes 80K tokens on a 128K model, you have up to 48K tokens for output.

Can I use this for multi-turn conversations?

Yes. For chat-based models, treat the entire conversation history as your "Context" section. Reserve system instructions separately, and leave enough output budget for the next response. Many teams budget 10–20% per turn for rolling conversation windows.

What happens if my percentages don't add up to 100%?

The tool will show a warning in the summary bar. The percentages must sum to exactly 100% for the plan to be valid. You can adjust any section's percentage using the + / − controls or by typing directly into the percentage field.

How do I export my budget plan?

Use the "Export JSON" button to download a machine-readable plan you can embed in your codebase or CI config. "Export Text" gives a human-readable summary. Both include model name, total budget, per-section allocations in tokens and percentages.

{ Prompt Token Budget Planner }

HOW TO USE

FEATURES

USE CASES

WHAT IS THIS?

RELATED TOOLS

FREQUENTLY ASKED QUESTIONS

What Is a Prompt Token Budget Planner?

The Four Components of a Prompt Budget

Context Windows Across Major LLM Models

Why Token Budgeting Improves Prompt Quality

How Token Estimation Works

Best Practices for Token Budget Planning