Changes to Felix AI Rate Limits
We're updating how rate limits work on Felix.
The old system:
Rate limits were based on the number of messages you sent in a rolling one-hour window. The issue: every message counted equally, regardless of size. A quick one-liner used the same share of your allowance as a long, token-heavy prompt. That meant users doing light, conversational work hit limits just as quickly as users running large jobs, and the system didn't reflect actual usage.
The new system:
We're moving to token budgets. Instead of counting messages, we count tokens - the actual unit of work behind every request.
Each user now has two budgets:
- Daily budget - resets every 24 hours.
- Weekly budget - resets every 7 days.
Short messages draw a small amount from your budget. Longer messages draw more. Both budgets are visible in the interface so you can track where you stand at any time.
Cached tokens don't get counted. To reward users for sticking with active & fresh conversations, any tokens served from cache are excluded from your budget - only non-cached tokens are deducted. TTL of caching can vary from 5 minutes to 60 minutes, depending on the task.
What this means for you:
Most users will notice more headroom, especially anyone whose work involves frequent shorter exchanges.