There's a gap between ephemeral prompt caching (5min/1h TTL) and fine-tuning. For apps with a large, stable system context (~50-100K tokens) and moderate but irregular traffic, neither option fits ...