Cost-Aware Contrastive Routing for LLMs
NeurIPS 2025 Spotlight, 2025
CSCR embeds both prompts and LLMs into a shared space using fast logit or perplexity fingerprints. A cost‑banded InfoNCE loss trains the space to balance quality against cost. It generalizes to unseen models and out‑of‑distribution prompts.