Token-count-based Batching: Faster, Cheaper Embedding Inference for Queries
• Token-count-based Batching: Faster, Cheaper Embedding Inference for Queries Embedding model inference often struggles with efficiency when serving large volumes of short requests