MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking
arXiv preprint, 2026
Abstract: The paper addresses the high inference cost of transformer-based cross-encoders in Information Retrieval. The authors propose MICE (Minimal Interaction Cross-Encoders), a new architecture that bridges standard cross-encoders and late-interaction models (like ColBERT). By identifying and removing unnecessary or detrimental interactions within the model, MICE achieves a fourfold decrease in inference latency while maintaining high ranking effectiveness and demonstrating superior generalization on out-of-domain datasets.

How MICE Works
MICE reduces the computational cost of traditional cross-encoders by identifying and pruning unnecessary token interactions. The architecture is defined by three principal choices:
- Mid-fusion (Independent Encoding): Query and document are encoded separately in the initial layers. This allows document representations to be pre-computed and stored offline, similar to bi-encoders.
- Light Cross-Attention: Interaction is restricted to a one-way flow where query tokens attend to “frozen” document states. By eliminating self-attention among document tokens in the interaction layers, MICE avoids the quadratic cost associated with long documents.
- Layer Dropping: The total number of interaction layers is strategically reduced. By using fewer, more effective layers for the final ranking, MICE achieves a significant speedup without sacrificing accuracy.
These choices result in:
- 4x Latency Reduction: Matching the efficiency of late-interaction models like ColBERT.
- Generalization: Retaining the high accuracy and out-of-domain robustness of full cross-encoders.
