MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking

arXiv preprint, 2026

Abstract: The paper addresses the high inference cost of transformer-based cross-encoders in Information Retrieval. The authors propose MICE (Minimal Interaction Cross-Encoders), a new architecture that bridges standard cross-encoders and late-interaction models (like ColBERT). By identifying and removing unnecessary or detrimental interactions within the model, MICE achieves a fourfold decrease in inference latency while maintaining high ranking effectiveness and demonstrating superior generalization on out-of-domain datasets.

MICE Architecture Overview — Overview of the MICE architecture and its interaction pruning mechanism.

How MICE Works

MICE reduces the computational cost of traditional cross-encoders by identifying and pruning unnecessary token interactions. The architecture is defined by three principal choices:

Mid-fusion (Independent Encoding): Query and document are encoded separately in the initial layers. This allows document representations to be pre-computed and stored offline, similar to bi-encoders.
Light Cross-Attention: Interaction is restricted to a one-way flow where query tokens attend to “frozen” document states. By eliminating self-attention among document tokens in the interaction layers, MICE avoids the quadratic cost associated with long documents.
Layer Dropping: The total number of interaction layers is strategically reduced. By using fewer, more effective layers for the final ranking, MICE achieves a significant speedup without sacrificing accuracy.

These choices result in:

4x Latency Reduction: Matching the efficiency of late-interaction models like ColBERT.
Generalization: Retaining the high accuracy and out-of-domain robustness of full cross-encoders.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Victor Morand

How MICE Works

Share on