ICML 2026

MMPD-Bench

Bridging Multimodal Fission with
Multi-Polarimetric Modalities Decomposition

Replacing iterative numerical Mueller-matrix inversion with deep surrogate models. 21,412 real-world Mueller-matrix observations, 16 baselines spanning state-space models, vision transformers, conditional diffusion, and neural operators — evaluated under a multi-faceted protocol of fidelity, physical consistency, robustness, and efficiency.

Yi He*,1 Zimo Zhao*,2 Yiming Yang1 Xiaoyuan Cheng1 Chao He†,2 Yukun Hu†,1
1Dynamic Systems Lab, University College London 2Vectorial Optics and Photonics Group, University of Oxford *Equal contribution  ·  Corresponding authors
21,412
Paired MM Samples
4
Specialised Test Sets
16
Benchmarked Variants
6
Decomposed Modalities

A Benchmark for Polarimetric
Modality Fission

Recovering multiple physical parameters from high-dimensional optical measurements remains challenging in computational optics. MMPD-Bench is a pioneering benchmark that reframes multi-polarimetric modalities decomposition from Mueller-matrix observations as a modality fission problem under the multimodal learning paradigm — replacing iterative numerical inversion with deep surrogate models and providing data, standardised solutions, and evaluations for this multi-physics generation challenge.

We benchmark representative architectures — state space models, vision transformers, conditional diffusion models, and neural operators — under a multi-faceted protocol that jointly assesses perceptual fidelity, physical consistency, robustness, and computational efficiency. Our analysis reveals non-trivial accuracy–robustness trade-offs and key limitations of existing surrogates. To support reproducible research, we open-source the full codebase together with 21,412 high-resolution Mueller-matrix observations and four specialised test sets acquired through physical polarimetric measurements.

  • Modality Fission Framing

    Formally defines MMPD as a modality-fission problem, bridging high-dimensional Mueller-matrix decomposition with the standardised multimodal-generation paradigm.

  • Neural Operators for MMPD

    First adaptation of FNO & UNO to Mueller-matrix decomposition — a benchmark that spans attention, state-space, generative, and operator-learning paradigms.

  • 21,412 Real-World Samples

    High-resolution paired observations from a custom wide-field transmissive Mueller polarimeter on healthy & diseased tissue, with four external test sets (waveplate, multi-wavelength).

  • Multi-Faceted Evaluation

    Transcends standard vision metrics with physical-consistency checks, scale-normalised numeric matching, and 1-D Wasserstein statistical distances — released as an open-source platform.

How MMPD-Bench Works

A spatially resolved 4×4 Mueller matrix observation describes the transformation of the polarisation state under the Stokes–Mueller formalism, and is decomposed into six physically interpretable parameters — diattenuation (D), depolarisation (Δ), linear retardance (η), total retardance (R), fast-axis orientation (θ), and optical rotation (ψ). Conventional MMPD relies on physics-based numerical inversion (Lu–Chipman), which can introduce numerical instability and computational burden at large scale. MMPD-Bench reframes this process as a modality-fission problem and benchmarks deep surrogate models — state space models, vision transformers, diffusion models, and neural operators — under unified evaluations of fidelity, statistical alignment, physical consistency, robustness, and efficiency.

MMPD-Bench overview pipeline (Figure 1)
Figure 1. Overview of MMPD-Bench. (a) A spatially resolved 4×4 Mueller matrix observation describes the transformation of the polarisation state under the Stokes–Mueller formalism, and is decomposed into physically interpretable parameters — diattenuation, depolarisation, linear and total retardance, fast-axis orientation, and optical rotation. (b) Conventional MMPD factorises the measured Mueller matrix and conducts physics-based numerical inversion to derive the components, but this process can introduce numerical instability and computational burden for large-scale polarimetric imaging. (c) MMPD-Bench reframes this process as a modality-fission problem and benchmarks deep surrogate models — including state space models, vision transformers, diffusion models, and neural operators — using unified evaluations of fidelity, statistical alignment, physical consistency, robustness, and efficiency.

Quantitative Results

Representative base (-b) results on the combined test set (clear Mueller-matrix observations); best model per chart is highlighted. Neural operators (FNO / UNO) and FactFormer cover the difficult angular-phase modalities (θ, ψ); deterministic surrogates lead on depolarisation (Δ); diffusion models trail across quantitative fidelity. Tables 8 & 9 in the paper report the full set of 16 model variants.

Fast-Axis Orientation (θ)

PSNR ↑ (dB) · ImageTheta

UNO-b (NO)
27.3
FNO-b (NO)
25.9
SU-b (Mamba)
24.8
FF-b (ViT)
24.1
DDPM-b (DM)
20.0
SU-b (ViT)
15.2

Depolarisation (Δ)

SSIM ↑ (%) · ImageDelta

FF-b (ViT)
99.8
SU-b (Mamba)
99.3
FNO-b (NO)
98.9
UNO-b (NO)
97.4
DDIM-b (DM)
85.9
DDPM-b (DM)
79.4

Statistical Alignment (θ)

WD-1d ↓ · whole test set, lower is better

FNO-b (NO)
1.55
SU-b (Mamba)
1.60
FF-b (ViT)
1.70
UNO-b (NO)
1.73
DDPM-b (DM)
4.25
DDIM-b (DM)
6.45

Key Findings

Across the three task pillars — computational efficiency, modality fidelity & physical consistency, and robustness under perturbations — we identify the following load-bearing observations from 16 model variants on 21,412 Mueller-matrix samples.

1

EfficiencySwin-Unet transformers scale best with batch size

Window-attention models maintain strong throughput as batch grows, while linear-complexity Mamba and FactFormer hit out-of-memory limits earlier than expected.

2

EfficiencyTailored architectures beat brute-force scaling

FNO-s and UNO-s strike the best efficiency–accuracy balance; scaling to -b variants adds cost without proportional gains, and diffusion sampling cost dominates total latency.

3

FidelityDeterministic methods outperform generative priors

FactFormer, FNO, and SwinUMamba dominate quantitative metrology and physical consistency; diffusion surrogates show notable retardance-consistency residuals.

4

FidelityNeural Operators & FactFormer resolve angular phase

UNO-b achieves 27.3 dB PSNR on ImageTheta and FF-b 52.5 dB on ImagePsi (Table 8) — spectral parameterisation preserves θ/ψ phase structure better than patch attention.

5

FidelityDiffusion models struggle with multi-polarimetric fission

Four factors: non-linear error amplification, hallucination violating physics constraints, incoherent inter-channel denoising, and data sparsity on the high-dimensional Mueller manifold.

6

RobustnessLarger models are not more robust

Smaller (-s) variants degrade less under additive Gaussian noise than their (-b) counterparts — scale amplifies sensitivity to high-frequency measurement noise.

7

RobustnessU-shaped models are vulnerable to noise

Mamba and NO U-shaped designs show up to 49.8% PSNR / 63.7% SSIM / 83.2% WD-1d degradation. Global-attention transformers stay substantially more stable.

8

RobustnessDiffusion models improve with noisy inputs

Measurement noise smooths the otherwise peaked Mueller-matrix distribution, stabilising score estimation. Caveat: angular modalities (θ) remain noise-sensitive due to non-linear arctan amplification.

Authors

Yi He*,1 Zimo Zhao*,2 Yiming Yang1 Xiaoyuan Cheng1 Chao He†,2 Yukun Hu†,1
1Dynamic Systems Lab, University College London 2Vectorial Optics and Photonics Group, University of Oxford

* Equal Contribution  ·  Corresponding Authors  ·  chao.he@eng.ox.ac.uk  ·  yukun.hu@ucl.ac.uk

BibTeX

If you find MMPD-Bench useful, please cite our ICML 2026 paper.

BibTeX
@inproceedings{he2026mmpdbench, title = {MMPD-Bench: Bridging Multimodal Fission with Multi-Polarimetric Modalities Decomposition}, author = {He, Yi and Zhao, Zimo and Yang, Yiming and Cheng, Xiaoyuan and He, Chao and Hu, Yukun}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning}, series = {Proceedings of Machine Learning Research}, volume = {306}, year = {2026}, publisher = {PMLR} }