MMPD-Bench: Bridging Multimodal Fission with Multi-Polarimetric Modalities Decomposition

Abstract

A Benchmark for Polarimetric
Modality Fission

Recovering multiple physical parameters from high-dimensional optical measurements remains challenging in computational optics. MMPD-Bench is a pioneering benchmark that reframes multi-polarimetric modalities decomposition from Mueller-matrix observations as a modality fission problem under the multimodal learning paradigm — replacing iterative numerical inversion with deep surrogate models and providing data, standardised solutions, and evaluations for this multi-physics generation challenge.

We benchmark representative architectures — state space models, vision transformers, conditional diffusion models, and neural operators — under a multi-faceted protocol that jointly assesses perceptual fidelity, physical consistency, robustness, and computational efficiency. Our analysis reveals non-trivial accuracy–robustness trade-offs and key limitations of existing surrogates. To support reproducible research, we open-source the full codebase together with 21,412 high-resolution Mueller-matrix observations and four specialised test sets acquired through physical polarimetric measurements.

Modality Fission Framing

Formally defines MMPD as a modality-fission problem, bridging high-dimensional Mueller-matrix decomposition with the standardised multimodal-generation paradigm.
Neural Operators for MMPD

First adaptation of FNO & UNO to Mueller-matrix decomposition — a benchmark that spans attention, state-space, generative, and operator-learning paradigms.
21,412 Real-World Samples

High-resolution paired observations from a custom wide-field transmissive Mueller polarimeter on healthy & diseased tissue, with four external test sets (waveplate, multi-wavelength).
Multi-Faceted Evaluation

Transcends standard vision metrics with physical-consistency checks, scale-normalised numeric matching, and 1-D Wasserstein statistical distances — released as an open-source platform.

Method

How MMPD-Bench Works

A spatially resolved 4×4 Mueller matrix observation describes the transformation of the polarisation state under the Stokes–Mueller formalism, and is decomposed into six physically interpretable parameters — diattenuation (D), depolarisation (Δ), linear retardance (η), total retardance (R), fast-axis orientation (θ), and optical rotation (ψ). Conventional MMPD relies on physics-based numerical inversion (Lu–Chipman), which can introduce numerical instability and computational burden at large scale. MMPD-Bench reframes this process as a modality-fission problem and benchmarks deep surrogate models — state space models, vision transformers, diffusion models, and neural operators — under unified evaluations of fidelity, statistical alignment, physical consistency, robustness, and efficiency.

MMPD-Bench overview pipeline (Figure 1) — **Figure 1.** Overview of MMPD-Bench. (a) A spatially resolved 4×4 Mueller matrix observation describes the transformation of the polarisation state under the Stokes–Mueller formalism, and is decomposed into physically interpretable parameters — diattenuation, depolarisation, linear and total retardance, fast-axis orientation, and optical rotation. (b) Conventional MMPD factorises the measured Mueller matrix and conducts physics-based numerical inversion to derive the components, but this process can introduce numerical instability and computational burden for large-scale polarimetric imaging. (c) MMPD-Bench reframes this process as a modality-fission problem and benchmarks deep surrogate models — including state space models, vision transformers, diffusion models, and neural operators — using unified evaluations of fidelity, statistical alignment, physical consistency, robustness, and efficiency.

Benchmarks

Quantitative Results

Representative base (-b) results on the combined test set (clear Mueller-matrix observations); best model per chart is highlighted. Neural operators (FNO / UNO) and FactFormer cover the difficult angular-phase modalities (θ, ψ); deterministic surrogates lead on depolarisation (Δ); diffusion models trail across quantitative fidelity. Tables 8 & 9 in the paper report the full set of 16 model variants.

Fast-Axis Orientation (θ)

PSNR ↑ (dB) · ImageTheta

UNO-b (NO)

27.3

FNO-b (NO)

25.9

SU-b (Mamba)

24.8

FF-b (ViT)

24.1

DDPM-b (DM)

20.0

SU-b (ViT)

15.2

Depolarisation (Δ)

SSIM ↑ (%) · ImageDelta

FF-b (ViT)

99.8

SU-b (Mamba)

99.3

FNO-b (NO)

98.9

UNO-b (NO)

97.4

DDIM-b (DM)

85.9

DDPM-b (DM)

79.4

Statistical Alignment (θ)

WD-1d ↓ · whole test set, lower is better

FNO-b (NO)

1.55

SU-b (Mamba)

1.60

FF-b (ViT)

1.70

UNO-b (NO)

1.73

DDPM-b (DM)

4.25

DDIM-b (DM)

6.45

Insights

Key Findings

Across the three task pillars — computational efficiency, modality fidelity & physical consistency, and robustness under perturbations — we identify the following load-bearing observations from 16 model variants on 21,412 Mueller-matrix samples.

1

EfficiencySwin-Unet transformers scale best with batch size

Window-attention models maintain strong throughput as batch grows, while linear-complexity Mamba and FactFormer hit out-of-memory limits earlier than expected.

2

EfficiencyTailored architectures beat brute-force scaling

FNO-s and UNO-s strike the best efficiency–accuracy balance; scaling to -b variants adds cost without proportional gains, and diffusion sampling cost dominates total latency.

3

FidelityDeterministic methods outperform generative priors

FactFormer, FNO, and SwinUMamba dominate quantitative metrology and physical consistency; diffusion surrogates show notable retardance-consistency residuals.

4

FidelityNeural Operators & FactFormer resolve angular phase

UNO-b achieves 27.3 dB PSNR on ImageTheta and FF-b 52.5 dB on ImagePsi (Table 8) — spectral parameterisation preserves θ/ψ phase structure better than patch attention.

5

FidelityDiffusion models struggle with multi-polarimetric fission

Four factors: non-linear error amplification, hallucination violating physics constraints, incoherent inter-channel denoising, and data sparsity on the high-dimensional Mueller manifold.

6

RobustnessLarger models are not more robust

Smaller (-s) variants degrade less under additive Gaussian noise than their (-b) counterparts — scale amplifies sensitivity to high-frequency measurement noise.

7

RobustnessU-shaped models are vulnerable to noise

Mamba and NO U-shaped designs show up to 49.8% PSNR / 63.7% SSIM / 83.2% WD-1d degradation. Global-attention transformers stay substantially more stable.

8

RobustnessDiffusion models improve with noisy inputs

Measurement noise smooths the otherwise peaked Mueller-matrix distribution, stabilising score estimation. Caveat: angular modalities (θ) remain noise-sensitive due to non-linear arctan amplification.

Gallery

Visual Companion

A visual deep-dive across the four task pillars — qualitative decomposition outputs (Figure 8), the multi-dimensional performance radar (Figures 2 & 7), retardance physical-consistency analysis (Figure 3), and robustness under acquisition noise (Figures 5 & 9). All figures are sourced from the paper.

Five-axis radar comparing model families on clear Mueller-matrix observations. Each axis is a normalised metric: PSNR ↑ (pixel fidelity) · SSIM ↑ (structural similarity) · WD-1d ↓ (statistical alignment) · Time ↓ (per-sample inference; per-step for diffusion) · R-Consist ↑ (retardance physical consistency).

Figure 2 — multi-dim radar (base models) — **Figure 2.** Multi-dimensional performance analysis across modalities for models **(-b)** against 5 metrics on clear observations — 1. **PSNR ↑**: pixel-wise reconstruction fidelity and signal quality; 2. **SSIM ↑**: preservation of structural information and spatial patterns; 3. **WD-1d ↓**: global statistical alignment across the entire test population; 4. **Time ↓**: runtime to inference per sample (for diffusion models, this plot compares the runtime to denoise one step); 5. **R-Consist ↑**: consistency to the physical retardance.

Figure 7 — multi-dim radar (small models) — **Figure 7.** Multi-dimensional performance analysis across modalities for models **(-s)** against the same 5 dimensions on clear observations — the small-scale counterpart of Figure 2. Smaller variants of FNO / UNO retain a notable share of the area covered by their (-b) versions on PSNR / SSIM, while pushing further on the Time axis — reflecting the efficiency–fidelity trade-off discussed in Findings 2 & 6.

Models that achieve both high visual accuracy and strict adherence to Stokes–Mueller physics fall inside the High Fidelity Zone. FactFormer, FNO and SwinUMamba consistently land there; diffusion models drift far outside (cf. Finding 3).

Figure 3 — retardance consistency scatter — **Figure 3.** Retardance consistency evaluates the physical validity of the generated decompositions by correlating the normalised mean absolute error against the physical-consistency residual derived from the retardance (R, η) and orientation (ψ) relationship. The **High Fidelity Zone** insert highlights models that achieve both high visual accuracy and strict adherence to Stokes–Mueller physics.

Stress-test with additive Gaussian noise at σ_noise = 0.1 σ_pixel. The histograms (Figure 5) and qualitative comparison (Figure 9) together explain why smaller models stay more robust and why diffusion improves with noisy inputs (Findings 6–8).

Figure 5 — pixel histograms (clean vs noisy) — **Figure 5.** Histograms of pixel-intensity distributions across a sampled 16-channel Mueller matrix normalised to [−1, 1]. The plots compare clean measurements against noisy observations perturbed by Gaussian noise, where the noise standard deviation is set to 10% of the pixel-value standard deviation (*σ_noise = 0.1 σ_pixel*).

Authors

Yi He^*,1 Zimo Zhao^*,2 Yiming Yang¹ Xiaoyuan Cheng¹ Chao He^†,2 Yukun Hu^†,1

1Dynamic Systems Lab, University College London 2Vectorial Optics and Photonics Group, University of Oxford

* Equal Contribution · † Corresponding Authors · chao.he@eng.ox.ac.uk · yukun.hu@ucl.ac.uk

Citation

BibTeX

If you find MMPD-Bench useful, please cite our ICML 2026 paper.

BibTeX

@inproceedings{he2026mmpdbench, title = {MMPD-Bench: Bridging Multimodal Fission with Multi-Polarimetric Modalities Decomposition}, author = {He, Yi and Zhao, Zimo and Yang, Yiming and Cheng, Xiaoyuan and He, Chao and Hu, Yukun}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning}, series = {Proceedings of Machine Learning Research}, volume = {306}, year = {2026}, publisher = {PMLR} }

MMPD-Bench

Bridging Multimodal Fission with Multi-Polarimetric Modalities Decomposition

A Benchmark for PolarimetricModality Fission

Modality Fission Framing

Neural Operators for MMPD

21,412 Real-World Samples

Multi-Faceted Evaluation

How MMPD-Bench Works

Quantitative Results

Fast-Axis Orientation (θ)

Depolarisation (Δ)

Statistical Alignment (θ)

Key Findings

EfficiencySwin-Unet transformers scale best with batch size

EfficiencyTailored architectures beat brute-force scaling

FidelityDeterministic methods outperform generative priors

FidelityNeural Operators & FactFormer resolve angular phase

FidelityDiffusion models struggle with multi-polarimetric fission

RobustnessLarger models are not more robust

RobustnessU-shaped models are vulnerable to noise

RobustnessDiffusion models improve with noisy inputs

Visual Companion

Authors

BibTeX

Bridging Multimodal Fission with
Multi-Polarimetric Modalities Decomposition

A Benchmark for Polarimetric
Modality Fission