Image and Video Processing 13
☆ Topology Optimization in Medical Image Segmentation with Fast Euler Characteristic
Deep learning-based medical image segmentation techniques have shown
promising results when evaluated based on conventional metrics such as the Dice
score or Intersection-over-Union. However, these fully automatic methods often
fail to meet clinically acceptable accuracy, especially when topological
constraints should be observed, e.g., continuous boundaries or closed surfaces.
In medical image segmentation, the correctness of a segmentation in terms of
the required topological genus sometimes is even more important than the
pixel-wise accuracy. Existing topology-aware approaches commonly estimate and
constrain the topological structure via the concept of persistent homology
(PH). However, these methods are difficult to implement for high dimensional
data due to their polynomial computational complexity. To overcome this
problem, we propose a novel and fast approach for topology-aware segmentation
based on the Euler Characteristic ($\chi$). First, we propose a fast
formulation for $\chi$ computation in both 2D and 3D. The scalar $\chi$ error
between the prediction and ground-truth serves as the topological evaluation
metric. Then we estimate the spatial topology correctness of any segmentation
network via a so-called topological violation map, i.e., a detailed map that
highlights regions with $\chi$ errors. Finally, the segmentation results from
the arbitrary network are refined based on the topological violation maps by a
topology-aware correction network. Our experiments are conducted on both 2D and
3D datasets and show that our method can significantly improve topological
correctness while preserving pixel-wise segmentation accuracy.
☆ Towards Field-Ready AI-based Malaria Diagnosis: A Continual Learning Approach MICCAI 2025
Louise Guillon, Soheib Biga, Yendoube E. Kantchire, Mouhamadou Lamine Sane, Grégoire Pasquier, Kossi Yakpa, Stéphane E. Sossou, Marc Thellier, Laurent Bonnardot, Laurence Lachaud, Renaud Piarroux, Ameyo M. Dorkenoo
Malaria remains a major global health challenge, particularly in low-resource
settings where access to expert microscopy may be limited. Deep learning-based
computer-aided diagnosis (CAD) systems have been developed and demonstrate
promising performance on thin blood smear images. However, their clinical
deployment may be hindered by limited generalization across sites with varying
conditions. Yet very few practical solutions have been proposed. In this work,
we investigate continual learning (CL) as a strategy to enhance the robustness
of malaria CAD models to domain shifts. We frame the problem as a
domain-incremental learning scenario, where a YOLO-based object detector must
adapt to new acquisition sites while retaining performance on previously seen
domains. We evaluate four CL strategies, two rehearsal-based and two
regularization-based methods, on real-life conditions thanks to a multi-site
clinical dataset of thin blood smear images. Our results suggest that CL, and
rehearsal-based methods in particular, can significantly improve performance.
These findings highlight the potential of continual learning to support the
development of deployable, field-ready CAD tools for malaria.
comment: MICCAI 2025 AMAI Workshop, Accepted, Submitted Manuscript Version
☆ JPEG Processing Neural Operator for Backward-Compatible Coding
Despite significant advances in learning-based lossy compression algorithms,
standardizing codecs remains a critical challenge. In this paper, we present
the JPEG Processing Neural Operator (JPNeO), a next-generation JPEG algorithm
that maintains full backward compatibility with the current JPEG format. Our
JPNeO improves chroma component preservation and enhances reconstruction
fidelity compared to existing artifact removal methods by incorporating neural
operators in both the encoding and decoding stages. JPNeO achieves practical
benefits in terms of reduced memory usage and parameter count. We further
validate our hypothesis about the existence of a space with high mutual
information through empirical evidence. In summary, the JPNeO functions as a
high-performance out-of-the-box image compression pipeline without changing
source coding's protocol. Our source code is available at
https://github.com/WooKyoungHan/JPNeO.
☆ Smart Video Capsule Endoscopy: Raw Image-Based Localization for Enhanced GI Tract Investigation ICONIP 2025
For many real-world applications involving low-power sensor edge devices deep
neural networks used for image classification might not be suitable. This is
due to their typically large model size and require- ment of operations often
exceeding the capabilities of such resource lim- ited devices. Furthermore,
camera sensors usually capture images with a Bayer color filter applied, which
are subsequently converted to RGB images that are commonly used for neural
network training. However, on resource-constrained devices, such conversions
demands their share of energy and optimally should be skipped if possible. This
work ad- dresses the need for hardware-suitable AI targeting sensor edge
devices by means of the Video Capsule Endoscopy, an important medical proce-
dure for the investigation of the small intestine, which is strongly limited by
its battery lifetime. Accurate organ classification is performed with a final
accuracy of 93.06% evaluated directly on Bayer images involv- ing a CNN with
only 63,000 parameters and time-series analysis in the form of Viterbi
decoding. Finally, the process of capturing images with a camera and raw image
processing is demonstrated with a customized PULPissimo System-on-Chip with a
RISC-V core and an ultra-low power hardware accelerator providing an
energy-efficient AI-based image clas- sification approach requiring just 5.31
{\mu}J per image. As a result, it is possible to save an average of 89.9% of
energy before entering the small intestine compared to classic video capsules.
comment: Accepted at the 32nd International Conference on Neural Information
Processing - ICONIP 2025
☆ Pixel Embedding Method for Tubular Neurite Segmentation
Automatic segmentation of neuronal topology is critical for handling large
scale neuroimaging data, as it can greatly accelerate neuron annotation and
analysis. However, the intricate morphology of neuronal branches and the
occlusions among fibers pose significant challenges for deep learning based
segmentation. To address these issues, we propose an improved framework: First,
we introduce a deep network that outputs pixel level embedding vectors and
design a corresponding loss function, enabling the learned features to
effectively distinguish different neuronal connections within occluded regions.
Second, building on this model, we develop an end to end pipeline that directly
maps raw neuronal images to SWC formatted neuron structure trees. Finally,
recognizing that existing evaluation metrics fail to fully capture segmentation
accuracy, we propose a novel topological assessment metric to more
appropriately quantify the quality of neuron segmentation and reconstruction.
Experiments on our fMOST imaging dataset demonstrate that, compared to several
classical methods, our approach significantly reduces the error rate in
neuronal topology reconstruction.
☆ Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai
Speech-driven methods for portraits are figuratively known as "Talkers"
because of their capability to synthesize speaking mouth shapes and facial
movements. Especially with the rapid development of the Text-to-Image (T2I)
models, AI-Generated Talking Heads (AGTHs) have gradually become an emerging
digital human media. However, challenges persist regarding the quality of these
talkers and AGTHs they generate, and comprehensive studies addressing these
issues remain limited. To address this gap, this paper presents the largest
AGTH quality assessment dataset THQA-10K to date, which selects 12 prominent
T2I models and 14 advanced talkers to generate AGTHs for 14 prompts. After
excluding instances where AGTH generation is unsuccessful, the THQA-10K dataset
contains 10,457 AGTHs. Then, volunteers are recruited to subjectively rate the
AGTHs and give the corresponding distortion categories. In our analysis for
subjective experimental results, we evaluate the performance of talkers in
terms of generalizability and quality, and also expose the distortions of
existing AGTHs. Finally, an objective quality assessment method based on the
first frame, Y-T slice and tone-lip consistency is proposed. Experimental
results show that this method can achieve state-of-the-art (SOTA) performance
in AGTH quality assessment. The work is released at
https://github.com/zyj-2000/Talker.
☆ EMedNeXt: An Enhanced Brain Tumor Segmentation Framework for Sub-Saharan Africa using MedNeXt V2 with Deep Supervision MICCAI 2025
Ahmed Jaheen, Abdelrahman Elsayed, Damir Kim, Daniil Tikhonov, Matheus Scatolin, Mohor Banerjee, Qiankun Ji, Mostafa Salem, Hu Wang, Sarim Hashmi, Mohammad Yaqub
Brain cancer affects millions worldwide, and in nearly every clinical
setting, doctors rely on magnetic resonance imaging (MRI) to diagnose and
monitor gliomas. However, the current standard for tumor quantification through
manual segmentation of multi-parametric MRI is time-consuming, requires expert
radiologists, and is often infeasible in under-resourced healthcare systems.
This problem is especially pronounced in low-income regions, where MRI scanners
are of lower quality and radiology expertise is scarce, leading to incorrect
segmentation and quantification. In addition, the number of acquired MRI scans
in Africa is typically small. To address these challenges, the BraTS-Lighthouse
2025 Challenge focuses on robust tumor segmentation in sub-Saharan Africa
(SSA), where resource constraints and image quality degradation introduce
significant shifts. In this study, we present EMedNeXt -- an enhanced brain
tumor segmentation framework based on MedNeXt V2 with deep supervision and
optimized post-processing pipelines tailored for SSA. EMedNeXt introduces three
key contributions: a larger region of interest, an improved nnU-Net v2-based
architectural skeleton, and a robust model ensembling system. Evaluated on the
hidden validation set, our solution achieved an average LesionWise DSC of 0.897
with an average LesionWise NSD of 0.541 and 0.84 at a tolerance of 0.5 mm and
1.0 mm, respectively.
comment: Submitted to the BraTS-Lighthouse 2025 Challenge (MICCAI 2025)
☆ BS-1-to-N: Diffusion-Based Environment-Aware Cross-BS Channel Knowledge Map Generation for Cell-Free Networks
Channel knowledge map (CKM) inference across base stations (BSs) is the key
to achieving efficient environmentaware communications. This paper proposes an
environmentaware cross-BS CKM inference method called BS-1-to-N based on the
generative diffusion model. To this end, we first design the BS location
embedding (BSLE) method tailored for cross-BS CKM inference to embed BS
location information in the feature vector of CKM. Further, we utilize the
cross- and self-attention mechanism for the proposed BS-1-to-N model to
respectively learn the relationships between source and target BSs, as well as
that among target BSs. Therefore, given the locations of the source and target
BSs, together with the source CKMs as control conditions, cross-BS CKM
inference can be performed for an arbitrary number of source and target BSs.
Specifically, in architectures with massive distributed nodes like cell-free
networks, traditional methods of sequentially traversing each BS for CKM
construction are prohibitively costly. By contrast, the proposed BS-1-to-N
model is able to achieve efficient CKM inference for a target BS at any
potential location based on the CKMs of source BSs. This is achieved by
exploiting the fact that within a given area, different BSs share the same
wireless environment that leads to their respective CKMs. Therefore, similar to
multi-view synthesis, CKMs of different BSs are representations of the same
wireless environment from different BS locations. By mining the implicit
correlation between CKM and BS location based on the wireless environment, the
proposed BS-1-to-N method achieves efficient CKM inference across BSs. We
provide extensive comparisons of CKM inference between the proposed BS-1-to-N
generative model versus benchmarking schemes, and provide one use case study to
demonstrate its practical application for the optimization of BS deployment.
☆ EMORe: Motion-Robust 5D MRI Reconstruction via Expectation-Maximization-Guided Binning Correction and Outlier Rejection
We propose EMORe, an adaptive reconstruction method designed to enhance
motion robustness in free-running, free-breathing self-gated 5D cardiac
magnetic resonance imaging (MRI). Traditional self-gating-based motion binning
for 5D MRI often results in residual motion artifacts due to inaccuracies in
cardiac and respiratory signal extraction and sporadic bulk motion,
compromising clinical utility. EMORe addresses these issues by integrating
adaptive inter-bin correction and explicit outlier rejection within an
expectation-maximization (EM) framework, whereby the E-step and M-step are
executed alternately until convergence. In the E-step, probabilistic (soft) bin
assignments are refined by correcting misassignment of valid data and rejecting
motion-corrupted data to a dedicated outlier bin. In the M-step, the image
estimate is improved using the refined soft bin assignments. Validation in a
simulated 5D MRXCAT phantom demonstrated EMORe's superior performance compared
to standard compressed sensing reconstruction, showing significant improvements
in peak signal-to-noise ratio, structural similarity index, edge sharpness, and
bin assignment accuracy across varying levels of simulated bulk motion. In vivo
validation in 13 volunteers further confirmed EMORe's robustness, significantly
enhancing blood-myocardium edge sharpness and reducing motion artifacts
compared to compressed sensing, particularly in scenarios with controlled
coughing-induced motion. Although EMORe incurs a modest increase in
computational complexity, its adaptability and robust handling of bulk motion
artifacts significantly enhance the clinical applicability and diagnostic
confidence of 5D cardiac MRI.
☆ Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent Reconstruction ACM MM 2025
Image downscaling is critical for efficient storage and transmission of
high-resolution (HR) images. Existing learning-based methods focus on
performing downscaling within the sRGB domain, which typically suffers from
blurred details and unexpected artifacts. RAW images, with their unprocessed
photonic information, offer greater flexibility but lack specialized
downscaling frameworks. In this paper, we propose a wavelet-based recurrent
reconstruction framework that leverages the information lossless attribute of
wavelet transformation to fulfill the arbitrary-scale RAW image downscaling in
a coarse-to-fine manner, in which the Low-Frequency Arbitrary-Scale Downscaling
Module (LASDM) and the High-Frequency Prediction Module (HFPM) are proposed to
preserve structural and textural integrity of the reconstructed low-resolution
(LR) RAW images, alongside an energy-maximization loss to align high-frequency
energy between HR and LR domain. Furthermore, we introduce the Realistic
Non-Integer RAW Downscaling (Real-NIRD) dataset, featuring a non-integer
downscaling factor of 1.3$\times$, and incorporate it with publicly available
datasets with integer factors (2$\times$, 3$\times$, 4$\times$) for
comprehensive benchmarking arbitrary-scale image downscaling purposes.
Extensive experiments demonstrate that our method outperforms existing
state-of-the-art competitors both quantitatively and visually. The code and
dataset will be released at https://github.com/RenYangSCU/ASRD.
comment: Accepted by ACM MM 2025
☆ Single Image Rain Streak Removal Using Harris Corner Loss and R-CBAM Network
The problem of single-image rain streak removal goes beyond simple noise
suppression, requiring the simultaneous preservation of fine structural details
and overall visual quality. In this study, we propose a novel image restoration
network that effectively constrains the restoration process by introducing a
Corner Loss, which prevents the loss of object boundaries and detailed texture
information during restoration. Furthermore, we propose a Residual
Convolutional Block Attention Module (R-CBAM) Block into the encoder and
decoder to dynamically adjust the importance of features in both spatial and
channel dimensions, enabling the network to focus more effectively on regions
heavily affected by rain streaks. Quantitative evaluations conducted on the
Rain100L and Rain100H datasets demonstrate that the proposed method
significantly outperforms previous approaches, achieving a PSNR of 33.29 dB on
Rain100L and 26.16 dB on Rain100H.
comment: 21 pages
♻ ☆ DeepForest: Sensing Into Self-Occluding Volumes of Vegetation With Aerial Imaging
Access to below-canopy volumetric vegetation data is crucial for
understanding ecosystem dynamics. We address the long-standing limitation of
remote sensing to penetrate deep into dense canopy layers. LiDAR and radar are
currently considered the primary options for measuring 3D vegetation
structures, while cameras can only extract the reflectance and depth of top
layers. Using conventional, high-resolution aerial images, our approach allows
sensing deep into self-occluding vegetation volumes, such as forests. It is
similar in spirit to the imaging process of wide-field microscopy, but can
handle much larger scales and strong occlusion. We scan focal stacks by
synthetic-aperture imaging with drones and reduce out-of-focus signal
contributions using pre-trained 3D convolutional neural networks with mean
squared error (MSE) as the loss function. The resulting volumetric reflectance
stacks contain low-frequency representations of the vegetation volume.
Combining multiple reflectance stacks from various spectral channels provides
insights into plant health, growth, and environmental conditions throughout the
entire vegetation volume. Compared with simulated ground truth, our correction
leads to ~x7 average improvements (min: ~x2, max: ~x12) for forest densities of
220 trees/ha - 1680 trees/ha. In our field experiment, we achieved an MSE of
0.05 when comparing with the top-vegetation layer that was measured with
classical multispectral aerial imaging.
♻ ☆ Exploiting Scale-Variant Attention for Segmenting Small Medical Objects
Early detection and accurate diagnosis can predict the risk of malignant
disease transformation, thereby increasing the probability of effective
treatment. Identifying mild syndrome with small pathological regions serves as
an ominous warning and is fundamental in the early diagnosis of diseases. While
deep learning algorithms, particularly convolutional neural networks (CNNs),
have shown promise in segmenting medical objects, analyzing small areas in
medical images remains challenging. This difficulty arises due to information
losses and compression defects from convolution and pooling operations in CNNs,
which become more pronounced as the network deepens, especially for small
medical objects. To address these challenges, we propose a novel scale-variant
attention-based network (SvANet) for accurately segmenting small-scale objects
in medical images. The SvANet consists of scale-variant attention, cross-scale
guidance, Monte Carlo attention, and vision transformer, which incorporates
cross-scale features and alleviates compression artifacts for enhancing the
discrimination of small medical objects. Quantitative experimental results
demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%,
89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for
segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical
excision cells, retinal vasculatures, and sperms, which occupy less than 1% of
the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and
SpermHealth datasets, respectively.
comment: 14 pages, 9 figures, under review