Image and Video Processing 11
☆ Multipath Exploitation in Highly Reflective Environments for Enhanced Microwave Imaging via Inverse Source Reconstruction
Multipath effects significantly influence the quality of microwave imaging in highly reflective environments, while the physical measurement aperture size constrains resolution. It is shown that by exploiting multipath reflections, improved resolution can be achieved while maintaining acceptable artifact levels. Based on image theory, strong scattered fields from an ideal reflection plane can be represented by virtual image sources. Using a single-frequency inverse source solver, the spatially distributed original and image sources are reconstructed and separated, which allows separate application of the imaging algorithm for both of them. The coherent combination of both sets of sources together with appropriate phase correction results in an effective aperture expansion that yields superior resolution. Furthermore, this separation strategy significantly mitigates interference artifacts. Simulation results, supported by theoretical analysis and comparison with a ray-tracing enhanced backprojection algorithm are presented to verify the effectiveness of the proposed approach.
☆ Phase-Corrected Near-Field Microwave Imaging via Inverse Source Reconstruction with Modulated Signals
An inverse source reconstruction (ISR) based 3-D near-field (NF) passive radar microwave imaging method utilizing modulated signals is presented. The modulated signals from a non-cooperative transmitter are scattered by the targets of interest and captured by a fixed reference antenna together with an NF scanning probe at different positions. By normalizing with the reference signals, spatial coherence of the NF observations is obtained, and a single-frequency inverse source solver is subsequently utilized for ISR and image generation. A corresponding phase correction method is proposed for the coherent superposition of multi-frequency images and verified through simulations. In addition, it is shown that for realistic narrowband signals, an incoherent imaging approach is sufficient. The presented technical scheme is validated using a planar scanning system in a typical office room, where software-defined radios are employed for the transmitting and receiving of narrowband orthogonal frequency-division multiplexing signals at Wi-Fi operating frequencies. With the aid of background subtraction and reference signals, images of a mannequin placed in the office room are successfully obtained.
☆ deSEO: Physics-Aware Dataset Creation for High-Resolution Satellite Image Shadow Removal SP
Lorenzo Beltrame, Jules Salzinger, Filip Svoboda, Phillipp Fanta-Jende, Jasmin Lampert, Radu Timofte, Marco Körner
Shadows cast by terrain and tall structures remain a major obstacle for high-resolution satellite image analysis, degrading classification, detection, and 3D reconstruction performance. Public resources offering geometry-consistent paired shadow/shadow-free satellite imagery are essentially missing, and most Earth-observation datasets are designed for shadow detection or 3D modelling rather than removal. Existing deep shadow-removal datasets either target ground-level or aerial scenes or rely on unpaired and weakly supervised formulations rather than explicit satellite pairs. We address this gap with deSEO, a geometry-aware and physics-informed methodology that, to the best of our knowledge, is the first to derive paired supervision for satellite shadow removal from the S-EO shadow detection dataset through a fully replicable pipeline. For each tile, deSEO selects a minimally shadowed acquisition as a weak reference and pairs it with shadowed counterparts using temporal and geometric filtering, Jacobian-based orientation normalisation, and LoFTR-RANSAC registration. A per-pixel validity mask restricts learning to reliably aligned regions, enabling supervision despite residual off-nadir parallax. In addition to this paired dataset, we develop a DSM-aware deshadowing model that combines residual translation, perceptual objectives, and mask-constrained adversarial learning. In contrast, a direct adaptation of a UAV-based SRNet/pix2pix architecture fails to converge under satellite viewpoint variability. Our model consistently reduces the visual impact of cast shadows across diverse illumination and viewing conditions, achieving improved structural and perceptual fidelity on held-out scenes. deSEO therefore provides the first reproducible, geometry-aware paired dataset and baseline for shadow removal in satellite Earth observation.
comment: 8 pages, 6 figures, 5 tables. Accepted in the annals track at the ISPRS 2026 Congress. Code and materials: https://github.com/AIT-Assistive-Autonomous-Systems/deSEO
☆ Dante: An Open Source Model Pre-Training and Fine-Tuning Tool for the Dafne Federated Framework for Medical Image Segmentation
Adapting pre-trained deep learning segmentation models to new clinical domains is a persistent challenge in medical image analysis, particularly when annotated data at the target site are scarce. Parameter-efficient fine-tuning strategies offer a principled solution by selectively updating a controlled subset of model parameters, preserving previously acquired representations while reducing the risk of overfitting on small datasets. This paper introduces DAfNe TrainEr (Dante), an open-source module integrating with the Dafne federated segmentation ecosystem as a dedicated training and fine-tuning backend. Dante supports training from scratch with automatic architecture configuration, configurable layer freezing schedules, and Low-Rank Adaptation (LoRA) extended to N-dimensional convolutional layers through channel-wise factorization. To validate the module, Gradual Unfreezing (GU) and LoRA are assessed across realistic cross-domain MRI transfer scenarios covering abdominal organ segmentation and brain white matter lesion segmentation, under full-data and few-shot conditions. GU reduced the epochs required to reach 85% of peak performance by up to 63.6% compared to training from scratch, while LoRA achieved Dice Similarity Coefficients up to 0.957 in data-rich scenarios. Both strategies outperformed the baseline across all tested domains, with gains amplified by richer pre-training datasets. These results validate Dante as a domain-agnostic fine-tuning module for medical image segmentation in real clinical deployment conditions. Dante code is available at https://github.com/dafne-imaging/dafne-torch-trainer while Dafne ecosystem project is available at https://github.com/dafne-imaging.
♻ ☆ StreamGuard: Exploring a 5G Architecture for Efficient, Quality of Experience-Aware Video Conferencing
Video conferencing over 5G is increasingly prevalent, yet its Quality of Experience (QoE) often degrades under limited radio resources. This has two causes: 5G networks must serve many users, while interactive traffic requires careful handling. Motivated by the insight that different subflows within an interactive session have a disproportionate effect on QoE, we present the design and implementation of StreamGuard, a practical 5G architecture for subflow-level, QoE-aware prioritization. StreamGuard forms a closed control loop with three components: (1) a monitor in the Radio Access Network (RAN) that uses deep packet inspection to infer QoE and RAN state, (2) a controller that selects prioritization actions to balance QoE and fairness, and (3) a marking module that applies these decisions by marking packets to steer subflows into appropriate priority queues. StreamGuard further shapes application behaviors via mechanisms including selective subflow dropping and probe-based rate control, to align application behavior with radio constraints. Implemented in a real 5G testbed, StreamGuard achieves a superior QoE-fairness tradeoff compared to vanilla 5G and prior state-of-the-art approaches, improving QoE by up to 70% at comparable background throughput or preserving up to 2x higher background throughput at similar QoE.
comment: 31 pages, 35 figures
♻ ☆ Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute
Subject-driven video generation (SDV-Gen) aims to produce videos of a specific subject by adapting a pretrained video model, enabling personalized and application-driven content creation. To achieve this goal, per-subject tuning methods require approximately 200 A100 GPU hours to generate a customized video, whereas zero-shot methods avoid per-subject tuning but typically rely on millions of subject-video pairs for the supervision, incurring massive network fine-tuning costs (10K-200K A100 GPU hours). We propose a data- and compute-efficient zero-shot SDV-Gen framework that avoids test-time per-subject tuning and the use of large-scale subject-video pairs. Our key idea decomposes SDV-Gen into (i) identity injection learned from subject-image pairs and (ii) motion-awareness preservation maintained by a small set of arbitrary videos. We optimize the two tasks with stochastic switching, using random reference-frame sampling and image-token dropout to prevent trivial first-frame copying. Our gradient analysis shows that the two objectives rapidly evolve toward nearly orthogonal update subspaces, explaining the stable optimization. Using CogVideoX-5B, we adapt a single model with 200K subject-image pairs and 4,000 arbitrary videos in 288 A100 GPU hours. This yields about 1% of compute compared to prior zero-shot baselines (i.e., 0.4% of VACE and 2.8% of Phantom) while using no subject-video pairs, yet remaining competitive in subject fidelity and motion quality. We show that the same recipe transfers to Wan 2.2-5B.
comment: [v3 updated] Project Page : https://carpedkm.github.io/projects/disentangled_sub/index.html
♻ ☆ Context- and Pixel-aware Large Language Model for Video Quality Assessment ICIP 2026
Video quality assessment (VQA) is a challenging research topic with broad applications. Traditional hand-crafted and discriminative learning-based VQA models mainly focus on pixel-level distortions and lack contextual understanding, while recent multimodal large language models (MLLMs) struggle with sensitivity to small distortions or handle quality scoring and description as separate tasks. To address these shortcomings, we introduce CP-LLM: a Context- and Pixel-aware Large Language Model. CP-LLM is a novel multimodal LLM architecture featuring dual vision encoders designed to independently analyze perceptual quality at both high-level (video context) and low-level (pixel distortion) granularity, along with a language decoder that subsequently reasons about the interplay between these aspects. This design enables CP-LLM to simultaneously produce robust quality scores and interpretable quality descriptions, with enhanced sensitivity to pixel distortions (e.g. compression artifacts). Experiment results demonstrate that CP-LLM achieves state-of-the-art cross-dataset performance on VQA benchmarks and superior robustness to pixel distortions.
comment: Accepted to ICIP 2026
♻ ☆ Equivariance2Inverse: A Practical Self-Supervised CT Reconstruction Method Benchmarked on Real, Limited-Angle, and Blurred Data
Deep learning has shown impressive results in reducing noise and artifacts in X-ray computed tomography (CT) reconstruction. Self-supervised CT reconstruction methods are especially appealing for real-world applications because they require no ground truth training examples. However, these methods involve a simplified X-ray physics model during training, which may make inaccurate assumptions, for example, about scintillator blurring, the scanning geometry, or the distribution of the noise. As a result, they can be less robust to real-world imaging circumstances. In this paper, we review the model assumptions of six recent self-supervised CT reconstruction methods. Based on this, we combined concepts of the Robust Equivariant Imaging and Sparse2Inverse methods in a new self-supervised CT reconstruction method called Equivariance2Inverse that is robust to scintillator blurring and limited-angle data. We benchmarked Equivariance2Inverse and the existing methods on the real-world 2DeteCT dataset and on synthetic data with and without scintillator blurring and a limited-angle scanning geometry. The results of our benchmark show that methods that assume that the noise is pixel-wise independent do not perform well on data with scintillator blurring. Moreover, they show that when the distribution of objects is rotationally invariant, this invariance can be used to reduce artifacts in limited-angle reconstructions.
comment: 13 pages, 4 figures
♻ ☆ Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy
Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios. However, Lissajous scanning traces a resonant trajectory and samples only the visited pixels per frame; at high frame rates, many pixels remain unvisited, creating structured holes. In this work, we introduce the first benchmark for high-rate Lissajous CLE, consisting of low-quality video clips paired with high-quality reference images. The reference images are wide-FOV mosaics obtained by stitching stabilized, slow-scan frames of the same tissue, enabling temporally aligned supervision. Using this dataset, we propose MIRA, a lightweight recurrent framework for Lissajous CLE restoration that iteratively aggregates temporal context through feature reuse and displacement alignment. Our experiments demonstrate that MIRA outperforms both lightweight and high-complexity baselines in restoration quality while maintaining a favorable computational efficiency suitable for clinical deployment.
♻ ☆ Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures
Today's high-performance architectures are increasingly constrained by data movement latency and energy overhead, as the slowdown of single-core performance scaling coincides with the rise of highly data-intensive workloads. In-memory architectures have emerged as a complementary solution to conventional von Neumann systems by alleviating memory bandwidth bottlenecks, exploiting massive concurrency, and mitigating excessive data movement between memory and processing units. This study proposes a parallel in-memory stochastic computing (SC) architecture that implements an end-to-end computation pipeline within Magnetic Tunnel Junction (MTJ)-based memory augmented with logic-in-memory (LIM) capabilities. By leveraging the inherent stochasticity and write-read characteristics of MTJ devices, the proposed architecture enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams, eliminating the need for energy-intensive external random number generation circuitry. These bit-streams are processed by parallel stochastic arithmetic units integrated directly within the memory arrays to efficiently implement core arithmetic and transcendental functions with minimal hardware complexity and inherent noise tolerance. The resulting stochastic outputs can be either reused as an input of future stochastic processing or converted back to binary form using parallel accumulation mechanisms and stored in the MTJ memory. By tightly integrating data storage, bit-stream generation, and computation within a unified in-memory fabric, the proposed design maximizes memory-level parallelism while substantially minimizing data movement.
♻ ☆ ARC: Consistent, Low-Latency Delivery via Receiver-Side Scheduling
Applications such as cloud gaming, video streaming, telemetry, ML inference, and data transfer provide a better experience when data is released at the receiver with timing reflecting how the data enters the sender. In practice, network delay variation and recovery dynamics at the receiver distort this timing even when transports deliver all packets correctly, producing visible jitter, stalls, and unstable playback. Many such applications operate best when delivery preserves this timing behavior and its implied order; out-of-order or irregular delivery can significantly degrade performance even when all data eventually arrives. We present a lightweight receiver-side release scheduling protocol, Adaptive Release Control (ARC), that restores this timing at the receiver. ARC releases recovered data in a manner that follows the sender's timing, maintaining ordering and limiting reordering when necessary while producing smooth delivery with minimal added latency given network conditions. It operates entirely on the receiver clock and requires no feedback, synchronization, or changes to the underlying transport. As an example, we integrate ARC into LT3, a network-layer system currently deployed as a software overlay that forwards traffic without altering the transport protocols it carries, where ARC functions as an independent module that regulates release timing for forwarded data. Evaluating LT3 with ARC on a cloud-gaming workload shows that the protocol removes virtually all large jitter excursions and yields release intervals that closely match the sender's timing, translating into improved perceptual smoothness. Broader latency improvements arise from the behavior of the full LT3 system. The benefits of ARC extend to transport protocols carried over LT3, including TCP, QUIC, WebRTC, UDP, and RTP, as preserving sender timing improves their behavior across a wide range of conditions.
comment: 30 pages, 6 figures, 1 table