Field
Type
Image & Video Restoration

Orthogonal Decoupling Contrastive Regularization: Toward Uncorrelated Feature Decoupling for Unpaired Image Restoration

Author:Zhongze Wang, Jingchao Peng, Haitao Zhao, Lujian Yao, Kaijie Zhao

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Orthogonal Decoupling Contrastive Regularization Toward Uncorrelated Feature Decoupling for Unpaired Image Restoration.jpg

Unpaired image restoration (UIR) is a significant task due to the difficulty of acquiring paired degraded/clear images with identical backgrounds. In this paper, we propose a novel UIR method based on the assumption that an image contains both degradation-related features, which affect the level of degradation, and degradation-unrelated features, such as texture and semantic information. Our method aims to ensure that the degradation-related features of the restoration result closely resemble those of the clear image, while the degradation-unrelated features align with the input degraded image. Specifically, we introduce a Feature Orthogonalization Module optimized on Stiefel manifold to decouple image features, ensuring feature uncorrelation. A task-driven Depth-wise Feature Classifier is proposed to assign weights to uncorrelated features based on their relevance to degradation prediction. To avoid the dependence of the training process on the quality of the clear image in a single pair of input data, we propose to maintain several degradation-related proxies describing the degradation level of clear images to enhance the model’s robustness. Finally, a weighted PatchNCE loss is introduced to pull degradation-related features in the output image toward those of clear images, while bringing degradation-unrelated features close to those of the degraded input.

Paper
Image Super-Resolution

Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution

Author:Ao Li, Le Zhang, Yun Liu, Ce Zhu

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution.jpg

Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.

Paper
Code
Image & Video Enhancement

High-resolution Photo Enhancement in Real-time: A Laplacian Pyramid Network

Author:Feng Zhang, Haoyou Deng, Zhiqiang Li, Lida Li, Bin Xu, Qingbo Lu

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

39 (1).jpg

Photo enhancement plays a crucial role in augmenting the visual aesthetics of a photograph. In recent years, photo enhancement methods have either focused on enhancement performance, producing powerful models that cannot be deployed on edge devices, or prioritized computational efficiency, resulting in inadequate performance for real-world applications. To this end, this paper introduces a pyramid network called LLF-LUT++, which integrates global and local operators through closed-form Laplacian pyramid decomposition and reconstruction. This approach enables fast processing of high-resolution images while also achieving excellent performance. Specifically, we utilize an image-adaptive 3D LUT that capitalizes on the global tonal characteristics of downsampled images, while incorporating two distinct weight fusion strategies to achieve coarse global image enhancement. To implement this strategy, we designed a spatial-frequency transformer weight predictor that effectively extracts the desired distinct weights by leveraging frequency features. Additionally, we apply local Laplacian filters to adaptively refine edge details in high-frequency components. After meticulously redesigning the network structure and transformer model, LLF-LUT++ not only achieves a 2.64 dB improvement in PSNR on the HDR+ dataset, but also further reduces runtime, with 4K resolution images processed in just 13 ms on a single GPU. Extensive experimental results on two benchmark datasets further show that the proposed approach performs favorably compared to state-of-the-art methods. The source code will be made publicly available at https://github.com/fengzhang427/LLF-LUT.

Paper
Code
Image & Video Enhancement

Generalized Task-Driven Medical Image Quality Enhancement With Gradient Promotion

Author:Dong Zhang, Kwang-Ting Cheng

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

38 (2).jpg

Thanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR (Liu et al. 2023), the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact–different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods.

Paper
Image & Video Enhancement

EventAid: Benchmarking Event-Aided Image/Video Enhancement Algorithms With Real-Captured Hybrid Dataset

Author:Peiqi Duan, Boyu Li, Yixin Yang, Hanyue Lou, Minggui Teng, Xinyu Zhou

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

37 (1).jpg

Event cameras are emerging imaging technology that offer advantages over conventional frame-based imaging sensors in dynamic range and sensing speed. Complementing the rich texture and color perception of traditional image frames, the hybrid camera system of event and frame-based cameras enables high-performance imaging. With the assistance of event cameras, high-quality image/video enhancement methods make it possible to break the limits of traditional frame-based cameras, especially exposure time, resolution, dynamic range, and frame rate limits. This paper focuses on five event-aided image and video enhancement tasks (i.e., event-based video reconstruction, event-aided high frame rate video reconstruction, image deblurring, image super-resolution, and high dynamic range image reconstruction), provides an analysis of the effects of different event properties, a real-captured and ground truth labeled benchmark dataset, a unified benchmarking of state-of-the-art methods, and an evaluation for two mainstream event simulators. In detail, this paper collects a real-captured evaluation dataset EventAid for five event-aided image/video enhancement tasks, by using “Event-RGB” multi-camera hybrid system, taking into account scene diversity and spatiotemporal synchronization. We further perform quantitative and visual comparisons for state-of-the-art algorithms, provide a controlled experiment to analyze the performance limit of event-aided image deblurring methods, and discuss open problems to inspire future research.

Paper
Code
Image & Video Enhancement

Learning With Self-Calibrator for Fast and Robust Low-Light Image Enhancement

Author:Long Ma, Tengyu Ma, Chengpei Xu, Jinyuan Liu, Xin Fan, Zhongxuan Luo

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

36 (1).jpg

Convolutional Neural Networks (CNNs) have shown significant success in the low-light image enhancement task. However, most of existing works encounter challenges in balancing quality and efficiency simultaneously. This limitation hinders practical applicability in real-world scenarios and downstream vision tasks. To overcome these obstacles, we propose a Self-Calibrated Illumination (SCI) learning scheme, introducing a new perspective to boost the model’s capability. Based on a weight-sharing illumination estimation process, we construct an embedded self-calibrator to accelerate stage-level convergence, yielding gains that utilize only a single basic block for inference, which drastically diminishes computation cost. Additionally, by introducing the additivity condition on the basic block, we acquire a reinforced version dubbed SCI++, which disentangles the relationship between the self-calibrator and illumination estimator, providing a more interpretable and effective learning paradigm with faster convergence and better stability. We assess the proposed enhancers on standard benchmarks and in-the-wild datasets, confirming that they can restore clean images from diverse scenes with higher quality and efficiency. The verification on different levels of low-light vision tasks shows our applicability against other methods.

Paper
Code
Image & Video Enhancement

Burst Image Restoration and Enhancement

Author:Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Husan Yang

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

35 (1).jpg

Burst Image Restoration aims to reconstruct a high-quality image by efficiently combining complementary inter-frame information. However, it is quite challenging since individual burst images often have inter-frame misalignments that usually lead to ghosting and zipper artifacts. To mitigate this, we develop a novel approach for burst image processing named BIPNet that focuses solely on the information exchange between burst frames and filter-out the inherent degradations while preserving and enhancing the actual scene details. Our central idea is to generate a set of pseudo-burst features that combine complementary information from all the burst frames to exchange information seamlessly. However, due to inter-frame misalignment, the information cannot be effectively combined in pseudo-burst. Thus, we initially align the incoming burst features regarding the reference frame using the proposed edge-boosting feature alignment. Lastly, we progressively upscale the pseudo-burst features in multiple stages while adaptively combining the complementary information. Unlike the existing works, that usually deploy single-stage up-sampling with a late fusion scheme, we first deploy a pseudo-burst mechanism followed by the adaptive-progressive feature up-sampling. The proposed BIPNet significantly outperforms the existing methods on burst super-resolution, low-light image enhancement, low-light image super-resolution, and denoising tasks.

Paper
Code
Image & Video Enhancement

Interpretable Optimization-Inspired Unfolding Network for Low-Light Image Enhancement

Author:Wenhui Wu, Jian Weng, Pingping Zhang, Xu Wang, Wenhan Yang, Jianmin Jiang

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

33 (1).jpg

Retinex model-based methods have shown to be effective in layer-wise manipulation with well-designed priors for low-light image enhancement (LLIE). However, the hand-crafted priors and conventional optimization algorithm adopted to solve the layer decomposition problem result in the lack of adaptivity and efficiency. To this end, this paper proposes a Retinex-based deep unfolding network (URetinex-Net++), which unfolds an optimization problem into a learnable network to decompose a low-light image into reflectance and illumination layers. By formulating the decomposition problem as an implicit priors regularized model, three learning-based modules are carefully designed, responsible for data-dependent initialization, high-efficient unfolding optimization, and fairly-flexible component adjustment, respectively. Particularly, the proposed unfolding optimization module, introducing two networks to adaptively fit implicit priors in the data-driven manner, can realize noise suppression and details preservation for decomposed components. URetinex-Net++ is a further augmented version of URetinex-Net, which introduces a cross-stage fusion block to alleviate the color defect in URetinex-Net. Therefore, boosted performance on LLIE can be obtained in both visual quality and quantitative metrics, where only a few parameters are introduced and little time is cost. Extensive experiments on real-world low-light images qualitatively and quantitatively demonstrate the effectiveness and superiority of the proposed URetinex-Net++ over state-of-the-art methods.

Paper
Code
Image & Video Enhancement

Diff-Retinex++: Retinex-Driven Reinforced Diffusion Model for Low-Light Image Enhancement

Author:Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

32 (2).jpg

This paper proposes a Retinex-driven reinforced diffusion model for low-light image enhancement, termed Diff-Retinex++, to address various degradations caused by low light. Our main approach integrates the diffusion model with Retinex-driven restoration to achieve physically-inspired generative enhancement, making it a pioneering effort. To be detailed, Diff-Retinex++ consists of two-stage view modules, including the Denoising Diffusion Model (DDM), and the Retinex-Driven Mixture of Experts Model (RMoE). First, DDM treats low-light image enhancement as one type of image generation task, benefiting from the powerful generation ability of diffusion model to handle the enhancement. Second, we design the Retinex theory into the plug-and-play supervision attention module. It leverages the latent features in the backbone and knowledge distillation to learn Retinex rules, and further regulates these latent features through the attention mechanism. In this way, it couples the relationship between Retinex decomposition and image enhancement in a new view, achieving dual improvement. In addition, the Low-Light Mixture of Experts preserves the vividness of the diffusion model and fidelity of the Retinex-driven restoration to the greatest extent. Ultimately, the iteration of DDM and RMoE achieves the goal of Retinex-driven reinforced diffusion model. Extensive experiments conducted on real-world low-light datasets qualitatively and quantitatively demonstrate the effectiveness, superiority, and generalization of the proposed method.

Paper
Code
1 2 3 ... 204 Jump topage