Field
Type
All-in-One Image Restoration

UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity

Author:Jingbo Lin, Zhilu Zhang, Wenbo Li, Renjing Pei, Hang Xu, Hongzhi Zhang, Wangmeng Zuo

Year:2026

Publication:International Conference on Learning Representations (ICLR)

ScreenShot_2026-02-13_121512_079.jpg

Recently, considerable progress has been made in all-in-one image restoration. Generally, existing methods can be degradation-agnostic or degradation-aware. However, the former are limited in leveraging degradation-specific restoration, and the latter suffer from the inevitable error in degradation estimation. Consequently, the performance of existing methods has a large gap compared to specific single-task models. In this work, we make a step forward in this topic, and present our UniRestorer with improved restoration performance. Specifically, we perform hierarchical clustering on degradation space, and train a multi-granularity mixture-of-experts (MoE) restoration model. Then, UniRestorer adopts both degradation and granularity estimation to adaptively select an appropriate expert for image restoration. In contrast to existing degradation-agnostic and -aware methods, UniRestorer can leverage degradation estimation to benefit degradation specific restoration, and use granularity estimation to make the model robust to degradation estimation error. Experimental results show that our UniRestorer outperforms state-of-the-art all-in-one methods by a large margin, and is promising in closing the performance gap to specific single task models.

Paper
Code
Image & Video Restoration

Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

Author:Haoran Bai, Xiaoxu Chen, Canqian Yang, Zongyao He, Sibin Deng, Ying Chen

Year:2026

Publication:International Conference on Learning Representations (ICLR)

ScreenShot_2026-01-27_104008_362 (1).jpg

We present Vivid-VR, a DiT-based generative video restoration method built upon an advanced T2V foundation model, where ControlNet is leveraged to control the generation process, ensuring content consistency. However, conventional fine-tuning of such controllable pipelines frequently suffers from distribution drift due to limitations in imperfect multimodal alignment, resulting in compromised texture realism and temporal coherence. To tackle this challenge, we propose a concept distillation training strategy that utilizes the pretrained T2V model to synthesize training samples with embedded textual concepts, thereby distilling its conceptual understanding to preserve texture and temporal quality. To enhance generation controllability, we redesign the control architecture with two key components: 1) a control feature projector that filters degradation artifacts from input video latents to minimize their propagation through the generation pipeline, and 2) a new ControlNet connector employing a dual-branch design. This connector synergistically combines MLP-based feature mapping with cross-attention mechanism for dynamic control feature retrieval, enabling both content preservation and adaptive control signal modulation. Extensive experiments show that Vivid-VR performs favorably against existing approaches on both synthetic and real-world benchmarks, as well as AIGC videos, achieving impressive texture realism, visual vividness, and temporal consistency. The codes and checkpoints are publicly available at this https URL.

Paper
Code
Image & Video Deraining

DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining

Author:Shuning Sun, Jialang Lu, Xiang Chen, Jichao Wang, Dianjie Lu, Guijuan Zhang, Guangwei Gao, Zhuoran Zheng

Year:2026

Publication:International Conference on Learning Representations (ICLR)

ScreenShot_2026-01-27_103050_348 (1).jpg

Videos captured in the wild often suffer from rain streaks, blur, and noise. In addition, even slight changes in camera pose can amplify cross-frame mismatches and temporal artifacts. Existing methods rely on optical flow or heuristic alignment, which are computationally expensive and less robust. To address these challenges, Lie groups provide a principled way to represent continuous geometric transformations, making them well-suited for enforcing spatial and temporal consistency in video modeling. Building on this insight, we propose DeLiVR, an efficient video deraining method that injects spatiotemporal Lie-group differential biases directly into attention scores of the network. Specifically, the method introduces two complementary components. First, a rotation-bounded Lie relative bias predicts the in-plane angle of each frame using a compact prediction module, where normalized coordinates are rotated and compared with base coordinates to achieve geometry-consistent alignment before feature aggregation. Second, a differential group displacement computes angular differences between adjacent frames to estimate a velocity. This bias computation combines temporal decay and attention masks to focus on inter-frame relationships while precisely matching the direction of rain streaks. Extensive experimental results demonstrate the effectiveness of our method on publicly available benchmarks.

Paper
Image & Video Restoration

Orthogonal Decoupling Contrastive Regularization: Toward Uncorrelated Feature Decoupling for Unpaired Image Restoration

Author:Zhongze Wang, Jingchao Peng, Haitao Zhao, Lujian Yao, Kaijie Zhao

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Orthogonal Decoupling Contrastive Regularization Toward Uncorrelated Feature Decoupling for Unpaired Image Restoration.jpg

Unpaired image restoration (UIR) is a significant task due to the difficulty of acquiring paired degraded/clear images with identical backgrounds. In this paper, we propose a novel UIR method based on the assumption that an image contains both degradation-related features, which affect the level of degradation, and degradation-unrelated features, such as texture and semantic information. Our method aims to ensure that the degradation-related features of the restoration result closely resemble those of the clear image, while the degradation-unrelated features align with the input degraded image. Specifically, we introduce a Feature Orthogonalization Module optimized on Stiefel manifold to decouple image features, ensuring feature uncorrelation. A task-driven Depth-wise Feature Classifier is proposed to assign weights to uncorrelated features based on their relevance to degradation prediction. To avoid the dependence of the training process on the quality of the clear image in a single pair of input data, we propose to maintain several degradation-related proxies describing the degradation level of clear images to enhance the model’s robustness. Finally, a weighted PatchNCE loss is introduced to pull degradation-related features in the output image toward those of clear images, while bringing degradation-unrelated features close to those of the degraded input.

Paper
Image Super-Resolution

Local Texture Pattern Estimation for Image Detail Super-Resolution

Author:Fan Fan, Yang Zhao, Yuan Chen, Nannan Li, Wei Jia, Ronggang Wang

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Local Texture Pattern Estimation for Image Detail Super-Resolution.jpg

In the image super-resolution (SR) field, recovering missing high-frequency textures has always been an important goal. However, deep SR networks based on pixel-level constraints tend to focus on stable edge details and cannot effectively restore random high-frequency textures. It was not until the emergence of the generative adversarial network (GAN) that GAN-based SR models achieved realistic texture restoration and quickly became the mainstream method for texture SR. However, GAN-based SR models still have some drawbacks, such as relying on a large number of parameters and generating fake textures that are inconsistent with ground truth. Inspired by traditional texture analysis research, this paper proposes a novel SR network based on local texture pattern estimation (LTPE), which can restore fine high-frequency texture details without GAN. A differentiable local texture operator is first designed to extract local texture structures, and a texture enhancement branch is used to predict the high-resolution local texture distribution based on the LTPE. Then, the predicted high-resolution texture structure map can be used as a reference for the texture fusion SR branch to obtain high-quality texture reconstruction. Finally, L1 loss and Gram loss are simultaneously used to optimize the network. Experimental results demonstrate that the proposed method can effectively recover high-frequency texture without using GAN structures. In addition, the restored high-frequency details are constrained by local texture distribution, thereby reducing significant errors in texture generation.

Paper
Code
Image Super-Resolution

Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution

Author:Xiaoming Li, Wangmeng Zuo, Chen Change Loy

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution.jpg

Faithful text image super-resolution (SR) is challenging because each character has a unique structure and usually exhibits diverse font styles and layouts. While existing methods primarily focus on English text, less attention has been paid to more complex scripts like Chinese. In this paper, we introduce a high-quality text image SR framework designed to restore the precise strokes of low-resolution (LR) Chinese characters. Unlike methods that rely on character recognition priors to regularize the SR task, we propose a novel structure prior that offers structure-level guidance to enhance visual quality. Our framework incorporates this structure prior within a StyleGAN model, leveraging its generative capabilities for restoration. To maintain the integrity of character structures while accommodating various font styles and layouts, we implement a codebook-based mechanism that restricts the generative space of StyleGAN. Each code in the codebook represents the structure of a specific character, while the vector w in StyleGAN controls the character’s style, including typeface, orientation, and location. Through the collaborative interaction between the codebook and style, we generate a high-resolution structure prior that aligns with LR characters both spatially and structurally. Experiments demonstrate that this structure prior provides robust, character-specific guidance, enabling the accurate restoration of clear strokes in degraded characters, even for real-world LR Chinese text with irregular layouts.

Paper
Code
Image Super-Resolution

Test-Time Training for Hyperspectral Image Super-Resolution

Author:Ke Li, Luc Van Gool, Dengxin Dai

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Test-Time Training for Hyperspectral Image Super-Resolution.jpg

The progress on Hyperspectral image (HSI) super-resolution (SR) is still lagging behind the research of RGB image SR. HSIs usually have a high number of spectral bands, so accurately modeling spectral band interaction for HSI SR is hard. Also, training data for HSI SR is hard to obtain so the dataset is usually rather small. In this work, we propose a new test-time training method to tackle this problem. Specifically, a novel self-training framework is developed, where more accurate pseudo-labels and more accurate LR-HR relationships are generated so that the model can be further trained with them to improve performance. In order to better support our test-time training method, we also propose a new network architecture to learn HSI SR without modeling spectral band interaction and propose a new data augmentation method Spectral Mixup to increase the diversity of the training data at test time. We also collect a new HSI dataset with a diverse set of images of interesting objects ranging from food to vegetation, to materials, and to general scenes. Extensive experiments on multiple datasets show that our method can improve the performance of pre-trained models significantly after test-time training and outperform competing methods significantly for HSI SR.

Paper
Image Super-Resolution

Rotation Equivariant Arbitrary-Scale Image Super-Resolution

Author:Qi Xie, Jiahong Fu, Zongben Xu, Deyu Meng

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Rotation Equivariant Arbitrary-Scale Image Super-Resolution.jpg

The arbitrary-scale image super-resolution (ASISR), a recent popular topic in computer vision, aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image. This task is realized by representing the image as a continuous implicit function through two fundamental modules, a deep-network-based encoder and an implicit neural representation (INR) module. Despite achieving notable progress, a crucial challenge of such a highly ill-posed setting is that many common geometric patterns, such as repetitive textures, edges, or shapes, are seriously warped and deformed in the low-resolution images, naturally leading to unexpected artifacts appearing in their high-resolution recoveries. Embedding rotation equivariance into the ASISR network is thus necessary, as it has been widely demonstrated that this enhancement enables the recovery to faithfully maintain the original orientations and structural integrity of geometric patterns underlying the input image. Motivated by this, we make efforts to construct a rotation equivariant ASISR method in this study. Specifically, we elaborately redesign the basic architectures of INR and encoder modules, incorporating intrinsic rotation equivariance capabilities beyond those of conventional ASISR networks. Through such amelioration, the ASISR network can, for the first time, be implemented with end-to-end rotational equivariance maintained from input to output. We also provide a solid theoretical analysis to evaluate its intrinsic equivariance error, demonstrating its inherent nature of embedding such an equivariance structure. The superiority of the proposed method is substantiated by experiments conducted on both simulated and real datasets. We also validate that the proposed framework can be readily integrated into current ASISR methods in a plug & play manner to further enhance their performance.

Paper
Code
Image Super-Resolution

Towards Lightweight Super-Resolution With Dual Regression Learning

Author:Yong Guo, Mingkui Tan, Zeshuai Deng, Jingdong Wang, Qi Chen, Jiezhang Cao

Year:2025

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Towards Lightweight Super-Resolution With Dual Regression Learning.jpg

Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models.

Paper
Code
1 2 3 ... 207 Jump topage