Field
Type
Image & Video Denoising

Learning Continuous Spatiotemporal Implicit Neural Fields for Unsupervised Video Denoising

Author:Xiaowan Hu, Henan Liu, Ce Zheng, Xinyang Li, Mai Xu

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Learning Continuous Spatiotemporal Implicit Neural Fields for Unsupervised Video Denoising.jpg

Video denoising is fundamental to low-level vision and real-world imaging, yet existing self-supervised methods remain fragile under severe noise and complex motion. Most approaches still rely on spatially and temporally discrete grid-based representations: blind-spot networks enforce J-invariance by masking center pixels with a limited receptive field, while recurrent models build temporal dependencies on discretized frame sequences and noise-sensitive optical flow, leading to error accumulation and motion artifacts. We address this model bottleneck by reformulating self-supervised video denoising as learning a continuous spatiotemporal implicit field. Building on coordinate-based implicit neural representations, we propose a unified video denoising model with a spatiotemporal implicit neural field (SINF). In the spatial domain, blind-spot implicit spatial field maps coordinates directly to pixel-level representations, enabling globally informed texture recovery beyond receptive-field limits. In the temporal domain, an implicit temporal embedding with periodic activations encodes motion continuously over time, while a time-aware spatial graph module refines cross-frame alignment. Together, SINF remodels discretized video signals into a continuous spatiotemporal intensity field, enabling more robust pixel-wise associations than coarse optical flow. Extensive experiments on synthetic and real noisy video benchmarks demonstrate that our SINF achieves state-of-the-art performance on synthetic and real noisy video benchmarks.

Paper
Code
Image Quality Assessment

SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack Against No-Reference Image Quality Assessment Models

Author:Yujia Liu, Dingquan Li, Zhixuan Li, Tiejun Huang

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

SEGA A Transferable Signed Ensemble Gaussian Black-Box Attack Against No-Reference Image Quality Assessment Models.jpg

No-Reference Image Quality Assessment (NR-IQA) models play an important role in various real-world applications. Recently, adversarial attacks against NR-IQA models have attracted increasing attention, as they provide valuable insights for revealing model vulnerabilities and guiding robust system design. Some effective attacks have been proposed against NR-IQA models in white-box settings, where the attacker has full access to the target model. However, these attacks often suffer from poor transferability to unknown target models in more realistic black-box scenarios, where the target model is inaccessible. This work makes the first attempt to address the challenge of low transferability in attacking NR-IQA models by proposing a transferable Signed Ensemble Gaussian black-box Attack (SEGA). The main idea is to approximate the gradient of the target model by applying Gaussian smoothing to source models and ensembling their smoothed gradients. To ensure the imperceptibility of adversarial perturbations, SEGA further removes inappropriate perturbations using a specially designed perturbation filter mask. Experimental results demonstrate the superior transferability of SEGA, validating its effectiveness in enabling successful transfer-based black-box attacks against NR-IQA models.

Paper
Code
Image Quality Assessment

Semantic Contrast for Domain-Robust Underwater Image Quality Assessment

Author:Jingchun Zhou, Chunjiang Liu, Qiuping Jiang, Xianping Fu, Junhui Hou, Xuelong Li

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Semantic Contrast for Domain-Robust Underwater Image Quality Assessment.jpg

Underwater image quality assessment (UIQA) is hindered by complex degradation and domain shifts across aquatic environments. Existing no-reference IQA methods rely on costly and subjective mean opinion scores (MOS), which limit their generalization to unseen domains. To overcome these challenges, we propose SCUIA, an unsupervised UIQA framework leveraging semantic contrastive learning for quality prediction without human annotations. Specifically, we introduce a vision-language contrastive learning strategy that aligns image features with textual embeddings in a unified semantic space, capturing implicit degradation-quality correlations. We further enhance quality discrimination with a hierarchical contrastive learning mechanism that combines image-specific statistical priors and semantic prompts. A triplet-based inter-group contrastive loss explicitly models relative quality relationships. To tackle cross-domain variations, we develop an unsupervised domain adaptation module that uses local statistical features to guide CLIP fine-tuning to disentangle domain-invariant quality representations from domain-specific noise. This enables zero-shot cross-domain quality prediction without labeled data. Extensive experiments on public UIQA benchmarks demonstrate significant improvements over existing methods, highlighting superior generalization and domain adaptability.

Paper
Video Frame Interpolation

Velocity Disambiguation for Video Frame Interpolation

Author:Zhihang Zhong, Yiming Zhang, Wei Wang, Xiao Sun, Yu Qiao, Gurunandan Krishnan

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Velocity Disambiguation for Video Frame Interpolation.jpg

Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t (“time indexing”), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly together with predicting the frames, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed “distance indexing”. This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. We further observed that, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames (i.e., halfway in-between), due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly sharper outputs and superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing without requiring extra computation. Furthermore, we demonstrate that if additional latency is acceptable, a continuous map estimator can be employed to compute a pixel-wise dense distance indexing using multiple nearby frames. Combined with efficient multi-frame refinement, this extension can further disambiguate complex motion, thus enhancing performance both qualitatively and quantitatively. Additionally, the ability to manually specify distance indexing allows for independent temporal manipulation of each object, providing a novel tool for video editing tasks such as re-timing.

Paper
Code
Image & Video Enhancement

UniFES: A Unified Recurrent Network for Quality Enhancement and Stabilization in Face Videos

Author:Tie Liu, Mai Xu, Shengxi Li, Jialu Zhang, Lai Jiang

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

UniFES A Unified Recurrent Network for Quality Enhancement and Stabilization in Face Videos.jpg

Recent years have witnessed an explosive increase of face content, which drives a distinct shift from static images to dynamic video formats. The shift of formats inherently alters the characteristics within face videos, whereby pixel-wise artifacts are intertwined with motion-related impairments. Addressing the emerging distortions that now always appear by twins in practice, however, is challenging and non-trivial, due to the distinct characteristics in addressing spatial-temporal frequencies in videos. In this paper, we propose a novel Unified recurrent network for joint Face video quality Enhancement and Stabilization (UniFES), as the first successful attempt for both quality enhancement and motion stabilization. Correspondingly, our UniFES method proposes to effectively aggregate the mutual information in the pixel and motion domains. For the quality enhancement, our UniFES method decomposes the shaking temporal alignment problem into progressive feature alignment with explicit physical information, which includes the global dynamics from the motion domain, i.e., from the stabilization task. Regarding the video stabilization, we integrate the mixed dynamics from the enhancement task (i.e., from pixel domain) to take into account both pixel-wise and motion-related characteristics, for ensuring robust trajectory estimation and motion stabilization. Subsequently, we refine the warping masks to achieve high-quality full frame rendering. We further establish a synthetic dataset for training and evaluation regarding this emerging task. Comprehensive experiments have illustrated the superior performances of our UniFES method over 32 comparing baselines on both newly established synthetic and real-world datasets.

Paper
Image & Video Enhancement

Leveraging Color Naming for Image Enhancement

Author:David Serrano-Lozano, Luis Herranz, Michael S. Brown, Javier Vazquez-Corral

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Leveraging Color Naming for Image Enhancement.jpg

Enhancing images to make them visually appealing is a persistent challenge in computer vision. Many deep-learning methods train models on paired datasets to replicate expert editing styles. However, these approaches struggle with two key issues: (1) interpretability and (2) a parametrization suitable for user adjustments. To address these challenges, we present NamedCurves+, an approach inspired by the concept of Color Naming, a universal set of familiar colors widely used in software tools for intuitive editing. Our method integrates color names into a learning-based framework, enabling global adjustments for each named color through tone curves. To address local image variations, we incorporate a transformer block that captures spatial dependencies, enabling context-aware edits across the image. NamedCurves+ enhances the retouching process's interpretability and supports user interaction, allowing flexible modifications of individual tone curves to refine the retouched image according to personal preferences. Extensive experiments on tasks such as image retouching, tone mapping, and exposure correction demonstrate that NamedCurves+ outperforms state-of-the-art methods. Notably, our approach is both explainable, as the tone curves explicitly represent how each color name contributes to the enhancement, and interactive, allowing users to customize the retouching process and achieve results tailored to their liking.

Paper
Code
Image & Video Denoising

D2S-RSG-SSD: Dual Double-Sampling with Random Sub-Samples Generation for Self-Supervised Real Image Denoising

Author:Xiao Liu, Xiuya Shi, Yizhong Pan, Shuhang Gu, Wei Liu, Chao Ren

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

D2S-RSG-SSD Dual Double-Sampling with Random Sub-Samples Generation for Self-Supervised Real Image Denoising.jpg

Recent advances in self-supervised image denoising have highlighted the potential of Blind-Spot Networks (BSNs). However, existing methods suffer from three major limitations: (1) Their effectiveness in real-world scenarios is limited by strong assumptions, such as noise independence, which rarely hold in practice. (2) While sampling-based strategies can partially improve performance, BSNs inherently suffer from information loss caused by centroid masking, and removing the blind spot leads to noise overfitting, both of which hinder denoising performance. (3) Sampling-based methods often introduce checkerboard artifacts, yet existing studies typically overlook the fundamental differences between these artifacts and real noise. To address these issues, we propose a novel self-supervised denoising framework, Dual Double-Sampling with Random Sub-samples Generation (D2S-RSG-SSD). To address Limitation 1, we introduce a sampling-based framework that breaks noise dependence by combining Random Sub-samples Generation (RSG) with a cross-paired loss LRSG. RSG generates diverse sub-samples with inherent variance, referred to as sampling differences, which serve as natural perturbations to augment training data and disrupt spatial noise correlations. The proposed loss function ensures full utilization of these sub-samples while stabilizing optimization. To address Limitation 2, we propose a Dual Double-Sampling (D2S) strategy with fixed sampling patterns and a dual-branch architecture. This design reduces reliance on pixel-level information and leverages complementary features to mitigate both noise overfitting and information loss. A key advantage is its compatibility with various advanced denoising networks, lifting the constraint of using BSNs in self-supervised settings. Additionally, we introduce a fixed sub-image sampling strategy to prevent pattern collapse during inference and ensure stability. To address Limitation 3, we explicitly differentiate checkerboard artifacts from real noise and develop a dedicated artifact remover to correct pixel discontinuities caused by sampling-based operations. This design preserves fine image details while reducing over-smoothing. Experiments on benchmark real-noise datasets and self-captured noisy images demonstrate the robustness and generalizability of our framework, achieving better performance over existing methods.

Paper
Code
Image & Video Denoising

Learning Physics-Informed Noise Models from Dark Frames for Low-Light Raw Image Denoising

Author:Hansen Feng, Lizhi Wang, Yiqi Huang, Yuzhi Wang, Lin Zhu, Hua Huang

Year:2026

Publication:IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Learning Physics-Informed Noise Models from Dark Frames for Low-Light Raw Image Denoising.jpg

Recently, the mainstream practice for training low-light raw image denoising methods has shifted towards employing synthetic data. Noise modeling, which focuses on characterizing the noise distribution of real-world sensors, profoundly influences the effectiveness and practicality of synthetic data. Currently, physics-based noise modeling struggles to characterize the entire real noise distribution, while learning-based noise modeling impractically depends on paired real data. In this paper, we propose a novel strategy: learning the noise model from dark frames instead of paired real data, to break down the data dependency. Based on this strategy, we introduce an efficient physics-informed noise neural proxy (PNNP) to approximate the real-world sensor noise model. Specifically, we integrate physical priors into neural proxies and introduce three efficient techniques: physics-guided noise decoupling (PND), physics-aware proxy model (PPM), and differentiable distribution loss (DDL). PND decouples the dark frame into different components and handles different levels of noise flexibly, which reduces the complexity of noise modeling. PPM incorporates physical priors to constrain the synthetic noise, which promotes the accuracy of noise modeling. DDL provides explicit and reliable supervision for noise distribution, which promotes the precision of noise modeling. PNNP exhibits powerful potential in characterizing the real noise distribution. Extensive experiments on public datasets demonstrate superior performance in practical low-light raw image denoising.

Paper
Code
1 2 3 ... 216 Jump topage