paper

FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis

Ziqi Ni, Ao Fu, Yi Zhou

FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis

Name: FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis
Author: Ziqi Ni, Ao Fu, Yi Zhou

Ziqi Ni, Ao Fu, Yi Zhou

Paper2025-03-06English

Start Reading

high frequency tradingarxiv

Description

Achieving high-fidelity lip-speech synchronization in audio-driven talking portrait synthesis remains challenging. While multi-stage pipelines or diffusion models yield high-quality results, they suffer from high computational costs. Some approaches perform well on specific individuals with low resources, yet still exhibit mismatched lip movements. The aforementioned methods are modeled in the pixel domain. We observed that there are noticeable discrepancies in the frequency domain between the s...