SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes

Abstract

Neural implicit surface representations have emerged as a promising paradigm to capture 3D shapes in a continuous and resolution-independent manner. However, adapting them to articulated shapes is non-trivial. Existing approaches learn a backward warp field that maps deformed to canonical points. However, this is problematic since the backward warp field is pose dependent and thus requires large amounts of data to learn.

To address this, we introduce SNARF, which combines the advantages of linear blend skinning (LBS) for polygonal meshes with those of neural implicit surfaces by learning a forward deformation field without direct supervision. This deformation field is defined in canonical, pose-independent, space, enabling generalization to unseen poses. Learning the deformation field from posed meshes alone is challenging since the correspondences of deformed points are defined implicitly and may not be unique under changes of topology. We propose a forward skinning model that finds all canonical correspondences of any deformed point using iterative root finding. We derive analytical gradients via implicit differentiation, enabling end-to-end training from 3D meshes with bone transformations.

Compared to state-of-the-art neural implicit representations, our approach generalizes better to unseen poses while preserving accuracy. We demonstrate our method in challenging scenarios on (clothed) 3D humans in diverse and unseen poses.

Video

Backward vs. Forward

Backward warping/skinning has been used to model non-rigid implicit shapes. It maps poitns from deformed space to canonical space. The backward skinning weights field is defined in deformed space, therefore it's pose-dependent and does not generalize to unseen poses.

We propose to use Forward skinning for animating implicit shapes. It maps points from canonical space to deformed space. The forward skinning weights field is defined in the canonical space. Thus, forward skinning naturally generalizes to unseen poses.

Method Overview

To genreate deformed shape or to train with deformed observations, we need to determine the canonical correspondence of any given deformed point. This is trivial for backward skinning, but not straightforward for forward skinning. The core of our method is to find the canonical correspondence of any deformed point using forward skinning weights. We use iterative root finding algorithm with multiple initializations to numerically find all corrpondences, and then aggregate their occupancy probabilities using max operator as the occupancy of the deformed point. Finnally, we derive analytical gradients using the implicit differentiation theorem, so that the whole pipeline is end-to-end differentiable and thus can be trained with deformed observations directly.

Comparison

We train our method using meshes in various poses and ask the model to generate novel poses during inference time:

As shown, our method generalizes to these challenging and unseen poses. In comparison, backward skinning produces distorted shapes for unseen poses. The other baseline, NASA, models human body as a composition of multiple parts and suffers from discontinuous artifacts at joints.

BibTeX

@inproceedings{chen2021snarf,
      title={SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes},
      author={Chen, Xu and Zheng, Yufeng and Black, Michael J and Hilliges, Otmar and Geiger, Andreas},
      booktitle={International Conference on Computer Vision (ICCV)},
      year={2021}
    }