: The appearance of a real-world feather results from the complex interaction of light with its multi-scale biological structure, including the central shaft, branching barbs, and interlocking barbules on those barbs. In this work, we propose a practical surface-based appearance model for feathers. We represent the far-field appearance of feathers using a BSDF that implicitly represents the light scattering from the main biological structures of a feather, such as the shaft, barb and barbules. Our model accounts for the particular characteristics of feather barbs such as the non-cylindrical cross-sections and the scattering media via a numerically-based BCSDF. To model the relative visibility between barbs and barbules, we derive a masking term for the differential projected areas of the different components of the feather’s microgeometry, which allows us to analytically compute the masking between barbs and barbules. As opposed to previous works, our model uses a lightweight representation of the geometry based on a 2D texture, and does not require explicitly representing the barbs as curves. We show the flexibility and potential of our appearance model approach to represent the most important visual features of several pennaceous feathers.
: Encoding information from 2D views of an object into a 3D representation is crucial for generalized 3D feature extraction. Such features can then enable 3D reconstruction, 3D generation, and other applications. We propose GOEmbed (Gradient Origin Embeddings) that encodes input 2D images into any 3D representation, without requiring a pre-trained image feature extractor; unlike typical prior approaches in which input images are either encoded using 2D features extracted from large pre-trained models, or customized features are designed to handle different 3D representations; or worse, encoders may not yet be available for specialized 3D neural representations such as MLPs and hash-grids. We extensively evaluate our proposed GOEmbed under different experimental settings on the OmniObject3D benchmark. First, we evaluate how well the mechanism compares against prior encoding mechanisms on multiple 3D representations using an illustrative experiment called Plenoptic-Encoding. Second, the efficacy of the GOEmbed mechanism is further demonstrated by achieving a new SOTA FID of 22.12 on the OmniObject3D generation task using a combination of GOEmbed and DFM (Diffusion with Forward Models), which we call GOEmbedFusion. Finally, we evaluate how the GOEmbed mechanism bolsters sparse-view 3D reconstruction pipelines.
>>authors: Animesh Karnewar, Roman Shapovalov, Tom Monnier, Andrea Vedaldi, Niloy J. Mitra, David Novotny
: Simulating the light transport on biological tissues is a longstanding challenge, given its complex multilayered structure. In biology, one of the most remarkable and studied examples of tissues are the scales that cover the skin of reptiles, which present a combination of photonic structures and pigmentation. This is, however, a somewhat ignored problem in computer graphics. In this work, we propose a multilayered appearance model based on the anatomy of the snake skin. Some snakes are known for their striking, highly iridescent scales resulting from light interference. We model snake skin as a two-layered reflectance function: The top layer is a thin layer resulting on a specular iridescent reflection, while the bottom layer is a diffuse highlyabsorbing layer, that results into a dark diffuse appearance that maximizes the iridescent color of the skin. We demonstrate our layered material on a wide range of appearances, and show that our model is able to qualitatively match the appearance of snake skin.
: Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism. We present HoloFusion, a method that combines the best of these approaches to produce high-fidelity, plausible, and diverse 3D samples while learning from a collection of multi-view 2D images only. The method first generates coarse 3D samples using a variant of the recently proposed HoloDiffusion generator. Then, it independently renders and upsamples a large number of views of the coarse 3D model, super-resolves them to add detail, and distills those into a single, high-fidelity implicit 3D representation, which also ensures view-consistency of the final renders. The super-resolution network is trained as an integral part of HoloFusion, end-to-end, and the final distillation uses a new sampling scheme to capture the space of super-resolved signals. We compare our method against existing baselines, including DreamFusion, Get3D, EG3D, and HoloDiffusion, and achieve, to the best of our knowledge, the most realistic results on the challenging CO3Dv2 dataset.
>>authors: Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, David Novotny
: Material appearance is commonly modeled with the Bidirectional Reflectance Distribution Functions (BRDFs), which need to trade accuracy for complexity and storage cost. To investigate the current practices of BRDF modeling, we collect the first high dynamic range stereoscopic video dataset that captures the perceived quality degradation with respect to a number of parametric and non-parametric BRDF models. Our dataset shows that the current loss functions used to fit BRDF models, such as mean-squared error of logarithmic reflectance values, correlate poorly with the perceived quality of materials in rendered videos. We further show that quality metrics that compare rendered material samples give a significantly higher correlation with subjective quality judgments, and a simple Euclidean distance in the ITP color space (ΔEITP) shows the highest correlation. Additionally, we investigate the use of different BRDF-space metrics as loss functions for fitting BRDF models and find that logarithmic mapping is the most effective approach for BRDF-space loss functions.
: Diffusion models have emerged as the best approach for generative modeling of 2D images. Part of their success is due to the possibility of training them on millions if not billions of images with a stable learning objective. However, extending these models to 3D remains difficult for two reasons. First, finding a large quantity of 3D training data is much more complex than for 2D images. Second, while it is conceptually trivial to extend the models to operate on 3D rather than 2D grids, the associated cubic growth in memory and compute complexity makes this infeasible. We address the first challenge by introducing a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision; and the second challenge by proposing an image formation model that decouples model memory from spatial memory. We evaluate our method on real-world data, using the CO3D dataset which has not been used to train 3D generative models before. We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
: This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectral-coded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware.
: Hand in hand with the rapid development of machine learning, deep learning and generative AI algorithms and architectures, the graphics community has seen a remarkable evolution of novel techniques for material and appearance capture. Typically, these machine-learning-driven methods and technologies, in contrast to traditional techniques, rely on only a single or very few input images, while enabling the recovery of detailed, high-quality measurements of bi-directional reflectance distribution functions, as well as the corresponding spatially varying material properties, also known as Spatially Varying Bi-directional Reflectance Distribution Functions (SVBRDFs). Learning-based approaches for appearance capture will play a key role in the development of new technologies that will exhibit a significant impact on virtually all domains of graphics. Therefore, to facilitate future research, this State-of-the-Art Report (STAR) presents an in-depth overview of the state-of-the-art in machine-learning-driven material capture in general, and focuses on SVBRDF acquisition in particular, due to its importance in accurately modelling complex light interaction properties of real-world materials. The overview includes a categorization of current methods along with a summary of each technique, an evaluation of their functionalities, their complexity in terms of acquisition requirements, computational aspects and usability constraints. The STAR is concluded by looking forward and summarizing open challenges in research and development toward predictive and general appearance capture in this field. A complete list of the methods and papers reviewed in this survey is available at computergraphics.on.liu.se/star_svbrdf_dl/.
: The problem of detecting and localizing defects in images has been tackled with various approaches, including what are now called traditional computer vision techniques, as well as machine learning. Notably, most of these efforts have been directed toward the normality-supervised setting of this problem. That is, these algorithms assume the availability of a curated set of normal images, known to not contain any anomalies. The anomaly-free images constitute reference data, used to detect anomalies in a one-class classification setting. While this kind of data is easier to acquire than anomaly-annotated images, it is still costly or difficult to obtain in-domain data for certain applications. We address the problem of anomaly detection and localization under a training-set-free paradigm and do not require any anomaly-free reference data. Concretely, we introduced a truly zero-shot method that can localize anomalies in a single image of a previously unobserved texture class. Then, we develop a mechanism to leverage additional test images, which may contain anomalies. Furthermore, we extend our analysis to also include a categorization of the anomalies in the given population through clustering. Importantly, we focus our attention on textures and texture-like images as we develop an anomaly detection method for structural defects, rather than logical anomalies. This aligns with the proposed setting, which avoids the supervisory signal generally needed for detecting logical and semantical anomalies. This poster summarizes our recent line of research on localization and classification of anomalies in real-world texture images.
: Biophysical skin appearance modeling has previously focused on spectral absorption and scattering due to chromophores in various skin layers. In this work, we extend recent practical skin appearance measurement methods employing RGB illumination to provide a novel estimate of skin fluorescence, as well as direct measurements of two parameters related to blood distribution in skin ± blood volume fraction, and blood oxygenation. The proposed method involves the acquisition of RGB facial skin reflectance responses under RGB illumination produced by regular desktop LCD screens. Unlike previous works that have employed hyperspectral imaging for this purpose, we demonstrate successful isolation of elastin-related fluorescence, as well as blood distributions in capillaries and veins using our practical RGB imaging procedure.