There have been attempts to convert 2 D of images into 3 D, including AI research in companies such as Facebook、Nvidia or start-ups like Threedy.AI.Recently, a research team from Microsoft also published a preprint paper, demonstrating its ability to generate 3D shape images based on unstructured 2D images.
Generally speaking, the framework of training needs to use raster processing to perform differential step rendering. Therefore, in the past, researchers in this field have focused on the development of customized rendering models. However, the image processed by this kind of model will not be real and natural, and it is not suitable for generating industrial renderings of game and graphics industry.
microsoft researchers have made a new breakthrough this time —— they detailed their paper a framework that uses the "scalable" training technique for the first time to be used in this field. the researchers mentioned that when training with 2 D images, the framework can always produce 3 D shapes that work better than existing models, which is "gospel" for video game developers, e-commerce companies, and animation companies that lack experience in creating 3 D models.
Specifically, researchers are trying to use a full-featured industrial renderer, which can generate images based on display data. To do this, researchers trained 3D shape generation models to render shapes and generate images that match the 2D dataset distribution. The generator model uses random input vectors (representing the values of data set features) and generates a continuous voxel representation of 3D objects (values on the mesh in 3D space), then inputs voxels into the non differential rendering process, and reduces the threshold value to discrete value before rendering with the existing renderer.
That is to say, this is a novel way of rendering the continuum mesh generated by 3D shape generation model directly by proxy neural renderer. As the researchers explained, given the 3D mesh input, it needs to be trained to match the rendering output of the off the shelf renderer.
The results of Gans in producing 2D image data are impressive. Many visual applications, such as games, need 3D models as input, not just images. However, to directly extend the existing Gan model to 3D, it is necessary to obtain 3D training data.
The picture above shows the 3D mushroom image generated by Microsoft model
During the experiment, the research team adopted 3D convolution Gan architecture for the above generators (GaN is an AI model composed of two parts, including generators, which use distributed sampling to generate composite samples from random noise, and feed these samples into the discriminator together with real samples in the training data set to try to distinguish the two). Data sets generated from 3D models and real data sets can synthesize images from different object categories, and render from different angles during the whole training process.
The researchers also said their framework would also extract lighting and shadow information from images, enabling them to extract more meaningful data from each training sample and produce better results on that basis. after training the dataset of natural images, the framework can generate realistic samples. in addition, the framework can use the exposure difference between the surfaces to successfully detect the internal structure of the concave object, thus enabling the accurate capture of the concave degree and the hollow space.
Combine information such as color, material, and lighting into the system, which can be used with more "regular" real data sets in the future.
User comments