Zihaofa from Aofei Temple
Qubit report official account number QbitAI
Let's take a look at an island Blockbuster:
It's not a photographer's masterpiece, it's from gancraft.
The original image is a mosaic scene in my world
Now "my world" has really become, my world!
Gancraft is an unsupervised 3D neural rendering framework, which can generate large 3D massive world into realistic images.
How realistic is it? And compared with other models.
Feel the effect of gancraft again: (color and picture quality are compressed)
Through comparison, we can see that:
Im2im (image to image conversion) methods, such as Munit and spade, can not maintain the consistency of the viewing angle, because the model does not understand the 3D geometry, and each frame is generated independently.
WC vid2vid produces consistent video, but due to the error accumulation of block geometry and training test domain, the image quality decreases rapidly with time.
Nsvf-w can also produce output that is consistent with the view, but it looks dull and lacks detail.
The image generated by gancraft not only maintains the view consistency, but also has high quality.
How is this done?
The use of neural rendering in gancraft ensures the consistency of view, and the innovative model architecture and training scheme achieve unprecedented realism.
Specifically, researchers combine 3D volume rendering and 2D image space rendering, and use hybird voxel conditional neural rendering method.
Firstly, a neural radiation field with voxel (volume element) as boundary is defined, and a learnable eigenvector is assigned to each corner of the block;
Then, the trilinear interpolation method is used to define the location code at any position in the voxel to represent the world as a continuous volume function, and each block is assigned a semantic label, such as soil, grass or water.
Then, MLP is used to implicitly define the radiation field, with location code, semantic tag and shared style code as input, and point features (similar to radiation) and their volume density are generated.
Finally, given the camera parameters, the 2D feature image is obtained by rendering the radiation field, and then converted to image by CNN.
Although voxel conditioned neural rendering model can be established, there is no image that can be used as ground truth. Therefore, researchers use confrontation training method.
But "my world" is different from the real world. Its blocks usually have completely different label distribution, such as: the scene is completely covered by snow or water, or multiple biological communities appear in one area.
In random sampling, using Internet photos for confrontation training will produce results that are not practical
Therefore, researchers generate pseudo ground truth for training.
Using the pre trained spade model, the pseudo ground truth image with the same semantics is obtained through 2D semantic segmentation mask.
This not only reduces the mismatch between tag and image allocation, but also makes faster and more stable training with stronger loss. The generation effect has been significantly improved
In addition, gancraft also allows users to control the scene semantics and output style
Its introduction page says: it turns every minecraft player into a 3D artist!
Moreover, it simplifies the 3D modeling process of complex landscape scene, and does not need many years of professional knowledge.
Gancraft will soon open source, interested readers can stamp the link to learn more~Reference link:[ one ] https :// nvlabs . github . io / GANcraft / https://arxiv.org/abs/2104.07659