Original title: Only 2 photos can be 2D to 3D, this AI can make up the candle blowing process in its own brain, one or two works are Chinese
2 pieces of waste snapped together!
Missed moments are instantly recreated, and you can upgrade from 2D to 3D effects.
Look, the cute smile of the little boy immediately appeared:
The moment of blowing out the birthday cake candle is also restored:
The process of grinning looks too healing
Let's just say that this time the waste film of the bear child/hair child in the camera has finally been saved!
And it is completely invisible to the effect of post-synthesis, as if it is natively shot.
This is the result of the recent joint launch of Google, Cornell University, and the University of Washington, which can only be used2 photos are similarThe photo recreates the 3D moment and has now been included in CVPR 2022.
The authors of the paper are both Chinese, and the sister of the first work graduated from Zhejiang University.
Predict the intermediate scene with 2 photos forward and backward
This method works well for two very similar photos, such as a series of photos produced during a continuous shooting.
The key to the method is to convert the 2 pictures into a pair of feature-based onesLayered depth images(LDI) and enhanced by scene streaming.
The whole process can be seen as the "starting point" and "end point" of the two photos, and then gradually predict the change of each moment in between the two.
Specifically, the process is as follows:
First, align the two photos with a homegraphy to predict the dense depth map of the two photos.
Each RGBD image is then converted to a color LDI, which repairs the occluded parts of the background through depth perception.
Among them, the RGB image is an ordinary RGB image + depth image.
Each color layer of the LDI is then repaired with a 2D feature extractor to obtain a feature layer, resulting in two copies of the feature layer.
The next step is hereSimulate scene motion Part.
By predicting the depth and optical flow between the two input images, the scene stream for each pixel in the LDI can be calculated.
If you want to render a new view between the two graphs and upgrade to 3D, you need to raise two sets of LDIs with eigenvalues to a pair of 3D point clouds, and move in both directions along the scene stream to the middle point in time.
Then the three-dimensional feature point projection is expanded to form a forward and reverse two-dimensional feature map and a corresponding depth map.
Finally, these maps are linearly mixed with the weights of the corresponding time points in the timeline, and the result is transmitted to the image synthesis network, and the final effect can be obtained.
Experimental results
From the data point of view, the method is above the baseline level on all error indicators.
On the UCSD dataset, this method preserves more detail in the frame, as shown in (d).
Ablation experiments on the NVIDIA dataset showed that the method also performed well at improving rendering quality.
However, there are some problems: when the change between the two images is relatively large, the object misalignment occurs.
For example, in the picture below, the nozzle of the bottle has moved, and the wine glass that should not have changed has also shaken.
There is also a "amputation" situation that will inevitably occur when the photo is not taken in its entirety, such as the hand feeding the koala in the picture below.
User comments