Home > News content

Sub-millimeter second mobile phone face recognition! A Major Breakthrough in Google BlazeFace algorithm for Mobile GPU

via:博客园     time:2019/7/13 14:25:46     readed:271


Xin Zhiyuan report

Source: arxiv

Editor: Xiao Qin and Peng Fei

[Introduction to Xin Zhiyuan]Google recently unveiled a submillisecond face detection algorithm, BlazeFace, a lightweight face detector tailored to mobile GPU reasoning that can run at a speed of 200 FPS and has excellent performance!

In recent years, various architecture improvements of deep neural networks have made real-time target detection possible. Laboratories can develop all kinds of algorithms at all costs in order to achieve the accuracy of approaching the limit. In practical applications, response speed, energy consumption and accuracy are all important. This requires that the complexity of the algorithm be low and suitable for hardware acceleration.

In mobile applications, real-time target detection is often the first step in the video processing process, followed by various specific tasks, such as segmentation, tracking or geometric reasoning.

Therefore, the algorithm of running object detection model reasoning should be as fast as possible, and it is better to have higher performance than the standard real-time benchmark.

Google just uploaded an arXiv paperBlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUsIt's launched.BlazeFace algorithm, a lightweight face detector tailored for mobile GPU reasoning, has excellent performance!


How remarkable is it? Google tested its flagship device and found that BlazeFace could.Run at 200 to 1000 FPS.


This super-real-time performance enables it to be applied to any facial area that requires accurate input as a specific model in augmented reality applications, such as 2D/3D facial key points or geometric estimation, facial features or expression classification, and facial region segmentation.

Google has applied the algorithm to industry.

First, two major algorithm innovations, all for fast and good

BlazeFace includes a lightweight feature extraction network inspired by MobileNet V1/V2, but different. A modified SSD target detection algorithm is also adopted to make it more GPU-friendly. Then an improved tie resolution strategy is used to replace non-maximum suppression.

BlazeFace can be used to detect one or more faces in an image captured by a smartphone front-end camera. Returning is a boundary box and six key points of each face (left eye, right eye, nose tip, mouth, lower left eye corner and lower right eye corner from the observer's point of view).

Algorithmic innovations include:

1. Innovations related to reasoning speed:

  • A very compact feature extractor convolution neural network related to MobileNetV1/V2 in structure is proposed, which is specially designed for lightweight object detection.
  • A GPU-friendly anchor mechanism based on SSD is proposed to improve the utilization of GPU. Anchors is a predefined static boundary box, which serves as the basis of network prediction adjustment and determines the prediction granularity.

2. Innovations related to predictive performance:

  • A joint resolution strategy is proposed to replace non-maximal suppression to achieve a more stable and smooth tie resolution between overlapping predictions.


BlazeBlock (left) and double BlazeBlock

BlazeFace's model architecture, as shown in the figure above, considers the following four factors in its design:

The size of the receptive field:

Although most modern convolution neural network architecture, including MobileNet, tends to use 3 in model diagrams

This study found that the cost of increasing the kernel size of the depth part does not increase much. Therefore, we used 5 in the model architecture

The low overhead of deep convolution also allows us to introduce another layer between the two convolutions, thus further speeding up the required receptive field. This forms a double BlazeBlock, as shown on the right side of the figure above.

Feature extractor:

In the experiment, we focus on the feature extractor of the front camera model. It must consider a smaller range of objects, so it has lower computing requirements. The extractor uses 128


Improved Anchor mechanism:

An object detection model similar to SSD relies on predefined fixed-size basic boundary boxes called terms proposed in priors, or Faster-R-CNN

We will 8


Pipeline example. Red: Output of BlazeFace. Green: Task-specific model output.

Post-processing mechanism:

Because our feature extractor doesn't reduce the resolution to 8.

To minimize this problem, we use a hybrid strategy instead of the suppression algorithm, which estimates the regression parameters of a boundary box as the weighted average between overlapping predictions. In fact, it will not bring additional cost to the original NMS algorithm. For our face detection task, this adjustment improves the accuracy by 10%.

2. Designed for GPU, Accuracy Beyond Mobile NetV2

Super real-time performance. Unlocking requires a facial area as input

  • Accurate 3D facial geometry
  • Puppeteering through Blendshapes
  • Facial segmentation
  • AR cosmetic try-on/beautification
  • Hair/lips/iris recoloration
  • Skin grinding


Design for Mobile GPU

  • Design for Mobile GPU and CPU
  • Lightweight feature extraction network
  • Anchor scheme more suitable for GPU
  • Improved tie resolution strategy


Fast Reasoning on GPU




  • The average absolute error of the distance between eyes is about 10% which is enough to be accurate.
  • Face-to-face alignment of subsequent models
  • Generating six key points coordinates of face
  • Use this model only on low-end devices to achieve simple effects such as ears.


China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments