Home > News content

Finally! Supervise.ly published image segmentation datasets (free open source)

via:博客园     time:2018/4/12 18:01:52     readed:514

This article is a technical blog compiled for the Lei Feng subtitle group, the original titleThe author is Supervise.ly.

Translation: Guo Naijiao, Wang Ning, Zhang tiger, collation: Fan Jiang, Wu Xuan

We are very proud to announce here,Supervisely image datasetOfficially released. It is open and free of charge, only for academic purposes.


In order for AI to be shared by all, we need not only open source, but also a powerful

Of course, we agree with him and let us expand the idea. There are many deep neural networks for semantic segmentation. However, in most cases, collecting data is more difficult and expensive to run data than to develop and apply algorithms to run data.

That's why we need a specially designed platform, which can cover all machine learning workflow, from developing training data set to training and deploying neural network.


Several examples come from the "Supervisely portrait dataset"

We think our work will help developers, researchers and businessmen. In order to create large training data set faster, our work can be regarded as not only an open data set, but also a set of innovative methods and tools.

Next, we'll show you how to set up this dataset from scratch, and let me show some interesting facts:

  • The data set consists of 5711 pictures, with 6884 human instances of high quality tagging.

  • All the following steps are done within the Supervisely, without any encoding.

  • More importantly, these steps are performed by my internal annotators, and there are no machines to learn professional knowledge. Data scientists are just controlling and managing the process.

  • The annotation group consists of two members and the whole process takes only 4 days.

Supervisely is an intelligent machine learning platform that includes data science. It allows data scientists to focus on real innovation and leave their daily work to others. (yes, training the well known neural network architecture is also a regular job).

Problems to be solved

In many real-world applications, portrait detection is a key task in analyzing human images, and it is used in motion recognition, automatic driving, video surveillance, mobile application and so on.

We conducted an internal study at DeepSystems, which made us aware of the lack of data for human detection tasks. You'll ask us, what about the public data sets such as COCO, Pascal, Mapillary? In order to answer this question, I will show you a few examples better.


Several examples of human annotations from COCO datasets

The quality of human detection data in most public datasets does not meet our requirements. We must create our own dataset and provide high-quality annotations. I will tell you how we did it.

Step 0: upload and prepare the common dataset as an initial point to train the initial neural network

Upload the common dataset to the system: PascalVoc, Mapillary., our

We execute DTL (


Merging, cutting, and filtering the original data after the public data set

There seems to be a lot of publicly available data, but as we mentioned earlier, there are some hidden problems: low quality annotations, low resolution, and so on.

So we built the first training dataset.

Step 1: training the neural network

We will make a little customization of the UNet-like architecture


Unet_v2 architecture

Loss = binary loss entropy = 1-random number.

The network is trained fast, it is very accurate, easy to implement and customize. It allows us to carry out a lot of experiments. Supervisely can be distributed over a number of nodes in a cluster.

So we can train several neural networks at the same time. All of the same neural networks support the multi GPU training on our platform. The input resolution of each training test was 256 * 256, and no more than 15 minutes.


Step 2: prepare the data for annotations

We did not collect unlabeled images, so we decided to download it from the Internet. We are inGitHubThis project has been implemented so that you can download data from an excellent photo library, completed by Pexels (thanks to him, this really cool job).

So we download about 15K pictures, including tags related to our tasks, upload them to Supervisely and perform DTL sizing operations through the query, because they have super high resolution.

Step 3: apply the neural network to unlabeled images

The past architecture does not support instance segmentation. So we do not use Mask-RCNN, because the quality of the segmentation near the edge of the object is very low.

That's why we decided to make the two step plan: apply Faster-RCNN (NasNet) to detect all the people in the image, then divide the dominating objects for each person using the dividing network. This method ensures that we both simulate instance segmentation and accurately segment the edge of the object.


3 minute video of the application model and manual correction detection:


We have tried different resolutions: the higher the resolution we pass to NN, the better it produces. We don't care about the total reasoning time, because Supervisely supports the reasoning that is distributed across multiple machines. This is enough for an automatic pretagging task.

Step 4: manual verification and error correction

All inferences are displayed in real time in the dashboard.Our operator previews all results and uses several labels to mark images: bad prediction, prediction correction, good prediction.This process is fast because they require fewer keyboard shortcuts


How do we mark images: left bad prediction, medium - prediction requiring mild manual correction, right - good prediction.

Marked as


How to correct neural network prediction video:


Manual correction takes much less time than the beginning of the annotation.

Step 5: add the result to the training dataset and go to the first step

Complete it!

Some hints:

  1. When we apply NN that trains only common data,

  2. After a fast tree iteration, the number increased to 70%. We have completed 6 iterations in total, and the final NN becomes quite accurate: -)

  3. Before training, we added small bands on the edge of the object to smooth the zigzag edges and perform a variety of enhancements: flip, random clipping, random angle rotation and color conversion. As you can see, even if you need to annotate multiple object classes on an image, this method applies to many computer visual tasks.


This data set helps us improve the AI support annotation tool - customized to use it to detect human beings. In our latest version, we have added the ability to train NN within the system. The following are the comparison between the category based tools and their custom versions. It is available, and you can try your data.


Video address:https://v.qq.com/iframe/player.html?vid=y06262nmylm

How to access a set of data

Registration Supervisory, enter


How to download the result


It is very interesting to see how people without any ML background do all these steps. We act asDeep learning expertWe have saved a lot of time, and our annotation team has become more efficient in terms of speed and quality.

We hope that the Supervisely platform will help each deep learning team make AI products faster and more easily.

Let me list the most valuable Supervisely functions we use in this work:






China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments