Home > News content

Google heavy blows open Cloud TPU: Jeff Dean ten tweets full interpretation

via:博客园     time:2018/2/14 10:02:40     readed:1306

Google recently released a trial version of the Cloud TPU, which is currently available in the U.S. at $ 6.5 an hour. Articles from: New Chi Yuan (ID: AI_era), Editor: Wen Fei, Pei Qi, Zhang Qian.

Jeff Dean sends ten tweets to get a full view of Cloud TPU


1. Google has released a beta version of Cloud TPU for those who want to train high-speed accelerators to train machine learning models. For details, see the blog:



Through Google Cloud VM, these devices provide 180 tflops of computing power through the TensorFlow programming model.


3. Many researchers and engineers have experienced limited machine learning computations and we think Cloud TPU will be an excellent solution. For example: A Cloud TPU can train the ResNet-50 model to 75% accuracy within 24 hours.


4. Users with early access rights look happy. Alfred Spector, CTO at investment firm Two Sigma, said: "We found that moving the TensorFlow workload to the TPU dramatically reduced the complexity of programming a new model and shortened training time. & rdquo;


5. Anantha Kancherla, director of software at Lyft, a travel company for sharing, said, "We have been shocked by the speed with Google Cloud TPU. Things that used to take a few days were completed in a matter of hours. & rdquo;


6. Model implementations such as Resnet, MobileNet, DenseNet and SqueezeNet (object classification), RetinaNet (object detection) and Transformer (language modeling and machine translation) can help users get started quickly:



7. The Cloud TPU was initially offered in the U.S. territory at a rate of $ 6.5 an hour.


You can fill in a request for a Cloud TPU quota


9. "New York Times" reporter Cade Metz reported today that "Google Makes Its Special A.I. Chips Available to Others" (Google popularized its dedicated AI chip)


10. Although we've been using it internally for some time, enabling external users to use Cloud TPU is the result of the work of many Google employees, including Google Cloud, Data Center, Platform Team, Google Brains, XLA Team, and many others.

Google Cloud TPU beta open, a limited number of 6.5 US dollars per hour

With immediate effect, Cloud TPU launches a beta release at Google Cloud (GCP) to help machine learning specialists train and run models more quickly.


Cloud TPU is a hardware accelerator designed by Google that is optimized for accelerating and expanding workloads on specific tensorflow machines. Four custom ASICs are built into each TPU, with a single board capable of computing at 180 teraflops per second and high bandwidth memory of 64GB. These boards can be used alone or over an ultra-fast private network to form a "TPU pod". Google will supply this larger supercomputer through Google Cloud this year.

Google designed Cloud TPU to differentiate TensorFlow's workload and to allow machine learning engineers and researchers to iterate faster. E.g:

With interactive and proprietary access to your networked Cloud TPUs through a controllable and customizable GoogleCompute Engine VM, you can queue up on a shared compute cluster without waiting for work. You can train several variants of the same model overnight on a group of CloudTPUs, and the next day you train the most accurate model for deployment into production without having to wait a few days or weeks to train a business-critical machine learning model. Only need a Cloud TPU, according to the tutorial (https://cloud.google.com/tpu/docs/tutorials/resnet), The ResNet-50 model can be trained to benchmark accuracy on ImageNet for less than $ 200 in a day.

Minimal Machine Learning Model Training

Traditionally, custom ASIC and supercomputer programming required very deep expertise. Now, you can program the Cloud TPU with the advanced TensorFlow API. Google will also open source a set of high-performance Cloud TPU models to get you started right away, including:

ResNet-50 and other image classification models Machine Translation and Language Modeling Transformer Object Detection RetinaNet

Google said in a blog that these models met the expected accuracy of the standard dataset after constant testing of performance and convergence.

Later, Google will gradually introduce more model implementations. However, machine learning specialists who want to explore can also optimize other TensorFlow models on Cloud TPU by using the documentation and tools they provide.

Now with Cloud TPU, until the launch of TPU pod later this year, the training time-precision ratio can be amazingly improved.

To save users time and effort, Google continually tested performance and convergence, and the models met the expectations for a standard dataset.

After development, Google will open more models to achieve. Adventurous machine learning specialists may be able to optimize other TensorFlow models themselves on the Cloud TPU using Google-provided documents and tools.

Now using Cloud TPU, when Google launches the TPU pod later this year, customers can benefit enormously from time to time with dramatic improvements in accuracy. As we announced at NIPS 2017, ResNet-50 and Transformer training time dropped from less than 30 minutes on most of the complete TPU pod without changing the code.

Alfred Spector, CTO of investment management firm Two Sigma, commented on the performance and ease-of-use of Google Cloud TPU as follows.

"We decided to focus our deep learning research on the cloud for a number of reasons, but mainly to get the latest machine-learning infrastructure. The Google Cloud TPU is an example that supports deep learning innovation and rapid technological development. We Discovering moving TensorFlow workloads to TPUs has greatly reduced productivity and complexity by reducing the complexity and training time of programming new models. Using Cloud TPUs instead of other accelerator clusters, we were able to focus on building our own model without distractions in managing complex clusters of communications. & rdquo;

A scalable ML platform

Cloud TPU also simplifies the calculation and management of ML computing resources:

Provides the team with state-of-the-art ML acceleration and dynamically adjusts capacity as needs change. Eliminate the capital, time and expertise required to design, install, and maintain an on-site ML compute cluster with dedicated power, cooling, networking, and storage requirements, benefiting from Google's years-long presence in a large, tightly integrated ML infrastructure experience of. Without the need to install drivers, the Cloud TPU is fully pre-configured to enjoy the same sophisticated security and protection of all Google Cloud services.

Anantha Kancherla, director of software at Lyft, a travel company that shares it, said, "We've been impressed with the speed of Google Cloud TPUs since it used to take a few days and now can take hours. Deep learning is becoming the backbone of software that enables autonomous vehicles to run. & rdquo;

On the Google Cloud, Google hopes to provide customers with a cloud that best suits each machine learning workload and will work with Cloud TPU to deliver a variety of high-performance CPUs, including Intel Skylake, and GPUs, including the NVIDIA Tesla V100.

Currently, the Cloud TPU is limited in number and costs $ 6.50 per hour.

Cloud machine learning performance PK, Google Cloud TPU or breaking the game

With the release of Google Cloud TPU, Google provides more services for machine learning in the cloud. Amazon Machine Learning, Microsoft Azure Machine Learning, and Google Cloud AI are three leading machine learning as a service (MLaaS) that allow rapid model training and deployment with little or no data science expertise.

The following is a comparison of the key profiles of the main machine learning as a service platform from Amazon, Microsoft and Google:


There are two levels of Amazon machine learning services: Amazon ML predictive analytics and SageMaker tools for data scientists.

Amazon Machine Learning for Predictive Analytics is one of the most automated solutions on the market that can load data from multiple sources, including Amazon RDS, Amazon Redshift, CSV files, and more. All data preprocessing is done automatically: The service identifies which fields are categorical and which are numeric, and does not require the user to choose a method for further data preprocessing (dimensionality reduction and whitening).

The predictive power of Amazon ML is limited to three choices: Binary, Categorical, and Regression. That is, this Amazon ML service does not support any unsupervised learning methods, and the user must select a target variable to mark it in the training set. And users do not need to know any machine learning methods because Amazon automatically chooses them after viewing the data provided.

This level of automation is both a plus and a minus for Amazon ML. If you need a fully automatic but limited solution, this service will meet your expectations. If not, that's SageMaker.

Amazon SageMaker and Frame-Based Services:

SageMaker is a machine learning environment that simplifies the work of fellow data scientists by providing rapid modeling and deployment tools. For example, it offers Jupyter, a creative notebook, that simplifies data browsing and analysis without requiring server management. Amazon also has built-in algorithms that are optimized for large data sets and calculations in distributed systems.

If you do not want to use these features, you can use SageMaker to leverage your deployment capabilities to add your own methods and run the model. Or you can integrate SageMaker with TensorFlow and the MXNet Deep Learning Library.

Often, Amazon Machine Learning services provides sufficient freedom to experienced data scientists and those who need to do their job without getting into the data set preparation and modeling. For those who already use the Amazon environment and do not intend to move to another cloud provider, this will be a good choice.

Microsoft Azure Machine Learning:

The goal of Azure Machine Learning is to create a powerful scenario for beginners and experienced data scientists. Microsoft's list of ML products is similar to Amazon's, but for now, Azure appears to be more flexible with off-the-shelf algorithms.

Services provided by Azure fall into two broad categories: Azure Machine Learning and Bot Services.

ML Studio is the main MLaaS package. Almost anything in Azure ML Studio has to be done manually. This includes data exploration, preprocessing, selection methods and validation modeling results.

Learning machine learning using Azure requires some learning curve. On the other hand, Azure ML supports a graphical interface to visualize every step of the workflow. Perhaps the main benefit of using Azure is that you can use a variety of algorithms. Studio supports about 100 ways to address classification (binary + multi-classification), anomaly detection, regression, recommendation, and text analysis. It is worth mentioning that the platform has a clustering algorithm (K-means).

The other part of Azure ML is Cortana Intelligence Gallery. It is a collection of machine learning solutions provided by the community for data scientists to explore and reuse. Azure products are a powerful tool that starts with machine learning and introduces its capabilities to new employees.

Google Prediction API

Google offers AI services on two levels: the data scientist's machine learning engine and the highly automated Google Prediction API. Unfortunately, the Google Predictive API has recently been deprecated, and Google will cancel the plugin on April 30, 2018.

The Predicion API is similar to Amazon ML. Its minimalist approach narrows to solving two major problems: classification (binary and multi-class) and regression. The trained model can be deployed through the REST API interface.

Google did not announce which algorithms were used to draw forecasts, nor did it allow engineers to customize the model. On the other hand, Google's environment is best suited for machine learning within the tight deadlines, with early launch of the ML program. But this product does not seem to be as popular as Google expects, and users using the Prediction API will have to use other platforms to "re-create the existing model."

Google Cloud Machine Learning Engine

Predicting APIs is highly automated at the expense of flexibility. Google ML Engine is the opposite. It caters to experienced data scientists and recommends using TensorFlow's cloud infrastructure as a machine learning driver. Therefore, the ML Engine is in principle very similar to SageMaker.

TensorFlow is another Google product, an open source machine learning library that includes a variety of data science tools, not ML-as-a-service. It has no visual interface, TensorFlow learning curve will be very steep.

It seems that Azure currently has the most feature-rich toolset in the MLaaS market. It covers most ML-related tasks, provides a visual interface for building custom models, and provides a robust set of APIs for those who do not want to master data science with bare hands. However, it still lacks Amazon's automation capabilities.

Amazon, Microsoft and Google's machine learning API comparison

In addition to the mature platform, developers also have access to advanced APIs. These are services in a well-trained model that does not require machine learning expertise. At present, the APIs of these three vendors can be roughly divided into three categories:

1) text recognition, translation and text analysis

2) Image + video recognition and correlation analysis

3) Others, including specific unclassified services


In addition to text and voice, Amazon, Microsoft and Google also offer common APIs for image and video analytics.


Although image analysis is closely related to video APIs, many video analytics tools are still in development or beta. For example, Google recommends a wealth of support for a variety of image processing tasks, but absolutely lacks the video analytics capabilities already offered by Microsoft and Amazon.


China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments