Home > News content

Google Brain Engineers Give 2018 Academic Top Focus: Adversary Learning + Reinforcement Learning

via:博客园     time:2018/6/14 12:02:09     readed:281

Xinzhiyuan compilation

Author: Alex Irpan

Translation: Xiao Qin

The author of this article Alex Irpan is a software engineer on the Google Brain Robotics team. He attended two academic conferences in less than a month: ICLR 2018 and ICRA 2018. The former is a deep learning conference and the latter is a conference in the robotics field. The author compared the two conferences.


ICLR 2018

From a research perspective, one of the major focuses of ICLR this year is confrontational learning.

The most popular area of ​​deep learning is the generation of confrontational networks. However, I am more concerned here, including the competition against the sample and the agent. In fact, any form of minimal optimization can be counted as confrontational learning.

I don't know if GAN is really popular or my memory has a selective bias because I'm interested in these methods. GAN feels strong. One way to evaluate GAN is to learn generators by using learning implicit costs rather than artificially defined costs. This allows you to adapt to the generator's capabilities and can define manual interpretations that can be costly.

Of course, this will make your problem more complicated. But if you have strong optimization and modeling capabilities, the cost of implicit learning will provide a sharper image than other methods. One of the benefits of replacing a portion of a system with learning components is that advances in optimization and modeling capabilities apply to more aspects of the problem. You are improving your ability to learn cost functions and the ability to minimize these learning costs.

From an abstract point of view, this involves the ability of an expressive, optimizable family of functions, such as neural networks. Minimax optimization is not a new idea. It has been around for a long time. The new thing is that deep learning allows you to model and learn complex cost functions on high-dimensional data. For me, what's interesting about GAN is not image generation, but proof of concept on complex data such as images. This framework does not require the use of image data.

There are other parts of the learning process that can be replaced by learning methods rather than artificially defined methods. Deep learning is such a method. Is this meaningful? Maybe there is. The problem is that the more you use deep learning, the harder it is to make everything learnable.

Recently there was an article in Quanta Magazine where Judea Pearl expressed his disappointment: Deep learning is only learning correlation and curve fitting, and this does not cover all intelligence. I agree with Judea Pearl's point of view, but as an advocate of deep learning, I think if you optimize a neural network large enough to optimize it, you may learn something that looks a lot like causal reasoning or something else. Count as smart things. But this is close to the field of philosophy, so I will talk about it here.

From the perspective of the participants, I like to have a lot of poster shows at this conference. This is my first time participating in ICLR. The ML meeting I attended before was NIPS, and NIPS gave me a very big feeling. Reading each poster carefully on NIPS feels less than feasible. It is possible to read all posters in ICLR, although you may not really want to do so.

I also appreciate that corporate recruitment on ICLR is not as ridiculous as NIPS. At NIPS, some companies will send strange fingertip gyros and spring toys... The most weird thing I get at ICLR is a pair of socks, which, though weird, are not particularly strange.

After the meeting I followed up with the reading paper:

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Learning Robust Rewards with Adverserial Inverse Reinforcement Learning

Policy Optimization by Genetic Distillation

Measuring the Intrinsic Dimension of Objective Landscapes

Eigenoption Discovery Through the Deep Successor Representation

Self-Ensembling for Visual Domain Adaptation

TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

Online Learning Rate Adaptation with Hypergradient Descent

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Learning to Multi-Task by Active Sampling

ICRA 2018

The ICRA 2018 was my first robot conference. I don't know what to expect. What I started doing was ML research. Later I turned to research robots. So my interest is closer to learning control than to making new robots. My ideal setting is that I can think of real-world hardware as abstract.

In addition to my poor understanding of control theory, I am unfamiliar with many topics at the conference. In spite of this, there are still many papers in the field of study and I am very glad that I have attended this meeting.

In the research I did understand, I was surprised to have so many reinforcement learning papers. It's a bit of fun to see that few of them use pure, modelless RLs. For ICRA, if your paper proposes models that have been run on real-world robots, then you are much more likely to be accepted. This forces the authors to focus on data efficiency, and therefore has great prejudice against only modelless RLs. As I listened to speeches everywhere, I kept hearing "We combine model-free learning with X" where X is a model-based RL, or learn from human presentations or learn from motion planning. Or learn from anything that helps to explore the problem.

From a broader perspective, this meeting is practical. Although it is a research conference, much of the content is still speculative, but it also feels that people can accept narrow, targeted solutions. I think this is another consequence of having to use real hardware. If you need to run the model in real time, you cannot ignore reasoning time. If you need to collect data from real robots, you cannot ignore data efficiency. The real hardware doesn't care what your problem is.

(1) The network must be able to operate.

(2) Regardless of your efforts, what kind of priority is given to them will not increase the speed of light.

——RFC 1925

This surprised many of the ML researchers I spoke with, but this robotics conference did not fully accept ML as NIPS/ICLR/ICML people did, partly because ML is not always effective. Machine learning is a solution, but it does not guarantee meaningful. My impression is that only a few people in ICRA are actively hoping that ML will fail. As long as ML can prove useful, others are happy to use ML. In some areas, it has already proved itself. Every perception-related paper I see uses CNN in one way or another. However, few people use deep learning to control, because control has many uncertainties.

Like ICLR, there are also many companies on ICRA that hold hiring or booths. Unlike ICLR, the booths here are more interesting. Most companies have robots to demonstrate, which of course is more interesting than listening to a recruitment speech.

At last year's NIPS, I noticed that ML's booth reminded me of the Berkeley career fair. Every technology company wants to recruit Berkeley's recent graduates. It's like an arms race to see who can provide the best things and the best free food. Feeling their goal is to make themselves look as cool as possible, without telling you what they really want to hire you to do. Robotics have not gone very far. It is growing, but there is not much publicity.

I attended several workshops where people talked about how they used robots in the real world. It was fun. Research conferences tend to focus on research and networking, which makes it easy to forget that research can have clear, direct economic value. There is an agricultural robot related, talking about using computer vision to detect weeds and spraying herbicides, which sounds good. Use fewer herbicides to kill fewer crops while slowing the occurrence of herbicide resistance.

Rodney Brooks also had a similar and wonderful speech. He used Roomba as an example to talk about what it takes to convert robot technology into consumer products. He said that when designing Roomba, they set a price, and then control all the features in the price. The result is that the price of a few hundred dollars allows you to have only a small margin in the choice of sensors and hardware, which limits the ability to make inferences on devices.

In terms of organization, it is doing very well. The conference center is close to the print shop, so when registering, the organizers said that if you e-mail PDF files within a certain period, they will handle all the remaining processes. All you have to do is pay for your poster online and get it out of the meeting. All demos are performed in the demo room. Each demo room is equipped with a whiteboard and a shelf on which you can place a laptop to play video.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments