What exactly is reinforcement learning and how does it work?
Time:2021-05-27
Views:1507
Reinforcement learning is a behavioral learning model in which algorithms provide data analysis feedback and guide users to obtain the best results step by step.
Unlike various types of supervised learning that uses sample data sets to train machine models, reinforcement learning attempts to master the tricks through trial and error. Through a series of correct decisions, the model itself will be gradually strengthened, and gradually take control of a better way to solve the problem.
Reinforcement learning is very similar to the learning process of human beings in infants and toddlers. The growth of each of us cannot be separated from this kind of learning reinforcement-it is with the help of our parents and falling down again and again that we finally stand up.
This is a learning process based on experience. The machine will continue to try and make mistakes, and finally find the right solution.
We only need to provide the most basic "rules of the game" for the machine model, and leave the rest to the model for independent exploration. The model will start with random attempts, build up its own complex tactics step by step, and achieve tasks and get rewards through countless attempts.
Facts have proved that reinforcement learning has become one of the important methods for cultivating the imagination of robots. Unlike ordinary humans, artificial intelligence will accumulate knowledge from thousands of rounds of games, and powerful computer infrastructure will provide reliable computing power for such models.
Videos on YouTube are examples of applications of reinforcement learning. After watching the current video, the platform will show you similar content that it thinks may be of interest to you. If you click on the recommended video but haven‘t finished watching it, the machine will consider this recommendation failed and try other recommendation methods next time.
The challenge of reinforcement learning
The core challenge faced by reinforcement learning is how to scale the simulated environment. The simulation environment is largely determined by the tasks to be performed. Let‘s take chess, Go or Atari games as examples. This kind of simulation environment is relatively simple and easy to build. However, in order to train safe and reliable self-driving cars in the same way, it is necessary to create a very realistic street prototype environment, introducing sudden pedestrians or various factors that may cause a collision. If the simulation is not enough, a series of problems will appear after the model is transferred from the training environment to the real scene.
Another problem lies in how to extend and modify the agent‘s neural network. Apart from rewards and penalties, we have no other way to establish contact with the network. This may cause severe symptoms of "forgetfulness", that is, after the network obtains new information, it will remove some of the old knowledge that may be very important. In other words, we need to find a way to manage the "memory" of the learning model.
Finally, we have to prevent machine agents from "cheating". Sometimes, the machine model can achieve good results, but the implementation method is far from our expectations. Some agents will even get the most rewards through "fishing in troubled waters" without completing the actual task.
Application areas of reinforcement learning
The game
The reason why machine learning has a high reputation is mainly due to its amazing strength in solving various game problems.
The most famous nature is AlphaGo and AlphaGo Zero. AlphaGo has conducted a lot of training through the chess records of countless human players, and obtained superhuman chess power by virtue of the Monte Carlo Tree Value Research and Value Network (MCTS) in the strategy network. But the researchers then tried another more pure reinforcement learning method-training machine models from scratch. In the end, the new agent AlphaGo Zero appeared, and its learning process was completely derived from independent exploration, without adding any artificial data, and finally defeated the predecessor AlphaGo with a 100-0 crushing advantage.
Personalized recommendation
News content recommendation is a historical problem. The fast-changing news dynamics, user preferences that may change at any time, and the click-through rate that is closely related to user retention are all headaches for researchers. The article "DRN: A Deep Reinforcement Learning Framework for News Recommendation" published by Guanjie and other researchers hopes to explore how to apply reinforcement learning technology to news recommendation systems to overcome this major challenge.
To this end, they built four resource categories, namely: 1) user resources; 2) context resources (such as environmental status resources); 3) user news resources; 4) news resources (such as action resources). They plug these four resources into the deep Q network (DQN) to calculate the Q value. Subsequently, they select a news list for recommendation based on the Q value, and use the user‘s clicks on the recommended content as an important reward indicator for the reinforcement learning agent.
The authors also use other techniques to solve related problems, including memory repetition, survival models, Dueling Bandit Gradient Descent and other methods.