A few years ago, BBC Earth highlighted a memory test performed by a Chimpanzee named Ayumu. The Chimp displayed traits of machine learning (ML) when it was shown the numbers 1 to 9 in a randomized order on a screen and able to correctly remember the sequence when the numbers disappeared almost 90% of the time. This test is significant, because the human subject only identified the correct number order once. So does this mean that we’ll never be able to compete with AI? Yes, that is correct. However, we don’t need to compete with AI. Something to be reminded of in this scenario with the Chimpanzee is that the Chimp had to take time to learn how the sequence of numbers work from 1 to 9. So if the numbers are exchanged for letters A-Z, the Chimp would need more time and training, whereas human subjects could instantly adapt.
If you watch the video, you’ll notice that the Chimp is getting a reward every time he chooses the right sequence in the machine. That is called reinforcement learning (RL). It’s a computational technique of ML where the machine is rewarded for picking the correct answer among the options given to it. Examples include Google’s DeepMind learning an Atari video game and AlphaGo AI beating the world’s best human Go player, which I have covered in my vlogs.
Reinforcement Learning Allows AI to Play Games
Recent advances in ML provide the ability for AI agents to interact with sensory inputs such as vision and speech which provides a large range of applications for RL. If you think about Atari, it has several games such as Breakout, Space Invader, and Seaquest. Each of these games have a unique way to play the game. Deep RL from DeepMind played these games by itself multiple times to learn how to play the game. For example, Space Invaders is a fixed shooter in which the player controls a laser cannon by moving it horizontally across the bottom of the screen and firing at descending aliens. Aliens can move in all directions. When DeepMind was asked to play this game, it was just pointed to 2 areas of the screen: the area which calculates scores and the area which shows the game was lost. From there, Deep RL played the game and reinforced the points going up. In a short time, it mastered the game and was able to play the game with creative techniques on its own.
Similar to Atari, Alpha Go adapted to moves by playing multiple times and constantly rewarded when it did the right move. In fact, when Alpha Go Zero was released, it learned Go and Chess by itself by playing and reinforcing against its predecessor, Alpha Go in a matter of 3 days. Alpha Go Zero, after 3 days of training, beat Alpha Go in a 100 straight games with creative moves that the human counterparts never thought of.
RL Agent and Environments
RL has an agent and an environment. Agent is what you build using your codebase and the environment is the situation at which you run RL. In addition, RL has 2 kinds of environments and learning: episodic learning and continuous learning. Episodic learning is where the computation has a start and an end versus a continuous state doesn’t have an end and continues until force stopped. In episodic learning, the reward will be assessed and analyzed by the end of every episode and will be improved upon next time when the episode is run. However, in continuous learning the reward will be assessed and continuous improvement will be performed in the RL models.
Reinforcement Learning Complications
In order to identify the use case and take appropriate action, RL needs large volumes of labelled training datasets. In addition, the quality of datasets provided for labeling should be good to avoid any biases. The difference between an algorithm such as supervised learning (SL) is that the performance of SL can be tracked during the evaluation, training, and validation using datasets. However, given the automated nature of RL, it is tough to track progress.