Genetic Reinforcement

A quick overview of my Minor Thesis project for my Master of Information Technology.

EDUCATION

Jarrod Muddyman

2/7/20244 min read

Machine learning has revolutionized various industries by enabling computers to learn and make decisions without explicit programming. One area of machine learning that has gained significant attention is reinforcement learning, which involves training an agent to make sequential decisions through interactions with an environment.

My minor thesis project focused on investigating the effectiveness of utilizing reinforcement learning algorithms within a genetic algorithm. The goal was to explore if any synergy between these two approaches exists and determine if the combination could lead to improved performance in training times.

Understanding Genetic Algorithms

Genetic algorithms are a class of optimization algorithms inspired by the process of natural selection. They mimic the principles of evolution, such as selection, crossover, and mutation, to iteratively search for optimal solutions to a given problem. Genetic algorithms work by maintaining a population of candidate solutions, evaluating their fitness based on a predefined objective function, and iteratively evolving the population over generations.

Integrating Reinforcement Learning Algorithms

Reinforcement learning algorithms, on the other hand, involve an agent learning to maximize a reward signal by taking actions in an environment. These algorithms typically employ a trial-and-error approach, where the agent explores different actions and learns from the feedback received through rewards or penalties.

In my research, I explored the integration of reinforcement learning algorithms, specifically by using a Deep Q-Network, within the population of a genetic algorithm. The idea was to leverage the exploration and exploitation capabilities of the reinforcement learning algorithm as an agent in the population to enhance the overall population of the genetic algorithms.

Ideally the genetic algorithms population would be able to help explore areas of further improvement without the cost associated with running a Deep-Q Network.

Experiments

Mixed Agent Population
The structure of this experiment is designed to allow a DQN Agent to gather experience from a large population of GA agents, which is an agent with the same Neural Network Structure but without the DQN algorithm.

This experience sharing is done through the shared memory buffer which has replaced the internal memory buffer of the DQN agents. The crossover phase of the GA is to ensure that any helpful innovations found by individual agents will spread throughout the population, by spreading the weights and bias of high fitness individuals, this includes the DQN Agent.

Concurrent DQN and GA
The structure of this setup is fairly straightforward, with the fact that the population of the GA in this experiment is solely comprised of DQN agents. These agents go through periods of training and are then ranked based on the cumulative rewards they achieved in the training epoch, this ranking is used to select the best ‘parents’ for the GA to select, and then perform crossover and mutation on.

During the initial approach of this experiment, each individual DQN agent had its own internal memory buffer, necessitating a large amount of memory or a small population. To combat this, a shared replay memory was eventually implemented, as seen in the Mixed Agent Population experiment.

Seeded Genetic Algorithm
The structure of this experiment is fairly simple: Four networks previously trained using my DQN, each of them trained for approximately 10 million frames, are implemented into the starting population of a simple GA baseline population, which would then train for at least 5 generations.

In my initial structure I had no weighting which tied the overall performance on levels with the fitness that the agents would be evaluated on, and as such had some inherent unpredictability when training.

My original method to compile an agent’s fitness was by simply summing the fitness scored on each level and normalizing it. However, again this gave inherently unpredictable results, to combat this I eventually landed on a the below expression for calculating fitness:

for all i such that 1 ≤ i ≤ n and 1 ≤ i ≤ j, Where w1 and w2 are weights that control the trade-off between the two objectives of similarity and a total higher value. The first term, which is affected by w1 is taking the sum of the fitness minus the sum of the difference between the fitness. The second term, affected by w2 is taking the sum of fitness and lastly, the entire equation is normalized using 1/n . The normalization is to ensure that levels that have a higher total possible fitness than others do not take precedents.

Results and Findings

My most promising results came from the Seeded Genetic Algorithm experiments, in which we had seen a small amount of generalization in even naive implementations, and with a more refined approach we had a moderate increase in agent generalization.

While my Mixed Agent Population with a shared memory buffer experiments yielded mixed results, which indicated that while the increased rate of experience gathering performed by the GA agents in the population was a net benefit, the overhead incurred from the GA outweighs this benefit.

Finally, the Concurrent DQN and GA experiments yielded poor results, and on investigation found that the agents were simply weighed down by detrimental mutations and the overhead of training a population. The result was a significantly slower method of training a DQN versus training a single DQN agent on its own.

Conclusion

My minor thesis project on investigating the utilization of reinforcement learning algorithms within a genetic algorithm showcased a potential synergy of these two approaches. The integration of reinforcement learning algorithms enhanced the performance of the genetic algorithm, albeit not by a wide margin I'm at least happy I got to explore and build a machine learning algorithm from a practical standpoint using Tensorflow and other python libraries.

Check out the project repository on my GitHub.

If you want to read further about this project, you can download my Thesis in pdf format here