Ever since I learned about evolutive algorithms, I’ve been fascinated by them. This class of algorithms use concepts from evolution, such as diversity and selection, to learn how to solve problems they know nothing about. It seems both magical and too good to be true ; and at the same time, being based on evolution, makes complete sense.

There’s also something strangely satisfying in grooming new AI breeds and savagely kill them because they’re not “fit enough”.

NEAT is a neuroevolutive genetic machine learning algorithm. The paper presenting it is sufficiently clear to give a good guideline to implement it from scratch.

In this post, I will build a game of snake, then groom an AI that masters it without ever even knowing it’s playing snake. The entirety of the code is in the post itself¹, feel free to look at the source to figure what’s happening. If you don’t want the details, you can skip directly to the conclusion, and watch a virtual species learn how to play snake directly in your browser!

The game

We will start with a simple game of snake. You can play it with the IJKL keys of your keyboard:

Now that’s all well an fun, but we need to simplify the problem for the algorithm. We center the game on the player and rotate so that straight forward is up. That way the problem becomes relative to the snake, rather than absolute to the board.

Since directions are now relative, commands should also be: we’ll use U and O to turn left and right:

It gets harder to play as a human, but the problem we are looking at is now standardized. Apples are passing by, and you want to turn to hit them, without eating you tail.

Behind the scenes, the board is encoded as a set of numbers. 0 are empty, 1 is an apple, and -1 is the snake’s body. This matrix, from the snake’s point of view, is what the AI will see².

Now, we’ve got something working that we can feed the computer. Time to switch to the AI

Genomes

We will build a neural network that will learn how to play this game. In fact, we will build a bunch of these, kill the worst, keep the best, make them mate together, take their progeny, expose them to radiations for accelerated mutations, and keep iterating till we find one that actually works. Pretty atrocious algorithm if you push the metaphor. But efficient.

We’ll loosely follow Neuro Evolution of Augmenting Topologies. There are a few bits we need to build.

Two entities: genomes, and networks. Networks take inputs, transform them to output. Like in biology, genomes are the encoding of the network.
Convert genetic material to a functional network.
Handle mating and mutations
Handle our collection of genomes, notably ablation (removing the least fit)
A loop to orchestrate all that, iterate, run the game, etc.

NEAT uses a genome that encodes each individual’s brain wiring. As with biology, it represents the information transmitted from a generation to the next through mating. It’s also how mutations are introduced. This genome is then expressed as a functional brain that allows an individual to decide what to do based on what it sees.

The genome encodes links and neurons.

We’ll use this structure:

interface Link {
  innovationId: string,
  from: neuron,
  to: neuron,
  weight: number, // 0 .. 1
  enabled: bool
}

interface Neuron {
  type: "in" | "out" | "hidden",
  pos: {                    // when type = in
    row: number, 
    col: number 
  }, 
  dir: "left" | "right",    // when type = out
  neuronId: number  // when type = hidden
}

interface Genome {
    links: Link[],
    genomeId: number,
    speciesId: number,
    neurons: Neuron[]
}

First let’s look at an intelligently designed organism’s genome. These are the link genes for a genome: (we don’t show the neurons because they don’t carry any more information).

Each of the genes is encoding a link from a neuron to another. The number on top of each gene is called the “innovation number”. We’ll come back to it in a moment.

The neurons formatted like “1x2” are input neurons. They represent coordinates relative to the player. The first number is in-front/behind. In front is > 0, behind is < 0. The second number is left/right. Negative is left, positive is right.

Neurons indicating directions are output neurons. Neurons prefixed by a “H” are hidden neurons: they’re neurons internal to the network. There aren’t any in that specific network.

Here is a view of the topology it generates:

Looking at it, we see links with a positive weight from the left of the snake to the “left” direction, and from the right of the snake to the “right” direction. This weight acts as a factor to the value in the corresponding cell. Apples are encoded with a 1. This means that if an apple is seen to the right of the snake, the cell will send a positive and significant signal to the right neuron, and correspondingly, the snake will turn right. Respectively to the left.

We also see a few links with negative weight in front of the snake. Body is encoded negatively, and so, if a body is seen, a positive signal will be sent to the left or right neuron, preventing the snake to eat itself.

Let’s see what it can do. We’ll transform the genome into a set of functions. There will be one function per output, taking the board as input, and determine how much the output is activated, applying an activation function over the sum of the incoming signal.

Starting small, we simulate the board positionally to our snake: (use ^ > < v to change its direction), the output receives the following activation:

Any activation greater than .5 will activate the neuron. If both sides are above, the strongest activation wins.

We now have a functional transformation of the genome into an activation function. The last step is to build a player from that and let it play snake. Hit play below:

This is a pretty simple genome, but it works ok. It turns whenever it sees an apple or when it’s gonna eat itself. Running it a few times already teaches us a few things. A) it’s much faster than you at snake. B) it’s not perfect, despite being intelligently designed. Or it might just be that its designer is not that intelligent C) scores are not entirely deterministic.

Let’s try to understand how much runs settle a good average with a few simulations. It will be an indication for this specific species, but should still give a good order of magnitude.

iterations

This yields interesting results. First, we see a significant spread between min and max. Second, we see that the game is not terribly optimized, and iterations take a few ms each. Third, we observe that the average results are consistent to +/-5% over 5 iterations, +/- 10% over 10 iterations, and about +/- 2% over 100 iterations.

We are directly using an average of the score over a few rounds as the fitness function. The fitness is how we determine how well a genome is doing.

Figuring the right number of rounds per species is a balancing game between obtaining precise results each round, and having more rounds. 10% fidelity seems enough.

Evolution

We have the basic elements that will allow for an AI to play. The next step is to build the evolutionary loop that will breed genomes capable of playing the game, and maybe beat our simple intelligent design genome.

There are two aspects to evolution. Variation: changing the species ; and selection: survival of the fittest. Let’s focus on variation first.

Variation

Variation is the set of mechanisms changing the genome of a species. It introduces entropy in the system so that better fitted (and worse…) species appear.

Mutation

The first mechanism of variation is mutation, introducing random changes within a genome. Since we can’t actually use radiation to modify our virtual genome, we’ll have to directly touch the genes with that.

We can mutate genome by changing the existing genes: activating deactivated genes and vice-versa, or randomizing the weight of existing links. We can also create new genes, i.e. create new connections, and new neurons. The example below is applying a healthy amount of variation on a genome.

with probabilities ([0, 1]) for mutation on weight: enable: disable: split link: new link:

Mating

The second mechanism of variation is mating. It allows traits to be propagated between generations by crossing the genomes of two individuals. Randomly merging topologies is hard: how do you know which neurons and links are matching between two genomes? How do you know if a given topology will make sense?

NEAT offers a neat way of doing just that. It introduces the notion of innovation to handle this. Every time a new neuron or a new link is inserted, we give it an innovation number. The innovation number is global. If two species evolve the same innovation (e.g. a new link between neurons A and B, or a new neuron C) it will get a single innovation number. NEAT therefor allows to trace ancestry and allows simple mating between genomes: a given gene’s innovation id will be the same throughout genomes.

We implement a dictionary that will generate innovation ids accordingly. If the neuron or the link gene is indeed new, it gets a new id. If it creates a topology that we already encountered, we reuse the same innovation id.

between and . Result:

Now we can proceed with mating. This happens between two genomes. Matching genes are selected randomly. Excess genes (which exist only on one of the genomes) are picked from the fittest parent. If fitness is equal, all are selected.

Fitness delta: (> 0 means genome on the left is fittest)

Viewed as graph:

Selection

We have the basics of variation, now we’ll tackle selection. This will happen by having generation after generation compete in our game, eliminate the lowest performers and select the fittest and breed them, rinse and repeat.

NEAT introduces the concept of speciation, so that new genes that could be beneficial compete “locally” and have more chances to survive their introduction. We group genomes that are similar into species, operate selection and ablation within a species, mate, reassemble species, then carry on.

Species

Compatibility distance is calculated based on the number of genes that differ, and the distance of weights.

Distance using coefficients: weight: excess genes count: disjointed genes count:

Distance: 0

Calculating distance allows to group genomes by species, and allowing the development of variations in a semi-protected environments. This happens by maintaining a list of species from a generation to the next. For each species an individual genome is picked randomly as a representative. Each individual from the new generation is placed with the first species with which it is compatible (i.e. compatibility > threshold). If it ain’t compatible, then a new species is created, with that genome as representative.

interface Species {
  id: number,
  representative: Genome,
  genomes: Genome[],
  speciesAge: number
}

We now need to executing all of that. Starting with an initial population that has no link genes.

From there, we can make the population compete at snake and obtain their fitness:

Once we have the fitness, we prune the least fit portion of each species. A larger portion of eliminated species allows faster evolution, but also tends to remove evolutions that didn’t have time to prove themselves. A good start is to keep 80%.

We also remove under-performing species older than a given number of generations.

We repopulate the remaining half through random breeding, sparkling some mutations to the mix.

Fight

We now have all the recipients for breeding snake players. Let’s plug all of that into an evolutionary loop. We will first generate a base population, with input neurons for every cell in the board, output neurons for right and left, and no links. Then iterate:

Compete all genomes
Ditch the lowest half of each species
Ditch any species lower than the average and older than X generations
Repopulate with offspring of the remaining species, introduce mutations
Re-sort into species
Loop

We pause every so often (every 10 generations, or every time a new genome beats the fitness record). We also track the progression of fitness from a generation to the next.

You can play with all the parameters we have discussed.

Pause for demos every 10 generations Pause when a new record is achieved (I recommend you uncheck these boxes while the fitness is < 2, you can check them while the loop is running)

Generation Max fitness Number of species

Maximum fitness over generations:

Some things to note:

Better scores are obtained much faster with smaller boards.
On larger boards, more mutations (especially, adding more connections), is helping bootstrapping useful genomes. This is because it’s much harder for the snakes to find anything to beginning with
You can frequently observe a jump in the fitness when a species evolves a particularly useful mutation. A frequent one is at a fitness equal to the board size, when a species evolves the gene that keeps it from eating itself.

There you have it! An evolutionary neural network from scratch directly in your browser. It’s both magical and mesmerizing. Now that I have one running, I have a few ideas for other applications that I hope we can see these learn to play. Next time!

Notes

and lesson learned, not necessarily the best way of doing things. But I wanted to see what it would do. ↩
in truth we don’t rotate the entire board but really just map the coordinates. ↩