Baby Naming

Generating Novel Names with Markov Model

Feb 23, 2017

When it comes to naming a baby, most parent follow name fads while a small proportion is bold enough to name their babies in creative ways. Either way, they all need to make sure the names are pronounceable, otherwise the name would lose its primary purpose. That throws a question: How can we generate names that sounds normal?

In this post, I'll try to tackle this question with a Markov Chain, which models a random variable that changes over time steps, given the assumption that the current step only depends on the previous N steps. In this case, the random variable is the letter in the name. If the currently letter depends on the previous letter, we call it First-order Markov Chain; If it depends on the previous two letters, we call it Second-order Markov Chain... You get the idea. That said, I'm going to generate some names letter by letter.

Specifically, for a second-order Markov Chain, a letter will be chosen randomly, weighted by its occurrence after its previous two letters. For example let us assume we have just three names in our dataset: [ Timothy, Tim, Tiana ]. If we have already selected “Ti” to start our new name and are trying to choose the third letter, we would generate a random number and use it to choose between “a” and “m”. Since “a” occurs 33% of time after “Ti” and “m” occurs 66% of the time, we would select “m” twice as often as "a".

Code is up here.

This name generator is a thrid-order Markov Chain that's trained using 1K boy names and 1K girl names. If you notice familar names being generated, note that they do not exist in the training data. The model simply learns what sounds normal to be a name.