Build your own Twitter AI using Markov Chains

In this post we’re going to explore Markov Chains and how to use them to create a twitter 'AI' that can create new tweets based on the behavior of other twitter users.

This post comes with a corresponding repository that contains a working example of a Twitter AI based on Markov Chains.

So what are Markov Chains? Markov chains, named after Andrey Markov, are a way to model behavior. They work by modeling behavior as a series of transitions between nodes in a graph. They can also be seen as a state machines where movement between states is based on chance.

A very simple Markov Chain can be drawn like this:

This diagram only has two states, A and B. You can also see that each state has two transitions: one to itself and one to the other state. These transitions also have a chance attached and for each state the sum of the chances adds up to 100%.

Very pretty visual representations with also a more in-depth explanation can be found in this blog post.

Modeling language behavior

So how does this knowledge apply to creating a twitter AI? Like many other forms of behavior the language patterns of a group of persons or a person can be expressed as a Markov chain. We model the behavior by feeding nodes to the graph 'strengthening' bond between certain words while also weakening bonds between others.

We do this by, in essence, counting occurrences of word combinations. We split the sentence (square in orange below) into a sequence of tokens (fat arrows in purple). From that we build a graph (circles) where the words are connected via edges leading to another word (arrows). The process is represented visually here:

So we have a sentence "foo bar baz", it’s split into individual words and then turned into a graph You might notice the {S} and {E} nodes. These are 'special' nodes that represent the {s}tart and {e}nd of the sentence. We need a start node so we know where to enter the graph and we also need an end node so we don’t generate sentences of infinite length.

The numbers above the arrows are the transition counts. Since we only fed the chain the sentence "foo bar baz" this is now also the only possible outcome: from the start the only possible transition is to "foo", from "foo" the only possible transition is "bar", then "baz" and then the only option is to finish the sentence. Since this is obviously quite boring we now feed it the next sentence:

We have now also inserted "foo baz bar" and as you can see three new connections have been created.

The possible results from this graph immediately have grown substantially. Not only can "foo baz bar" now be a result from this chain, if you trace through the graph you can see it is also possible for, for example, "foo baz" or "foo baz bar baz" to be generated. In fact: there is now a cycle between "baz" and "baz" so in theory it is possible to generate a string starting with "foo" and then repeating "bar baz" infinitely.

So why won’t this be an issue? Because each transition has a chance that is simply the transition count divided by the sum of all transitions out of that state. So in the case of {S} to "foo" this is 2/2 = 100%. In the case of transitions out of "baz" there is a 50% chance (1/2) of it transitioning to "bar" and a 50% chance (1/2) of it transitioning to {E}.

Let’s add one last sentence:

We’ve added "foo foo bar" and you notice that we now have a transition from "foo" to "foo" added while also incrementing the counts on "foo" → "bar" and "bar" → {E}.

We now have a beautiful stochastic model of our little textual world that can now be used to generate completely new text. It will often generate sentences we’ve fed into the chain (like "foo foo bar") but also sentences we didn’t ("foo foo baz bar") while still using the same type of language. It will never generate something like "bar bar foo" because the model doesn’t allow it to: we never allowed it to make a connection from the {S} node to "bar".

Building our AI

After learning this I hope it becomes clear that we can apply this to any sequence of 'things'. It doesn’t really matter if it’s words from a bible text, tweets or even individual letters in english words: they can all be modeled this way as transitions between nodes in a graph.

So we’ll do just that: the most important class in our solution is the MarkovChain class. The add method allows you to add sentences to the chain while the generate() method generates sentences.

A very quick explanation of the 'add' process is that you split the input text up in pairs of words, with the first pair having a "from" that is empty and the last pair having a "to" that is empty. So "foo bar baz" gets split into (null, foo), (foo, bar), (bar, baz) and (baz, null). Each of these then gets passed to the word nodes (which get created on the fly if they don’t exist yet) and the counts get incremented.

Generating a sentence is also fairly straightforward: we grab the start node and from there select a random transition based on the transition chances. The chance for a transition is simply the count of an edge divided by the sum for the node (which we store with the node). The sentence 'ends' when the randomly selected node is the end node.

Running the example

The repository contains a runnable Spring Boot application that showcases the 'bot'. To run the example you will need to provide a configuration.properties file with your Twitter OAuth2 credentials:

twitter4j.oauth.consumerKey=...
twitter4j.oauth.consumerSecret=...
twitter4j.oauth.accessToken=...
twitter4j.oauth.accessTokenSecret=...

So first fork and clone the example application and assuming you created the properties file (anywhere on the classpath works) correctly you can either start the Application class or use maven to build a runnable jar and run it from your command line.

After it’s started successfully you should be able to point your browser to http://localhost:8080 and see a small AngularJS application that automatically refreshes

Playing Around

The main view shows the state of the 'main' Markov Chain that gets constantly fed from a sample of the twitter feed. Quite soon you’ll see that some words (The, I) occur more often than others. You should also see a constant feed (1 per second) of generated tweets based on the chain.

If you switch to the user view you can input a user name (or click one of the provided ones) and do the same thing. But instead of a random selection of all english tweets it’ll 'mimic' a real used based on their language. So by inputting 'realdonaldtrump' for example you will create tweets similar to those of DeepDrumpf.

Conclusion

I hope I at least entertained you and perhaps even inspired you to build your own 'AI'. Feel free to contact me if anything is unclear either via the contactdetails in my about page or through issue’s in the repository.