In a manner most typical of me, I have once again added a new annoying feature to my wonderfully retarded artificial son KevBot. If you'll recall, the last feature I added was the ability for KevBot to masquerade as IRC users (provided they have given enough data to KevBot for it to form a decent Markov model for that user). This time I am spelunking, not unlike the player in Minecraft, into the depths of recursion.
Now I'm aware that not everyone who reads my blog is a computer scientist and I'm also aware that I have a reputation of explaining things well that I have to maintain should those non-computer-scientist people follow the rest of this post and therefore stick around and leave comments (the cardinality of comments left on my blog being a measurement of my worth as a human being (I like run-on sentences (and parenthetical nesting))).
To put it tersely, recursion is defining something in terms of itself. To understand how that makes any sense, think of a Matryoshka doll. Let's define a Matryoshka doll as a hollow doll with a Matryoshka doll inside of it. So a Matryoshka doll can be defined in terms of Matryoshka dolls. Look at me, I'm using recursion!
(As an aside, recursion is not to be confused with looping, which is what a lot of "recursion" jokes actually represent. Like Google's "Did you mean: recursion" loop. Looping and recursion are different things. Google's awful joke is not recursion because it doesn't define recursion in terms of itself; all it does is provide a link to itself. In more computery terms, there is no nested recursive structure to the joke. You'd think that Google would know better.)
So anyway, one day at work I was talking about KevBot (y'know, just 'cause) and my friend and coworker Rarzipace mentioned that she thought it would be neat to allow KevBot's factoid system to embed results from the Markov chaining system directly into otherwise normal replies. While I thought the feature idea was a good one, I couldn't help but think how KevBot's factoid system couldn't do anything like that at all because it was so poorly written.
But I realize the importance of maintaining a good codebase, and if KevBot's factoid system was built properly, it should be possible to embed Markov results in factoids. And if it were designed properly, I would even be able to define factoids in terms of themselves: recursive factoids! That kind of overengineering would allow Rarzipace's feature request to be built into the design trivially. (Don't try overengineering at home, kids — leave it to the professionals like me.)
I've been writing KevBot since 2004, back before I had any real discipline as a programmer and also back before I knew what the hell I was doing. The factoid feature, being one of the earliest features I wrote into KevBot, is easily the worst, least readable block of code in KevBot. It was not encapsulated at all, meaning to change how it works I potentially have to change dozens of different locations all over the codebase, ranging from the storage mechanism to the main message-parsing function.
So my first hurdle was to find all of the factoid-related functionality and move it to one place so that I don't have to hunt around the code to find where to make the necessary changes. That didn't take too much effort, and once I had the functionality consolidated I had to run regression tests to make sure nothing broke before I even had the chance to start adding functionality. Once all the bugs had been worked out of that, I had to solve a few other problems.
Some of KevBot's functions have side effects and unchecked recursion could invoke them with disastrous results. For example, if I allowed users to embed the "say" command into a factoid, KevBot could be forced to send two messages to the channel instead of just one. That would break one of my fundamental goals with KevBot: never send more than one message to the channel for any given command to prevent spam attacks. Thus I needed to devise some way of isolating things with side effects from things without side effects or suppress side effects in recursive calls.
The other problem is infinite recursion. If a Matryoshka doll's only definition is a doll containing a Matryoshka doll, there would be no way to have a single Matryoshka doll without having an infinite number of them. Therefore it's necessary to have some kind of limit to the definition to keep it from going forever. For a Matryoshka doll you just define a solid Matryoshka doll that doesn't contain a smaller doll as a base. For KevBot, the limit is five levels of recursion: KevBot can only retrieve five factoids in one message.
I chose five mostly for speed since each recursive call is potentially another trip to KevBot's MySQL database to retrieve more data. This can get very slow very quickly even with good indexing. Thanks to the limit, it is entirely possible to define a factoid in terms of itself, with no base case to limit it. KevBot will limit the recursion depth to five and simply not bother going deeper.
Now KevBot can be told to store all kinds of factoids that are defined in terms of other factoids, or indeed any of KevBot's other functions (math expressions, Markov chains, random-number generation, etc). For example, I can define a name factoid of common IRC users, an adverb factoid and an adjective factoid filled with various arbitrary words and then define a new factoid like this: "`user` is `adverb` `adjective`." This has some fairly humorous results:
Nemuri is figuratively criminal.
Inferno is unconditionally boring.
Xaenyth is frighteningly fast.
I'm not sure if I'm OK with my bot claiming that I am boring. But its random number generator is entitled to its own opinions. It's also possible to embed Markov chaining into factoids:
I asked Inferno what he thought about Nick, and he said, "Presentation is part of random comedy cannot be spoiled.".
In this case, the quoted part is an impersonating Markov chain and the rest is a normal factoid.
At this point the functionality is too new to really use for amazing results. Until someone gets an interesting idea on how to use it we're going to be working with simple definitions like the examples above. Recursion is one of those things that is easy to implement and use but requires real creativity and inspiration to turn it into something amazing.
All of us in #kevex on irc.slashnet.org are thinking up new ways to use the feature. As always, you're welcome to drop in and give the bot a whirl.