The Right Glue
October 2010

Algorithmic depths

By Dean

In a manner most typical of me, I have once again added a new annoying feature to my wonderfully retarded artificial son KevBot. If you'll recall, the last feature I added was the ability for KevBot to masquerade as IRC users (provided they have given enough data to KevBot for it to form a decent Markov model for that user). This time I am spelunking, not unlike the player in Minecraft, into the depths of recursion.

Now I'm aware that not everyone who reads my blog is a computer scientist and I'm also aware that I have a reputation of explaining things well that I have to maintain should those non-computer-scientist people follow the rest of this post and therefore stick around and leave comments (the cardinality of comments left on my blog being a measurement of my worth as a human being (I like run-on sentences (and parenthetical nesting))).

To put it tersely, recursion is defining something in terms of itself. To understand how that makes any sense, think of a Matryoshka doll. Let's define a Matryoshka doll as a hollow doll with a Matryoshka doll inside of it. So a Matryoshka doll can be defined in terms of Matryoshka dolls. Look at me, I'm using recursion!

(As an aside, recursion is not to be confused with looping, which is what a lot of "recursion" jokes actually represent. Like Google's "Did you mean: recursion" loop. Looping and recursion are different things. Google's awful joke is not recursion because it doesn't define recursion in terms of itself; all it does is provide a link to itself. In more computery terms, there is no nested recursive structure to the joke. You'd think that Google would know better.)

So anyway, one day at work I was talking about KevBot (y'know, just 'cause) and my friend and coworker Rarzipace mentioned that she thought it would be neat to allow KevBot's factoid system to embed results from the Markov chaining system directly into otherwise normal replies. While I thought the feature idea was a good one, I couldn't help but think how KevBot's factoid system couldn't do anything like that at all because it was so poorly written.

But I realize the importance of maintaining a good codebase, and if KevBot's factoid system was built properly, it should be possible to embed Markov results in factoids. And if it were designed properly, I would even be able to define factoids in terms of themselves: recursive factoids! That kind of overengineering would allow Rarzipace's feature request to be built into the design trivially. (Don't try overengineering at home, kids — leave it to the professionals like me.)

I've been writing KevBot since 2004, back before I had any real discipline as a programmer and also back before I knew what the hell I was doing. The factoid feature, being one of the earliest features I wrote into KevBot, is easily the worst, least readable block of code in KevBot. It was not encapsulated at all, meaning to change how it works I potentially have to change dozens of different locations all over the codebase, ranging from the storage mechanism to the main message-parsing function.

So my first hurdle was to find all of the factoid-related functionality and move it to one place so that I don't have to hunt around the code to find where to make the necessary changes. That didn't take too much effort, and once I had the functionality consolidated I had to run regression tests to make sure nothing broke before I even had the chance to start adding functionality. Once all the bugs had been worked out of that, I had to solve a few other problems.

Some of KevBot's functions have side effects and unchecked recursion could invoke them with disastrous results. For example, if I allowed users to embed the "say" command into a factoid, KevBot could be forced to send two messages to the channel instead of just one. That would break one of my fundamental goals with KevBot: never send more than one message to the channel for any given command to prevent spam attacks. Thus I needed to devise some way of isolating things with side effects from things without side effects or suppress side effects in recursive calls.

The other problem is infinite recursion. If a Matryoshka doll's only definition is a doll containing a Matryoshka doll, there would be no way to have a single Matryoshka doll without having an infinite number of them. Therefore it's necessary to have some kind of limit to the definition to keep it from going forever. For a Matryoshka doll you just define a solid Matryoshka doll that doesn't contain a smaller doll as a base. For KevBot, the limit is five levels of recursion: KevBot can only retrieve five factoids in one message.

I chose five mostly for speed since each recursive call is potentially another trip to KevBot's MySQL database to retrieve more data. This can get very slow very quickly even with good indexing. Thanks to the limit, it is entirely possible to define a factoid in terms of itself, with no base case to limit it. KevBot will limit the recursion depth to five and simply not bother going deeper.

Now KevBot can be told to store all kinds of factoids that are defined in terms of other factoids, or indeed any of KevBot's other functions (math expressions, Markov chains, random-number generation, etc). For example, I can define a name factoid of common IRC users, an adverb factoid and an adjective factoid filled with various arbitrary words and then define a new factoid like this: "`user` is `adverb` `adjective`." This has some fairly humorous results:

Nemuri is figuratively criminal.
Inferno is unconditionally boring.
Xaenyth is frighteningly fast.

I'm not sure if I'm OK with my bot claiming that I am boring. But its random number generator is entitled to its own opinions. It's also possible to embed Markov chaining into factoids:

I asked Inferno what he thought about Nick, and he said, "Presentation is part of random comedy cannot be spoiled.".

In this case, the quoted part is an impersonating Markov chain and the rest is a normal factoid.

At this point the functionality is too new to really use for amazing results. Until someone gets an interesting idea on how to use it we're going to be working with simple definitions like the examples above. Recursion is one of those things that is easy to implement and use but requires real creativity and inspiration to turn it into something amazing.

All of us in #kevex on irc.slashnet.org are thinking up new ways to use the feature. As always, you're welcome to drop in and give the bot a whirl.

Latest comments:

Design patterns

By Dean

Australia is spiting me again, this time with a blog post I found in a thread on The Daily WTF's forum. The post is called Design patterns have (mostly) flopped. You can read it if you want, but I can summarize it for you.

Basically in this article the author makes the claim that knowing a lot of design patterns is supposed to arm programmers with the terminology required to communicate complex designs to each other. (This is on top of the usual benefit of patterns — which is taking advantage of the fact that someone has already solved your problem for you.) He goes on to say that patterns have failed to accomplish this goal because so few programmers know what design patterns are and those that do tend to not know very many of them.

I'm going to try to show why design patterns are a poor choice for communicating designs and why memorizing design patterns is a silly idea in and of itself. Also I am really getting tired of Australia.

I'm going to assume you're already familiar with patterns. If you aren't, then you can read up on them in the link two paragraphs up, or if you're exceptionally lazy you can think of them as well-defined common problems with robust solutions. Basically the idea is that a bunch of people with some free time looked at a lot of software and noticed that a lot of the problems faced by a lot of software are similar, and some solutions to those common problems are better than others. So they named a bunch of the problems and their best solutions and called the idea "design patterns". That way, if a programmer ever faces one of these common problems, all he has to do is look up the best solution and implement that. It saves him a lot of mental effort in solving his problem.

Whether a programmer is familiar with the idea of patterns or not, it's highly likely that he's met a lot of the problems covered by patterns in the past. In fact, unless he's not a very good programmer, it's likely that he's come up with robust solutions similar to the ones documented as patterns. Programmers solve problems and typically do so in the best way they can conceive; it follows that patterns have been re-invented countless times over the years.

That's wasted effort, yes. But then again a lot of people (like me) learn much better by doing something than by reading about it. I'm much more likely to remember a problem and its best solution if I actually encounter it and solve it on my own. But the idea that they can be re-invented so often indicates that many patterns are mostly obvious. Many patterns document things that many experienced programmers wouldn't even have thought to name because they are so obvious. Because of this experienced programmers would tend to ignore design patterns.

Beginner programmers wouldn't even know they existed in the first place. So that leaves the guys in the middle. People who have looked at a few problems and noticed that they have many things in common with each other. These guys poke around online and find "design patterns", often touted as silver bullets for solving software engineering problems. They find articles like the one our Australian friend has written that claim it is very important for communicating with other developers. So they misguidedly learn all of the design patterns they can. Then they find they can't use them to communicate with beginner or experienced programmers.

More importantly, people who memorize patterns will start to exhibit the everything-is-a-nail-when-you-have-a-hammer effect: they will begin designing software in such a way that it bends to patterns' solutions before they bother fleshing out the problem. Not only might this mean that they might design their software using patterns inappropriately, but also they might creating an overcomplicated design as an excuse to use their favourite patterns. Future developers who look at the design will wonder why it is so complicated. The effort in maintaining it will increase.

The goal of software design is to reduce complexity. Design patterns are one tool among many to help developers accomplish this goal. Like any tool, it needs to be understood for what it is more than its details need to be fleshed out. It's better for a programmer to know that patterns exist and, upon encountering a problem he thinks is common, consult pattern documentation to see if his problem is covered. If it is, that's great. But if it isn't, then he's going to have to think on his own. I'd rather he be confident in his ability to do that than to be able to name a dozen or more design patterns off the top of his head.

Latest comments:
This is some kind of footnote. This webpage is awesome and can be viewed in any browser. Even ones that suck ass like Safari and Firefox. Isn't that awesome? This site is best viewed with browsers that aren't maximized on large-resolution displays (> 1024 pixels in width). But then again, if you are running a large resolution and browsing maximized, then you're a terrible person so you don't really deserve to see this site at its finest. Jerk. I mean, seriously. I spend all this time making a nice site and your silly browsing habits ruin its look. That's really cold, man. If you're using IE6, then in order to see the cool avatar effects you need to enable JavaScript. This site conforms 100% with the laws (both known and unknown) governing physical reality. No rights reserved by Dean Whelton (who is awesome) of any of the content, images, design, backend or electrons used in this site. Steal at your convenience. None of it is worth stealing anyway, so there. I have even made an RSS feed for more efficient theft of my intellectual property. Now, don't say I'm not generous. I guess if you want to know more about me, you can visit the about page. I actually made a real about page! It's more like a FAQ, though. You can contact me, too, if you feel like it. Are you really wasting time reading this? Go outside or something.