Lately at work, I've been working on optimizing a bash script that was used to build the data of a game. Its role is to parse the game directory and for each file, match it with a building rule. If there's no rule for a file, well, then nothing happens. If there's a rule, then dates of the source file and the built file are compared, and if the source file is more recent, then the file is rebuilt. Yes, that's pretty much the basics of a configuration manager, but in this precise case Make wasn't an option.

What really bothered me in that tool, was its extreme slowness. Rebuilding modified data lacked a bit of punch, but more importantly, parsing data that hadn't been modified was slow to death. Even when a single value was modified in a file, it was taking at least 15 seconds to rebuild the script. I know this isn't too bad, but let's put this into perspective: doing nothing on about 300 files with a quad-core processor at 2Ghz was taking 15 seconds. This was already pretty frustrating as it was. Now, consider that the game contains at least 7400 files, which means when someone changes a file, you can't always know which one, so you have to rebuild the whole directory. Do the math: this blocks your machine for at least 6 minutes, just to do nothing. And that's a minimum! Because many files are rebuilt anyway. And you do this many many times a day. Fast iterating is key to success in game development, and that was nothing like it. Here is a chart with approximative time I benchmarked:

Chart: Time to build x files

At first, I thought that the problem was Bash. Browsing every file in the directory recursively was maybe slow in shell script. But it wasn't. So I thought maybe comparing the dates was long. But it wasn't. And then I saw function that was matching file extensions with rules. It was looking up extension files in an array, but had to do string manipulations because each element was of the form "extension ruleName". And it was checking the whole array for every single file you had to build! So I made improvements there.

The array looked like this:

ACTIONS=( 
"txt DoStuffForTxt"
"avi EncodeVideo"
"lua PreCompileLua"
"etc"
)

It didn't go has I expected because the building time got longer. My solution was to replace the "extension rule" array with an associative array. But associative arrays are not supported in Bash! (even though the manual says so). So I did some hacking to circumvent that (I'll talk about it in another post). I finally understood that my solution was using to much "echo"s, and removing them improved the building time. But not much. I couln't think of anything else slowing the thing, because the rest was pretty trivial, so started a hunt for "echo"s. And there it was.

In each build rule, the second line was:

echo "Extension: $ext" >> log.txt

This seems harmless of course, nevertheless, this can eat a lot of time if log.txt is actually a big file. My intuition was right and I discovered that the file was about 8Mo big. After all this investigation, the real solution to the problem was to add

echo "" > log.txt

somewhere at the beginning of the script. Then the building time improved a lot:

In fact now, trying to rebuild even a 1000 files with no changes takes less than 5 seconds. So mark my words: when you do optimisation of code, any type of code, first get a real good overview of the performances and track the lines creating bottlenecks. Unlike me, you'll end up saving a lot of time! Plus, if you're doing some bash scripting, remember that echo is quite greedy and avoid it as much as you can!