cvsps memory diet
23rd May 2005

At first I've started to convert it from memory data structures to gdbm, but it got too tedious after a while.

I then found that the cached data on disk, which is a mirror of the memory data, is only 30MB. So I started to look around to find the culprit.

Apparently, there were some huge over allocations, where a log message has a max of 1K in that repository, 8K would be allocated, there were over 15K log messages. For each filename 4K were allocated, a max length for filename was 200 bytes. Revisions and branch information were kept in too large hashes where a linked list would do well. And a few other minor optimizations were needed.

All in all, memory requirement dropped from 500MB to less than 60MB, which is still a lot but liveable. Until such time that the repository grow too much.

I added a small statistics collector/reporter to the code to help guide my way and used the large repository as well as the gaim repository as a base for my decisions, it was fun.

I did notice a need for a statistics collector library for such a thing, it should report max, average, median and such data, I didn't do median because I was lazy. But between the max and average there is such a large difference that a median would help here. Dumping the data and showing histograms would be great for such a task.

Now I need to clear it up at work and submit the patches to the author. I've got one of those all-your-code-are-belong-to-us type of contracts but with a special clause for Open-Source projects, I still need to get permission for each new project to ensure it doesn't clashes with my work related tasks.

