Saturday, September 23, 2006

Making directory traversals more efficient

I wrote a script which takes less than a second to run, and which splits bulky folders up into 28 different subfolders folders. If there's a well-organized folder with more than 30-50 subfolders under it, then they're all the same type of subdirectory and they only differ in their name. More than thirty or fifty subfolders is too much for your users to digest at once, and it slows down traversing the file system. It takes longer for Linux to get all the files necessary to list all the subfolders, and if you're browsing with Apache, it might take a while to download the whole listing.


A very popular solution for this type of directory is to subdivide the collection of similar subfolders by their name. I wrote a script which does this based on the first character of the folder name. This way, you can divide a huge directory into chunks that make traversals more efficient for the computer and more responsive to your users (even over CIFS, FTP, NFS, etc.).

Following is the script code:

#!/bin/sh

mkdir /tmp/myindex

mkdir /tmp/myindex/0-9
mkdir /tmp/myindex/misc
mkdir /tmp/myindex/a
mkdir /tmp/myindex/b
mkdir /tmp/myindex/c
mkdir /tmp/myindex/d
mkdir /tmp/myindex/e
mkdir /tmp/myindex/f
mkdir /tmp/myindex/g
mkdir /tmp/myindex/h
mkdir /tmp/myindex/i
mkdir /tmp/myindex/j
mkdir /tmp/myindex/k
mkdir /tmp/myindex/l
mkdir /tmp/myindex/m
mkdir /tmp/myindex/n
mkdir /tmp/myindex/o
mkdir /tmp/myindex/p
mkdir /tmp/myindex/q
mkdir /tmp/myindex/r
mkdir /tmp/myindex/s
mkdir /tmp/myindex/t
mkdir /tmp/myindex/u
mkdir /tmp/myindex/v
mkdir /tmp/myindex/w
mkdir /tmp/myindex/x
mkdir /tmp/myindex/y
mkdir /tmp/myindex/z

mv ./[0-9]* /tmp/myindex/0-9/
mv ./[!0-9a-zA-Z]* /tmp/myindex/misc/
mv ./[Aa]* /tmp/myindex/a/
mv ./[Bb]* /tmp/myindex/b/
mv ./[Cc]* /tmp/myindex/c/
mv ./[Dd]* /tmp/myindex/d/
mv ./[Ee]* /tmp/myindex/e/
mv ./[Ff]* /tmp/myindex/f/
mv ./[Gg]* /tmp/myindex/g/
mv ./[Hh]* /tmp/myindex/h/
mv ./[Ii]* /tmp/myindex/i/
mv ./[Jj]* /tmp/myindex/j/
mv ./[Kk]* /tmp/myindex/k/
mv ./[Ll]* /tmp/myindex/l/
mv ./[Mm]* /tmp/myindex/m/
mv ./[Nn]* /tmp/myindex/n/
mv ./[Oo]* /tmp/myindex/o/
mv ./[Pp]* /tmp/myindex/p/
mv ./[Qq]* /tmp/myindex/q/
mv ./[Rr]* /tmp/myindex/r/
mv ./[Ss]* /tmp/myindex/s/
mv ./[Tt]* /tmp/myindex/t/
mv ./[Uu]* /tmp/myindex/u/
mv ./[Vv]* /tmp/myindex/v/
mv ./[Ww]* /tmp/myindex/w/
mv ./[Xx]* /tmp/myindex/x/
mv ./[Yy]* /tmp/myindex/y/
mv ./[Zz]* /tmp/myindex/z/

mv /tmp/myindex/* ./
rmdir /tmp/myindex

exit 0


Some 'for' loops would make this code substantially smaller, and if someone makes a suggestion I'll try it.

No comments: