Saturday, September 23, 2006

Making directory traversals more efficient

I wrote a script which takes less than a second to run, and which splits bulky folders up into 28 different subfolders folders. If there's a well-organized folder with more than 30-50 subfolders under it, then they're all the same type of subdirectory and they only differ in their name. More than thirty or fifty subfolders is too much for your users to digest at once, and it slows down traversing the file system. It takes longer for Linux to get all the files necessary to list all the subfolders, and if you're browsing with Apache, it might take a while to download the whole listing.


A very popular solution for this type of directory is to subdivide the collection of similar subfolders by their name. I wrote a script which does this based on the first character of the folder name. This way, you can divide a huge directory into chunks that make traversals more efficient for the computer and more responsive to your users (even over CIFS, FTP, NFS, etc.).

Following is the script code:

#!/bin/sh

mkdir /tmp/myindex

mkdir /tmp/myindex/0-9
mkdir /tmp/myindex/misc
mkdir /tmp/myindex/a
mkdir /tmp/myindex/b
mkdir /tmp/myindex/c
mkdir /tmp/myindex/d
mkdir /tmp/myindex/e
mkdir /tmp/myindex/f
mkdir /tmp/myindex/g
mkdir /tmp/myindex/h
mkdir /tmp/myindex/i
mkdir /tmp/myindex/j
mkdir /tmp/myindex/k
mkdir /tmp/myindex/l
mkdir /tmp/myindex/m
mkdir /tmp/myindex/n
mkdir /tmp/myindex/o
mkdir /tmp/myindex/p
mkdir /tmp/myindex/q
mkdir /tmp/myindex/r
mkdir /tmp/myindex/s
mkdir /tmp/myindex/t
mkdir /tmp/myindex/u
mkdir /tmp/myindex/v
mkdir /tmp/myindex/w
mkdir /tmp/myindex/x
mkdir /tmp/myindex/y
mkdir /tmp/myindex/z

mv ./[0-9]* /tmp/myindex/0-9/
mv ./[!0-9a-zA-Z]* /tmp/myindex/misc/
mv ./[Aa]* /tmp/myindex/a/
mv ./[Bb]* /tmp/myindex/b/
mv ./[Cc]* /tmp/myindex/c/
mv ./[Dd]* /tmp/myindex/d/
mv ./[Ee]* /tmp/myindex/e/
mv ./[Ff]* /tmp/myindex/f/
mv ./[Gg]* /tmp/myindex/g/
mv ./[Hh]* /tmp/myindex/h/
mv ./[Ii]* /tmp/myindex/i/
mv ./[Jj]* /tmp/myindex/j/
mv ./[Kk]* /tmp/myindex/k/
mv ./[Ll]* /tmp/myindex/l/
mv ./[Mm]* /tmp/myindex/m/
mv ./[Nn]* /tmp/myindex/n/
mv ./[Oo]* /tmp/myindex/o/
mv ./[Pp]* /tmp/myindex/p/
mv ./[Qq]* /tmp/myindex/q/
mv ./[Rr]* /tmp/myindex/r/
mv ./[Ss]* /tmp/myindex/s/
mv ./[Tt]* /tmp/myindex/t/
mv ./[Uu]* /tmp/myindex/u/
mv ./[Vv]* /tmp/myindex/v/
mv ./[Ww]* /tmp/myindex/w/
mv ./[Xx]* /tmp/myindex/x/
mv ./[Yy]* /tmp/myindex/y/
mv ./[Zz]* /tmp/myindex/z/

mv /tmp/myindex/* ./
rmdir /tmp/myindex

exit 0


Some 'for' loops would make this code substantially smaller, and if someone makes a suggestion I'll try it.

Compiling the vt1211.ko Kernel Module

I basically followed the directions from VIA, except I used the latest patch file for kernel 2.6.17 available at Lars Ekman's website: http://hem.bredband.net/ekmlar/vt1211.html instead of the one VIA provides in their VIA FC5 Hardware Monitor Application Notes (www.viaarena.com). The patch file is designed to patch the whole kernel tree (only a small part, but you need the whole thing). If you have 2.6.17-1.2174_FC5, the module I compiled is already available on Lars Ekman's website.


Below is a copy & paste of the instructions from VIA, available on VIA Arena, with annotations made where I remember their instructions didn't work. Fill in the mentioned kernel name with your kernel. If I ever have to do this again, I'll make my own instructions.


I got the corresponding kernel source package from YUM instead of downloading the RPM. The package is called kernel-devel. This puts files in /usr/src/redhat. Note that when you're done, you can use YUM to again remove this package. I had to do a few things to get rpmbuild installed and working (I remember it needed a few software packages from YUM), but then I could make it through the first few lines.



#rpm –ivh kernel-2.6.15-1.2054_FC5.src.rpm
#cd /usr/src/redhat/SPECS
#rpmbuild –bp –-target=i686 kernel-2.6.spec
#cd /usr/src/redhat/BUILD


Then they want you to move the kernel source tree to /usr/src/{name of kernel} and then apply the patch.



You can find the kernel source directory in path
/usr/src/redhat/BUILD/kernel-2.6.15/linux-2.6.15.i686. Move the
kernel source directory to path /usr/src/kernels and patch
the os default kernel.
#cd /usr/src/redhat/BUILD/kernel-2.6.15
#mv linux-2.6.15.i686 /usr/src/linux-2.6.15-1.2054_FC5
#cp vt1211_FC5.patch /usr/src
#cd /usr/src
#patch –p0


Now you have the new kernel tree, and you need to compile the kernel with the vt1211 source as a module to get the vt1211.ko.



Select needed item and rebuild the patched kernel
Edit the Makefile in directory linux-2.6.15-1.2054_FC5.Find the string
“EXTRAVERSION = -prep” and modify it to “EXTRAVERSION = -1.2054_FC5”.
#cd linux-2.6.15-1.2054_FC5
#cp /boot/config-2.6.15-1.2054_FC5 .config
#make menuconfig
Device Drivers ---> Hardware Monitoring support --->
[M] Hardware Monitoring support
[M] VT1211
After set the kernel item completely and save it, we can rebuild the kernel
source. When the compiling module procedure is completed, you can find
the “vt1211.ko” module to path
/lib/modules/2.6.15-1.2054_FC5/kernel/drivers/hwmon
#make
#cp linux-2.6.15-1.2054_FC5/drivers/hwmon/vt1211.ko
/lib/modules/2.6.15-1.2054_FC5/kernel/drivers/hwmon
#depmod -a


The GUI menuconfig didn't work for me. If you run make config, you can do the same thing in text mode. If you just run make, I think it will ask you if you want Kernel/Module whenever not specified by the .config file you copied from your boot folder. Make sure you select Module (type the letter 'm') for the vt1211 objects. Then, about six hours later, it'll finally make it through all the source and you'll get your .ko file. Copy it out of there and into your /lib/modules tree and then run depmod as in the instructions above.


When you're done, you can go back to the /usr/src/{linux kernel name} and run make clean, or you can remove that whole directory if you no longer need it. You can also remove the kernel-devel package, rpmbuild, etc.

Update

Linux kernel 2.6.21 now includes an even better version of the vt1211 module! This kernel is the default on RedHat Fedora 7 Linux. Finally time to upgrade my server installation.


Monday, September 18, 2006

Create md5 sum files in Linux

Learned that the Linux shell can be quickly used to generate md5 checksums of an entire directory to an md5 file, the command is of the form:


cd [directory to generate check sums for]
find ./ -printf “%p\0” | xargs -0 -n 1 md5sum > ./checksums.md5

Then, to check the files in a new location (e.g. on a CD-ROM), use the command:


md5sum -c ./checksums.md5

Or there are other programs for validating the files with the checksums, like wxChecksums which is available for Windows.


I wanted this because the graphical wxChecksums was too difficult to compile, and didn't seem worth the effort. Although I'm sure a couple backwards-compatibility packages for gcc would fix this problem, the command-line method makes it straightforward for doing multiple check sums in a single command in text mode.

Sunday, September 17, 2006

Python BitTorrent Daemon

If you're running a server and leeching/seeding torrents, and it's a no-screen machine, you might not want to use a BitTorrent client with an interface. The main reason is, the interface requires a lot of CPU cycles to constantly update and if you're not even looking, why bother? Azureus is the most powerful BitTorrent client, but it's also the most CPU-intensive.


On my Mini-ITX server, Azureus running on a VNC Server (xvnc) left the kernel 60% or less idle at times when the GUI was open or when traffic was heavy (30-40% cycles given to Azureus), and Azureus is also known for hogging memory (weakness of Java). After I switched to just letting btseed run, the total CPU usage is about 8% (92% idle). This means that the kernel is more responsive to new tasks.


A Python BitTorrent service (Daemon) either comes with Fedora or is part of the packages installed as dependencies of the YUM bittorrent-gui package. There is a Daemon set up called btseed, which is capable of monitoring .torrent files in a specified directory and downloading them automatically. There's essentially no documentation I can find about how to use this correctly, but I got it working OK.


The /etc/init.d/btseed script should be used to start/stop/restart the Daemon. This script launches btseed, which is an incarnation of the launchmany-console Python program. The file /etc/sysconfig/bittorrent should be configured to your preferences.


The Daemon is capable of UPNP, which makes it pretty nice. Here is my /etc/sysconfig/bittorrent configuration file:


SEEDDIR=/srv/bittorrent/data
SEEDOPTS="--max_upload_rate 50 --display_interval 30 --minport 49900 --maxport 52000"
SEEDLOG=/var/log/bittorrent/btseed.log
TRACKPORT=6969
TRACKDIR=/srv/bittorrent/data
TRACKSTATEFILE=/srv/bittorrent/state/bttrack
TRACKLOG=/var/log/bittorrent/bttrack.log
TRACKOPTS="--min_time_between_log_flushes 4.0 --show_names 1 --hupmonitor 1"

Note that the TRACK* settings are not important unless you're running a tracker.

Get a free domain name (IP alias) for your subnet

DynDNS.com provides a free IP Address Alias for everyone. The only problem is, I have a dynamic IP address, so I had to also get ddclient (for Linux) working. It was pretty simple—a package was available on YUM, and it installs itself by default as a daemon (service). All you need to do is set up the /etc/ddclient.conf file to work correctly. You need to do two things with the ddclient: discover your IP address, and check with DynDNS. I basically just had to uncomment some lines and fill in the details. DynDNS has a facility for giving you your WAN IP address. Your router does too, but it may be too hard to find.