Ryan's Bliggity Blog: 2008

Monday, December 08, 2008

Nikon D-50 Cleaning Dust from Sensor

So, on some bright long-exposure shots, I noticed there were some consistent dust spots. I took some slow-shutter, out-of-focus test shots to see. I changed lenses thinking it could be one of the internal elements, but nothing---the dirt was on the sensor. I decided to clean out the inside of the body. Some dust was accumulating on the reflex mirror and the window for the viewfinder. However, if you move the mirror down and out of the way (ever so carefully!), there is a door in the rear which is called the shutter curtain. I thought I was out of luck and needed help, until (completely by chance) I came across a section in the User Manual about cleaning the low-pass filter.

You can see the dirt at about 4:30 in this test shot.

I knew that the low-pass filter was a fancy term for an IR filter that happens to provide an air-tight seal for the CCD sensor. There's a procedure in the manual that enables you to lock open the reflex mirror and the shutter curtain, exposing the low-pass filter and sensor. I highly recommend that you read these instructions; they're very easy to follow. They recommend not to contact the low-pass filter, and I totally agree.

I followed the instructions for locking the mirror and curtain open, and when I removed the body cap, I could clearly see the debris on the sensor. I simply blew some air in there, and it was freed. I recommend blowing air while holding the body upside-down. Make sure you have a lot of light.

Nikon also has directions for setting up test shots for seeing if you have a dust problem. It is geared for their dust removal software, but it also serves as a diagnostic.

Test shot following Nikon's directions after cleaning

You turn the camera off to exit the special lock mode. Presto! No more dirt on my photos!

Sunday, April 27, 2008

MD5SUM reorganization revisited

I was running my shell script on a collection of over 10,000 files, and it was running for over three days. The shell interpreter is just way too slow when this amount of work is involved. So, I re-wrote the script in Perl and it is now lightning fast because I am able to take advantage of Perl hashes (associative arrays).

#!/usr/bin/perl

#

# This file takes a checksum file (output from md5sum utility)

# and attempts to reorganize the files in the directory to

# match the listing in the md5 file.

#

# Files not found in the md5 input are left alone.

# Files already in the right place are left alone.

# Other files have their checksums computed and, if they are found

# in the md5 input, they are moved to the appropriate location.

#

# WARNING: It confuses duplicate files!!!

#

use File::Basename;



if ( $#ARGV < reorgpath =" $ARGV[1];" reorgptah = "." lines =" ;



close(SUMSFILE);



foreach my $line (@lines) {

chomp($line);

$sums{substr($line,0,32)} = substr($line,34);

}



print "Read in ".($#lines+1)." checksums and paths.\n";



&reorg($reorgPath);



sub reorg {

my $dir = shift;

#print "Recurring for $dir\n";



opendir DIR, $dir or return;

my @contents =

 map "$dir/$_",

 sort grep !/^\.\.?$/,

 readdir DIR;

closedir DIR;

foreach my $file (@contents) {

 #print "Considering $file\n";

 #next unless !-1 && -d;

 if ( -d $file ) {

  &reorg($file);

 } else {

#   my @args = ("md5sum", $file);

#   system(@args) == 0 or die("System @args failed: $?\n");

  $tmpLine = `md5sum "$file"`;

  chomp($tmpLine);

  $tmpHash = substr($tmpLine,0,32);

  $tmpPath = substr($tmpLine,34);



  #print "$tmpHash -> $tmpPath\n";

  # Now lookup the hash, if the paths aren't the same, move the file

  if (defined $sums{$tmpHash}) {

   #print $sums{$tmpHash};

   if ($tmpPath ne $sums{$tmpHash}) {

    print "Moving ".$tmpPath." to ".$sums{$tmpHash}."\n";



    # Make directory for move if it doesn't exist

    @args = ("mkdir", "-p", dirname($sums{$tmpHash}) );

    system(@args) == 0 or print STDERR "Couldn't create directory @args: $!\n";

    # Move file to path found in checksum file

    @args = ("mv", $tmpPath, $sums{$tmpHash});

    system(@args) == 0 or print STDERR "Couldn't move: $!\n";

   } else {

    print "File $tmpPath is already in place.\n";

   }

  } else {

   print "No hash for $tmpPath found.\n";

  }

 }

}

return;

}



exit;

A few things still need to be fixed: Inserting a new hash from the file into the hash table should fail if it is already there (hash collision or duplicate files). Also, a reverse lookup in the hash table could save needing to compute the MD5SUM, but at what computational cost?

Tuesday, March 04, 2008

Data/Documents backup script

I used to always create directory under my home called data. I'd store files which I need to do work and my working files. Then, this would be the only directory I had to synchronize with Unison (http://www.cis.upenn.edu/~bcpierce/unison/)between machines. Since Fedora has included a directory called Documents, which has a similar purpose (and so has Mac OS X), I've just changed the name of this special directory from data to Documents.

That brings me to the topic of this article--how do I manage if I mess up a file in this directory, and I synchronize the mistake to other mirrors? That would mean none of the computers I work on has a copy anymore. The answer is rdiff-backup (http://www.nongnu.org/rdiff-backup/). I used to make just a user cron job to call rdiff-backup and update an incremental backup repo on my behalf, but I wanted this to be more mechanical (what if I added a new user, etc.). So I wrote the following script which replaces an old script I used to have which would only make a mirror with rsync.

Note that, on my configuration, the incremental backup repo is on the same device as the original files, but both are mirrored in an overnight backup job. This means I have (at least) four mirrors of the same files, which is sort of a waste but I don't really worry about it since it's about 200 MB / user. Ideally, you would keep just the local mirror on the disk, and the rdiff-backup repository on the backup device (or system over network). The downside here would be, if the backup device (or remote system) crashes you loose the ability to restore a previous increment.

Following is the script for maintaining incremental backup repositories with rdiff-backup for all users. Caution: the backup path should be as read-only as possible, but it is not if the user logs in to the system. For this reason, you'll have to count on them not to modify the files, or keep the files on a filesystem they cannot mount. For simple remote access over samba (the kind of access in my setup), you would have to configure a per-user share (so Joe can't see Susan's backup, for instance) and make each one read-only (so Joe cannot corrupt his repository by changing the files there).


#!/bin/sh
#
# Make incremental backups of 'Documents' directories for users
#
# Ryan Helinski

LOGFILE=/var/log/rdiff-backup.log

# Close stdout and stderr
1>&-; 2>&-;

# Direct stdout and stderr to logfile
exec 1>>$LOGFILE; exec 2>>$LOGFILE;

echo Begin $0, `date`;

# For debugging
#set -x

# Script-specific configuration
HOMESPATH=/home
BACKUPPATH=/opt/backup
RDIFF_OPTIONS="--print-statistics"
DOCSDIR=Documents

# Max increment life 
# The time interval is an integer followed by the character s, m, h, D, W, 
# M, or Y, indicating seconds,  minutes, hours,  days,  weeks,  months, or 
# years respectively, or a number of these con-catenated.
MAX_LIFE=4W

for USERHOME in $HOMESPATH/* ; do

# USERDIR=`basename $USERHOME`;
 USER=`stat -c "%U" $USERHOME`;

 if [ -d "$USERHOME/$DOCSDIR" ] ; then
  echo "Updating $BACKUPPATH/$USER/$DOCSDIR";

  # Create backup repo if it doesn't exist
  if [ ! -d "$BACKUPPATH/$USER/$DOCSDIR" ] ; then
   echo "Creating rdiff-backup repo";
   mkdir -p "$BACKUPPATH/$USER/$DOCSDIR";
   chown $USER "$BACKUPPATH/$USER";
   chown $USER "$BACKUPPATH/$USER/$DOCSDIR";
  fi

  # Update repo
  rdiff-backup $RDIFF_OPTIONS "$USERHOME/$DOCSDIR" \
   "$BACKUPPATH/$USER/$DOCSDIR" ;

  # Clean up repo
  #rdiff-backup --remove-older-than $MAX_LIFE --force \
  # "$BACKUPPATH/$USER/$DOCSDIR" ;

 fi

done

echo End $0, `date`;
echo

exit

To actually remove old backups, uncomment the code in the "Clean up repo" section.

Wednesday, January 30, 2008

Shell Script to Tunnel VNC Sessions over SSH (Improved)

Following is a parameterized shell script which you can use with this syntax:

Usage: vnctunnel [ssh session] [screen] [vnc options]
Example: vnctunnel user@server :1

It's "improved" because it's completely parameterized--you can specify the SSH session, screen, and even vnc options.

(Assuming you copy it to your ~/bin directory and make it executable)

#!/bin/sh
# This is a parameterized script to SSH tunnel a VNC session
# Ryan Helinski

# Local settings
# Change this if you don't like `vncviewer'
VNCVIEWER=vncviewer
# Change this if you use ports near 25900 for something else locally
PORTOFFSET=20000
# Apply extra options to vncviewer
VNCOPTS="--AutoSelect=0 $3"

if [ $# -lt 2 ] ; then
        echo "Usage: $0 [ssh session] [screen]";
        echo "Example: $0 user@server :1";
fi

SSHPARM=$1;
SCREEN=`echo $2 | cut -d':' -f2`;
SSHPORT=$[$SCREEN+5900];

echo "Session: $SSHPARM, Screen: $SCREEN, Port: $SSHPORT"

ssh -f -L $[$PORTOFFSET+$SSHPORT]:localhost:$SSHPORT $SSHPARM sleep 10; \
        vncviewer localhost:$[$PORTOFFSET+$SSHPORT]:$SCREEN $VNCOPTS

exit

Using 7-zip for Archival

I've been using 7-zip to archive large collections of files--mainly college and documents which have high potential to be compressed. 7-zip has a high compression ratio--higher than bzip2, but also higher than the evil RAR archive everyone seems to like--and it's GPL and widely available for GNU+Linux, UNIX and Mac via `p7zip'. However, I've been slightly concerned that it doesn't retain POSIX file modes. For files like text, graphics, etc. this doesn't really bother me. However, if I was going to archive something which contains scripts, or a shared directory where user ownership needs to be preserved, I'd need to retain this extra information.

I have seen it a couple times while browsing, and thought "oh...". When I tried to look for it, I really found a lack of examples. After careful reading of the manuals for tar and 7z, following is the solution:

To encode:

tar cf - data/ | 7za a data.tar.7z -si

To decode:

7za x ~/data.tar.7z -so | tar xf -

Monday, January 28, 2008

Trash Management Scripts

Moving files to trash

Today's post is about some trash management scripts I have written for myself and would like to share. They should run on most GNU+Linux environments.

If you're like me, you prefer to mv file ~/.Trash/ rather than rm file, but this is both dangerous and hard to type. I created a (very small) script that will act as a command you can use to move files to your Trash bin without overwriting anything.

Uses 'numbered' backup scheme of 'mv', so the most recently trashed
file will have its name intact, and the backups will have the suffix
".~n~" appended, where the higher the n, the more recent the backup.


#!/bin/sh
#
# Moves files in the command arguments to the trash bin, while keeping 
# backups of any files already in that directory. 
#
# Uses 'numbered' backup scheme of 'mv', so the most recently trashed 
# file will have its name intact, and the backups will have the suffix 
# .~n~ appended where the higher the n, the more recent the backup. 
#
# It uses the basename of the file so that no (absolute or relative) 
# directory is preserved.
#
TRASH="$HOME/.Trash";

while [ $# -gt 0 ];
do
        mv --backup=numbered "$1" "$TRASH/`basename $1`";

        shift;
done

After you copy this file to your favorite bin directory and make it executable, you can use the following syntax

trash file1 file2 path/to/file3 path/to/file4

And, even if file1 and file3 have the same name, you'll still be able to find both in your trash bin.

Rounding up and deleting trash automatically

If your trash is anything like mine, the probability that a file is not trash increases exponentially every day. For this reason, I wanted to somehow tag when I threw something out so I could tell how old it was, and maybe delete everything after a specific number of days.

The solution I came up with was to create siblings of the .Trash directory, using UNIX time-stamps. In conjunction with a cron task, yesterday's trash will be in a directory at ~/.Trash-XXXXXXXXXX where the X's are the time-stamp for 4:00am today. In this manner, you'll have a bin for each day you throw something out. The script follows.


#!/bin/sh
#
# The first step is to move all files under ~/.Trash, other than 
# ., .., and .#bin-n into a new trash bin for yesterday.
# 
#

NEWBIN=`date +%s`;
NEWBINPATH="$HOME/.Trash-$NEWBIN";
OLDESTBIN="20"; # days

OLDESTBINTIME=`date -d "now - $OLDESTBIN days" +%s`;
OLDESTBINPATH="$HOME/.Trash-$OLDESTBINTIME";

echo "Moving current trash to $NEWBINPATH";
mkdir $NEWBINPATH;
mv $HOME/.Trash/* $HOME/.Trash/.[!.]* $NEWBINPATH/;

for BIN in $HOME/.Trash-*; do 

        STAMP=`echo "$BIN" | cut -d'-' -f2`;
#       echo $BIN $STAMP $OLDESTBINTIME;
        if [ $STAMP -lt $OLDESTBINTIME ] ;
        then
                echo "Deleting $BIN";
#               rm -Rf $BIN;
        fi

done

So I copied this file into my ~/bin/ directory and used crontab -e to add the following line to my crontab:

00 4 * * * /home/ryan/bin/trash-roundup.sh

Right now this script will send errors to crond, which should be delivered in your local mail (accessed using mail). Also, deleting old bins is disabled since I haven't had a chance to thoroughly test it.

To actually delete old trash, choose a value for the OLDESTBIN variable in the script, this is the longest time that a bin will hang around. Then, you have to un-comment the line with rm in the script.

Friday, January 25, 2008

Using the Seagate FreeAgent as a (periodic) Mirror Backup

I recently purchased a Seagate FreeAgent 100D USB drive. The drive is quite nice, small, quiet, and has advanced power management. I plugged the device in on a Fedora 7 installation, and it came right up.

The first problem I had was the NTFS partition that came on the device. This is easily fixed with fdisk and mkfs.ext3. Don't forget to use e2label, especially if you have more than one of the same device.

The next thing you want to do is determine the "id" of the device, so that you can uniquely identify it. The problem is that udev sets up devices in order of connection (/dev/sda, /dev/sdb, ...). To remedy this, use the symbolic links that udev creates under /dev/disk/ to identify the disk by one of the following means:

/dev/disk/by-id/ - Uses connection, make, model, and serial number

/dev/disk/by-uuid/ - Uses the UUID given to the partition when it was created

/dev/disk/by-label/ - If you have used e2label, this is more meaningful than the UUID

/dev/disk/by-path/ - This is the one you do not want to use!

I want to disable normal users (including myself) from modifying the contents of the backup, so the plan is to keep it unmounted and disable users from mounting it. Thefefor, this is the appropriate entry for me in /etc/fstab:

/dev/disk/by-id/usb-Seagate_FreeAgentDesktop_30DFK39D-0:0-part1 /media/FreeAgent        ext3    defaults        1 2

Note this will require the device to be present and consistent at boot-time. If you don't need this, change the last two numbers in the fstab entry to zero.

Before automating backup, I had a problem: The drive spins down automatically, so if you use only a command like rsync it may terminate due to an I/O Error because it simply timed out. This happened with me using ls, but it doesn't always happen -- so you should be more paranoid about the power mode.

The solution is to use a utility called sdparm, which is made to control SCSI (and SATA) disks. This is available for Fedora in a yum package called simply sdparm. This allows you to send a command to the drive to "start" (spin up) or "stop" (spin down).

Then, the final task is to create a backup procedure that is invoked by crond. Rather than writing something to /etc/crontab, I added the following file to /etc/cron.daily/. Make sure it has execute permissions, and call it (as root) a couple times to make sure it's working.

#!/bin/sh
#
# Backup to a USB disk 
#

# Env variables 
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# OK to change these
LOGFILE=/var/log/backup-usb.log
DISKDEVICE=/dev/disk/by-id/usb-Seagate_FreeAgentDesktop_30DFK39D-0\:0
DISKPARTITION=$DISKDEVICE-part1
MOUNTPOINT=/media/FreeAgent

# Leave these alone
DEVICEFILE=`readlink -f $DISKDEVICE`
PARTITIONFILE=`readlink -f $DISKPARTITION`

# Close stdout
1>&-;
# Close stderr
2>&-;

# Log needs to be rotated manually
#mv $LOGFILE $LOGFILE.1

# Direct stdout and stderr to logfile
exec 1>>$LOGFILE;
exec 2>>$LOGFILE;

echo Begin $0, `date`;

# Preliminary work (get backup device online)
if [ -e $DISKDEVICE ] ;
then
 echo "Device exists (is connected).";

# /usr/bin/sdparm --all $DISKDEVICE

 echo "Sending start (wake up) signal...";
 /usr/bin/sdparm --command=start $DISKDEVICE
 if [ "$?" -eq "0" ] ; 
 then 
  echo "Success";
 else 
  echo "Failed to start device";
  exit;
 fi

 # Get backup partition mounted
 if [ `grep $PARTITIONFILE /etc/mtab | wc -l` -le "0" ] ;
 then
  echo "Device $PARTITIONFILE is not mounted!";
  mount $PARTITIONFILE $MOUNTPOINT;
  if [ "$?" -eq "0" ] ;
  then
   echo "Mounted OK.";
  else
   echo "Failed to mount.";
   exit;
  fi
  
 else 
  echo "Partition $PARTITIONFILE is already mounted.";
 fi
else
 echo "Device doesn't exist, must not be connected!";
 exit;
fi

# Backup procedure 
# 
# For now I'm just using RSYNC because these files don't often change
# 
rsync --verbose --itemize-changes \
--archive --hard-links --partial --delete-before \
/opt/srv/ $MOUNTPOINT/srv/ && \
date && \
df -h $MOUNTPOINT && \
echo ;

# NOTE might also want to un-mount the device at this point 
# to prevent users from modifying it directly !
echo "Unmounting $PARTITIONFILE"
umount $PARTITIONFILE
echo "Stopping device $DISKDEVICE"
/usr/bin/sdparm --command=stop $DISKDEVICE
# /usr/bin/sdparm --all $DISKDEVICE && \

echo End $0, `date`;

The variables section and the rsync command have to be updated to fit your installation and your backup scheme.

Again, this worked well for me, you may have to make adjustments.

Friday, January 18, 2008

Shell Script to Reorganize a Mirror to Match a Newer Mirror

This script has a very specific purpose, but you've probably had a chance where it could have come in handy. The idea is, if there are two mirrors of a bunch of files, and you reorganize (more around and/or rename) files on one mirror, then when you go to RSYNC the other mirror, it thinks the files that were only moved were added and deleted.

To use this script, you would first move the files on one mirror, and generate an MD5 sum file, as in:

$ find ./ -type f -exec md5sum {} \; | tee checksums.md5

Then, copy this small file to the root of the other mirror, and invoke the script with the name of the checksums file:

$ md5sum-reorg.sh checksums.md5

The script will process every file under the working directory and attempt to correct its path so that it matches a file in the new 'checksums.md5'. Files not found in the checksums file are left alone. Files already in place are left alone (obviously), without even calculating their md5sum.

Script follows:


#!/bin/bash
#
# This file takes a checksum file (output from md5sum utility)
# and attempts to reorganize the files in the directory to 
# match the listing in the md5 file. 
#
# Files not found in the md5 input are left alone.
# Files already in the right place are left alone.
# Other files have their checksums computed and, if they are found 
# in the md5 input, they are moved to the appropriate location.
#
# WARNING: It confuses duplicate files!!!
#

if [ $# -lt 1 ];
then
 echo "Usage: $0 [checksums file]";
 exit;
fi

declare -a SUMS
declare -a NEWPATHS

exec 10<$1
let count=0

echo "Parsing checksums and paths from input file...";

while read LINE <&10; do
 SUM=`echo "$LINE" | cut -d' ' -f1`;
 NEWPATH=`echo "$LINE" | cut -d' ' -f1 --complement | cut -d' ' -f1 --complement`;
 SUMS[$count]=$SUM;
 NEWPATHS[$count]=$NEWPATH;
 ((count++));
done
# close file 
exec 10>&-

echo "Compiling list of files that need to be checked...";

TMPFILE=`mktemp`;
find ./ -type f -printf "%p\n" > $TMPFILE;

exec 10<$TMPFILE
while read OLDFILE <&10; do
 echo "Trying to find new path for $OLDFILE";

 SKIP=0;
 let count=0;
 while [ $count -lt ${#NEWPATHS[@]} ] ; do
  if [ "${NEWPATHS[$count]}" == "$OLDFILE" ]; then
   echo "File already exists at '${NEWPATHS[$count]}'";
   SKIP=1;
   break;
  fi
  ((count++));
 done

 if [ $SKIP -eq 1 ]; then
  continue; #skip the rest of this iteration
 fi


 echo "Computing checksum of $OLDFILE";
 OLDSUM=`md5sum "$OLDFILE" | cut -d' ' -f1`;

 # iterate over the pair of arrays until we might find a matching sum
 let count=0;
 while [ "$count" -lt ${#SUMS[@]} ]; do 
  SUM=${SUMS[$count]};
  NEWPATH=${NEWPATHS[$count]};

  if [ "$SUM" == "$OLDSUM" ];
  then
   if [ "$OLDFILE" != "$NEWPATH" ] ;
   then
    NEWPARENT=`dirname "$NEWPATH"`;
    if [ ! -d "$NEWPARENT" -a "$NEWPARENT" != "." ];
    then
     echo "Making directory $NEWPARENT";
     mkdir "-p" "$NEWPARENT";
    fi
    echo "Moving $OLDFILE to $NEWPATH";
    mv "$OLDFILE" "$NEWPATH";
   else
    echo "Path hasn't changed.";
   fi
   break;
  fi
  ((count++));
 done

done

exec 10>&-

exit

In case it's not clear, this is offered without any warranty or guarantee whatsoever.

Friday, January 11, 2008

Scripts for Moving Large Files to DVD (GNU+Linux environment)

Often, we have tons of files (videos, ISO's, software packages, etc.) that we're done with but don't want to throw away. I have written a few scripts to pack a directory of these kinds of files up into volumes (without any kind of compression) so that they can be written to DVD. Each volume should have an MD5 sum file generated which provides both a directory for that volume (since you can keep a copy locally and search using grep) and a means to verify its integrity.

The first script attempts to "pack" files at the current directory level (it's non-recursive) into volumes of a specified size. This is really the only part of this post that warrants a real script since it's a nested loop. Note that really small files will always end up in the extra space on the first volume, so you may want to move them first.


#!/bin/sh
#
# Script takes files (or directories) in the current working directory 
# and moves them to subdirectories for writing out to discs
#
# This allows collections of relatively same-sized files or directories
# of files to be packed into volumes for storage on optical media.
#
# Modified to output a shell script instead of making any changes
# Disk capacity can now be entered

# Defaults
discSize="0";
discDefaultSize="4380000";
discInitNumDef="0";
discInitNum="0";
scriptPathDef="pack.sh";
diskPath="disks";

echo -n "Enter the volume number at which to start [$discInitNumDef]: ";
read discNumOffset;

if [ "$discNumOffset" == "" ] ;
then
 discNumOffset=$discInitNumDef;
fi
echo $discNumOffset;

echo -n "Enter the maximum capacity of the media [$discDefaultSize]: ";
read discMaxSize;

if [ "$discMaxSize" == "" ] ;
then 
 discMaxSize=$discDefaultSize;
fi
echo $discMaxSize;

echo -n "A shell script will be output, move files now? [y/N]";
read moveFiles;

if [ "$moveFiles" == "" ];
then
 moveFiles="N";
 echo "Not going to move files.";
fi

echo -n "Enter the path to save the shell script [$scriptPathDef]: ";
read $scriptPath;

if [ "$scriptPath" == "" ] ;
then
 scriptPath=$scriptPathDef;
fi

echo "Going to write shell script to '$scriptPath'.";

# Declare disk size array
diskSizes[0]=0;
arraySize=1;

echo "#!/bin/sh" > $scriptPath;

if [ ! -d "$diskPath" ];
then
 echo "mkdir \"$diskPath\";" >> $scriptPath;
fi

if [ ! -d "$diskPath/`expr $discNum + $discNumOffSet`" ] ;
then
 echo "mkdir \"$diskPath/`expr $discInitNum + $discNumOffset`\";" >> $scriptPath;
fi

for file in * ;
do

if [ "$file" != "$diskPath" -a "$file" != "$scriptPath" ] ;
then 
 echo "$file";

 discNum=$discInitNum;

 newSize=`du -s "$file" | cut -f1`;
 #discSize=`du -s todisk/$discNum/ | cut -f1`;
 #discSize=`expr $diskSizes[$discNum] + $newSize`;
 discSize=${diskSizes[$discNum]};

 echo "newSize = $newSize, discSize = $discSize";

 if [ $newSize -gt $discMaxSize ] ;
 then
  echo "$file is larger than the disc size, skipping it.";
 else
  while [ `expr $discSize + $newSize` -gt $discMaxSize ] 
  do
   echo "Won't fit in $discNum + $discNumOffset: $discSize + $newSize > $discMaxSize";

   discNum=`expr $discNum + 1`;
   
   if [ $discNum -ge $arraySize ] ;
   then
    #diskSizes[$diskNum]=0;
    diskSizes=( ${diskSizes[@]} 0 );
    arraySize=`expr $arraySize + 1`;

    if [ ! -d "$diskPath/`expr $discNum + $discNumOffset`" ];
    then
            echo "mkdir \"$diskPath/`expr $discNum + $discNumOffset`\";" >> $scriptPath;
    fi


   fi
   
   discSize=${diskSizes[$discNum]};
  done
  
  echo "Going to move $file into $discNum to make $discSize kb `expr $discSize + $newSize`";
 
  echo "mv \"$file\" \"$diskPath/`expr $discNum + $discNumOffset`/\";" >> $scriptPath;

  # Update disc size entry
  diskSizes[$discNum]=`expr $discSize + $newSize`;
 
 fi

fi

done

echo "Disk sizes:";

for DISC in ${diskSizes[@]}
do
 echo "$DISC kb";
done

exit;

The next, albeit more simple script creates checksum files (for later use with md5sum -c ...) in each volume and provides the option to save a copy of the checksum file to another location.


#!/bin/sh
#
# Create checksum files for disk volumes generated by 'disk-pack'.
# These files allow the fidelity of the optical media to be 
# evaluated, and allow the contents of the disk to be catalogued.
#
# This file should not change any files; only add new files. 
#

CATDIRDEF="`pwd`";
echo -n "Path to save a duplicate of the MD5 checksums [$CATDIRDEF]: ";
read CATDIR;

if [ "$CATDIR" == "" ];
then
 CATDIR=$CATDIRDEF;
fi

echo "Saving duplicate checksums in '$CATDIR'";

if [ ! -d "$CATDIR" ];
then 
 echo "Directory doesn't exist.";
 exit;
fi

PREFIXDEF="disk";
echo -n "Prefix to use in checksum file names [$PREFIXDEF]: ";
read PREFIX;

if [ "$PREFIX" == "" ];
then
        PREFIX=PREFIXDEF;
fi
echo "Using prefix '$PREFIX'.";

for DISK in [0-9]* ;
do

 if [ "$DISK" != "." -a "$DISK" != ".." ]
 then
  echo "Processing volume $DISK";
  cd $DISK;
  find . -type f -exec md5sum {} \; | tee ../tempsums.md5;
  cd ..;

  if [ -e "$CATDIR/$PREFIX$DISK.md5" ];
  then 
   echo "WARNING: Catalog file already exists, using alternate name.";
   NUM="0";
   while [ -e ""$CATDIR/$PREFIX$DISK-$NUM.md5 ]; do
    NUM=`expr $NUM + 1`;
   done
   cp tempsums.md5 $CATDIR/$PREFIX$DISK-$NUM.md5;
  else
   cp tempsums.md5 $CATDIR/$PREFIX$DISK.md5;
  fi

  if [ -e "$DISK/$PREFIX$DISK.md5" ];
  then
   echo "WARNING: File $DISK/$PREFIX$DISK.md5 already exists, using alternate name.";
   NUM="0";
   while [ -e $DISK/$PREFIX$DISK-$NUM.md5 ]; do
    NUM=`expr $NUM + 1`;
   done
   mv tempsums.md5 $DISK/$PREFIX$DISK-$NUM.md5;
  else
   mv tempsums.md5 $DISK/$PREFIX$DISK.md5;
  fi
 fi
done

Finally, you're ready to put these volumes out to optical media (since you've minimized internal fragmentation, captured a catalog of the files, and took an extra step to preserve integrity). You can do so using your favorite method, but when there are many volumes (like more than three) I prefer to take the following steps.

The following command, if you have genisoimage will create a .iso file for the directory '40' and it will have the volume name "Volume40" when you mount it.

genisoimage -o volume40.iso -J -r -V Volume40 40/

After you have a .iso file, you're almost ready to burn. Always, always, always mount the ISO image (mount -o loop -t iso9660 volume40.iso isotest/), enter it and check some of the MD5 sums to make sure you have a good .iso file! You'll have to check the man page for genisoimage and make sure you're providing the command-line options correctly if the files in the ISO seem corrupted.

If you're familiar with cdrecord, it is now provided by wodim. You need to be root. The command looks like:

wodim -v -eject speed=8 dev='/dev/scd0' volume40.iso

Then, before I delete anything, I always insert the CD, preferably into another optical drive, and run md5sum -c volume40.md5. Now you know you have an exact copy, you can put it in a case and delete the original. Note I'm assuming that if the disc fidelity decays that the files can be found again from the Internet--make sure you have even more redundancy if these are your personal files!

New Split Directory Script

Follow-up to: "Making directory traversals more efficient"

Back at the beginning, I posted a lengthy script that would split up a congested directory alphabetically. Recently, I needed it again, but needed it to be smarter, so I re-wrote it. Also, I figured out how to insert code into Blogger. Enjoy.


#!/bin/sh
#
# by Ryan Helinski, January 2008
#
# This is the second revision of a script that should be used 
# when there are too many files or directories at a single level
# on the file system. 
#
# It now recognizes the word, "the", and that the name should 
# be alphabetized by the words following.
#
# A script is output so the changes can be reviewed before
# any are made. 
#
# The script should add to existing bin directories if they 
# already exist.
#
# A further improvement would be to allow the split to be 
# multi-level.

BINS=(0-9 a b c d e f g h i j k l m n o p q r s t u v w x y z);
BIN_EXPS=(0-9 Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz);

SCRIPT_FILE=".script.sh";

echo "#!/bin/sh" > $SCRIPT_FILE;

for BIN in ${BINS[*]};
do
 if [ -d $BIN ];
 then 
  echo "mv $BIN .$BIN" >> $SCRIPT_FILE;
 else
  echo "mkdir .$BIN" >> $SCRIPT_FILE;
 fi
done

INDEX="0";
while [ "$INDEX" -lt "${#BINS[*]}" ];
do
 echo "mv [Tt][Hh][Ee]\ [${BIN_EXPS[$INDEX]}]* .${BINS[$INDEX]}/" >> $SCRIPT_FILE;
 INDEX=`expr $INDEX + 1`;
done

INDEX="0";
while [ "$INDEX" -lt "${#BINS[*]}" ];
do
 echo "mv [${BIN_EXPS[$INDEX]}]* .${BINS[$INDEX]}/" >> $SCRIPT_FILE;
 INDEX=`expr $INDEX + 1`;
done

for BIN in ${BINS[*]};
do
 echo "mv .$BIN $BIN" >> $SCRIPT_FILE;
done

ANSWER="";
while [ "$ANSWER" != "yes" -a "$ANSWER" != "no" ];
do
 echo "Script written to \"$SCRIPT_FILE\", execute now? (yes, no)";
 read ANSWER;
done

if [ "$ANSWER" == "yes" ];
then
 sh $SCRIPT_FILE;
 
 ANSWER="";
 while [ "$ANSWER" != "yes" -a "$ANSWER" != "no" ];
 do
  echo "Delete script file? (yes, no)";
  read ANSWER;
 done

 if [ "$ANSWER" == "yes" ];
 then
  rm "$SCRIPT_FILE";
 fi
fi

exit;