Friday, January 11, 2008

Scripts for Moving Large Files to DVD (GNU+Linux environment)

Often, we have tons of files (videos, ISO's, software packages, etc.) that we're done with but don't want to throw away. I have written a few scripts to pack a directory of these kinds of files up into volumes (without any kind of compression) so that they can be written to DVD. Each volume should have an MD5 sum file generated which provides both a directory for that volume (since you can keep a copy locally and search using grep) and a means to verify its integrity.



The first script attempts to "pack" files at the current directory level (it's non-recursive) into volumes of a specified size. This is really the only part of this post that warrants a real script since it's a nested loop. Note that really small files will always end up in the extra space on the first volume, so you may want to move them first.




#!/bin/sh
#
# Script takes files (or directories) in the current working directory
# and moves them to subdirectories for writing out to discs
#
# This allows collections of relatively same-sized files or directories
# of files to be packed into volumes for storage on optical media.
#
# Modified to output a shell script instead of making any changes
# Disk capacity can now be entered

# Defaults
discSize="0";
discDefaultSize="4380000";
discInitNumDef="0";
discInitNum="0";
scriptPathDef="pack.sh";
diskPath="disks";

echo -n "Enter the volume number at which to start [$discInitNumDef]: ";
read discNumOffset;

if [ "$discNumOffset" == "" ] ;
then
discNumOffset=$discInitNumDef;
fi
echo $discNumOffset;

echo -n "Enter the maximum capacity of the media [$discDefaultSize]: ";
read discMaxSize;

if [ "$discMaxSize" == "" ] ;
then
discMaxSize=$discDefaultSize;
fi
echo $discMaxSize;

echo -n "A shell script will be output, move files now? [y/N]";
read moveFiles;

if [ "$moveFiles" == "" ];
then
moveFiles="N";
echo "Not going to move files.";
fi

echo -n "Enter the path to save the shell script [$scriptPathDef]: ";
read $scriptPath;

if [ "$scriptPath" == "" ] ;
then
scriptPath=$scriptPathDef;
fi

echo "Going to write shell script to '$scriptPath'.";

# Declare disk size array
diskSizes[0]=0;
arraySize=1;

echo "#!/bin/sh" > $scriptPath;

if [ ! -d "$diskPath" ];
then
echo "mkdir \"$diskPath\";" >> $scriptPath;
fi

if [ ! -d "$diskPath/`expr $discNum + $discNumOffSet`" ] ;
then
echo "mkdir \"$diskPath/`expr $discInitNum + $discNumOffset`\";" >> $scriptPath;
fi

for file in * ;
do

if [ "$file" != "$diskPath" -a "$file" != "$scriptPath" ] ;
then
echo "$file";

discNum=$discInitNum;

newSize=`du -s "$file" | cut -f1`;
#discSize=`du -s todisk/$discNum/ | cut -f1`;
#discSize=`expr $diskSizes[$discNum] + $newSize`;
discSize=${diskSizes[$discNum]};

echo "newSize = $newSize, discSize = $discSize";

if [ $newSize -gt $discMaxSize ] ;
then
echo "$file is larger than the disc size, skipping it.";
else
while [ `expr $discSize + $newSize` -gt $discMaxSize ]
do
echo "Won't fit in $discNum + $discNumOffset: $discSize + $newSize > $discMaxSize";

discNum=`expr $discNum + 1`;

if [ $discNum -ge $arraySize ] ;
then
#diskSizes[$diskNum]=0;
diskSizes=( ${diskSizes[@]} 0 );
arraySize=`expr $arraySize + 1`;

if [ ! -d "$diskPath/`expr $discNum + $discNumOffset`" ];
then
echo "mkdir \"$diskPath/`expr $discNum + $discNumOffset`\";" >> $scriptPath;
fi


fi

discSize=${diskSizes[$discNum]};
done

echo "Going to move $file into $discNum to make $discSize kb `expr $discSize + $newSize`";

echo "mv \"$file\" \"$diskPath/`expr $discNum + $discNumOffset`/\";" >> $scriptPath;

# Update disc size entry
diskSizes[$discNum]=`expr $discSize + $newSize`;

fi

fi

done

echo "Disk sizes:";

for DISC in ${diskSizes[@]}
do
echo "$DISC kb";
done

exit;


The next, albeit more simple script creates checksum files (for later use with md5sum -c ...) in each volume and provides the option to save a copy of the checksum file to another location.




#!/bin/sh
#
# Create checksum files for disk volumes generated by 'disk-pack'.
# These files allow the fidelity of the optical media to be
# evaluated, and allow the contents of the disk to be catalogued.
#
# This file should not change any files; only add new files.
#

CATDIRDEF="`pwd`";
echo -n "Path to save a duplicate of the MD5 checksums [$CATDIRDEF]: ";
read CATDIR;

if [ "$CATDIR" == "" ];
then
CATDIR=$CATDIRDEF;
fi

echo "Saving duplicate checksums in '$CATDIR'";

if [ ! -d "$CATDIR" ];
then
echo "Directory doesn't exist.";
exit;
fi

PREFIXDEF="disk";
echo -n "Prefix to use in checksum file names [$PREFIXDEF]: ";
read PREFIX;

if [ "$PREFIX" == "" ];
then
PREFIX=PREFIXDEF;
fi
echo "Using prefix '$PREFIX'.";

for DISK in [0-9]* ;
do

if [ "$DISK" != "." -a "$DISK" != ".." ]
then
echo "Processing volume $DISK";
cd $DISK;
find . -type f -exec md5sum {} \; | tee ../tempsums.md5;
cd ..;

if [ -e "$CATDIR/$PREFIX$DISK.md5" ];
then
echo "WARNING: Catalog file already exists, using alternate name.";
NUM="0";
while [ -e ""$CATDIR/$PREFIX$DISK-$NUM.md5 ]; do
NUM=`expr $NUM + 1`;
done
cp tempsums.md5 $CATDIR/$PREFIX$DISK-$NUM.md5;
else
cp tempsums.md5 $CATDIR/$PREFIX$DISK.md5;
fi

if [ -e "$DISK/$PREFIX$DISK.md5" ];
then
echo "WARNING: File $DISK/$PREFIX$DISK.md5 already exists, using alternate name.";
NUM="0";
while [ -e $DISK/$PREFIX$DISK-$NUM.md5 ]; do
NUM=`expr $NUM + 1`;
done
mv tempsums.md5 $DISK/$PREFIX$DISK-$NUM.md5;
else
mv tempsums.md5 $DISK/$PREFIX$DISK.md5;
fi
fi
done


Finally, you're ready to put these volumes out to optical media (since you've minimized internal fragmentation, captured a catalog of the files, and took an extra step to preserve integrity). You can do so using your favorite method, but when there are many volumes (like more than three) I prefer to take the following steps.



The following command, if you have genisoimage will create a .iso file for the directory '40' and it will have the volume name "Volume40" when you mount it.


genisoimage -o volume40.iso -J -r -V Volume40 40/


After you have a .iso file, you're almost ready to burn. Always, always, always mount the ISO image (mount -o loop -t iso9660 volume40.iso isotest/), enter it and check some of the MD5 sums to make sure you have a good .iso file! You'll have to check the man page for genisoimage and make sure you're providing the command-line options correctly if the files in the ISO seem corrupted.



If you're familiar with cdrecord, it is now provided by wodim. You need to be root. The command looks like:

wodim -v -eject speed=8 dev='/dev/scd0' volume40.iso


Then, before I delete anything, I always insert the CD, preferably into another optical drive, and run md5sum -c volume40.md5. Now you know you have an exact copy, you can put it in a case and delete the original. Note I'm assuming that if the disc fidelity decays that the files can be found again from the Internet--make sure you have even more redundancy if these are your personal files!

No comments: