Friday, January 18, 2008

Shell Script to Reorganize a Mirror to Match a Newer Mirror

This script has a very specific purpose, but you've probably had a chance where it could have come in handy. The idea is, if there are two mirrors of a bunch of files, and you reorganize (more around and/or rename) files on one mirror, then when you go to RSYNC the other mirror, it thinks the files that were only moved were added and deleted. 

To use this script, you would first move the files on one mirror, and generate an MD5 sum file, as in:
$ find ./ -type f -exec md5sum {} \; | tee checksums.md5

Then, copy this small file to the root of the other mirror, and invoke the script with the name of the checksums file:
$ md5sum-reorg.sh checksums.md5

The script will process every file under the working directory and attempt to correct its path so that it matches a file in the new 'checksums.md5'. Files not found in the checksums file are left alone. Files already in place are left alone (obviously), without even calculating their md5sum.

Script follows:


#!/bin/bash
#
# This file takes a checksum file (output from md5sum utility)
# and attempts to reorganize the files in the directory to
# match the listing in the md5 file.
#
# Files not found in the md5 input are left alone.
# Files already in the right place are left alone.
# Other files have their checksums computed and, if they are found
# in the md5 input, they are moved to the appropriate location.
#
# WARNING: It confuses duplicate files!!!
#

if [ $# -lt 1 ];
then
echo "Usage: $0 [checksums file]";
exit;
fi

declare -a SUMS
declare -a NEWPATHS

exec 10<$1
let count=0

echo "Parsing checksums and paths from input file...";

while read LINE <&10; do
SUM=`echo "$LINE" | cut -d' ' -f1`;
NEWPATH=`echo "$LINE" | cut -d' ' -f1 --complement | cut -d' ' -f1 --complement`;
SUMS[$count]=$SUM;
NEWPATHS[$count]=$NEWPATH;
((count++));
done
# close file
exec 10>&-

echo "Compiling list of files that need to be checked...";

TMPFILE=`mktemp`;
find ./ -type f -printf "%p\n" > $TMPFILE;

exec 10<$TMPFILE
while read OLDFILE <&10; do
echo "Trying to find new path for $OLDFILE";

SKIP=0;
let count=0;
while [ $count -lt ${#NEWPATHS[@]} ] ; do
if [ "${NEWPATHS[$count]}" == "$OLDFILE" ]; then
echo "File already exists at '${NEWPATHS[$count]}'";
SKIP=1;
break;
fi
((count++));
done

if [ $SKIP -eq 1 ]; then
continue; #skip the rest of this iteration
fi


echo "Computing checksum of $OLDFILE";
OLDSUM=`md5sum "$OLDFILE" | cut -d' ' -f1`;

# iterate over the pair of arrays until we might find a matching sum
let count=0;
while [ "$count" -lt ${#SUMS[@]} ]; do
SUM=${SUMS[$count]};
NEWPATH=${NEWPATHS[$count]};

if [ "$SUM" == "$OLDSUM" ];
then
if [ "$OLDFILE" != "$NEWPATH" ] ;
then
NEWPARENT=`dirname "$NEWPATH"`;
if [ ! -d "$NEWPARENT" -a "$NEWPARENT" != "." ];
then
echo "Making directory $NEWPARENT";
mkdir "-p" "$NEWPARENT";
fi
echo "Moving $OLDFILE to $NEWPATH";
mv "$OLDFILE" "$NEWPATH";
else
echo "Path hasn't changed.";
fi
break;
fi
((count++));
done

done

exec 10>&-

exit



In case it's not clear, this is offered without any warranty or guarantee whatsoever.

No comments: