Originally published May 2, 2017 @ 10:46 pm

I’ve run into an interesting challenge: I needed to migrate application data from a local filesystem to NFS without stopping the processes running in the original mountpoint. Here’s a basic overview of the process. This will not work for every application.

So, let’s start a process in a directory located on a local filesystem. This will just run in `/tmp/local` and write the current timestamp to a file.

cd /tmp/local
df -hlP /tmp/local

	#Filesystem      Size  Used Avail Use% Mounted on
	#tmpfs            30G   17M   30G   1% /tmp

for i in `seq 1 1000` ; do date >> /tmp/local/out; sleep 5; done

Below is a sample process for halting the processes running in the source filesystem, rsyncing the contents to an NFS share, remounting the original mountpoint to the NFS share, restarting the halted processes, and refreshing their working directory. The last part is important because, while the mountpoint did not change, the underlying filesystem will be different and the process needs to know that.

# Define source and taget mountpoints
workdir=/tmp/local
tmpdir=/tmp/remote
mkdir -p ${tmpdir}
chown --reference=${workdir} ${tmpdir}

# Create an array holding PIDs for processes running in ${workdir}
IFS=$'\n' ; a=($(lsof ${workdir} | awk '{print $2}' | egrep "[0-9]{1,}")) ; unset IFS

# And pause those processes
for i in $(printf '%s\n' ${a[@]}); do kill -STOP ${i}; done

# Mount the ${tmpdir} on the NFS share
mount nas04:/share01 ${tmpdir}

# Rsync local filesystem to the NFS share
rsync -avKx --delete ${workdir}/ ${tmpdir}/

# Mount the original ${workdir} to the NFS share
umount ${tmpdir}
mount nas04:/share01 ${workdir}

# Resume paused PIDs and refresh their working directory
for i in $(printf '%s\n' ${a[@]}); do
kill -CONT ${i}
gdb -q <<EOF
attach ${i}
call (int) chdir("${workdir}")
detach
quit
EOF
done

Now you can tail the migrated output file and see that the original process is still writing to it:

tail -f /tmp/local/out
#Tue May  2 11:44:47 EDT 2017
#Tue May  2 11:44:52 EDT 2017
#Tue May  2 11:44:57 EDT 2017
#Tue May  2 11:45:02 EDT 2017
#Tue May  2 11:45:07 EDT 2017
# >>> note the time gap due to migration
#Tue May  2 11:45:29 EDT 2017
#Tue May  2 11:45:34 EDT 2017
#Tue May  2 11:45:39 EDT 2017
#Tue May  2 11:45:44 EDT 2017

As I mentioned, this may not work for more complex applications. Still, this can be useful. For example, you launched some script or another process that’s writing to a local filesystem. Then you realized you may not have enough disk space to hold the output. This may be a way to move the output file to another filesystem without relaunching the process.

For more complex data structures, you may want to use lsyncd instead of rsync for the sync process to be running in real time. This will minimize the downtime required to remount.