From 7a1f212bda7280ec6a6fb16f1e5c1bbda2866f06 Mon Sep 17 00:00:00 2001 From: Bill Date: Sun, 25 Apr 2021 23:37:11 -0400 Subject: Auto syncing --- gemini/gemlog/2021-04-26-auto-syncing.gmi | 101 ++++++++++++++++++++++++++++++ gemini/gemlog/index.gmi | 1 + gemini/index.gmi | 3 - 3 files changed, 102 insertions(+), 3 deletions(-) create mode 100644 gemini/gemlog/2021-04-26-auto-syncing.gmi diff --git a/gemini/gemlog/2021-04-26-auto-syncing.gmi b/gemini/gemlog/2021-04-26-auto-syncing.gmi new file mode 100644 index 0000000..416414a --- /dev/null +++ b/gemini/gemlog/2021-04-26-auto-syncing.gmi @@ -0,0 +1,101 @@ +# Auto Syncing + +I have a remote server that acts as sort-of a DMZ between my friends and local server. I recently obtained two 16TB drives that I set up in my local server to act as a NAS. The only issue is - my remote server has limited space and lacks a decent structure to just rely on basic rsyncing. So I wrote a script that checks if any new files exist in a place and downloads them. + +## ssync + +The script I wrote I called ssync and I set it up to just run on a */1 cron. + +=> https://git.senders.io/senders/ssync/tree/ssync [https] ssync (git) + +```ssync +... # variable setup and prechecks + +log "Fetching files" +mkdir -p $RUN_DIR +ssh -i $KEY_FILE $REMOTE \ + "find ${REMOTE_DIR} -newermt ${PREV_RUN_DATE} -exec realpath --relative-to ${REMOTE_DIR} {} \;" \ + >> $CURGET_FILE +comm -23 <(sort -u $CURGET_FILE) <(sort -u $FETCHED_FILE) > $FETCH_FILE +COUNT=$(wc -l $FETCH_FILE | cut -d' ' -f1) + +if [ $COUNT -gt 0 ]; then + # Syncing + log "Found ${COUNT} files to fetch" + + cat $FETCH_FILE >> $FETCHED_FILE + log "Wrote files to fetched files" + log "Syncing now" + cat $FETCH_FILE | xargs -n1 -P$PARALLEL -I '{}' rsync -e "ssh -i $KEY_FILE" \ + -av \ + $REMOTE:${REMOTE_DIR}/'{}' ${SRC_DIR} +else + log "No files to sync" +fi +echo $NEXT_RUN_DATE > $LASTRAN_FILE +log "Done syncing" +``` + +The script relies on 5 main utilities: +* ssh +* find +* comm +* xargs +* rsync + +I use ssh to connect to the remote server and find all the files that have been created since the last run and pipe that into a file on my local machine. Then I compare those files against everything I've fetched to see if anything has already been fetched (or is currently being fetched). + +With the new list of files to grab (the output of comm) I xargs those files into rsync with a parallelism of 5 (about what my bandwidth can manage if they're large files) + +Once the sync is complete I update the next run date with start time of the script to use on the next run. This means if a sync takes 15 minutes - each minute we will still be looking for files using the date of the last successful run. I did this as a way to ensure we're not missing anything. I like to write my scripts with a margin of error - I'd rather have a duplicate on my local machine than lose something. But the fetched file contains a list of everything fetched - or in the process of fetching - so even though our ssh list will pull the files comm weeds those out. + +Everything goes into an inbox directory that I sort though periodically cataloging the files into their proper directory. + +## Joy + +It is very nice to now be able to not have to think "did I sync this directory yet?". Since I have a limited amount of space on the remote server it's good to know I can just delete anything a few days old without worry. + +I've had this remote server for 8 years and I've never got around to setting this up. So it's a huge relief having this finally. + +## Pain + +This entire process would've been SO MUCH simpler if I had just made the two directories match file structure. I could simply run rsync relative to the two main outer-directories and called it a day. + +## find + +I use xargs and comm all the time. But find is one of those utilities that honestly I never knew had the -exec option. This has been a huge life savor in getting everything off of this remote server that I have missed. + +I wanted to find any files that didn't exist already on my local machine so on both servers I ran + +```index.sh +find . -type f -exec basename {} \; | sort > index.txt +``` + +This created a list of all of the files on each device. Then I could just comm -23 and get the ones that I needed to go get. I could rely that the names wouldn't conflict so I just did: + +```get-realpath.sh +cat remote-only.txt | xargs -I{} find . -iname '*{}' -exec realpath {} \; +``` + +And now I knew exactly where the files were that I was missing and I could figure out the best way to fetch all the files. + +## I spend my weekend shell scripting + +I don't know about most of you - but I honestly could not compute without access to the shell. So much of what I do is simplified because I can write a line of commands that execute the actions I want as a single process (through pipelining) and then make a few adjustments and run it again for a new set of data. + +=> https://youtu.be/tc4ROCJYbm0?t=341 [youtube] AT&T Archives: The UNIX Operating System (ts=5:41) + +I recommend you watch the entire video if you're a fan of the unix style operating systems - but around 6 minutes David Kernighan explains pipelining and the power it provides. I use this video as a reference all the time when asked what I love about my Linux machine, and why I wouldn't want to go back to windows full time. He breaks it down into such a clear and precise way that I find is useful to explain to non-technical/people unfamiliar with unix or the command line, why and how the command line can be so powerful. + +## Time to write + +This was a perfect candidate for writing a script because the process of syncing files over the internet takes time - so spending a few hours perfecting this script ultimately saves me time and creates peace of mind. But writing a script to look at the file and try and discern where to move it to once it appeared on the local machine is not worth the time. A simple mv will take seconds at most - but parsing the file name, maybe looking at some metadata - and trying to guess what directory it belongs in - honestly there are far too many requirements to even list - that a human taking a few minutes on the weekend to just create some new dirs and move the files into them is where scripting isn't worth it (yet). + +## Conclusion + +I know in my reply about the shell not being good for automation I go into shell scripting - but I wouldn't be surprised if I've gushed over it in a few other gemlogs. There is something about the terminal that just clicks with how my brain wants to manage the PC, and I wouldn't want to use a computer with out. + +# Links + +=> /gemlog/ Gemlog +=> / Home diff --git a/gemini/gemlog/index.gmi b/gemini/gemlog/index.gmi index c0f81c1..e7f661f 100644 --- a/gemini/gemlog/index.gmi +++ b/gemini/gemlog/index.gmi @@ -4,6 +4,7 @@ Welcome to my gemlog. I post whenever I do something I feel is worth writing abo ## My posts +=> 2021-04-26-auto-syncing.gmi 2021-04-26 - Auto Syncing => 2021-04-25-stowaway-2021.gmi 2021-04-25 - Stowaway (2021) => 2021-04-23-re-the-linux-shell-is-not-a-good-automation-platform.gmi 2021-04-23 - re: The Linux shell is not a good automation platform => 2021-04-21-vaccination.gmi 2021-04-21 - I got vaccinated! (part 1) diff --git a/gemini/index.gmi b/gemini/index.gmi index f4ea114..0df41ed 100644 --- a/gemini/index.gmi +++ b/gemini/index.gmi @@ -56,6 +56,3 @@ And if there is anything critical about this capsule/hosting/security please sen Thanks! And if you sub, shoot me an email and I'll happily sub back :) -## P.S - Cert migration (2021-04-06) - -I migrated my cert! Sorry for the inconvenience! -- cgit v1.2.3-54-g00ecf