diff options
Diffstat (limited to 'gemini/gemlog/2021-04-15-capsule-stats.gmi')
-rw-r--r-- | gemini/gemlog/2021-04-15-capsule-stats.gmi | 129 |
1 files changed, 129 insertions, 0 deletions
diff --git a/gemini/gemlog/2021-04-15-capsule-stats.gmi b/gemini/gemlog/2021-04-15-capsule-stats.gmi new file mode 100644 index 0000000..e5b8fc3 --- /dev/null +++ b/gemini/gemlog/2021-04-15-capsule-stats.gmi @@ -0,0 +1,129 @@ +# Capsule Stats + +I was curious what the general traffic of my capsule was. I tinkered with the idea for my webserver actually setting up some sort of Elk (elastic, logstash, kibana) setup to get some monitoring and metrics on my actual server. But for gemini, where I actually HAVE traffic, I decided to just to have a live look at it. + +## Logfile + +I am running my own server whose access logging is in the syntax: + +```access.log +2021-04-15T02:41:04,899Z IN /67.86.nnn.nnn:33378 gemini://senders.io/feed/atom.xml 33 +2021-04-15T02:41:04,907Z OUT 20 application/xml; lang=en; 3452 +2021-04-15T02:41:04,950Z IN /67.86.nnn.nnn:33380 gemini://senders.io/gemlog/feed/atom.xml 40 +2021-04-15T02:41:04,951Z OUT 20 application/xml; lang=en; 3467 +``` + +These are tab separated lines broken down into two categories: IN and OUT. + +### IN logline + +IN logs are requests: + +```access.log IN +[timestamp] [tab] IN [IP] [tab] [URI] [tab] [SIZE] +``` + +### OUT logline + +OUT logs are responses: + +```access.log +[timestamp] [tab] OUT [STATUS] [tab] [META] [tab] [SIZE] +``` + +## Generating stats + +These are tab structured lines and it is pretty easy to just calculate some basic stats on incoming and outgoing messages using the wonderful world of bash scripting. + +### calc.sh + +```calc.sh +#!/usr/bin/env bash + +LOGFILE=$1 +OUTFILE=$2 + +if [ $# -lt 2 ]; then + echo "Usage: + ./calc.sh logs/access.log gemini/stats.gmi + " +fi + +# Stats for today +TODAY=$(date -Id) +echo -e "Stats for day:\t$TODAY" > $OUTFILE +echo -e " Total Reqs:\t"$(grep 'OUT' ${LOGFILE} | grep "${TODAY}" | wc -l) >> $OUTFILE +echo -e " Gemlog Reads:\t"$(grep 'IN' ${LOGFILE} | grep "${TODAY}" | grep "gemlog" | grep "gmi" | wc -l) >> $OUTFILE +echo "Top 5 Gemlogs" >> $OUTFILE +echo "--------------" >> $OUTFILE +grep "IN" ${LOGFILE} | grep "${TODAY}" | cut -f4 | grep "gemlog" | grep ".gmi" | sort | uniq -c | sort -rn | head -n5 >> $OUTFILE + +# Stats total +EARLIEST=$(head -n1 $LOGFILE | cut -f1) +echo "" >> $OUTFILE +echo -e " Stats since:\t$EARLIEST" >> $OUTFILE +echo -e " Total Reqs:\t"$(grep 'OUT' ${LOGFILE} | wc -l) >> $OUTFILE +echo -e " Gemlog Reads:\t"$(grep 'IN' ${LOGFILE} | grep "gemlog" | grep "gmi" | wc -l) >> $OUTFILE +echo "Top 5 Gemlogs" >> $OUTFILE +echo "--------------" >> $OUTFILE +grep "IN" ${LOGFILE} | cut -f4 | grep "gemlog" | grep ".gmi" | sort | uniq -c | sort -rn | head -n5 >> $OUTFILE + +# print generating timestamp +echo -e "\n// generated $(date -u -Is)" >> $OUTFILE +``` + +This bash script is basically a combination of: grep, cut, sort, uniq. I know that I can optimize this much further, but I wrote this in a way where I filter down in steps to aid in my understanding of what and why I am filtering. + +I also wrote the script to be run where I can change the in and out file, but that was a relic of this being something I ran locally and not to a fixed location on my server. + +### What I filter for + +I decided to break information into two things: total requests - where I filter all log lines basically. And then "gemlog reads" since the homepage and atom.xml are things I don't really care about. But it's pretty good to see the percent of the requests are page reads. And I also decided to show the "from the beginning of the file" stats as well (originally I just was just calculating the stats for day). + +### The output + +```stats.txt +Stats for day: 2021-04-14 + Total Reqs: 301 + Gemlog Reads: 155 +Top 5 Gemlogs +-------------- + 53 gemini://senders.io/gemlog/2021-04-13-digital-hygiene-one-week-in.gmi + 14 gemini://senders.io/gemlog/2021-04-09-humans-first-words.gmi + 13 gemini://senders.io/gemlog/2021-04-12-girl-2020-land-before-time.gmi + 7 gemini://senders.io/gemlog/2021-04-10-floc.gmi + 7 gemini://senders.io/gemlog/2021-04-03-digital-hygiene.gmi + + Stats since: 2021-04-07T00:53:38,811Z + Total Reqs: 3500 + Gemlog Reads: 1852 +Top 5 Gemlogs +-------------- + 239 gemini://senders.io/gemlog/2021-04-10-floc.gmi + 207 gemini://senders.io/gemlog/2021-04-13-digital-hygiene-one-week-in.gmi + 186 gemini://senders.io/gemlog/2021-04-07-devlog-4-deployed-in-production.gmi + 173 gemini://senders.io/gemlog/2021-04-09-humans-first-words.gmi + 138 gemini://senders.io/gemlog/2021-04-12-girl-2020-land-before-time.gmi + +// generated 2021-04-15T02:56:01+00:00 +``` + +## Generating this file + +I run this via a cronjob because I don't have any CGI support to load these stats on demand on my server. I run the calc script every minute to write it to file on my capsule: + +=> /stats.txt /stats.txt + +## Conculsion + +I found this a fun exercise to see how well a particular gemlog "was doing" - were people clicking into it? It's also interesting to see some traffic numbers on days (like today) where I haven't posted. + +## P.S - GDPR + +I realized upon writing this calc process - I probably should do something about the fact I am logging IPs onto my server outside of the EU, but I know some of you are IN the EU. I have a retention setup via cron to wipe my logs every month which if I recall should be compliant. But I might just remove the actual IP from the log and add a UUID on the IN and OUT so I can properly match up the IN and OUT lines. I really don't need your IP, nor would I want my IP sitting on some random server somewhere (though I am not subject to GPDR so I probably have no recourse to ask you to remove it). I know some of you run larger sites/capsules - what do you do about access logs? If this were HTTP it would probably make sense to keep the IP logs to capture/manage any traffic to monitor for potentially malicious actions and ban them etc? So just curious to not reinvent the wheel here... + +I thought it was neat to take a look at the general traffic on my server and share the script :) + +=> /gemlog/ Gemlog +=> / Home + |