Originally published March 19, 2020 @ 10:37 pm
The result of my morbid fascination with the coronavirus situation is this quick bash script that parses Johns Hopkins University coronavirus data to generate a quick report for the current date for the specified countries.
The plan is to add some statistical analysis to spot potential anomalies in the reported data. For now, just a simple summary for the current day.
The script is below. You can also download it from my GitHub repo here. Here’s an example of how to run it:
./covid19_stats_mk2.sh -c US -c Italy -c Spain -c China -c "United Kingdom" COUNTRY DATE CONFIRMED DEATHS RECOVERED ACTIVE MORTALITY RECOVERY US 03-19-2020 13680 200 108 13372 1.4% .7% Italy 03-19-2020 41035 3405 4440 33190 8.2% 10.8% Spain 03-19-2020 17963 830 1107 16026 4.6% 6.1% China 03-19-2020 81156 3249 70535 7372 4.0% 86.9% United Kingdom 03-19-2020 2716 138 67 2511 5.0% 2.4%
And this is the script:
It would seem Johns Hopkins University Center for Systems Science and Engineering has issues with maintaining consistent format of their COVID-19 data files. For unknown reasons they rearranged the columns differently for data file from different dates. They also made other arbitrary changes, such as renamed ‘Country_Region’ column to ‘Country/Region’. Well, I hope that made someone very happy.
In any case, I made a couple of changes to my script to compensate for someone’s lack of experience handling data. From bash scripting standpoint you may find interesting the use of *_field
variables that dynamically change to identify the correct data column based on the exact or approximate header name. So, as long JHU CSSE doesn’t rename “Deaths” to “Casualties” or “Confirmed” to “Verified”, we should be fine…
#!/bin/bash while getopts ":c:" opt do case ${opt} in c ) countries+=("${OPTARG}") ;; \? ) echo "Unknown option: -$OPTARG" >&2; exit 1;; : ) echo "Missing option argument for -$OPTARG" >&2; exit 1;; * ) echo "Unimplemented option: -$OPTARG" >&2; exit 1;; esac done shift $((OPTIND -1)) url="https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports" url_raw="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports" if [ -z "${countries}" ] then echo "You need to specify country code. Exiting..." exit 1010 fi curl_get() { curl -s0 -k "${url_raw}/${e}.csv" 2>/dev/null | grep -vE "404: Not Found" > "${tmpfile}" } rulem () { if [ $# -eq 0 ]; then echo "Usage: rulem MESSAGE [RULE_CHARACTER]" return 1 fi printf -v _hr "%*s" $(tput cols) && echo -en ${_hr// /${2--}} && echo -e "\r3[2C$1" } tmpfile="$(mktemp)" tmpfootnotes="$(mktemp)" e="$(date +'%m-%d-%Y')" curl_get if [ ! -s "${tmpfile}" ] then e="$(date -d'-1 days' +'%m-%d-%Y')" curl_get fi if [ ! -s "${tmpfile}" ] then echo "Unable to download CSV file. Exiting..." exit 1030 fi for ((i = 0; i < ${#countries[@]}; i++)) do c="${countries[$i]}" c="$(echo ${c} | sed 's/^ //g')" case ${c} in US) echo -e "* ${c} recovery rates are no longer tracked as of 2020-12-14" >> "${tmpfootnotes}" ;; esac country_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Country.Region/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1) confirmed_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Confirmed/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1) deaths_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Deaths/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1) recovered_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Recovered/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1) if [ ! -z "${country_field}" ] && [ ! -z "${confirmed_field}" ] && [ ! -z "${deaths_field}" ] && [ ! -z "${recovered_field}" ] then confirmed=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$confirmed_field -F, '{s+=$field}END{print s}') deaths=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$deaths_field -F, '{s+=$field}END{print s}') recovered=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$recovered_field -F, '{s+=$field}END{print s}') death_pct="$(echo "scale=1;(${deaths}*100)/${confirmed}"|bc -l)" recovery_pct="$(echo "scale=1;(${recovered}*100)/${confirmed}"|bc -l)" active_cases="$(echo "scale=0;${confirmed}-(${deaths}+${recovered})"|bc -l)" echo "${c},${e},${confirmed},${deaths},${recovered},${active_cases},${death_pct}%,${recovery_pct}%" fi done | (echo "COUNTRY,DATE,CONFIRMED,DEATHS,RECOVERED,ACTIVE,MORTALITY,RECOVERY" && cat) | column -s',' -t echo "" if [ -s "${tmpfootnotes}" ] then cat << EOF $(rulem FOOTNOTES) $(cat "${tmpfootnotes}") EOF fi /bin/rm -f "${tmpfile}" "${tmpfootnotes}" 2>/dev/null
Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.