Originally published June 26, 2020 @ 12:24 am

I frequently use MySQL to store data collected or processed by various shell scripts. The database comes in handy when massaging data with good old awk and sed gets too cumbersome.

This is really very basic stuff: just loading some data into MySQL and retrieving it. The usual database configuration section I include in my scripts looks something like this:

db_host=hostname.local
db_port=3306
db_user="mysqluser"
db_pass="mypass"
db_name="schema_name"
tbl_name="table_name"
MYSQL="/usr/bin/mysql --batch --skip-column-names --max_allowed_packet=100M -h${db_host} -u${db_user} -p${db_pass} ${db_name} -e"

Here’s a way to retrieve some data from the table:

$MYSQL "select count(*) from $tbl_name"

Now, let’s say your database table has four columns:

id timestamp hostname message
(primary index, auto-increment) datetime varchar long text

Let’s also imagine you have a comma-separated file $datafile that looks like this:

2020-06-25 23:23:53,host01,Linux ncc1711 2.6.32-754.30.2.el6.x86_64 #1 SMP

You would like to load this file into the database. Here you go:

${MYSQL} "LOAD DATA LOCAL INFILE '${datafile}' IGNORE INTO TABLE ${tbl_name} \
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' \
(timestamp, hostname, message);"

Here’s an example of comparing records from yesterday with those older than two weeks to find missing lines:

diff --new-line-format="" --unchanged-line-format="" \
<($MYSQL "SELECT DISTINCT message FROM ${tbl_name} WHERE hostname LIKE 'host01' AND timestamp BETWEEN CURDATE() - INTERVAL 1 DAY AND CURDATE() - INTERVAL 1 SECOND ORDER BY message ASC") \
<($MYSQL "SELECT DISTINCT message FROM ${tbl_name} WHERE hostname LIKE 'host01' AND timestamp < '$(date -d'2 weeks ago' +'%Y-%m-%d %H:%M:%S')' ORDER BY message ASC")

Here’s another example where we extract dhcpd messages from /var/log/messages, convert it to CSV format, and load it into the database table:

# Source file format:
Jun 21 04:20:42 ncc1711 dhcpd: DHCPREQUEST for 192.168.122.159 from 6c:72:20:c5:0b:83 (One) via eth1
Jun 21 04:20:42 ncc1711 dhcpd: DHCPACK on 192.168.122.159 to 6c:72:20:c5:0b:83 (One) via eth1
Jun 21 04:20:43 ncc1711 dhcpd: DHCPREQUEST for 192.168.122.146 from e0:37:bf:8e:a2:f1 (audiocast) via eth1
Jun 21 04:20:43 ncc1711 dhcpd: DHCPACK on 192.168.122.146 to e0:37:bf:8e:a2:f1 (audiocast) via eth1

# Script:
tmpfile=$(mktemp)
grep dhcpd /var/log/messages | while read line; do 
timestamp="$(date -d"$(echo $line | awk '{print $1,$2,$3}')" +'%F %T')"
host_name="$(echo $line | awk '{print $4}')"
message="$(echo $line | \
awk '{ s = ""; for (i = 5; i <= NF; i++) s = s $i " "; print s }' | \
sed 's/,//g')"
echo "${timestamp},${host_name},${message}" >> $tmpfile
done
${MYSQL} "LOAD DATA LOCAL INFILE '${tmpfile}' IGNORE INTO TABLE ${tbl_name} \
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' \
(timestamp, hostname, message);"
/bin/rm $tmpfile

This is pretty much it. The rest is just your SQL foo.