Home SysAdmin Commands & Shells Parallel MySQL Queries

Parallel MySQL Queries

January 9, 2024

339

Originally published January 7, 2018 @ 2:53 pm

Unlike Oracle, MySQL has no support for parallel SQL queries. A bummer for sure, in light of all the awesome multi-coreness we enjoy these days. There’ve been some attempts (1, 2) to get around this unfortunate limitation: both relying on running multiple queries in the background with little or no flow control.

The downside of this technique is the likelihood of overloading the server: contrary to the expected improvement in performance, you may crash your whole box. And even if you don’t, contention for system resources may slow down the query – not speed it up. The basic idea is sound, but flow control is essential to keep the system from sinking into the depths of context switching and system interrupts.

For some time now I’ve been playing with GNU Parallel and it is the perfect choice for the job. Without further ado, here’s a quick practical example. We have a text file with a list of search keywords – one per line. For each keyword we need to run a SELECT query and count the number of results. If the number of hits is one or more, then we need to display the keyword along with the count.

Here’s an example of the keyword file:

# cat search_string_one_per_line.txt

keyword1
keyword2
keyword_3
keyword 04

The common approach would be to loop through the file and execute a SELECT query for each keyword:

#!/bin/bash
db_user="dbuser"
db_pass="dbpasswd"
db_name="dbname"
tbl_name="tblname"
MYSQL="/usr/bin/mysql --batch --skip-column-names --max_allowed_packet=100M -u${db_user} -p${db_pass} ${db_name} -e"

for search_string in `cat search_string_one_per_line.txt`
do
   count=$($MYSQL "SELECT COUNT(*) FROM ${tbl_name} WHERE keyword \
   LIKE '%${search_string}%' 
   ORDER BY timestamp")

   if [ "$${count}" -gt 0 ]
   then
	echo "${search_string}:\t$${count}"
   fi
done

This should work just fine, but will probably take awhile. Annoyingly, this approach will utilize just a single core of your fancy multi-core server. It will be like cutting your hair – one at a time.

Here’s pretty much the same process, but the keyword input is slightly massaged using “sed” and piped through “parallel”. After “sed” is done with it, the list of keywords will look something like this: “keyword1\nkeyword2\nkeyword_3\nkeyword 04”. Just the way “parallel” likes it.

The GNU Parallel utility will figure out how many cores your system has and will put them all to good use. You can run “htop” in the background and watch it light up like a Christmas tree.

#!/bin/bash

select_query() {
	search_string=""
	db_user="dbuser"
	db_pass="dbpasswd"
	db_name="dbname"
	tbl_name="tblname"
	MYSQL="/usr/bin/mysql --batch --skip-column-names --max_allowed_packet=100M -u${db_user} -p${db_pass} ${db_name} -e"
	count=$($MYSQL "SELECT COUNT(*) FROM ${tbl_name} WHERE keyword LIKE '%${search_string}%' ORDER BY timestamp")
	if [ "$${count}" -gt 0 ]
	then
			echo "${search_string}:\t$${count}"
	fi
}

export -f select_query

query_run() {
	search_string=$(sed ':a;N;$!ba;s/\n/\n/g' search_string_one_per_line.txt)
	echo -e "${search_string}" | parallel --gnu --will-cite select_query "{}"
}

query_run

Will this be faster? In most cases, yes. A whole lot faster. However, there are some gotchas you need to consider. First, just because your server has 128 cores, doesn’t mean the hard drive hosting your database will be able to keep up. You may run into I/O wait scenario and the end result will be disappointing. So make sure your datastore can keep up with the demand.

Second – memory. May even be first. If your server has 128 cores, chances are, it has plenty of RAM also, but it’s worth checking. If the server starts using disk swap, your SQL query will not set any speed records.

Finally, just like rubber ducks, MySQL is not actually made out of rubber: it won’t stretch forever. At the very least you may need to check my.cnf and, perhaps, up the “max_connections” and point “tmpdir” to a filesystem with plenty of space (it defaults to /tmp, which may not be sufficient). So, have fun, but not on the production server.

Igor

Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.

Symbol	USD	% 1h	% 24h	% 7d
BTC	37,157	0.55	2.50	7.72
ETH	1,716.5	0.31	3.66	4.71
USDT	1.001	0.00	0.04	0.09
XRP	0.3813	0.14	0.63	2.13
BNB	600.73	0.49	0.05	1.31
SOL	147.93	0.13	1.23	6.13
USDC	0.9999	0.00	0.01	0.00
DOGE	0.1808	0.36	0.42	14.48
	?	---	0.00	0.00
	?	---	0.00	0.00

Bitcoin $ 37,157	Bitcoin 2.50 %
Ethereum $ 1,716.5	Ethereum 3.66 %
Litecoin $ 53.16	Litecoin 0.18 %
XRP $ 0.3813	XRP 0.63 %

Adding and Removing sshd instances on CentOS 7

Adding and Removing sshd instances on CentOS 6

Gnuplot with Bash

Multi-Dimensional Arrays in Bash

Asciinema Notes

Notes on ownCloud configuration

Removing Chef Server Installation

Curated Downloads

Exporting WordPress to Markdown

Finding Passwords in Logs and Shell History

Anatomy of Internet Bullshit

Creating a Chroot Jail for SSH Access

Late Night Rant: Facebook

Late Night Rant: College Admissions Scandal

Measure DNS Server Performance

Resizing Photos for Instagram

QNAP NAS Performance Analysis

Adding and Removing sshd instances on CentOS 7

Adding and Removing sshd instances on CentOS 6

Measure DNS Server Performance

Inventory Network Services with Nmap

Finding Duplicate Photos

Maryland Renaissance Festival

Focus Stacking with Lightroom and Photoshop

Longwood Gardens, April 2018

Parallel MySQL Queries

Using IPTables to Allow Access to Private Networks

GPT Prompt Engineering

Notes on ownCloud configuration

Migrating Filesystems with Active Processes

Nohup Without Nohup.out

Document Conversion with Unoconv