Simple database process parallelization on multi cpu database servers

Say you have a multi cpu database such as MySQL, MariaDB or Postgres SQL that can only spread one session load over one cpu.

This post aims at providing an easy & simple way to overcome this using a basic shell command : xargs !

seq 1 10 | xargs -P 4 -I {} echo "select * from {}"

This will send a sequence that will be individually processed by xargs which will fork off individual processes for each of them using 4 threads (you get it right it’s the -P parameter).

Sample output (the outcome order can be different at each execution) :

select * from 1
select * from 2
select * from 3
select * from 4
select * from 5
select * from 6
select * from 7
select * from 9
select * from 10
select * from 8

In case you don’t find this awesome you may stop reading this post right now.

In case you did, I’m sure you’re beginning to see how it can be used in so so many cases but in particular : our database processes.

Let’s take a simple postgres example :

psql -t -c "select array_agg(jobid) from myschema.myjobtable t group by mygroupcolumn" \
xargs -P 1 -I {} \
psql  -c "begin;  select * from myschema.myjob('{}')).*; commit;"

That one will retrieve a job list from database and run a myschema.myjob stored procedure taking a job id list as parameter on 1 process, looking at our database server load using top we can see only one cpu is used :

 top - 14:58:30 up 134 days,  4:20,  1 user,  load average: 0.57, 0.23, 0.08
Tasks:  38 total,   2 running,  36 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 99.0 us,  0.3 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   6291456 total,  6280152 used,    11304 free,        0 buffers
KiB Swap:  2097152 total,   229232 used,  1867920 free.  6168656 cached Mem

Now running the same job with as many processes as our database server has cpu :

psql -t -c "select array_agg(jobid) from myschema.myjobtable t group by mygroupcolumn" \
xargs -P 4 -I {} \
psql  -c "begin;  select * from myschema.myjob('{}')).*; commit;"
top - 15:00:17 up 134 days,  4:22,  1 user,  load average: 1.27, 0.52, 0.20
Tasks:  41 total,   5 running,  36 sleeping,   0 stopped,   0 zombie
%Cpu0  : 98.3 us,  1.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 99.1 us,  0.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 91.2 us,  0.4 sy,  0.0 ni,  8.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 92.2 us,  1.3 sy,  0.0 ni,  6.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   6291456 total,  6277252 used,    14204 free,        0 buffers
KiB Swap:  2097152 total,   321844 used,  1775308 free.  6070056 cached Mem

We can see the load of our database process is spread over the 4 cpu. Ain’t it cool ?!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s