Maatkit tools for database administrators

Order into Chaos

Article from ADMIN 02/2010
By
The Maatkit tool suite helps administrators with performance tuning and other tasks for MySQL and other databases.

Maatkit [1], which was named after Maat, the Egyptian goddess of order and justice, gives administrators and developers a comprehensive collection of command-line tools written in Perl that help extend the functionality of database management systems and facilitate management tasks. For example, you can check whether replication is working correctly, identify and repair damaged files, quickly export tables into files, parse data from files, or output the privileges assigned to individual database users. Most of the features were written for MySQL [2]; however, Maatkit does support other free database management systems including PostgreSQL [3] and Memcached [4].

Maatkit was developed in 2006 by Baron Schwartz. Since 2008 Daniel Nichter has been the main developer, although Schwartz still contributes suggestions for new tools and features. Both Schwartz and Nichter work for database specialist Percona, which has often impressed users in the past with performance-boosting patches for MySQL [5].

You can install Maatkit with just a couple of keystrokes. Many package repositories include the tool selection, but, if this is not the case with your distribution, you can download the source code and Deb or RPM packages off the project website. To quickly download and start using an individual tool, you can also try:

wget http://www.maatkit.org/get/ toolname

Maatkit currently includes 30 tools; Admin magazine picked a couple of them for closer inspection.

Fast Dump and Restore

Wherever you need to roll out a new system, you need to test it thoroughly up front, preferably with real data. Because the data will typically be accessible, you can transfer the individual databases to the system you need to test. MySQL includes the mysqldump tool for creating a dump of the database. This action just copies the content of a database, or all databases, to a file, which the other system then just reads. The mysqldump command does this reliably, but slowly, because it backs up all the databases and tables sequentially. If you have a system with seven databases and only one of them is slightly larger, doing:

time mysqldump --skip-opt ↩
   --create-options --database datenkrake ↩
     wikkawiki piwik sprzblog mysql ↩
     limesurvey -uroot -ppassword > ↩
   backup.sql

will take 21 seconds to complete:

real    0m21.626s
user    0m0.000s
sys    0m0.000s

In the case of larger databases on non-replicated servers, mysqldump is practically no longer an option, because it locks the tables for the duration of the backup (to ensure consistency). Maatkit has the mk-parallel-dump command to accelerate this process by transferring the data to files in parallel – assuming you have a system with multiple CPU cores.

By default, mk-parallel-dump uses two threads to copy the databases, but you can increase this number, assuming your hardware supports this, by setting the --threads option, thus substantially speeding up the dump. On my test system, mk-parallel-dump did the same job nearly twice as quickly as mysqldump (Listing 1).

Listing 1

Parallel Dump

$ mk-parallel-dump -uroot -ppassword --database datenkrake,wikkawiki,piwik,sprzblog,mysql,limesurvey  --base-dir=db-backup/
CHUNK  TIME  EXIT  SKIPPED DATABASE.TABLE
   db 11.43     0        0 datenkrake
   db  1.34     0        0 limesurvey
   db  2.75     0        0 sprzblog
   db  2.67     0        0 mysql
   db  3.45     0        0 piwik
   db  2.50     0        0 wikkawiki
  all 13.63     0        0 -

Baron Schwartz warns database administrators not to use mk-parallel-dump as a backup solution. If you work with really large tables, you will soon notice why. The tool accesses mysqldump but without necessarily enabling its Quick option. Thus, the data must fit into the available RAM before they are written to a file. If the tables are too big to allow this, the system will start to swap, or mk-parallel-dump will throw errors before continuing. If you don't see the error message, or run mk-parallel-dump as a cron job, you might be in for a nasty surprise when you try to restore the results. The counterpart to mk-parallel-dump is mk-parallel-restore, which writes the tables you copied back to the database and again uses multiple threads to do so.

Stress Test for the DBMS

If you want to test the performance of your database management system, the Maatkit suite includes the mk-log-player tool. Let's see what the tool actually does. Typing:

mk-log-player --split Thread_id ↩
   --type genlog --only-select ↩
   --base-dir=stresstest ↩
     /var/log/mysql/mysql.log

tells Maatkit to explore the MySQL logfile for write access and create contiguous session files organized by thread in the stresstest directory. You can then run these in parallel against the database management system (DBMS) (Figure 1). Tests like this make sense if you need to assess different configurations of your DBMS.

Figure 1: The mk-log-player command extracts queries from logfiles; administrators can then send them in parallel to the database.

After creating the sessions, you can type:

mk-log-player --play -uroot ↩
   -ppassword ./stresstest ↩
   --base-dir results h=localhost

to send them to the database. By default, mk-log-player will issue all the database queries stored in the sessions to the database in two parallel threads. Note how long the DBMS takes to process the queries and write the results for each thread to a separate file in slow log format, which you can then read using mk-query-digest.

Finding Slow Queries

Once you have a sufficient volume of logfiles, you can evaluate the files with mk-query-digest. By default, the tool will use the slow query format, although it can also interpret normal logfiles and binary logs, or – if you can't access the database server directly – evaluate traffic captured by tcpdump. Each query is fingerprinted so that you can group identical queries, such as SELECTs or UPDATEs, and evaluate them separately.

The output from mk-query-digest based on the stress test shown in Figure 2, shows that some 19,200, comprising 55 different commands, were issued. The system, a server with 768MB RAM and a Pentium 3 CPU, took a total of 22 seconds to answer them, the longest response took 32 milliseconds, the shortest was 126 microseconds, with a mean server query response time of less than one millisecond.

Figure 2: The mk-query-digest output, showing what the DBMS is capable of.

Lower down the tool lists the 19 queries that occupied the most server time. In this example, the DBMS took longest to serve the Piwik web analysis tool (Figure 3). Following the overview is a more detailed analysis for the 19 "top consumers" that provides an idea of where bottlenecks are occurring and whether you can optimize for them. The mk-query-digest command offers more than 50 command-line options that database administrators can use to specify, for example, whether they want to query a specific database, at a specific time of day, or see the results in a different order.

Figure 3: The mk-query-digest output shows the queries that are keeping the system busy.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Stopping SQL Injection

    SQL injection can strike at any moment. GreenSQL is an effective remedy that sits between the database and application and filters out suspicious queries.

  • Blocking SQL injections with GreenSQL
    SQL injection can strike at any moment. GreenSQL is an effective remedy that sits between the database and application and filters out suspicious queries.
  • RSQL, a Microsoft SQL clone
    The open source relational database RSQL aims to completely replace slimmed-down instances of Microsoft SQL Server.
  • Chive

    Generations of web admins have used phpMyAdmin or SQL Buddy to communicate with their databases. Newcomer Chive has the potential to send the legacy tools into early retirement, thanks to its state-of-the-art Ajax interface and impressive feature scope.

  • Optimizing SQL with Execution Plans

    Understanding database execution plans is the key to assessing effectively the optimum potential of an SQL query or estimating your future resource requirements.

comments powered by Disqus