A storage engine for every use case

Data Machine

Third-Party Storage Engines

A number of storage engines come with Percona Server and MariaDB. Each fork has a slightly modified version of these engines, mainly to support MariaDB-specific features, but you always should be able to recompile the versions in Percona Server for use in MySQL.

TokuDB was initially developed by a small company called TokuTek, later acquired by Percona. It is optimized for write-intensive workloads: Its indexes have a set of buffers (called a fractal tree) that reduce write operations, flushing a portion of the indexes to disk altogether; the compression rate is very good; and in a slave, the tables can skip the tests on unique indexes and trust the checks already made by the master.

Another interesting feature is clustering indexes. Whereas indexes usually speed up filtering, ordering, and grouping because they are ordered data structures, clustering indexes contain extra columns that don't affect the order of index entries. As a consequence, they don't slow down writes when index entries are added, deleted, or modified; however, when a query only reads columns from a clustering index, a single index search will find all relevant data, which saves time because it eliminates the additional, normally necessary operation of looking for some columns in the table data.

TokuDB supports transactions, and it is usable as a general-purpose engine. Its optimizations aim to reduce I/O operations – in particular, those that hit the disk.

RocksDB , which I mentioned previously as MyRocks , the storage engine from Facebook, is present in Percona Server and MariaDB. The terminology could be a bit confusing here. Facebook uses the name MyRocks, which is not used in MariaDB or Persona Server. Instead, the name RocksDB is used, which could lead to some confusion because its name is identical to the RocksDB NoSQL database. With this storage engine, part of the RocksDB technology is made available in MySQL, and usable via a SQL interface. RocksDB (from now on we will use this term for the storage engine) is optimized for flash storage, to MySQL.

According to various articles and conference speeches from the development team, MyRocks was created because Facebook was not happy with some flaws in RocksDB, in particular so-called write amplification, wherein each time something is written in an InnoDB table, several physical writes occur, which can shorten the life of SSD devices.

RocksDB has a very good compression ratio, which reduces the number of I/O operations and the need for storage space. In general, it is designed to serve most workloads, while keeping resource usage relatively low. CPU usage is generally low, and although both InnoDB and RocksDB performance benefits by adding more memory, it seems to be less critical with RocksDB. InnoDB write I/O operations per second (IOPS) is much more stable compared with InnoDB.

RocksDB does not necessarily outperform InnoDB in all cases, though. The InnoDB user base is much wider, which means it is more battle tested in a wide variety of use cases.

Storage Engines Distributed with MariaDB

Spider is design to implement transparent sharding (i.e., row-wise partitioning). The key to understanding Spider is to understand partitioning. The feature allows you to split a table into multiple physical chunks (partitions), which is generally useful with big tables, because each partition will be smaller than the whole table and therefore faster to read and write and easier to manage.

Each Spider partition can be linked to another table located on the same server or on a remote server. This idea is not much different from that behind the FEDERATED engine; however, instead of linking one table to a remote table, you can link each individual partition to different remote tables. In this way, data can be physically distributed on multiple servers.

Although this mechanism is transparent to the user, you need to remember that each query will result in one or more remote connections to other servers, and Spider doesn't cache data locally, which adds latency to each query.

The CONNECT storage engine defines tables by data type, which allows importing and exporting data stored in different formats (e.g., CSV, JSON, XML, HTML). It can access remote tables from other MySQL or MariaDB servers, or it can access any other DBMS through the Open Database Connectivity (ODBC) or Java Database Connectivity (JDBC) standard.

CONNECT can define a template to parse logfiles in several formats (e.g., by defining the information stored, it can read Apache logfiles), and it has special read-only table types that retrieve information from the system, such as the contents of a directory. CONNECT is not meant to be superfast. Before performing complex operations, it would be a good idea to copy data from a CONNECT table to InnoDB. Also note that transactions are not supported. However, CONNECT is a convenient way to perform complex import/export operations.

Aria is an evolution of MyISAM and was originally meant to replace InnoDB in MariaDB, because InnoDB belongs to Oracle. However, plans to support transactions fully in Aria seem to have been abandoned long ago, and this engine is simply "crash-safe." After a crash, Aria will not lose data and is the main reason it is considered an evolution of MyISAM, which is not crash-safe. However, Aria's write performance does not shine, and it never managed to reach the quality of InnoDB.

Sequence provides virtual tables that return integer sequences. Instead of creating tables, you just query an ephemeral table, keeping in mind its name pattern. For example,

SELECT seq FROM seq_1_to_10_step_2;

returns all numbers from 1 to 10, incremented by 2 (pair numbers). Slightly more complex queries can return different sequences. For example,

SELECT CHAR(s1.seq), CHAR(s2.seq) FROM seq_97_to_122 s1 JOIN seq_97_to_122 s2;

returns all combinations of lowercase characters from a to z.

In general, Sequence is a convenient way to populate a table with initial data (e.g., a sequence of numbers, character strings, or dates).

OQGRAPH (Open Query GRAPH) is a computation engine created to store hierarchical data or graphs (e.g., a directory tree or links between social network users) in a relational database, which was always a problem until MariaDB implemented the WITH syntax, which allows you to retrieve this kind of data easily.

An OQGRAPH table is built on a regular table (e.g., InnoDB) with references to the rows in the underlying table and arcs between them. The arcs can have a weight that represents the strength of the relation. Running a SELECT against the OQGRAPH table with a particular syntax allows you to find the shortest path quickly between two nodes of a graph.

SphinxSE (Sphinx storage engine) allows MySQL and MariaDB to connect to the Sphinx open source search engine to query its indexes, which is very useful to MySQL and MariaDB because their full-text functionality is not very advanced compared with technologies like Sphinx. Although Sphinx is not a trendy technology anymore, it is stable, and its full-text capabilities are battle tested. The plugin is maintained by the Sphinx team and works equally well with its fork Manticore.

Cassandra is a well-known NoSQL database. Although it now uses a subset of the SQL language, when this storage engine was written, Cassandra used the Apache Thrift protocol [2]. Although never widely used in production environments, Cassandra was an interesting proof of concept, showing that a MySQL/MariaDB plugin can allow interaction with the NoSQL world.

Using Multiple Engines

General-purpose storage engines like InnoDB, RocksDB, and TokuDB have a cache for the data and indexes accessed most often and for buffering writes to disk. Often, these caches are given as much memory as possible and a good number of background threads, which means that using different general-purpose engines on the same servers can lead to serious resource problems.

Transactions that involve multiple engines are also a problem for performance, because every engine coordinates its own transactions with different mechanisms. To make transactions consistent across different engines, MySQL has to use the binary log as a coordinator, as well as a two-phase commit internally that is, of course, slow. The general suggestion is to avoid these kinds of situations.

However, there is no reason to avoid the use of different engines on different servers. Even in the case of a master-slave replication chain, it is generally possible to use a certain storage engine on the master and a different one on the slaves – or on some of them, especially when the slaves must deal with a different workload.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>


		<div class=