Featured

Introduction to the WARP storage engine.


WarpSQL currently ships a beta release of the WARP storage engine. The WARP storage engine is based on Fastbit, which is a column store with word aligned hybrid compressed bitmap indexes. WARP storage engine is an acronym for word-aligned relational partitioned storage engine. Fastbit is a mature database technology which has been used in areas such as particle physics and grid computing.

Columnar storage
The WARP storage engine is a columnar engine. This means that it stores each column separately on disk, rather than storing the entire row together on disk. This has significant IO advantages, especially for wide tables, because columns that are not included in the query are not read from disk. This can result in significant performance improvements compared to a row store.

DML operations are supported
Many columnar engines, such as ClickHouse, are append only. The WARP engine supports INSERT, UPDATE, and DELETE operations, but they may be more expensive than comparable operations on a row store.

All MySQL data types except GEOMETRY are supported
Unlike many column stores, all MySQL data types are supported, including long text and long blob columns.

Automatic Indexing
The WARP storage engine automatically indexes tables to match the queries executed against WARP storage engine backed tables.

The WARP storage engine uses engine condition pushdown to efficiently evaluate complex expressions in the storage engine that MySQL can not normally optimize. WARP supports more than 15 types of bitmap indexes, and the desired index characteristics can be selecting using a column comment.

Word-aligned hybrid compressed bitmap indexes
WARP features compressed bitmap indexes. Bitmap indexes have certain advantages and disadvantages compared to traditional sorted indexes such as B*Tree indexes. Bitmap indexes can be used for complex query conditions which B*Tree and hash indexes can not answer. Unlike traditional storage engine, WARP is able to utilize more than one index to answer a query.

MySQL 8 supports hash joins.
Make sure you adjust join_buffer_size appropriately to improve performance of the hash join.

ACID Compliant with MVCC capabilities
WARP beta 2 supports MVCC transactions, with non-locking REPEATABLE-READ and READ-COMMITTED transaction isolation levels. Statements are atomic and the database is crash safe.

Download the WARP storage engine from Github
https://github.com/greenlion/warp

Advertisement

WarpSQL BETA 2 based on 8.0.21 now available for testing

I have been working very hard on WarpSQL beta 2, since announcing the first beta in July. It has been just over 60 days since beta 1 was released, and beta 2 has a whole slew of improvements and bug fixes, and a few things were removed that will be added back into later releases. This blog post mainly focuses on what is new. It took awhile to make all these changes, but future betas should come more quickly. There is a lot packed into this release.

Since the only thing currently different between WarpSQL and upstream MySQL 8 is the WARP storage engine, I will refer to WARP rather than WarpSQL from now on. That will change in follow-on betas which will add query parallelization and materialized views.

Beta 2 has numerous on disk format changes

Beta 1 had a number of bugs that could have resulted in data corruption. This included some bugs in the sparse bitmap used to mark deletions, as well as the possibility of generating duplicate ROWID values after database restarts. In addition, ROWID generation was on a per-table basis, that is, different tables could contain the same ROWID for different tuples. For this reason, it is not possible to safely upgrade a beta 1 database to beta 2, because the data may be in a corrupted state.

The per-table delete bitmap has been replaced with a per-instance delete bitmap and ROWID generation is now global, not table specific, as a row’s unique ROWID is placed on the deleted bitmap when a row is deleted. If you try to start a beta 2 instance on beta 1 data, the server will fail to start. It is highly recommended that you start with a clean data directory, rather than deleting the WARP tables from the data directory.

In addition to the delete bitmap changes, there is now a “database state file” which stores WARP storage engine state, such as the highest handed out transaction id, and ROWID batch information, and the version of the database as stored on disk, such that the database data can be upgraded if a new version of WARP is started on it. The WARP storage engine now has a framework to support upgrading tables in-place if on-disk format changes are made, and the version information in the state file is used to determine if tables are in need of upgrading.

Finally there is the new commit state file (commits.warp) which has the transaction id of each committed transaction. This will be discussed further below.

Support for explicit indexes (CREATE INDEX, or KEY clauses in CREATE TABLE) has been removed in beta 2, because MySQL’s storage engine interface is not agile enough to support range operations on some data types, but not others. Range conditions are not supported in the bitmap indexes for string or decimal values, but there was no way to make this known to the storage engine API. I have an idea for a solution to this problem, which will be implemented in a later beta. Because tables may have been defined with indexes in beta 1, this creates a further incompatibility problem.

Because ROW based replication requires UNIQUE or PRIMARY KEY indexes to replicate UPDATEs and DELETEs, if these operations are going to be used on WARP tables, it is recommended to use STATEMENT based replication until a future release makes explicit indexes available again.

MVCC and ACID compliance

Atomic statements and transactions
Unlike in beta 1, statements against a WARP backed table are now atomic. That means that either an entire statement completes successfully, or not at all. Thus a DELETE, UPDATE or INSERT that triggers an error (for example due to violating a constraint) is completely rolled back. In beta 1, if the database crashed in the middle of an update, or CTRL-C (or KILL) was used to terminate a statement, some of the rows may have been changed and some may have not, which was not atomic. Beta 1 did not support transactions at all.

When an explicit transaction is started (MySQL default behavior is auto-commit), for example with START TRANSACTION, then all of the changes in that transaction are atomic. If the transaction is rolled back, then all of the changes made in that transaction are undone, and if it commits, all of the changes become visible as an atomic unit sometimes referred to a unit of work.

Consistency and Durability (transaction logs)
WARP maintains transaction logs for each transaction unlike InnoDB, which has one global circular buffer for all transactions. If the InnoDB log files are too small, this can cause performance issues where the database has to stop to write changes to disk before the log file can be overwritten.

Each tuple contains a transaction id that indicates in which transaction the row was created. WARP is internally an append only database. Records are never updated in place. When a row is updated, a new version of the row is written and the ROWID of the old row is marked as deleted in the delete bitmap. Similarly, when rows are deleted, the ROWID of the deleted row is marked in the bitmap and no row data in the table is modified in any way. Currently it is necessary to use OPTIMIZE TABLE (a blocking operation) in order to clean up old row data. This will become a background operation in a later beta release.

When a WARP transaction modifies a row (DELETE or UPDATE) then the transaction log is marked with a DELETE record for the ROWID that has been deleted or updated. INSERT and UPDATE likewise log the ROWID of the new data, and they append the new uncommitted data into the table. WARP has a per-connection insert buffer (which defaults to 1 million rows) in which records are stored in memory before being flushed to disk. When traversing rows in a scan, WARP will not make visible any rows that should not be visible to the transaction. More on this in a moment.

Each DML statement creates an implicit savepoint in the log. If a data modification statement fails, then any rows inserted have their ROWID value placed on the delete bitmap. The transaction log is used to undo the changes by writing any new ROWID values into the delete bitmap. There is nothing that needs to be done for DELETE statements or the rows marked as DELETED in an UPDATE, as DELETE statements are written to the delete bitmap at commit. Any set locks remain on those rows until the transaction terminates.

Transactions that have only performed INSERT statements do not have to do any extra work except update the commit list, and sync the commit state file to disk. This is an append operation, and should normally be very fast. This means, however, that WARP does more work when committing a transaction that modified rows than InnoDB does, because this is when the delete bitmap is modified. Only one transaction may be committing at a time.

WARP is not designed for high volume OLTP, it is geared toward analytics queries. That doesn’t mean that it can’t handle the needs of many low to medium volume OLTP applications, but if a database is geared toward high volume OLTP, InnoDB may be a better choice of engines.

In a later WarpSQL beta, fast refreshable materialized views will be added to support HTAP (hybrid transactional/analytic processing) such that WARP tables can be automatically updated when underlying InnoDB data (or any other storage engine that writes binary logs) is modified. These materialized views can also be used with software such as Apache Calcite, or other reporting tools such as Mondrian, to automatically utilize the materialized views to quickly access pre-aggregated and de-normalized data without any additional software to keep the materialized views in sync. It is actually possible to do this right now if you use Flexviews, which is part of the Swanhart-Toolkit which you can obtain at https://github.com/greenlion/swanhart-tools. The built in materialized views in WarpSQL will simply be an improvement on Flexviews.

Row level locking
When a transaction updates or deletes a row, it is locked with an EX_LOCK so that no other transactions may modify the row until the transaction that modified it commits or rolls back.

Because the current transaction and other transactions have to have information regarding visibility of rows, and because the DELETE bitmap is not modified until commit, WARP sets special LOCK_HISTORY locks in place of UNDO information for a row. While InnoDB has to copy rows into UNDO SEGMENTS when they are modified, WARP simply has to set a lightweight lock. This means that WARP does not have as much of a penalty for long running transactions and UPDATE and DELETE statements do not have to write a lot of extra information to disk, which is good for SSD wear endurance. There is no traversal of UNDO segments through UNDO pointers to find the proper version of a row, but old versions of rows may be traversed in the table itself to answer queries.

When the LOCK IN SHARE MODE clause is used with a SELECT statement, then WARP will set shared locks on each row traversed. When FOR UPDATE is used, then WARP sets WRITE-INTENTION locks, which are similar to LOCK_EX locks, but no HISTORY lock is set when these locks are set, and the row remains visible to the current transaction. WRITE-INTENTION locks can be upgraded immediately to a LOCK_EX lock (which also creates a HISTORY lock) without the danger of encountering a lock conflict.

WARP does not currently support deadlock detection, and instead relies on lock wait timeouts. The warp_lock_timeout variable can be set to control the amount of time a transaction waits on locks before aborting. Important note: a lock wait timeout rolls back the current transaction, not just the current statement.

Transaction isolation levels
WARP beta 2 has transaction isolation support with MVCC semantics. MVCC (multiple view concurrency control) is how the I in ACID (atomic consistent isolated and durable) is implemented in most RDBMS systems. MVCC support in WARP is similar in MVCC in InnoDB. Unlike many other database engines (like PostgreSQL or Microsoft SQL Server) WARP does not have to take shared locks to support REPEATABLE-READ or lower transaction isolation levels.

This means that readers do not block writers except when the SERIALIZABLE isolation level is used.

This is a very important property for a transactional database, and it decreases the overhead of REPEATABLE-READ and increases the level of concurrency possible. Like InnoDB, the default WARP transaction isolation level is REPEATABLE-READ. The SERIALIZABLE isolation level takes shared locks (LOCK_SH) on each tuple traversed by statements in a SERIALIZABLE transaction.

During a REPEATABLE-READ transaction, rows deleted or updated in transactions that were not committed when the transaction started or that were committed in a transaction that started after the transaction started are not visible to reads in the REPEATABLE-READ transaction.

Every SELECT in a REPEATABLE-READ transaction acts upon a “consistent snapshot” of the data, or in the terms of MVCC, a consistent read view. WARP also supports READ-COMMITTED transaction isolation. In READ-COMMITTED, a SELECT statement may see rows that were committed after the transaction began, even if those rows are from a transaction that started prior to the READ-COMMITTED transaction started.

WARP does not support READ-UNCOMMITTED. READ-UNCOMMITED will act just like READ-COMMITTED and only committed data is visible to transactions.

Fast data loading

The WARP storage engine loads data much faster than InnoDB. The following timings are on scale factor 1 of the star schema benchmark, using the MySQL shell parallel load utility. Note that after loading, I had to create four indexes on the InnoDB lineorder table in order to support the SSB queries, and creating the indexes took an additional two minutes and thirty three seconds. Thus the total load time for the lineorder table on InnoDB was 5 minutes and 35 seconds.

SSB SF1 InnoDB LINEORDER table loaded in 4 parallel threads
---------------------------------------------------------------------
File '/mnt/data/warp/storage/warp/benchmark/ssb-dbgen/sf1/lineorder.tbl' (641.12 MB) was imported in 3 min 2.2281 sec at 3.52 MB/s
Total rows affected in ssb_sf1_warp.lineorder: Records: 6001215 Deleted: 0 Skipped: 0 Warnings: 0

alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_SuppKey), add key(LO_PartKey);
Query OK, 0 rows affected (2 min 33.7517 sec)

SSB SF1 WARP LINEORDER table loaded in 4 parallel threads
---------------------------------------------------------------------
File '/mnt/data/warp/storage/warp/benchmark/ssb-dbgen/sf1/lineorder.tbl' (641.12 MB) was imported in 19.6932 sec at 32.56 MB/s
Total rows affected in ssb_sf1_warp.lineorder: Records: 6001215 Deleted: 0 Skipped: 0 Warnings: 0

It is not necessary to create secondary indexes for WARP. They will be created in a few seconds when a query is executed that creates automatic indexes. Thus a query may run more slowly the first time it is executed, than when executed in the future, because the first access creates indexes. But these same indexes will be used in other queries that filter on the same columns. Thus the loading time for WARP is approximately 20 seconds versus 5.5 minutes for InnoDB.

Engine condition pushdown enhancements

WARP supports automatic indexing. Any columns used in a WHERE clause will automatically be indexed using WAH compressed bitmap indexes, so that future queries that filter on those same columns can be used to answer queries faster. WARP can automatically use more than one index to answer a query.

In beta 1, a SELECT statement such as the following query would not be able to use an implicit (automatic) index on the c1 column, because an expression that could not be indexed is used on c2. In beta 2, c1 will be automatically indexed, and the index will be used to find rows matching c1 = 1, and then the SQL layer will execute the filter on c2 to remove any rows that do not match the lower() function call.

SELECT COUNT(*) AS cnt
  FROM table
 WHERE c1 = 1 and lower(c2) = 'abc'

Note that this does not help if the AND operator is changed to OR, as MySQL will have to filter all rows at the SQL layer anyway.

Bitmap index JOIN optimization
This new capability is important for queries that use JOINs, because the JOIN filter (the JOIN conditions) would previously cause ECP to neglect expressions that could be satisfied using the automatic indexes. But more importantly, the ECP code has been enhanced to support bitmap index pushdown join optimization.

Consider the following query:

SELECT COUNT(*) AS cnt
  FROM fact
  JOIN dim 
    ON fact.dim_id = dim.id 
 WHERE dim.c1 = 'NEW YORK'

To answer this query, first an automatic index will be constructed on the c1 column of the dim table. Then the id column will be projected from the dim table during ECP evaluation, and a filter constructed for fact.dim_id such that only the dim.id values for tuples that match the ‘NEW YORK’ filter are read from the fact table. When the hash join is performed to the dim table, only fact tuples that match the filter will be probed in the hash table, and MySQL will see only rows in the dim table that match the filter, because the automatic index will be used when MySQL constructs the hash table on the dim table.

Functionally, the query is essentially re-written as the following, assuming there are five rows in the dimension table that match the dim.c1 filter (the id values given are just as an arbitrary example):

SELECT COUNT(*) AS cnt
  FROM fact
  JOIN dim 
    ON fact.dim_id = dim.id 
   AND fact.dim_id IN (1,2,3,9,50)
 WHERE dim.c1 = 'NEW YORK'

As part of star schema optimization WARP is supposed to automatically make MySQL access the fact table first in the execution plan, and this filtering often improves JOIN performance by 2x to 10x or more, depending on conditions. Unfortunately, however, there is a bug in beta 2 that sometimes causes this optimization to fail, and the dim table will appear first in the execution plan. The STRAIGHT_JOIN hint can be used to ensure the optimization works as expected. This will be fixed shortly.

The follow up to this blog posts will have detailed benchmarks, but this optimization is clear on the SSB (star schema benchmark) query 1.1:

InnoDB SSB SF1 Q1.1
---------------------------------------------------------------------
select sum(lo_extendedprice*lo_discount) as revenue 
  from lineorder 
  join dim_date on lo_orderdatekey = d_datekey 
 where d_year = 1993 
   and lo_discount between 1 and 3 
   and lo_quantity < 25;
+--------------+
| revenue      |
+--------------+
| 446031203850 |
+--------------+
1 row in set (4.7032 sec)

WARP SSB SF1 Q1.1
---------------------------------------------------------------------
select STRAIGHT_JOIN sum(lo_extendedprice*lo_discount) as revenue
  from lineorder
  join dim_date on lo_orderdatekey = d_datekey
 where d_year = 1993
   and lo_discount between 1 and 3
   and lo_quantity < 25;
+--------------+
| revenue      |
+--------------+
| 446031203850 |
+--------------+
1 row in set (0.7554 sec)

Note that I added the STRAIGHT_JOIN hint to the WARP query. This will be unnecessary after the bug I mentioned is fixed. To answer this query, WARP will use an automatic index on lo_discount and an automatic index on lo_quantity and an automatic index on lo_orderdatekey and an automatic index on d_year. It pushes the d_datekey values into the lo_orderdatekey filter automatically constructed to answer the query.

To be clear, the STRAIGHT_JOIN hint does not help InnoDB on this query, and it will hurt performance of most other SSB queries:

InnoDB SSB SF1 Q1.1 w/ STRAIGHT_JOIN hint
+--------------+
| revenue      |
+--------------+
| 446031203850 |
+--------------+
1 row in set (4.9589 sec)

Binaries available for download now

A tarball built on CentOS 8.2 is available here:
https://github.com/greenlion/warp/releases/tag/v8.0.21-1.0.2-beta

It is easy to compile WarpSQL from scratch for testing if you want to do so, or if you are using a Linux distribution that can not run those binaries because the libc version differs, etc. I can also compile binaries upon request, but priority goes to Patreon supporters.

WarpSQL is very much a work in progress, but it offers clear advantages in many cases over InnoDB. Automatic indexing is extremely useful, as is bitmap join optimization.

Please consider supporting WarpSQL on Patreon!

I am currently doing some consulting here and there, but my primary focus is full time on WarpSQL, and I put in at least 40 hours per week on it. Please consider supporting this exciting work by supporting it on Patreon: https://patreon.com/warpsql

Higher level supporters can receive consulting and advice on using WARP or any MySQL database or fork/distribution. Supporters also have access to the WarpSQL discord channel, where questions can be asked about WarpSQL, or other versions of MySQL.