Friday, July 28, 2017

Node.js and MySQL on the Oracle Cloud

Let's explore how to deploy a node.js app with MySQL backend on the Oracle Cloud. I want to cover several aspects:
  • How to deploy and initialize a MySQL instance
  • How to adapt the source code
  • How to deploy the application
  • How to scale the application
There are different ways to configure this. I tested the easiest deployment with MySQL Cloud Service and the Application Container Cloud for the node.js part. All configurations are done via the cloud web GUI. There is also a REST interface available. But let's keep that for later.
If you don't have access to the Oracle Cloud you can get a trial access here.

How to deploy a MySQL instance

Once you logged into the Oracle cloud you can create new instances from the dashboard. The following screenshots describe the process.


On the next screen we upload the public key for admin access to the instance. Either upload your own public key or generate a new key pair. (If you generate a new key pair you need to download the private key to your local machine.)

I skipped the backup and monitoring configurations for this demo. Let's focus on the application instead. After creating the instance (approx. 10 min) you can navigate via the dashboard to this instance and get the IP address. This is needed for the next step.
To initialize the database I ran this little script that runs ssh to the instance (using the private key), switch user to "oracle" and then call the MySQL CLI to run a few SQL statements.

How to adapt the source code

The Application Container Cloud sets a few environment variables that should be used inside the application to adapt to the environment. In my case this are the following variables:
  • PORT is the port number that the application should listen on
  • MYSQLCS_USER_NAME is the MySQL user name for the database backend
  • MYSQLCS_USER_PASSWORD is the corresponding password
  • MYSQLCS_CONNECT_STRING is the hostname and port of the database backend
I could have hardcoded the database connection parameters but that is inflexible if the architecture changes. So let's use these variables. The Node.js code looks like this:


How to deploy the application

There are two simple steps needed: Creating an application and defining service bindings. In my case the only service binding is the MySQL backend. But one step after the other. First let's create the application. First you need to create a manifest.json file to describe the application. Here is mine:

Ideally you create a zip archive with the source code, the manifest.json file and all other resources that your application needs. If you want to use my zip archive, feel free. You find it on GitHub.
From the Oracle Cloud Dashboard click on "create instance -> application container" and then select "Create Instance" and "Node.js". (Java SE, Java EE, Python, Ruby and PHP are available as well.)

On the pop-up you define the application artifacts, number of application instances and the memory per instance. After you click "create" the application is deployed automatically within a few minutes.
The last step is to connect the application service with the database backend. To achieve that, click on the application in the application overview page. Here you find the URL under which your application will be available. And on the left hand side you see three tabs:

Overview, Deployments and Administration. Click on "Deployments". Here you can add the service binding as described in the following screenshot:

After modifying the service bindings you have to click "Apply changes". This will restart the application instances. Obviously needed because now the environment variables for the database backend are set correctly.
That's it. We have an application. The URL to access the new app is listed in the application overview tab. Because this URL is not so nice for offering a short url service, I registered a new domain and forwarded that to the anota application. Maybe it is still running? Check here.

How to scale the application

This is really easy. On the application overview tab you can just increase the number of instances and the memory per instance. After applying the changes, the Application Container Cloud platform will deploy new instances, stop spare instances or reconfigure the existing instances. If you use my ANOTA application, go to the report page. The last line prints the hostname of the application server. Requests are automatically load balanced between the available application instances. 

Summary

There are some minor changes to the application to run on the Oracle Cloud Platform: Reading the port variable and database connection parameters from the provided environment variables and that's it. Deployment is really easy via the GUI. And scalability is so simple now that the full Oracle Cloud Plattform is available and can be provisioned within minutes. 


Thursday, June 1, 2017

MySQL Shell - Easy scripting

With the introduction of MySQL InnoDB Cluster we also got the MySQL Shell (mysqlsh) interface. The shell offers scripting in Javascript (default), SQL or Python. This offers a lot more options for writing scripts on MySQL, for example it is much easier now to use multiple server connections in a single script.
A customer recently asked for a way to compare the transaction sets between servers. That is useful when setting up replication or identifying the server that has most transactions applied already. So I wrote this little script which can be executed from the OS shell:

 #!/usr/bin/mysqlsh -f  
 // it is important to connect to the X protocol port,  
 // usually it is the traditional port + "0"  
 //  
 var serverA="root:root@localhost:40010"  
 var serverB="root:root@localhost:50010"  
 shell.connect(serverA)  
 var gtidA=session.sql("SELECT @@global.gtid_executed").execute().fetchOne()[0]  
 shell.connect(serverB)  
 var gtidB=session.sql("SELECT @@global.gtid_executed").execute().fetchOne()[0]  
 //  
 // If you want to use pure XdevAPI the former statements should be  
 //  
 // gtid = session.getSchema("performance_schema").global_variables.select("VARIABLE_VALUE").where("VARIABLE_NAME='gtid_executed'").execute().fetchOne()[0]  
 //  
 println(" ")  
 println ("Transactions that exist only on "+serverA)  
 println (session.sql("SELECT gtid_subtract('"+gtidA+"','"+gtidB+"')").execute().fetchOne()[0])  
 println(" ")  
 println ("Transactions that exist only on "+serverB)  
 println (session.sql("SELECT gtid_subtract('"+gtidB+"','"+gtidA+"')").execute().fetchOne()[0])  
 

Wednesday, April 13, 2016

Query Rewrite Plugin and Binlog for Replication

Starting with MySQL 5.7 we introduced the Query Rewrite Plugin. That tool is really useful for changing queries. Of course the best location to modify the query is the source code of the application, but this is not always possible. Either the application is not under your control or queries are generated from a framework like Hibernate and sometimes it is hard to change the query generation.
If you are interested in details about the Query Rewrite Plugin, I recommend this blogpost from the MySQL Engineering: http://mysqlserverteam.com/the-query-rewrite-plugins/
Recently I was asked how this works in replication environments. Which query goes into the binlog?

If you are using the Rewriter plugin that comes with MySQL 5.7, the answer is easy: This plugin only supports rewriting SELECT queries. SELECT queries don't get into the binlog at all. Simple.

But you might write your own preparse or postparse plugin. In that case you can define the behavior with the server option --log-raw. See documentation here: https://dev.mysql.com/doc/refman/5.7/en/server-options.html#option_mysqld_log-raw
You can either bring the original query to the binlog or the rewritten query. So all flexibility you need. However be aware that --log-raw also affects logging of passwords in the general log file. With --log-raw passwords are written in plain text to the log files. So consider this side effect when switching --log-raw on or off.

Monday, April 4, 2016

MySQL 5.7: Optimizer finds best index by expression

The optimizer in MySQL 5.7 leverages generated columns. Generated columns will physically store data in two cases: Either the column is defined as STORED or you create an index on a virtual column. The optimizer will leverage such an index automatically if it encounters the same expression in a statement. Let's see an example:

mysql> DESC squares;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| dx    | int(10) unsigned | YES  |     | NULL    |       |
| dy    | int(10) unsigned | YES  |     | NULL    |       |
+-------+------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> SELECT COUNT(*) FROM squares;
+----------+
| COUNT(*) |
+----------+
|  2097152 |
+----------+
1 row in set (0.77 sec)


We have a large table with 2 million rows. Selecting rows by the surface area of squares can hardly leverage an index on dx or dy:

mysql> EXPLAIN SELECT * FROM squares WHERE dx*dy=221\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: squares
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2092860
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Now let's add an index over a generated, virtual column that defines the area:

mysql> ALTER TABLE squares ADD COLUMN (area INT AS (dx*dy));
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE squares ADD INDEX (area);
Query OK, 0 rows affected (5.24 sec)
Records: 0  Duplicates: 0  Warnings: 0


Now we can run query again:

mysql> EXPLAIN SELECT * FROM squares WHERE dx*dy=221\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: squares
   partitions: NULL
         type: ref
possible_keys: area
          key: area

      key_len: 5
          ref: const
         rows: 18682
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

 I did not change the query! The WHERE condition is still dx*dy. Nevertheless the optimizer finds the generated column, sees the index and decides to leverage that.
So you can add complex indexes and without changing the application code you can benefit from these indexes. That makes life much easier.

One limitation though: It seems the optimizer recognizes expressions only in the WHERE clause. It will not use the generated column and index for the SELECT expression:

mysql> EXPLAIN SELECT SUM(dx*dy) FROM squares\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: squares
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL

      key_len: NULL
          ref: NULL
         rows: 2092860
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT SUM(area) FROM squares\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: squares
   partitions: NULL
         type: index
possible_keys: NULL
          key: area
      key_len: 5
          ref: NULL
         rows: 2092860
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)


CHECK constraint for MySQL - NOT NULL on generated columns

During our recent TechTour event the idea came up to implement JSON document validation not necessarily via foreign keys (as I have shown here) but to define the generated column as NOT NULL. The generation expression must be defined in a way that it returns NULL for invalid data.
DISCLAIMER: This has already been explored by yoku0825 in his blogpost. He deserves all credit!

Let's do a short test:

mysql> CREATE TABLE checker ( 
    i int, 
    i_must_be_between_7_and_12 BOOLEAN 
         AS (IF(i BETWEEN 7 AND 12, true, NULL))  
         VIRTUAL NOT NULL);
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO checker (i) VALUES (11);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO checker (i) VALUES (12);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO checker (i) VALUES (13);
ERROR 1048 (23000): Column 'i_must_be_between_7_and_12' cannot be null




As you can see I used the column name to create a meaningful error message when inserting invalid data. It is perfectly possible to add a generated validation column for each data column so that you run several check constraints.
Or you can even check a combination of columns:

mysql> CREATE TABLE squares (
     dx DOUBLE, 
     dy DOUBLE, 
     area_must_be_larger_than_10 BOOLEAN 
           AS (IF(dx*dy>10.0,true,NULL)) NOT NULL);
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO squares (dx,dy) VALUES (7,4);

Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO squares (dx,dy) VALUES (2,4);

ERROR 1048 (23000): Column 'area_must_be_larger_than_10' cannot be null

As generated columns are virtual by default this costs no extra storage. Data volume is the same. The expression is evaluated when inserting or updating data.
If you add a validation column to an already existing table and want to verify all existing rows, you could define the validation column as STORED (instead of the default VIRTUAL). This will fail if there are any invalid rows in your existing data set. However in normal operation a virtual column seems more appropriate for performance reasons. So I recommend to always use VIRTUAL validation columns and check pre-existing data separately with a small procedure.

Tuesday, February 2, 2016

Looking for the smallest possible MySQL Footprint

MySQL is known and famous for it’s simplicity and small size, especially compared to other RDBMSs. But what if you want to deploy on tiny hardware? I mean something even smaller than RaspberryPi?
I tested three steps to make the MySQL footprint as small as possible. All my tests were compiled for Oracle Linux 7 on x64 platform. I did not test any ARM cross compile. And these are the steps:
  1. Compile my own binary
  2. Remove all unnecessary tools/files
  3. Strip symbol information from binary

Let’s take a closer look at the tree steps.

Compile my own binary

MySQL is available as a source release. Using that you can configure the make process. That is documented pretty well in the Reference Manual. By switching off some options I was able to reduce the binary size from 240MB to 216MB. I switched off some performance_schema features, removed some storage engines that are irrelevant in most environments anyway (like ARCHIVE, NDB, EXAMPLE, …) and I removed all options for profiling. The final CMAKE statement is at the bottom of this post.

Remove unnecessary tools

I removed scripts and binaries from the distribution. Ted has written an interesting blog post about this. The remaining share directory contains some SQL scripts for installing additional tools. You need these at most once during setup and never again. So let’s remove these. If you are happy to live without textual error messages you can also remove the errmsg-utf8.txt file as well and all translations in the country specific subdirs. And if you can live with reduced charset support, you can even remove the rest of the share directory. You are running essentially only with a mysqld binary.

Strip symbol information from binary

All compilations are done with extended diagnosis information in the binary. These symbol data helps if you want to analyze a core dump for example. Symbols are included by default in the MySQL binaries. These take a surprisingly large amount of space. You can remove these symbols from the binary with the tool “strip(1). After stripping the binary size came down to 24MB, which is only 10% of the initial size.

More ideas

There are some more options to use either system libraries or the libraries that come with the source code. Using existing libraries from the system might help save a few bytes.

Summary

It is possible to make MySQL very lean for your (embedded) system. Despite all the functionality that we added to MySQL in the releases since MySQL 5.1 you get a full featured RDBMS with only a handful of MB. Here are my final results:

  • MySQL 5.6, minimal features: 79MB, stripped 13MB
  • MySQL 5.7, default features: 240MB, stripped 24MB
  • MySQL5.7, minimal features: 216MB, stripped 24MB (removing features brings minimal savings only)

Addendum

This is the CMAKE statement I used to compile MySQL 5.7 on Oracle Linux 7:
cmake . -DCMAKE_INSTALL_PREFIX=/home/testy/TQ/dist-mysql-5.7.10/        \
        -DDOWNLOAD_BOOST=1                                              \
        -DWITH_BOOST=/home/testy/TQ/boost/                              \
        -DDISABLE_PSI_COND=1   \
        -DDISABLE_PSI_FILE=1   \
        -DDISABLE_PSI_IDLE=1   \
        -DDISABLE_PSI_MEMORY=1 \
        -DDISABLE_PSI_METADATA=1 \
        -DDISABLE_PSI_MUTEX=1 \
        -DDISABLE_PSI_RWLOCK=1 \
        -DDISABLE_PSI_SOCKET=1 \
        -DDISABLE_PSI_SP=1     \
        -DDISABLE_PSI_STAGE=1  \
        -DDISABLE_PSI_STATEMENT=1  \
        -DDISABLE_PSI_STATEMENT_DIGEST=1    \
        -DDISABLE_PSI_TABLE=1 \
-DWITH_ARCHIVE_STORAGE_ENGINE=0 \
-DWITH_BLACKHOLE_STORAGE_ENGINE=0 \
-DWITH_EXAMPLE_STORAGE_ENGINE=0 \
-DWITH_FEDERATED_STORAGE_ENGINE=0 \
-DWITH_PARTITION_STORAGE_ENGINE=0 \
-DWITH_PERFSCHEMA_STORAGE_ENGINE=0 \
-DENABLED_PROFILING=0 \
-DENABLE_DEBUG_SYNC=0 \
-DENABLE_DTRACE=0 \
-DENABLE_GCOV=0 \
-DENABLE_GPROF=0 \
-DOPTIMIZER_TRACE=0 \
-DWITH_CLIENT_PROTOCOL_TRACING=0 \
-DWITH_DEBUG=0 \
-DWITH_INNODB_EXTRA_DEBUG=0 

Thursday, November 26, 2015

JSON memory consumption

I got some more questions on the new JSON data type and functions during our TechTours. And I like to summarize the answers in this blogpost.

Memory consumption 

The binary format of the JSON data type should consume more memory. But how much? I did a little test by comparing a freshly loaded 25,000 row dataset stored as JSON and stored as TEXT. Seven top level attributes per JSON document. Average JSON_DEPTH is 5.9 . Let's see:
mysql> DESC data_as_text;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id    | int(11) | NO   | PRI | NULL    |       |
| doc   | text    | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> SELECT COUNT(*),AVG(JSON_LENGTH(doc)) FROM data_as_text;
+----------+-----------------------+
| COUNT(*) | AVG(JSON_LENGTH(doc)) |
+----------+-----------------------+
|    25359 |                7.0000 |
+----------+-----------------------+
1 row in set (0.81 sec)

mysql> DESC data_as_json;
+-------+---------+------+-----+---------+----------------+
| Field | Type    | Null | Key | Default | Extra          |
+-------+---------+------+-----+---------+----------------+
| id    | int(11) | NO   | PRI | NULL    | auto_increment |
| doc   | json    | NO   |     | NULL    |                |
+-------+---------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

mysql> SELECT COUNT(*),AVG(JSON_LENGTH(doc)) FROM data_as_json;
+----------+-----------------------+
| COUNT(*) | AVG(JSON_LENGTH(doc)) |
+----------+-----------------------+
|    25359 |                7.0000 |
+----------+-----------------------+
1 row in set (0.08 sec)

mysql> select name,allocated_size/1024/1024 AS "size in MB" from information_schema.innodb_sys_tablespaces where name like "%temp%";
+-------------------+-------------+
| name              | size in MB  |
+-------------------+-------------+
| temp/data_as_json | 23.00390625 |
| temp/data_as_text | 22.00390625 |
+-------------------+-------------+
2 rows in set (0.00 sec)
The increased memory consumption is 1/22 in this case, which is roughly 4,5%. At the same time you see the benefit: The full table scan with some JSON operation has a 90% reduction in runtime when using JSON datatype.
Don't take this number for real. Of course it depends on the number of JSON attributes, character set and others. Just a rough indication. If you want all the details look at the JSON architecture in WL#8132.