This post is a part of a series of posts that I am writing weekly while fixing MXS-2 issue for MaxScale as part of Google Summer of Code - 2015.
I made a number of changes this week. First of them all was changing the HASHTABLE in BGD_SESSION. Previously, it was used for mapping file name to file handle, but, it needs to store much more information than that. For this purpose, I added a new structure named table_info. The commit with these changes can be sen here.
Followed a small tweak where, me and my mentor(Markus) decided to eliminate the regular expressions from the filter all togther. Since, the filter expects a list of tables for which the data should be logged, we introduced a parameter named tables which expects a list of comma separated unique table names. Here, unique table names signify the format .. Only those tables which are specified in this list are processed and others are ignored. The changes for this are spread over two commits which can be observed here and here. I also made a small optimization where I avoid generating the data file name over and over again by storing it in a new variable in BGD_SESSION named current_table_data_file. The changes can be seen here.
For extracting data from CREATE TABLE command, I needed to store the schema and column definitions. query_classifier deals with extracting data from queries. So two new structures were introduced namely TableSchema and ColunDef. Their definitions are as follows,
typedef struct column_def ColumnDef;
typedef struct table_schema
{
char *dbname; // database name
char *tblname; // table name
int ncolumns; // number of columns
ColumnDef *head; // head of list of columns
ColumnDef *tail; // tail of list of columns
} TableSchema;
struct column_def
{
enum enum_field_types type; // columns data type
char *colname; // column name
void *defval; // default value
ColumnDef *next; // next column
};
Currently, TableSchema stores a linked list of ColumnDef. I intend to change it in next week with something which will give a little better mapping and access. A new function in query_classifier named skygw_get_schema_from_create extracts the required data from CREATE TABLE queries and returns a TableSchema object. bgdfilter handles the QUERY_OP_CREATE_TABLE in clientReply. All the changes for this can be reviewed here.
I had not handled a case in my implementation until now, wher a db name is specified in the query, e.g. CREATE TABLE test.t1(c1 int). The db name should be taken from the query in this case. So, I changed current_db and new_db in BGD_SESSION to default_db and active_db. active_db will always hold the db name for current query in execution whereas default_db will hold the db name specified in mysql command on terminal while connecting to MaxScale or the db name after USE DATABASE command. Change with respect to this can be seen here.
I still have a few tasks pending, they are as follows,
Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!
This post involves all the changes I made in past two weeks.
As mentioned in the previous post, the options parameter now expects a path of the directory in which all the data files should be stored. For creating directory and checking all the related errors, I have added a function named create_dir. With this, the file pointer in the BGD_INSTANCE was no longer required. A new variable named current_db has been added in BGD_SESSION which will be required for naming the data files. All these changes can be reviewed here.
We have finalized the naming convetion for data files. The data files will be named as ..data. For example, if we have a database named test and a table inside it named t1, then the data file would be named as test.t1.data. It will be stored in the directory provided by options parameter. Now, we can have multiple files open in a single session. I used HASHTABLE for mapping file names to their respective file descriptors. It is added in BGD_INSTANCE. Since, we do not have a closeInstance or freeInstance method, it is not possible to decide when and where to close these file descriptors. Closing them in closeSession or freeSession is not possible as these descriptors are shared by multiple sessions under single instance. To manage this, a static linked list of instances is created which is freed only when MaxScale goes down. The code changes for this can be reviewed here. To make the code a little modular, log_insert_data function was added. I plan to write a generic function to log the data from any query, if possible. With this I also discoveed that the data should not be logged if there is some error when executing the query. So, instead of logging data from routeQuery, which was done before, it is logged from clientReply function only if the query is executed successfully. Two new filter function were added for this functionality to work, namely, setUpstream and clientReply. Also, the query buffer (GWBUF) is stored in BGD_SESSION to process it in clientReply. These changes can be reviewed here.
To keep current_db updated, I had to handle USE DATABASE query. Its query type byte is set to MYSQL_COM_INIT_DB. A variable named, new_db is added in BGD_SESSION for updating current_db only if it is changed successfully. These changes can be reviewed here.
I am yet to retrieve data from INSERT query. I ran into a lot of problems when trying to do so. In this week, I am targetting to complete following functionalities
Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!
For the first week my target was to write a simple filter for MaxScale which will log only INSERT queries in a file. I have named the filter as bgdfilter for now which stands for BigData Filter. Following are filter INSTANCE and SESSION data structures
typedef struct {
char *format; /* Storage format JSON or XML (Default: JSON */
char *path; /* Path to a folder where to store all data files */
char *match; /* Mandatory regex to match against table names */
regex_t re; /* Compiled regex text */
FILE *fp;
char *filebase;
} BGD_INSTANCE;
typedef struct {
DOWNSTREAM down;
int active;
} BGD_SESSION;A few of these fields are not used for this simple filter. Also, these are not final data structures, I will keep on updating them as the features are implemented. Also the path variable currently represents the log file.
I have forked MaxScale repo and created a new branch named MXS-2 from develop branch. My latest code changes can be seen here.
For checking the bgdfilter, configuration can be written as
[BGD] type=filter module=bgdfilter [RWSplitRouter] type=service router=readwritesplit servers=master,s1,s2,s3 user=maxuser passwd=C8315EB77701CED103285274D0E022FB max_slave_connections=100% localhost_match_wildcard_host=1 filters=BGD
The path parameter is optional. By default the log file is created at /tmp/bgd.
Next up is retrieving data from INSERT queries and writing it into data files. Also, will decide a naming convention for data files.
Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!
I have started working with MariaDB as part of Google Sumer of Code 2015. I will be working on MaxScale issue MXS-2. The first step before starting work was to install and configure MaxScale. This post will give a general idea about the MaxScale configuration.
MaxScale works with almost all versions of MariaDB (>= 5.5 and
All the above links give enough instructions on how to get started with MariaDB replication and setting up MaxScale. Apart from these, following is some useful stuff that I discovered while setting up.
Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!
Hello All,
Yesterday was the the hard ‘pencils down’ date for Google Summer of Code'2014. If you have been following my blog, you would know that I had been working on adding support for OR REPLACE, IF NOT EXISTS and DROP…IF EXISTS commands into MariaDB for all objects. MDEV-5359 gives details about the project. This blog post is to summarize the work I have done in last three months.
I started with studying the MariaDB code base. I started off with bazaar and launchpad but immediately moved to github repo as I am more comfortable with git. Also, my projects changes will be applied to MariaDB version 10.1, so it allowed me to switch to github.
After getting into the coding standards and the work flow of the “query execution” in MariaDB in the first week. I picked up the most simple command of them all which required very little code changes (at least this is what I thought at that time), CREATE OR REPLACE DATABASE.
At the end of second week I was all into coding. I thought of writing blog posts every week to keep track of what I do each week and my mentor, Alexander, also gave a thumbs up to that. Believe me, writing blog was the first most important decision I took, this helped me a lot. The response that I got for my blog was also great. Many developers from MariaDB mailing lists joined in and gave their inputs about what should be there in the blog and what should not be there. So, thanks to all of them, my blog got better and better.
Again, making use of the blog that I have written and simplifying my job, following are the links to the work I have done each week starting from first week to the last. Please, visit them all and get yourself acquainted with all the changes that I have made.
I have also made some changes in the last week which can be checked on my repository. You can find my repository here.
Ending this, I would like to thank everyone for allowing we to work on such an amazing project, and everyone who helped me out, on IRC, on the mailing lists, and everywhere else. :)
Thanks a lot!