Sriram Patil
Master of Technology (Computer Science) student @ IIIT Hyderabad
Follow Me on Twitter!
Sriram Patil
Master of Technology (Computer Science) student @ IIIT Hyderabad
Follow Me on Twitter!
  • rss
  • archive
  • MXS-2: Week 4

    This post is a part of a series of posts that I am writing weekly while fixing MXS-2 issue for MaxScale as part of Google Summer of Code - 2015.

    I made a number of changes this week. First of them all was changing the HASHTABLE in BGD_SESSION. Previously, it was used for mapping file name to file handle, but, it needs to store much more information than that. For this purpose, I added a new structure named table_info. The commit with these changes can be sen here.

    Followed a small tweak where, me and my mentor(Markus) decided to eliminate the regular expressions from the filter all togther. Since, the filter expects a list of tables for which the data should be logged, we introduced a parameter named tables which expects a list of comma separated unique table names. Here, unique table names signify the format .. Only those tables which are specified in this list are processed and others are ignored. The changes for this are spread over two commits which can be observed here and here. I also made a small optimization where I avoid generating the data file name over and over again by storing it in a new variable in BGD_SESSION named current_table_data_file. The changes can be seen here.

    For extracting data from CREATE TABLE command, I needed to store the schema and column definitions. query_classifier deals with extracting data from queries. So two new structures were introduced namely TableSchema and ColunDef. Their definitions are as follows,

    typedef struct column_def ColumnDef;
    
    typedef struct table_schema
    {
        char *dbname;           // database name
        char *tblname;          // table name
    
        int ncolumns;           // number of columns
    
        ColumnDef *head;        // head of list of columns
        ColumnDef *tail;        // tail of list of columns
    } TableSchema;
    
    struct column_def
    {
        enum enum_field_types type;     // columns data type
        char *colname;                  // column name
        void *defval;                   // default value
    
        ColumnDef *next;                // next column
    };
    

    Currently, TableSchema stores a linked list of ColumnDef. I intend to change it in next week with something which will give a little better mapping and access. A new function in query_classifier named skygw_get_schema_from_create extracts the required data from CREATE TABLE queries and returns a TableSchema object. bgdfilter handles the QUERY_OP_CREATE_TABLE in clientReply. All the changes for this can be reviewed here.

    I had not handled a case in my implementation until now, wher a db name is specified in the query, e.g. CREATE TABLE test.t1(c1 int). The db name should be taken from the query in this case. So, I changed current_db and new_db in BGD_SESSION to default_db and active_db. active_db will always hold the db name for current query in execution whereas default_db will hold the db name specified in mysql command on terminal while connecting to MaxScale or the db name after USE DATABASE command. Change with respect to this can be seen here.

    I still have a few tasks pending, they are as follows,

    1. Creating a metadata file for tables
    2. Logging data from INSERT queries
    3. Making use of the “format” parameter for data logging. (JSON/XML)

    Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!

    • 8 years ago
    • #maxscale
    • #mariadb
    • #gsoc
    • #gsoc15
    0 Comments
  • MXS-2: Week 2 and 3

    This post involves all the changes I made in past two weeks.

    As mentioned in the previous post, the options parameter now expects a path of the directory in which all the data files should be stored. For creating directory and checking all the related errors, I have added a function named create_dir. With this, the file pointer in the BGD_INSTANCE was no longer required. A new variable named current_db has been added in BGD_SESSION which will be required for naming the data files. All these changes can be reviewed here.

    We have finalized the naming convetion for data files. The data files will be named as ..data. For example, if we have a database named test and a table inside it named t1, then the data file would be named as test.t1.data. It will be stored in the directory provided by options parameter. Now, we can have multiple files open in a single session. I used HASHTABLE for mapping file names to their respective file descriptors. It is added in BGD_INSTANCE. Since, we do not have a closeInstance or freeInstance method, it is not possible to decide when and where to close these file descriptors. Closing them in closeSession or freeSession is not possible as these descriptors are shared by multiple sessions under single instance. To manage this, a static linked list of instances is created which is freed only when MaxScale goes down. The code changes for this can be reviewed here. To make the code a little modular, log_insert_data function was added. I plan to write a generic function to log the data from any query, if possible. With this I also discoveed that the data should not be logged if there is some error when executing the query. So, instead of logging data from routeQuery, which was done before, it is logged from clientReply function only if the query is executed successfully. Two new filter function were added for this functionality to work, namely, setUpstream and clientReply. Also, the query buffer (GWBUF) is stored in BGD_SESSION to process it in clientReply. These changes can be reviewed here.

    To keep current_db updated, I had to handle USE DATABASE query. Its query type byte is set to MYSQL_COM_INIT_DB. A variable named, new_db is added in BGD_SESSION for updating current_db only if it is changed successfully. These changes can be reviewed here.

    I am yet to retrieve data from INSERT query. I ran into a lot of problems when trying to do so. In this week, I am targetting to complete following functionalities

    1. Retrieving data from INSERT query
    2. Make use of match regex or specify tables names directly. (To be decided)
    3. Create data files when CREATE TABLE is executed.

    Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!

    • 8 years ago
    • #maxscale
    • #gsoc15
    • #mariadb
    • #gsoc
    0 Comments
  • MXS-2: Week 1

    For the first week my target was to write a simple filter for MaxScale which will log only INSERT queries in a file. I have named the filter as bgdfilter for now which stands for BigData Filter. Following are filter INSTANCE and SESSION data structures

    typedef struct {
        char *format;   /* Storage format JSON or XML (Default: JSON */
        char *path;     /* Path to a folder where to store all data files */
        char *match;    /* Mandatory regex to match against table names */
    
        regex_t re;     /*  Compiled regex text */
    
        FILE *fp;
        char *filebase;
    } BGD_INSTANCE;
    
    
    typedef struct {
        DOWNSTREAM down;
        int active;
    } BGD_SESSION;

    A few of these fields are not used for this simple filter. Also, these are not final data structures, I will keep on updating them as the features are implemented. Also the path variable currently represents the log file.

    I have forked MaxScale repo and created a new branch named MXS-2 from develop branch. My latest code changes can be seen here.

    For checking the bgdfilter, configuration can be written as

    [BGD]
    type=filter
    module=bgdfilter
    
    [RWSplitRouter]
    type=service
    router=readwritesplit
    servers=master,s1,s2,s3
    user=maxuser
    passwd=C8315EB77701CED103285274D0E022FB
    max_slave_connections=100%
    localhost_match_wildcard_host=1
    filters=BGD
    

    The path parameter is optional. By default the log file is created at /tmp/bgd.

    Next up is retrieving data from INSERT queries and writing it into data files. Also, will decide a naming convention for data files.

    Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome!

    • 8 years ago
    • #maxscale
    • #mariadb
    • #gsoc15
    • #gsoc
    0 Comments
  • MaxScale Installation and Configuration

    I have started working with MariaDB as part of Google Sumer of Code 2015. I will be working on MaxScale issue MXS-2. The first step before starting work was to install and configure MaxScale. This post will give a general idea about the MaxScale configuration.

    MaxScale works with almost all versions of MariaDB (>= 5.5 and

    1. Running multiple MariaDB instances on single machine. Using this blog I configured 4 instances on my machine. It also provides a simple script for the same.
    2. Setting up master-slave replication for MariaDB. I configured one master and three slaves on single machine. There are two resources for setting up the replication - MariaDB official Blog post and Ivan’s blog.
    3. It is better to install MaxScale from source code. The official documentation for building MaxScale from source code gives all the requirements for several Linux distributions. I am using Ubuntu 14.04 and apart from the listed packages in the documentation, I had to install libcurl4-openssl-dev and libpcre3-dev for running cmake.
    4. And again for configuring MaxScale, the official blog post and Ivan’s blog are sufficient. One can also check if it is working fine by executing example commands mentioned in the blogs.

    All the above links give enough instructions on how to get started with MariaDB replication and setting up MaxScale. Apart from these, following is some useful stuff that I discovered while setting up.

    • To check replication error (if any) one can use show slave status\G command on slave database server. The error field will be empty if replication setup is fine.
    • Always refer “localhost” by a loopback address (e.g. 127.0.0.1) in configuration files as well as when specifying host to connect to in mysql command.
    • User localhost_match_wildcard_host=1 when specifying router service. This allows MaxScale to match ‘%’ with ‘localhost’.
    • Turn on debugging to spot errors in MaxScale setup. Add the log flags in configuration file under [maxscale] section. The log flags are log_trace, log_messages and log_debug. Set all these to 1. The log files are stored in folder $MAXSCALE_HOME/log.

    Feel free to point out mistakes, if any. Also, suggestions and/or reviews are welcome! 

    • 8 years ago
    • #maxscale
    • #mariadb
    • #gsoc15
    • #gsoc
    0 Comments
  • Adding support for OR REPLACE, IF NOT EXISTS and IF EXISTS (Pencils down)

    Hello All,

    Yesterday was the the hard ‘pencils down’ date for Google Summer of Code'2014. If you have been following my blog, you would know that I had been working on adding support for OR REPLACE, IF NOT EXISTS and DROP…IF EXISTS commands into MariaDB for all objects. MDEV-5359 gives details about the project. This blog post is to summarize the work I have done in last three months.

    I started with studying the MariaDB code base. I started off with bazaar and launchpad but immediately moved to github repo as I am more comfortable with git. Also, my projects changes will be applied to MariaDB version 10.1, so it allowed me to switch to github.

    After getting into the coding standards and the work flow of the “query execution” in MariaDB in the first week. I picked up the most simple command of them all which required very little code changes (at least this is what I thought at that time), CREATE OR REPLACE DATABASE.

    At the end of second week I was all into coding. I thought of writing blog posts every week to keep track of what I do each week and my mentor, Alexander, also gave a thumbs up to that. Believe me, writing blog was the first most important decision I took, this helped me a lot. The response that I got for my blog was also great. Many developers from MariaDB mailing lists joined in and gave their inputs about what should be there in the blog and what should not be there. So, thanks to all of them, my blog got better and better.

    Again, making use of the blog that I have written and simplifying my job, following are the links to the work I have done each week starting from first week to the last. Please, visit them all and get yourself acquainted with all the changes that I have made.

    • CREATE OR REPLACE DATABASE, CREATE USER/ROLE/SERVER IF NOT EXISTS and DROP USER/ROLE IF EXISTS
    • CREATE PROCEDURE/VIEW IF NOT EXISTS
    • CREATE TRIGGER/FUNCTION(STORED and UDF) IF NOT EXISTS, DROP FUNCTION(UDF) IF EXISTS
    • All about bin logging
    • Fixed MDEV-6409
    • CREATE OR REPLACE SERVER/TRIGGER
    • CREATE OR REPLACE PROCEDURE/FUNCTION(STORED)/EVENT
    • CREATE OR REPLACE FUNCTION(UDF)/USER/ROLE, CREATE DATABASE IF NOT EXISTS and permission checks
    • Test cases and bug fixes
    • CREATE OR REPLACE INDEX, Replication test cases and bug fixes

    I have also made some changes in the last week which can be checked on my repository. You can find my repository here.

    Ending this, I would like to thank everyone for allowing we to work on such an amazing project, and everyone who helped me out, on IRC, on the mailing lists, and everywhere else. :)

    Thanks a lot!

    • 9 years ago
    • #GSoC2014
    • #MariaDB
    • #PencilsDown
    0 Comments
© 2011–2024
Next page
  • Page 1 / 3