10: Lexical Scanner Implementation (2)

2018/10/25

I modify some part that I wrote yesterday.

src/sql/symbol.rs

pub struct Symbol<'a> {
    name: &'a str,
    len: usize,
    token: Token,
    group: Group,
}

I change Symbol.name from String to &str type, because String type stores data in the heap and &str stores at the stack. There is no reason we should store name in heap, since name is a fixed value and put it in the stack would be faster.

You would notice the <'a> syntax, it is the lifetime syntax in rust. The &str usually live only in the scope, so I need to tell the compiler that name should live as long as Symbol.

src/sql/symbol.rs

pub enum Group {
    DataType,
    DoubleKeyword,
    MultiKeyword,
    Function,
    Keyword,
}

I add more enums in Group, where DataType includes Int, Char, etc., and DoubleKeyword means the a two-word keywords.

src/sql/symbol.rs

pub enum Token {
    /* SQL Keywords */

    //...
    //...

    /* SQL Function */
    AVG,
    COUNT,
    MAX,
    MIN,
    SUM,

    /* SQL Data Type */
    Char,
    Double,
    Float,
    Int,
    Varchar,
}

I update struct Token. There is only Keywords part yesterday, and I fill up Function and Data Type. I just plan to implement Functions and Data Type listed in Token, though other DBMS implement more.

You might wonder what is the difference between Varchar and Char. According to MySQL document, the Char and Varchar types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.

The Char and Varchar types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.

The table shows the difference.

Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes

src/sql/symbol.rs

lazy_static! {
    /// A static struct of token hashmap storing all tokens
    pub static ref SYMBOLS: HashMap<&'static str, Symbol<'static>> = {
        let mut m = HashMap::new();

        // The following is maintained by hand according to `Token`
        /* SQL Keywords */
        m.insert("add", sym("add", Token::Add, Group::Keyword));
        m.insert(
            "add constraint",
            sym("add constraint", Token::AddConstraint, Group::Keyword),
        );

        // ...
        // ...
        // ...

        m // return m
    };
}

I store all Symbol to SYMBOLS hashmap for later use by the scanner.

In C++, we can define and export a list, an array, or a map with value. Other file could simply include the .h or .cc to use the exported variable.

However in Rust, we cannot do that without crates help (crate is a third party module). So, I use lazy_static here, it helps us to export the hashmap just like C++. Using the macro lazy_static! (You could see a macro as a decorator in Rust), it is possible to have statics that require code to be executed at runtime in order to be initialized.

Note that the lifetime of the SYMBOLS should be static, which mean it and its content should live in the whole life of the program. So, you could see the type is HashMap<&'static str, Symbol<'static>>.

All that remains to do are lots of "dirty works". As you could see, I need to insert all Symbol in SYMBOLS. This is a boring job by hand, but I still need to that. (Of course I write a simple script to make it easier)

Today commits:

  • 74adcd01220e3b73947dcfa1d6ff83101953f843 sql: update Symbol, Group, Token. Implement SYMBOLS
  • 46a3ce29ab482197baebceb4197502e46e17b8ac travis: add cargo test
  • 1257a38bdcb0f76e76f00ac70c5d5a4773462c1b crate: add lazy_static
  • 7bf3f54b8ef0a6cfd3803d646fe7bb42a9d93307 sql: let symbol pub

results matching ""

    No results matching ""