10: Lexical Scanner Implementation (2)
2018/10/25
I modify some part that I wrote yesterday.
src/sql/symbol.rs
pub struct Symbol<'a> {
name: &'a str,
len: usize,
token: Token,
group: Group,
}
I change Symbol.name
from String
to &str
type, because String
type stores data in the heap and &str
stores at the stack. There is no reason we should store name
in heap, since name
is a fixed value and put it in the stack would be faster.
You would notice the <'a>
syntax, it is the lifetime syntax in rust. The &str
usually live only in the scope, so I need to tell the compiler that name
should live as long as Symbol
.
src/sql/symbol.rs
pub enum Group {
DataType,
DoubleKeyword,
MultiKeyword,
Function,
Keyword,
}
I add more enums in Group
, where DataType
includes Int
, Char
, etc., and DoubleKeyword
means the a two-word keywords.
src/sql/symbol.rs
pub enum Token {
/* SQL Keywords */
//...
//...
/* SQL Function */
AVG,
COUNT,
MAX,
MIN,
SUM,
/* SQL Data Type */
Char,
Double,
Float,
Int,
Varchar,
}
I update struct Token
. There is only Keywords
part yesterday, and I fill up Function
and Data Type
. I just plan to implement Functions
and Data Type
listed in Token
, though other DBMS implement more.
You might wonder what is the difference between Varchar
and Char
. According to MySQL document, the Char
and Varchar
types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.
The Char
and Varchar
types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30)
can hold up to 30
characters.
The table shows the difference.
Value | CHAR(4) | Storage Required | VARCHAR(4) | Storage Required |
---|---|---|---|---|
'' | ' ' | 4 bytes | '' | 1 byte |
'ab' | 'ab ' | 4 bytes | 'ab' | 3 bytes |
'abcd' | 'abcd' | 4 bytes | 'abcd' | 5 bytes |
'abcdefgh' | 'abcd' | 4 bytes | 'abcd' | 5 bytes |
src/sql/symbol.rs
lazy_static! {
/// A static struct of token hashmap storing all tokens
pub static ref SYMBOLS: HashMap<&'static str, Symbol<'static>> = {
let mut m = HashMap::new();
// The following is maintained by hand according to `Token`
/* SQL Keywords */
m.insert("add", sym("add", Token::Add, Group::Keyword));
m.insert(
"add constraint",
sym("add constraint", Token::AddConstraint, Group::Keyword),
);
// ...
// ...
// ...
m // return m
};
}
I store all Symbol
to SYMBOLS
hashmap for later use by the scanner.
In C++, we can define and export a list, an array, or a map with value. Other file could simply include
the .h
or .cc
to use the exported variable.
However in Rust, we cannot do that without crates help (crate is a third party module). So, I use lazy_static
here, it helps us to export the hashmap just like C++. Using the macro lazy_static!
(You could see a macro as a decorator in Rust), it is possible to have statics that require code to be executed at runtime in order to be initialized.
Note that the lifetime of the SYMBOLS
should be static
, which mean it and its content should live in the whole life of the program. So, you could see the type is HashMap<&'static str, Symbol<'static>>
.
All that remains to do are lots of "dirty works". As you could see, I need to insert
all Symbol
in SYMBOLS
. This is a boring job by hand, but I still need to that. (Of course I write a simple script to make it easier)
Today commits:
74adcd01220e3b73947dcfa1d6ff83101953f843
sql: update Symbol, Group, Token. Implement SYMBOLS46a3ce29ab482197baebceb4197502e46e17b8ac
travis: add cargo test1257a38bdcb0f76e76f00ac70c5d5a4773462c1b
crate: add lazy_static7bf3f54b8ef0a6cfd3803d646fe7bb42a9d93307
sql: let symbol pub