This is my take on the "Build your own SQLite" Challenge by CodeCrafters. It's a simple command line tool as well as a library to parse a .db file in the SQLite file format run some simple against the database. It's mostly for didactic purposes, and is therefore not even close to feature complete, fast, or smart about the way it runs its queries and retrieves data from the SQLite file. Still, it was a very nice introduction to the world of relational databases and their actual implementation behind the scenes, and it was quite some fun to work with. If you want to give it a spin, you can find some instructions below.
To get started, build the tool with the following command:
cargo build -rA few sample databases are provided to show the functionality of the tool:
sample.db: a small database containing two tables,applesandoranges, with a few rows in them and no indicessuperheroes.db: a larger database (~1MB) containing one table,superheroes, with more rows (~7K) and no indicescompanies.db: an even larger datbase (~7MB) containing one table,companies, with even more rows (~56K) and one index on thecountrycolumn
Note
The commands below are among the few that work and might not necessarily work on all SQLite files, depending on the schema of the tables in the database. It doesn't take a lot of straying off of this path to run into panics and other more-or-less user friendly errors. You can try them out on the given sample files, and have some fun adding support for a SQLite database you want to run the tool on if you feel like it :)
You can get some information about the db with the .dbinfo command:
$ sqlzero sample.db '.dbinfo'
database page size: 4096
number of tables: 2You can also get the names of the tables or indices in the file with the .tables and .indices commands:
$ sqlzero sample.db '.tables'
apples oranges
$ sqlzero companies.db '.indices'
idx_companies_country on companies(country)You can count all the the rows in a table:
$ sqlzero companies.db 'select count(*) from companies'
55991You can run a select * query to get all the rows in a table:
$ sqlzero sample.db 'select * from apples'
1|Granny Smith|Light Green
2|Fuji|Red
3|Honeycrisp|Blush Red
4|Golden Delicious|YellowThis also works against the sqlite_schema table, even though it wasn't listed in the output of .tables:
$ sqlzero sample.db 'select * from sqlite_schema'
table|1|apples|2|CREATE TABLE apples
(
id integer primary key autoincrement,
name text,
color text
)
table|2|sqlite_sequence|3|CREATE TABLE sqlite_sequence(name,seq)
table|3|oranges|4|CREATE TABLE oranges
(
id integer primary key autoincrement,
name text,
description text
)You can also project single columns:
$ sqlzero superheroes.db 'select * from sqlite_schema'
table|8|superheroes|2|CREATE TABLE "superheroes" (id integer primary key autoincrement, name text not null, eye_color text, hair_color text, appearance_count integer, first_appearance text, first_appearance_year text)
table|9|sqlite_sequence|3|CREATE TABLE sqlite_sequence(name,seq)
$ sqlzero superheroes.db 'select name, eye_color, first_appearance, first_appearance_year from superheroes'
Batman (Bruce Wayne)|Blue Eyes|1939, May|1939
Superman (Clark Kent)|Blue Eyes|1986, October|1986
Green Lantern (Hal Jordan)|Brown Eyes|1959, October|1959
James Gordon (New Earth)|Brown Eyes|1987, February|1987
Richard Grayson (New Earth)|Blue Eyes|1940, April|1940
Wonder Woman (Diana Prince)|Blue Eyes|1941, December|1941
Aquaman (Arthur Curry)|Blue Eyes|1941, November|1941
...
You can also filter on a single text column (mind the single quotes '):
$ sqlzero superheroes.db "select name, eye_color, first_appearance, first_appearance_year from superheroes where first_appearance_year = '1986'"
Superman (Clark Kent)|Blue Eyes|1986, October|1986
Alexander Luthor (New Earth)|Green Eyes|1986, October|1986
James Olsen (New Earth)|Green Eyes|1986, October|1986
Theodore Kord (New Earth)|Blue Eyes|1986, February|1986
Michael Jon Carter (New Earth)|Blue Eyes|1986, February|1986
Perry White (New Earth)|Blue Eyes|1986, October|1986
Kilowog (New Earth)|Red Eyes|1986, June|1986
Martha Clark (New Earth)|Blue Eyes|1986, October|1986
Jonathan Kent (New Earth)|Blue Eyes|1986, October|1986
Amanda Waller (New Earth)|Brown Eyes|1986, November|1986
...If the column you filter on has a corresponding index, then the index will be used instead of doing a full table scan (one is much faster than the other):
$ sqlzero companies.db '.indices'
idx_companies_country on companies(country)
$ time sqlzero companies.db "select id, name from companies where name = 'unilink s.c.'"
121311|unilink s.c.
sqlzero companies.db 0.05s user 0.02s system 98% cpu 0.068 total
$ time sqlzero companies.db "select id, name from companies where country = 'eritrea'"
121311|unilink s.c.
2102438|orange asmara it solutions
6634629|asmara rental
5729848|zara mining share company
sqlzero companies.db 0.00s user 0.00s system 89% cpu 0.002 totalLastly, a weird little flex: while this re-implementation is not feature complete, I have jerry-rigged the SQL parser and database tuple extraction code just enough to support the schema used by Anki Desktop :
$ sqlzero collection.anki2 'select * from sqlite_schema'
table|1|col|2|CREATE TABLE col (
id integer primary key,
crt integer not null,
mod integer not null,
scm integer not null,
ver integer not null,
dty integer not null,
usn integer not null,
ls integer not null,
conf text not null,
models text not null,
decks text not null,
dconf text not null,
tags text not null
)
table|2|notes|3|CREATE TABLE notes (
id integer primary key, /* 0 */
guid text not null, /* 1 */
mid integer not null, /* 2 */
mod integer not null, /* 3 */
usn integer not null, /* 4 */
tags text not null, /* 5 */
flds text not null, /* 6 */
sfld integer not null, /* 7 */
csum integer not null, /* 8 */
flags integer not null, /* 9 */
data text not null /* 10 */
)
table|3|cards|4|CREATE TABLE cards (
id integer primary key, /* 0 */
nid integer not null, /* 1 */
did integer not null, /* 2 */
ord integer not null, /* 3 */
mod integer not null, /* 4 */
usn integer not null, /* 5 */
type integer not null, /* 6 */
queue integer not null, /* 7 */
due integer not null, /* 8 */
ivl integer not null, /* 9 */
factor integer not null, /* 10 */
reps integer not null, /* 11 */
lapses integer not null, /* 12 */
left integer not null, /* 13 */
odue integer not null, /* 14 */
odid integer not null, /* 15 */
flags integer not null, /* 16 */
data text not null /* 17 */
)
table|4|revlog|5|CREATE TABLE revlog (
id integer primary key,
cid integer not null,
usn integer not null,
ease integer not null,
ivl integer not null,
lastIvl integer not null,
factor integer not null,
time integer not null,
type integer not null
)
table|5|sqlite_stat1|6|CREATE TABLE sqlite_stat1(tbl,idx,stat)
table|6|sqlite_stat4|7|CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample)
table|7|android_metadata|8|CREATE TABLE android_metadata (locale TEXT)
table|8|deck_config|9|CREATE TABLE deck_config (
id integer PRIMARY KEY NOT NULL,
name text NOT NULL COLLATE unicase,
mtime_secs integer NOT NULL,
usn integer NOT NULL,
config blob NOT NULL
)
table|9|config|10|CREATE TABLE config (
KEY text NOT NULL PRIMARY KEY,
usn integer NOT NULL,
mtime_secs integer NOT NULL,
val blob NOT NULL
) without rowid
table|10|fields|11|CREATE TABLE fields (
ntid integer NOT NULL,
ord integer NOT NULL,
name text NOT NULL COLLATE unicase,
config blob NOT NULL,
PRIMARY KEY (ntid, ord)
) without rowid
table|11|templates|12|CREATE TABLE templates (
ntid integer NOT NULL,
ord integer NOT NULL,
name text NOT NULL COLLATE unicase,
mtime_secs integer NOT NULL,
usn integer NOT NULL,
config blob NOT NULL,
PRIMARY KEY (ntid, ord)
) without rowid
table|12|notetypes|13|CREATE TABLE notetypes (
id integer NOT NULL PRIMARY KEY,
name text NOT NULL COLLATE unicase,
mtime_secs integer NOT NULL,
usn integer NOT NULL,
config blob NOT NULL
)
table|13|decks|14|CREATE TABLE decks (
id integer PRIMARY KEY NOT NULL,
name text NOT NULL COLLATE unicase,
mtime_secs integer NOT NULL,
usn integer NOT NULL,
common blob NOT NULL,
kind blob NOT NULL
)
table|14|tags|16|CREATE TABLE tags (
tag text NOT NULL PRIMARY KEY COLLATE unicase,
usn integer NOT NULL,
collapsed boolean NOT NULL,
config blob NULL
) without rowid
table|15|graves|18|CREATE TABLE graves (
oid integer NOT NULL,
type integer NOT NULL,
usn integer NOT NULL,
PRIMARY KEY (oid, type)
) WITHOUT ROWID
index|16|notes|19|CREATE INDEX ix_notes_usn on notes (usn)
index|17|cards|20|CREATE INDEX ix_cards_usn on cards (usn)
index|18|revlog|21|CREATE INDEX ix_revlog_usn on revlog (usn)
index|19|cards|22|CREATE INDEX ix_cards_nid on cards (nid)
index|20|cards|23|CREATE INDEX ix_cards_sched on cards (did, queue, due)
index|21|revlog|24|CREATE INDEX ix_revlog_cid on revlog (cid)
index|22|notes|25|CREATE INDEX ix_notes_csum on notes (csum)
index|23|fields|26|CREATE UNIQUE INDEX idx_fields_name_ntid ON fields (name, ntid)
index|24|templates|27|CREATE UNIQUE INDEX idx_templates_name_ntid ON templates (name, ntid)
index|25|templates|28|CREATE INDEX idx_templates_usn ON templates (usn)
index|26|notetypes|29|CREATE UNIQUE INDEX idx_notetypes_name ON notetypes (name)
index|27|notetypes|30|CREATE INDEX idx_notetypes_usn ON notetypes (usn)
index|28|decks|31|CREATE UNIQUE INDEX idx_decks_name ON decks (name)
index|29|notes|32|CREATE INDEX idx_notes_mid ON notes (mid)
index|30|cards|33|CREATE INDEX idx_cards_odid ON cards (odid)
WHERE odid != 0
index|31|graves|34|CREATE INDEX idx_graves_pending ON graves (usn)
idx_companies_country on companies(country)
$ sqlzero collection.anki2 'select count(*) from revlog'
613165
$ sqlzero collection.anki2 "SELECT * FROM notes where guid = 'mFHgaIL9ZR'"
1593961343452|mFHgaIL9ZR|1586604965499|1694931669|18732||Why can there be unused direct dependencies in a binary/library?It is related to the options used by the linker when building the binary. Although the program does not need an unused library, it was still linked and labelled as <code>NEEDED</code> in the information about the object file|Why can there be unused direct dependencies in a binary/library?|1375559563|0|The collection used in the example was ~60MB in size and had tuples with enough data to overflow to multiple pages. It's not the most complex case, but it's one I cared particularly about since I'm an avid user of Anki, and my tool is compatible with a real-world datase I use every day :) (somewhat at least).