From 63847496f14c813a5d80efd5b7de0f1294ffe1e3 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 13 Apr 2024 16:07:11 +0200 Subject: Adding upstream version 3.45.1. Signed-off-by: Daniel Baumann --- www/howitworks.html | 387 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 387 insertions(+) create mode 100644 www/howitworks.html (limited to 'www/howitworks.html') diff --git a/www/howitworks.html b/www/howitworks.html new file mode 100644 index 0000000..a313691 --- /dev/null +++ b/www/howitworks.html @@ -0,0 +1,387 @@ + + + + + +How SQLite Works + + + +
+ + + +
+
+Small. Fast. Reliable.
Choose any three. +
+ + +
+
+ + + +
+
+
+ +
+
+
+How SQLite Works +
+ + +
+ + + + + +

1. Background

+ +

+SQLite is a software library +that translates high-level disk I/O requests generated by an application +into low-level I/O operations that +can be carried out by the operating system. The application constructs +high-level I/O requests using the +SQL language. +SQLite translates each high-level SQL statement into a sequence of +many low-level I/O requests (open a file, read a few bytes from a file, +write a few bytes into a file, etc.) that do the work requested by +the SQL. + + +

+An application program could do all its disk I/O by direct calls to +operating system I/O routines +or by using a key/value storage engine like +Berkeley DB or +RocksDB (to name but two). +But there are advantages to using a higher-level interface based +on the SQL language. + +

    +
  1. +SQL is a very high-level language. +A few lines of SQL can replace hundreds or thousands of lines of +procedural code. SQL thus reduces +the amount of work needed to develop and maintain the application, +and thereby helps to reduce the number of bugs in the application. + +

  2. +SQL and SQLite are +transactional. +The use of a transactional storage system makes it much easier to +reason about the behavior of the application, and to write applications +that are robust, even in the face of software bugs, hardware faults, +or power losses. + +

  3. +SQLite is often +faster than direct low-level I/O. +This is counterintuitive. One would expect that a +high-level interface such as SQLite would impose a run-time penalty. +And, theoretically, that is correct. But in practice, SQL-based +systems such as SQLite do so many behind-the-scenes optimizations +that an application developer would never have time to create and +maintain, that the SQL-based systems end up providing +a net performance gain. +

+ +

1.1. SQLite Is Different From Most Other SQL Databases

+ +

+There are many SQL-based database management systems available, besides +SQLite. Common options include MySQL, PostgreSQL, and SQL-Server. +All these systems use the SQL langauge to communicate with the +application, just like SQLite. But these other systems different +from SQLite in important respects. + +

    +
  1. +SQLite is a serverless software library, whereas the other +systems are client-server based. With MySQL, PostgreSQL, SQL-Server, +and others, the application sends a message containing some SQL +over to a separate server thread or process. That separate +thread or process performs the requested I/O, then send the results +back to the application. But there is no separate thread or +process with SQLite. SQLite runs in the same address space as +the application, using the same program counter and heap storage. +SQLite does no interprocess communication (IPC). When an application +sends an SQL statement into SQLite (by invoking a the appropriate +SQLite library subroutine), SQLite interprets the SQL +in the same thread as the caller. When an SQLite API routine +returns, it does not leave behind any background tasks that +run separately from the application. + +

  2. +An SQLite database is a single ordinary file on disk +(with a well-defined file format). With other +systems, a "database" is usually a large number of separate +files hidden away in obscure directories of the filesystem, or +even spread across multiple machines. But with SQLite, a +complete database is just an ordinary disk file. +

+ +

2. SQL Is A Programming Language

+ +

+The best way to understand how SQL database engines work is to +think of SQL as a programming language, not as a "query language". +Each SQL statement is a separate program. Applications construct +SQL program source files and send them to the database engine. +The database engine compiles the SQL source code into executable +form, runs that executable, then sends the result back to the +application. + +

+While SQL is a programming language, it is different from other +programming languages like C, Javascript, Python, or Go in that +SQL is a +declarative language +where the others are +imperative languages. +This is an important difference that has implications for the +design of the compiler used to translate program source text into an +executable format. However, those details +should not detract from the fact that +SQL is really just another programming language. + +

2.1. Programming Language Processing Steps

+ +

+All programming languages are processed in two steps: + +

    +
  1. +Translate the program source text into an executable format. +

  2. +Run the executable generated in the previous step in order to +carry out the desired action. +

+ +

+All programming languages uses those two basic steps. +The main difference is in the executable format. + +

+"Compiled" languages like C++ and Rust translate +the source text into machine code that can be directly executed +by the underlying hardware. There exist SQL database systems +that do the same with SQL - they translate each SQL statement +directly into machine code. But that approach is uncommon and +is not the approach taken by SQLite. + +

+Other languages like Java, Perl, Python, and TCL +typically translate the program source text into bytecode. +This bytecode is then run through an interpreter that +reads the bytecode and carries out the desired operations. +SQLite uses this bytecode approach. If you preceed +any SQL statement with the "EXPLAIN" keyword in SQLite, +it will show you the bytecode that is generated rather +than run the bytecode. + +

+Another approach is to translate the program source text +into a tree of objects in memory. This tree is the "executable". +An interpret runs the executable by walking the tree. +This is the technique used by MySQL, PostgreSQL, and +SQL-Server. + +

+Of course, not every language fits neatly into one of the +above categories. This applies to both SQL database engines +and more familiar imperative programming languages. Javascript +is famous for using a hybrid execution model, where the code +is initially compiled into a tree of objects, but might be +further translating (using just-in-time compilation) down into +more efficient bytecode or machine code, as a means of boosting +performance. + +

+The executable format really ends up being just an implementation +detail. The key point is that all languages have a compiler +step which translates programs into an executable format and +a run step that executes the compiled program. + + +

2.2. Compiling SQLite Programs

+ +

+When an SQL program is submitted to SQLite, the first step is +to split the source text into "tokens". A token might be: + +

    +
  • A language keyword like "SELECT" or "UPDATE". +
  • An identifier for a table or column or variable. +
  • Punctuation characters like "," or "==" or ";". +
  • Literal values: numeric or string constants. +
  • Whitespace or comments. +
+ +

+Whitespace and comment tokens are discarded. All other tokens +are fed into an +LALR(1) Parser +that analyzes the structure of the input program and generates an +Abstract Syntax Tree (AST) +for the input program. + +

+The parser forwards the AST on to the code generator. +The code generator is the heart of SQLite, and is where most of +the magic happens. +The code generator resolves symbolic names in the AST - matching +the names of columns and tables in the input SQL into actual +columns and tables of the database. The code generator also does +various transformations on the AST to "optimize" it. Finally +the code generator chooses appropriate algorithms to implement +the operations requested by the AST and constructs bytecode to +carry out those operations. + +

+The bytecode generated by the code generator is called a +"prepared statement". Translating SQL source text into a +prepared statement is analogous to converting a C++ program +into machine code by invoking gcc or clang. Human-readable +source text (SQL or C++) goes in, and a machine readable executable +(bytecode or machine code) comes out. + +

3. Further Reading

+ +
    +
  • +The Atomic Commit document describes how SQLite implements +transactions. + +

  • +The bytecode engine document has more information +on the bytecode format used by SQLite, and how to view and interpret +an SQLite prepared statement. + +

  • +The SQLite query planner and +Next Generation Query Planner documents have further details +on the algorithms SQLite uses to implement SQL statements and +how it goes above choosing an appropriate algorithm for each +individual SQL statement. +

+ + -- cgit v1.2.3