1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
|
<!DOCTYPE html>
<html><head>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<link href="sqlite.css" rel="stylesheet">
<title>How SQLite Is Tested</title>
<!-- path= -->
</head>
<body>
<div class=nosearch>
<a href="index.html">
<img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
</a>
<div><!-- IE hack to prevent disappearing logo --></div>
<div class="tagline desktoponly">
Small. Fast. Reliable.<br>Choose any three.
</div>
<div class="menu mainmenu">
<ul>
<li><a href="index.html">Home</a>
<li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
<li class='wideonly'><a href='about.html'>About</a>
<li class='desktoponly'><a href="docs.html">Documentation</a>
<li class='desktoponly'><a href="download.html">Download</a>
<li class='wideonly'><a href='copyright.html'>License</a>
<li class='desktoponly'><a href="support.html">Support</a>
<li class='desktoponly'><a href="prosupport.html">Purchase</a>
<li class='search' id='search_menubutton'>
<a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
</ul>
</div>
<div class="menu submenu" id="submenu">
<ul>
<li><a href='about.html'>About</a>
<li><a href='docs.html'>Documentation</a>
<li><a href='download.html'>Download</a>
<li><a href='support.html'>Support</a>
<li><a href='prosupport.html'>Purchase</a>
</ul>
</div>
<div class="searchmenu" id="searchmenu">
<form method="GET" action="search">
<select name="s" id="searchtype">
<option value="d">Search Documentation</option>
<option value="c">Search Changelog</option>
</select>
<input type="text" name="q" id="searchbox" value="">
<input type="submit" value="Go">
</form>
</div>
</div>
<script>
function toggle_div(nm) {
var w = document.getElementById(nm);
if( w.style.display=="block" ){
w.style.display = "none";
}else{
w.style.display = "block";
}
}
function toggle_search() {
var w = document.getElementById("searchmenu");
if( w.style.display=="block" ){
w.style.display = "none";
} else {
w.style.display = "block";
setTimeout(function(){
document.getElementById("searchbox").focus()
}, 30);
}
}
function div_off(nm){document.getElementById(nm).style.display="none";}
window.onbeforeunload = function(e){div_off("submenu");}
/* Disable the Search feature if we are not operating from CGI, since */
/* Search is accomplished using CGI and will not work without it. */
if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
document.getElementById("search_menubutton").style.display = "none";
}
/* Used by the Hide/Show button beside syntax diagrams, to toggle the */
function hideorshow(btn,obj){
var x = document.getElementById(obj);
var b = document.getElementById(btn);
if( x.style.display!='none' ){
x.style.display = 'none';
b.innerHTML='show';
}else{
x.style.display = '';
b.innerHTML='hide';
}
return false;
}
var antiRobot = 0;
function antiRobotGo(){
if( antiRobot!=3 ) return;
antiRobot = 7;
var j = document.getElementById("mtimelink");
if(j && j.hasAttribute("data-href")) j.href=j.getAttribute("data-href");
}
function antiRobotDefense(){
document.body.onmousedown=function(){
antiRobot |= 2;
antiRobotGo();
document.body.onmousedown=null;
}
document.body.onmousemove=function(){
antiRobot |= 2;
antiRobotGo();
document.body.onmousemove=null;
}
setTimeout(function(){
antiRobot |= 1;
antiRobotGo();
}, 100)
antiRobotGo();
}
antiRobotDefense();
</script>
<div class=fancy>
<div class=nosearch>
<div class="fancy_title">
How SQLite Is Tested
</div>
<div class="fancy_toc">
<a onclick="toggle_toc()">
<span class="fancy_toc_mark" id="toc_mk">►</span>
Table Of Contents
</a>
<div id="toc_sub"><div class="fancy-toc1"><a href="#introduction">1. Introduction</a></div>
<div class="fancy-toc2"><a href="#executive_summary">1.1. Executive Summary</a></div>
<div class="fancy-toc1"><a href="#test_harnesses">2. Test Harnesses</a></div>
<div class="fancy-toc1"><a href="#anomaly_testing">3. Anomaly Testing</a></div>
<div class="fancy-toc2"><a href="#out_of_memory_testing">3.1. Out-Of-Memory Testing</a></div>
<div class="fancy-toc2"><a href="#i_o_error_testing">3.2. I/O Error Testing</a></div>
<div class="fancy-toc2"><a href="#crash_testing">3.3. Crash Testing</a></div>
<div class="fancy-toc2"><a href="#compound_failure_tests">3.4. Compound failure tests</a></div>
<div class="fancy-toc1"><a href="#fuzz_testing">4. Fuzz Testing</a></div>
<div class="fancy-toc2"><a href="#sql_fuzz">4.1. SQL Fuzz</a></div>
<div class="fancy-toc3"><a href="#sql_fuzz_using_the_american_fuzzy_lop_fuzzer">4.1.1. SQL Fuzz Using The American Fuzzy Lop Fuzzer</a></div>
<div class="fancy-toc3"><a href="#google_oss_fuzz">4.1.2. Google OSS Fuzz</a></div>
<div class="fancy-toc3"><a href="#the_dbsqlfuzz_and_jfuzz_fuzzers">4.1.3. The dbsqlfuzz and jfuzz fuzzers</a></div>
<div class="fancy-toc3"><a href="#other_third_party_fuzzers">4.1.4. Other third-party fuzzers</a></div>
<div class="fancy-toc3"><a href="#the_fuzzcheck_test_harness">4.1.5. The fuzzcheck test harness</a></div>
<div class="fancy-toc3"><a href="#tension_between_fuzz_testing_and_100_mc_dc_testing">4.1.6. Tension Between Fuzz Testing And 100% MC/DC Testing</a></div>
<div class="fancy-toc2"><a href="#malformed_database_files">4.2. Malformed Database Files</a></div>
<div class="fancy-toc2"><a href="#boundary_value_tests">4.3. Boundary Value Tests</a></div>
<div class="fancy-toc1"><a href="#regression_testing">5. Regression Testing</a></div>
<div class="fancy-toc1"><a href="#automatic_resource_leak_detection">6. Automatic Resource Leak Detection</a></div>
<div class="fancy-toc1"><a href="#test_coverage">7. Test Coverage</a></div>
<div class="fancy-toc2"><a href="#statement_versus_branch_coverage">7.1. Statement versus branch coverage</a></div>
<div class="fancy-toc2"><a href="#coverage_testing_of_defensive_code">7.2. Coverage testing of defensive code</a></div>
<div class="fancy-toc2"><a href="#forcing_coverage_of_boundary_values_and_boolean_vector_tests">7.3. Forcing coverage of boundary values and boolean vector tests</a></div>
<div class="fancy-toc2"><a href="#branch_coverage_versus_mc_dc">7.4. Branch coverage versus MC/DC</a></div>
<div class="fancy-toc2"><a href="#measuring_branch_coverage">7.5. Measuring branch coverage</a></div>
<div class="fancy-toc2"><a href="#mutation_testing">7.6. Mutation testing</a></div>
<div class="fancy-toc2"><a href="#experience_with_full_test_coverage">7.7. Experience with full test coverage</a></div>
<div class="fancy-toc1"><a href="#dynamic_analysis">8. Dynamic Analysis</a></div>
<div class="fancy-toc2"><a href="#assert">8.1. Assert</a></div>
<div class="fancy-toc2"><a href="#valgrind">8.2. Valgrind</a></div>
<div class="fancy-toc2"><a href="#memsys2">8.3. Memsys2</a></div>
<div class="fancy-toc2"><a href="#mutex_asserts">8.4. Mutex Asserts</a></div>
<div class="fancy-toc2"><a href="#journal_tests">8.5. Journal Tests</a></div>
<div class="fancy-toc2"><a href="#undefined_behavior_checks">8.6. Undefined Behavior Checks</a></div>
<div class="fancy-toc1"><a href="#disabled_optimization_tests">9. Disabled Optimization Tests</a></div>
<div class="fancy-toc1"><a href="#checklists">10. Checklists</a></div>
<div class="fancy-toc1"><a href="#static_analysis">11. Static Analysis</a></div>
<div class="fancy-toc1"><a href="#summary">12. Summary</a></div>
</div>
</div>
<script>
function toggle_toc(){
var sub = document.getElementById("toc_sub")
var mk = document.getElementById("toc_mk")
if( sub.style.display!="block" ){
sub.style.display = "block";
mk.innerHTML = "▼";
} else {
sub.style.display = "none";
mk.innerHTML = "►";
}
}
</script>
</div>
<h1 id="introduction"><span>1. </span>Introduction</h1>
<p>The reliability and robustness of SQLite is achieved in part
by thorough and careful testing.</p>
<p>As of <a href="releaselog/3_42_0.html">version 3.42.0</a> (2023-05-16),
the SQLite library consists of approximately
155.8 KSLOC of C code.
(KSLOC means thousands of "Source Lines Of Code" or, in other words,
lines of code excluding blank lines and comments.)
By comparison, the project has
590 times as much
test code and test scripts -
92053.1 KSLOC.</p>
<h2 id="executive_summary"><span>1.1. </span>Executive Summary</h2>
<ul>
<li> Four independently developed test harnesses
</li><li> 100% branch test coverage in an as-deployed configuration
</li><li> Millions and millions of test cases
</li><li> Out-of-memory tests
</li><li> I/O error tests
</li><li> Crash and power loss tests
</li><li> Fuzz tests
</li><li> Boundary value tests
</li><li> Disabled optimization tests
</li><li> Regression tests
</li><li> Malformed database tests
</li><li> Extensive use of assert() and run-time checks
</li><li> Valgrind analysis
</li><li> Undefined behavior checks
</li><li> Checklists
</li></ul>
<a name="harnesses"></a>
<h1 id="test_harnesses"><span>2. </span>Test Harnesses</h1>
<p>There are four independent test harnesses used for testing the
core SQLite library.
Each test harness is designed, maintained, and managed separately
from the others.
</p>
<ol>
<li><p>
<a name="tcl"></a>
The <b>TCL Tests</b> are the original tests for SQLite.
They are contained in the same source tree as the
SQLite core and like the SQLite core are in the public domain. The
TCL tests are the primary tests used during development.
The TCL tests are written using the
<a href="http://www.tcl-lang.org/">TCL scripting language</a>.
The TCL test harness itself consists of 27.2 KSLOC
of C code used to create the TCL interface. The test scripts are contained
in 1390 files totaling
23.2MB in size. There are
51445 distinct test cases, but many of the test
cases are parameterized and run multiple times (with different parameters)
so that on a full test run millions of
separate tests are performed.
</p>
</li>
<li><p>
The <b><a href="th3.html">TH3</a></b> test harness is a set of proprietary tests, written in
C that provide 100% branch test coverage
(and <a href="testing.html#mcdc">100% MC/DC test coverage</a>) to
the core SQLite library. The TH3 tests are designed to run
on embedded and specialized platforms that would not easily support
TCL or other workstation services. TH3 tests use only the published
SQLite interfaces. TH3 consists of about
76.9 MB or 1055.4 KSLOC
of C code implementing 50362 distinct test cases.
TH3 tests are heavily parameterized, though, so a full-coverage test runs
about 2.4 million different test
instances.
The cases that provide 100% branch test coverage constitute
a subset of the total TH3 test suite. A soak test
prior to release does about
248.5 million tests.
Additional information on TH3 is <a href="th3.html">available separately</a>.</p></li>
<li><p>
<a name="slt"></a>
The <a href="https://www.sqlite.org/sqllogictest"><b>SQL Logic Test</b></a>
or SLT test harness is used to run huge numbers
of SQL statements against both SQLite and several other SQL database engines
and verify that they all get the same answers. SLT currently compares
SQLite against PostgreSQL, MySQL, Microsoft SQL Server, and Oracle 10g.
SLT runs 7.2 million queries comprising
1.12GB of test data.
</p></li>
<li><p>
The <a href="#dbsqlfuzz"><b>dbsqlfuzz</b></a> engine is a
proprietary fuzz tester. Other <a href="testing.html#fuzztesting">fuzzers for SQLite</a>
mutate either the SQL inputs or the database file. Dbsqlfuzz mutates
both the SQL and the database file at the same time, and is thus able
to reach new error states. Dbsqlfuzz is built using the
<a href="http://llvm.org/docs/LibFuzzer.html">libFuzzer</a> framework of LLVM
with a custom mutator. There are
336 seed files. The dbsqlfuzz fuzzer
runs about one billion test mutations per day.
Dbsqlfuzz helps ensure
that SQLite is robust against attack via malicious SQL or database
inputs.
</p></li></ol>
<p>In addition to the four main test harnesses, there many other
small programs that implement specialized tests. Here are a few
examples:
</p><ol>
<li value="5">The "speedtest1.c" program
estimates the performance of SQLite under a typical workload.
</li><li>The "mptester.c" program is a stress test for multiple processes
concurrently reading and writing a single database.
</li><li>The "threadtest3.c" program is a stress test for multiple threads using
SQLite simultaneously.
</li><li>The "fuzzershell.c" program is used to
run some <a href="#fuzztesting">fuzz tests</a>.
</li><li>The "jfuzz" program is a libfuzzer-based fuzzer for
<a href="json1.html#jsonbx">JSONB</a> inputs to the <a href="json1.html">JSON SQL functions</a>.
</li></ol>
<p>All of the tests above must run successfully, on multiple platforms
and under multiple compile-time configurations,
before each release of SQLite.</p>
<p>Prior to each check-in to the SQLite source tree, developers
typically run a subset (called "veryquick") of the Tcl tests
consisting of about
304.7 thousand test cases.
The veryquick tests include most tests other than the anomaly, fuzz, and
soak tests. The idea behind the veryquick tests are that they are
sufficient to catch most errors, but also run in only a few minutes
instead of a few hours.</p>
<a name="anomaly"></a>
<h1 id="anomaly_testing"><span>3. </span>Anomaly Testing</h1>
<p>Anomaly tests are tests designed to verify the correct behavior
of SQLite when something goes wrong. It is (relatively) easy to build
an SQL database engine that behaves correctly on well-formed inputs
on a fully functional computer. It is more difficult to build a system
that responds sanely to invalid inputs and continues to function following
system malfunctions. The anomaly tests are designed to verify the latter
behavior.</p>
<a name="oomtesting"></a>
<h2 id="out_of_memory_testing"><span>3.1. </span>Out-Of-Memory Testing</h2>
<p>SQLite, like all SQL database engines, makes extensive use of
malloc() (See the separate report on
<a href="malloc.html">dynamic memory allocation in SQLite</a> for
additional detail.)
On servers and workstations, malloc() never fails in practice and so correct
handling of out-of-memory (OOM) errors is not particularly important.
But on embedded devices, OOM errors are frighteningly common and since
SQLite is frequently used on embedded devices, it is important that
SQLite be able to gracefully handle OOM errors.</p>
<p>OOM testing is accomplished by simulating OOM errors.
SQLite allows an application to substitute an alternative malloc()
implementation using the <a href="c3ref/config.html">sqlite3_config</a>(<a href="c3ref/c_config_covering_index_scan.html#sqliteconfigmalloc">SQLITE_CONFIG_MALLOC</a>,...)
interface. The TCL and TH3 test harnesses are both capable of
inserting a modified version of malloc() that can be rigged to fail
after a certain number of allocations. These instrumented mallocs
can be set to fail only once and then start working again, or to
continue failing after the first failure. OOM tests are done in a
loop. On the first iteration of the loop, the instrumented malloc
is rigged to fail on the first allocation. Then some SQLite operation
is carried out and checks are done to make sure SQLite handled the
OOM error correctly. Then the time-to-failure counter
on the instrumented malloc is increased by one and the test is
repeated. The loop continues until the entire operation runs to
completion without ever encountering a simulated OOM failure.
Tests like this are run twice, once with the instrumented malloc
set to fail only once, and again with the instrumented malloc set
to fail continuously after the first failure.</p>
<a name="ioerrtesting"></a>
<h2 id="i_o_error_testing"><span>3.2. </span>I/O Error Testing</h2>
<p>I/O error testing seeks to verify that SQLite responds sanely
to failed I/O operations. I/O errors might result from a full disk drive,
malfunctioning disk hardware, network outages when using a network
file system, system configuration or permission changes that occur in the
middle of an SQL operation, or other hardware or operating system
malfunctions. Whatever the cause, it is important that SQLite be able
to respond correctly to these errors and I/O error testing seeks to
verify that it does.</p>
<p>I/O error testing is similar in concept to OOM testing; I/O errors
are simulated and checks are made to verify that SQLite responds
correctly to the simulated errors. I/O errors are simulated in both
the TCL and TH3 test harnesses by inserting a new
<a href="c3ref/vfs.html">Virtual File System object</a> that is specially rigged
to simulate an I/O error after a set number of I/O operations.
As with OOM error testing, the I/O error simulators can be set to
fail just once, or to fail continuously after the first failure.
Tests are run in a loop, slowly increasing the point of failure until
the test case runs to completion without error. The loop is run twice,
once with the I/O error simulator set to simulate only a single failure
and a second time with it set to fail all I/O operations after the first
failure.</p>
<p>In I/O error tests, after the I/O error simulation failure mechanism
is disabled, the database is examined using
<a href="pragma.html#pragma_integrity_check">PRAGMA integrity_check</a> to make sure that the I/O error has not
introduced database corruption.</p>
<a name="crashtesting"></a>
<h2 id="crash_testing"><span>3.3. </span>Crash Testing</h2>
<p>Crash testing seeks to demonstrate that an SQLite database will not
go corrupt if the application or operating system crashes or if there
is a power failure in the middle of a database update. A separate
white-paper titled
<a href="atomiccommit.html">Atomic Commit in SQLite</a> describes the
defensive measure SQLite takes to prevent database corruption following
a crash. Crash tests strive to verify that those defensive measures
are working correctly.</p>
<p>It is impractical to do crash testing using real power failures, of
course, and so crash testing is done in simulation. An alternative
<a href="c3ref/vfs.html">Virtual File System</a> is inserted that allows the test
harness to simulate the state of the database file following a crash.</p>
<p>In the TCL test harness, the crash simulation is done in a separate
process. The main testing process spawns a child process which runs
some SQLite operation and randomly crashes somewhere in the middle of
a write operation. A special <a href="vfs.html">VFS</a> randomly reorders and corrupts
the unsynchronized
write operations to simulate the effect of buffered filesystems. After
the child dies, the original test process opens and reads the test
database and verifies that the changes attempted by the child either
completed successfully or else were completely rolled back. The
<a href="pragma.html#pragma_integrity_check">integrity_check</a> <a href="pragma.html#syntax">PRAGMA</a> is used to make sure no database corruption
occurs.</p>
<p>The TH3 test harness needs to run on embedded systems that do not
necessarily have the ability to spawn child processes, so it uses
an in-memory <a href="vfs.html">VFS</a> to simulate crashes. The in-memory <a href="vfs.html">VFS</a> can be rigged
to make a snapshot of the entire filesystem after a set number of I/O
operations. Crash tests run in a loop. On each iteration of the loop,
the point at which a snapshot is made is advanced until the SQLite
operations being tested run to completion without ever hitting a
snapshot. Within the loop, after the SQLite operation under test has
completed, the filesystem is reverted to the snapshot and random file
damage is introduced that is characteristic of the kinds of damage
one expects to see following a power loss. Then the database is opened
and checks are made to ensure that it is well-formed and that the
transaction either ran to completion or was completely rolled back.
The interior of the loop is repeated multiple times for each
snapshot with different random damage each time.</p>
<a name="multifail"></a>
<h2 id="compound_failure_tests"><span>3.4. </span>Compound failure tests</h2>
<p>The test suites for SQLite also explore the result of stacking
multiple failures. For example, tests are run to ensure correct behavior
when an I/O error or OOM fault occurs while trying to recover from a
prior crash.
<a name="fuzztesting"></a>
</p><h1 id="fuzz_testing"><span>4. </span>Fuzz Testing</h1>
<p><a href="http://en.wikipedia.org/wiki/Fuzz_testing">Fuzz testing</a>
seeks to establish that SQLite responds correctly to invalid, out-of-range,
or malformed inputs.</p>
<h2 id="sql_fuzz"><span>4.1. </span>SQL Fuzz</h2>
<p>SQL fuzz testing consists of creating syntactically correct yet
wildly nonsensical SQL statements and feeding them to SQLite to see
what it will do with them. Usually some kind of error is returned
(such as "no such table"). Sometimes, purely by chance, the SQL
statement also happens to be semantically correct. In that case, the
resulting prepared statement is run to make sure it gives a reasonable
result.</p>
<a name="aflfuzz"></a>
<h3 id="sql_fuzz_using_the_american_fuzzy_lop_fuzzer"><span>4.1.1. </span>SQL Fuzz Using The American Fuzzy Lop Fuzzer</h3>
<p>The concept of fuzz testing has been around for decades, but fuzz
testing was not an effective way to find bugs until 2014 when
Michal Zalewski invented the first practical profile-guided fuzzer,
<a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a> or "AFL".
Unlike prior fuzzers that blindly generate random inputs, AFL
instruments the program being tested (by modifying the assembly-language
output from the C compiler) and uses that instrumentation to detect when
an input causes the program to do something different - to follow
a new control path or loop a different number of times. Inputs that provoke
new behavior are retained and further mutated. In this way, AFL is able
to "discover" new behaviors of the program under test, including behaviors
that were never envisioned by the designers.
</p><p>AFL proved adept at finding arcane bugs in SQLite.
Most of the findings have been assert() statements where the conditional
was false under obscure circumstances. But AFL has also found
a fair number of crash bugs in SQLite, and even a few cases where SQLite
computed incorrect results.
</p><p>Because of its past success, AFL became a standard part of the testing
strategy for SQLite beginning with <a href="releaselog/3_8_10.html">version 3.8.10</a> (2015-05-07) until
it was superseded by better fuzzers in <a href="releaselog/3_29_0.html">version 3.29.0</a> (2019-07-10).
<a name="ossfuzz"></a>
</p><h3 id="google_oss_fuzz"><span>4.1.2. </span>Google OSS Fuzz</h3>
<p>Beginning in 2016, a team of engineers at Google started the
<a href="https://github.com/google/oss-fuzz">OSS Fuzz</a> project.
OSS Fuzz uses a AFL-style guided fuzzer running on Google's infrastructure.
The Fuzzer automatically downloads the latest check-ins for participating
projects, fuzzes them, and sends email to the developers reporting any
problems. When a fix is checked in, the fuzzer automatically detects this
and emails a confirmation to the developers.
</p><p>SQLite is one of many open-source projects that OSS Fuzz tests. The
<a href="https://www.sqlite.org/src/file/test/ossfuzz.c">test/ossfuzz.c</a> source file
in the SQLite repository is SQLite's interface to OSS fuzz.
</p><p>OSS Fuzz no longer finds historical bugs in SQLite. But it is still
running and does occasionally find issues in new development check-ins.
Examples:
<a href="https://www.sqlite.org/src/timeline?y=ci&c=c422afb507dc8757">[1]</a>
<a href="https://www.sqlite.org/src/timeline?y=ci&c=0a2eb949f8a759e5">[2]</a>
<a href="https://www.sqlite.org/src/timeline?y=ci&c=62f2235adf796c72">[3]</a>.
<a name="dbsqlfuzz"></a>
</p><h3 id="the_dbsqlfuzz_and_jfuzz_fuzzers"><span>4.1.3. </span>The dbsqlfuzz and jfuzz fuzzers</h3>
<p>Beginning in late 2018, SQLite has been fuzzed using a proprietary
fuzzer called "dbsqlfuzz". Dbsqlfuzz is built using the
<a href="http://llvm.org/docs/LibFuzzer.html">libFuzzer</a> framework of LLVM.
</p><p>The dbsqlfuzz fuzzer mutates both the SQL input and the database file
at the same time. Dbsqlfuzz uses a custom
<a href="https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md">Structure-Aware Mutator</a>
on a specialized input file that defines both an input database and SQL
text to be run against that database. Because it mutates both the input
database and the input SQL at the same time, dbsqlfuzz has been able to
find some obscure faults in SQLite that were missed by prior fuzzers that
mutated only SQL inputs or only the database file.
The SQLite developers keep dbsqlfuzz running against trunk in about
16 cores at all times. Each instance of dbsqlfuzz program is able to
evalutes about 400 test cases per second, meaning that about 500 million
cases are checked every day.</p>
<p>The dbsqlfuzz fuzzer has been very successful at hardening the
SQLite code base against malicious attack. Since dbsqlfuzz has been
added to the SQLite internal test suite, bug reports from external
fuzzers such as OSSFuzz have all but stopped.
</p><p>Note that dbsqlfuzz is <i>not</i> the Protobuf-based structure-aware
fuzzer for SQLite that is used by Chromium and described in the
<a href="https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md#user-content-example-sqlite">Structure-Aware Mutator article</a>.
There is no connection between these two fuzzers, other than the fact that they
are both based on <a href="http://llvm.org/docs/LibFuzzer.html">libFuzzer</a>
The Protobuf fuzzer for SQLite is written and maintained by the Chromium
team at Google, whereas dbsqlfuzz is written and maintained by the original
SQLite developers. Having multiple independently-developed fuzzers for SQLite
is good, as it means that obscure issues are more likely to be uncovered.
</p><p>Near the end of January 2024, a second libFuzzer-based tool called
"jfuzz" came into use. Jfuzz generates corrupt <a href="json1.html#jsonbx">JSONB</a> blobs and feeds
them into the <a href="json1.html">JSON SQL functions</a> to verify that the JSON functions
are able to safely and efficiently deal with corrupt binary inputs.
<a name="3pfuzz"></a>
</p><h3 id="other_third_party_fuzzers"><span>4.1.4. </span>Other third-party fuzzers</h3>
<p>SQLite seems to be a popular target for third-parties to fuzz.
The developers hear about many attempts to fuzz SQLite
and they do occasionally get bug reports found by independent
fuzzers. All such reports are promptly fixed, so the product is
improved and that the entire SQLite user community benefits.
This mechanism of having many independent testers is similar to
<a href="https://en.wikipedia.org/wiki/Linus%27s_law">Linus's law</a>:
"given enough eyeballs, all bugs are shallow".
</p><p>One fuzzing researcher of particular note is
<a href="https://www.manuelrigger.at/">Manuel Rigger</a>.
Most fuzzers only look for assertion faults, crashes, undefined behavior (UB),
or other easily detected anomalies. Dr. Rigger's fuzzers, on the other hand,
are able to find cases where SQLite computes an incorrect answer.
Rigger has found
<a href="https://www.sqlite.org/src/timeline?y=t&u=mrigger&n=all">many such cases</a>.
Most of these finds are obscure corner cases involving type
conversions and affinity transformations, and a good number of the finds
are against unreleased features. Nevertheless, his finds are still important
as they are real bugs,
and the SQLite developers are grateful to be able to identify and fix
the underlying problems.
<a name="fuzzcheck"></a>
</p><h3 id="the_fuzzcheck_test_harness"><span>4.1.5. </span>The fuzzcheck test harness</h3>
<p>Historical test cases from <a href="testing.html#aflfuzz">AFL</a>, <a href="testing.html#ossfuzz">OSS Fuzz</a>, and <a href="testing.html#dbsqlfuzz">dbsqlfuzz</a> are
collected in a set of database files in the main SQLite source tree
and then rerun by the "fuzzcheck" utility program whenever one runs
"make test". Fuzzcheck only runs a few thousand "interesting" cases
out of the billions of cases that the various fuzzers have
examined over the years. "Interesting" cases are cases that exhibit
previously unseen behavior. Actual bugs found by fuzzers are always
included among the interesting test cases, but most of the cases run
by fuzzcheck were never actual bugs.
<a name="tension"></a>
</p><h3 id="tension_between_fuzz_testing_and_100_mc_dc_testing"><span>4.1.6. </span>Tension Between Fuzz Testing And 100% MC/DC Testing</h3>
<p>Fuzz testing and <a href="testing.html#mcdc">100% MC/DC testing</a> are in tension with
one another.
That is to say, code tested to 100% MC/DC will tend to be
more vulnerable to problems found by fuzzing and code that performs
well during fuzz testing will tend to have (much) less than
100% MC/DC.
This is because MC/DC testing discourages <a href="testing.html#defcode">defensive code</a> with
unreachable branches, but without defensive code, a fuzzer is
more likely to find a path that causes problems. MC/DC testing
seems to work well for building code that is robust during
normal use, whereas fuzz testing is good for building code that is
robust against malicious attack.
</p><p>Of course, users would prefer code that is both robust in normal
use and resistant to malicious attack. The SQLite developers are
dedicated to providing that. The purpose of this section is merely
to point out that doing both at the same time is difficult.
</p><p>For much of its history SQLite has been focused on 100% MC/DC testing.
Resistance to fuzzing attacks only became a concern with the introduction
of AFL in 2014. For a while there, fuzzers were finding many problems
in SQLite. In more recent years, the testing strategy of SQLite has
evolved to place more emphasis on fuzz testing. We still maintain
100% MC/DC of the core SQLite code, but most testing CPU cycles are
now devoted to fuzzing.
</p><p>While fuzz testing and 100% MC/DC testing are in tension, they
are not completely at cross-purposes. The fact that the SQlite test
suite does test to 100% MC/DC means that when fuzzers do find problems,
those problems can be fixed quickly and with little risk of introducing
new errors.
</p><h2 id="malformed_database_files"><span>4.2. </span>Malformed Database Files</h2>
<p>There are numerous test cases that verify that SQLite is able to
deal with malformed database files.
These tests first build a well-formed database file, then add
corruption by changing one or more bytes in the file by some means
other than SQLite. Then SQLite is used to read the database.
In some cases, the bytes changes are in the middle of data.
This causes the content of the database to change while keeping the
database well-formed.
In other cases, unused bytes of the file are modified, which has
no effect on the integrity of the database.
The interesting cases are when bytes of the file that
define database structure get changed. The malformed database tests
verify that SQLite finds the file format errors and reports them
using the <a href="rescode.html#corrupt">SQLITE_CORRUPT</a> return code without overflowing
buffers, dereferencing NULL pointers, or performing other
unwholesome actions.</p>
<p>The <a href="testing.html#dbsqlfuzz">dbsqlfuzz</a> fuzzer also does an excellent job of verifying
that SQLite responds sanely to malformed database files.</p>
<h2 id="boundary_value_tests"><span>4.3. </span>Boundary Value Tests</h2>
<p>SQLite defines certain <a href="limits.html">limits</a> on its operation, such as the
maximum number of columns in a table, the maximum length of an
SQL statement, or the maximum value of an integer. The TCL and TH3 test
suites both contains numerous tests that push SQLite right to the edge
of its defined limits and verify that it performs correctly for
all allowed values. Additional tests go beyond the defined limits
and verify that SQLite correctly returns errors. The source code
contains <a href="testing.html#testcase">testcase macros</a> to verify that both sides of each boundary
have been tested.</p>
<a name="regressiontesting"></a>
<h1 id="regression_testing"><span>5. </span>Regression Testing</h1>
<p>Whenever a bug is reported against SQLite, that bug is not considered
fixed until new test cases that would exhibit the bug have been added
to either the TCL or TH3 test suites.
Over the years,
this has resulted in thousands and thousands of new tests.
These regression tests ensure that bugs that have
been fixed in the past are not reintroduced into future versions of
SQLite.</p>
<a name="leakcheck"></a>
<h1 id="automatic_resource_leak_detection"><span>6. </span>Automatic Resource Leak Detection</h1>
<p>Resource leak occurs when system resources
are allocated and never freed. The most troublesome resource leaks
in many applications are memory leaks - when memory is allocated using
malloc() but never released using free(). But other kinds of resources
can also be leaked: file descriptors, threads, mutexes, etc.</p>
<p>Both the TCL and TH3 test harnesses automatically track system
resources and report resource leaks on <u>every</u> test run.
No special configuration or setup is required. The test harnesses
are especially vigilant with regard to memory leaks. If a change
causes a memory leak, the test harnesses will recognize this
quickly. SQLite is designed to never leak memory, even after
an exception such as an OOM error or disk I/O error. The test
harnesses are zealous to enforce this.</p>
<a name="coverage"></a>
<h1 id="test_coverage"><span>7. </span>Test Coverage</h1>
<p>The SQLite core, including the unix <a href="vfs.html">VFS</a>,
has 100% branch test coverage under <a href="th3.html">TH3</a> in
its default configuration as measured by
<a href="http://gcc.gnu.org/onlinedocs/gcc/Gcov.html">gcov</a>.
Extensions such as FTS3 and RTree are excluded from this
analysis.</p>
<a name="stmtvbr"></a>
<h2 id="statement_versus_branch_coverage"><span>7.1. </span>Statement versus branch coverage</h2>
<p>There are many ways to measure test coverage. The most popular
metric is "statement coverage". When you hear someone say that their
program as "XX% test coverage" without further explanation, they usually
mean statement coverage. Statement coverage measures what percentage
of lines of code are executed at least once by the test suite.</p>
<p>Branch coverage is more rigorous than statement coverage. Branch
coverage measures the number of machine-code branch instructions that
are evaluated at least once on both directions.</p>
<p>To illustrate the difference between statement coverage and
branch coverage, consider the following hypothetical
line of C code:</p>
<div class="codeblock"><pre>if( a>b && c!=25 ){ d++; }
</pre></div>
<p>Such a line of C code might generate a dozen separate machine code
instructions. If any one of those instructions is ever evaluated, then
we say that the statement has been tested. So, for example, it might
be the case that the conditional expression is
always false and the "d" variable is
never incremented. Even so, statement coverage counts this line of
code as having been tested.</p>
<p>Branch coverage is more strict. With branch coverage, each test and
each subblock within the statement is considered separately. In order
to achieve 100% branch coverage in the example above, there must be at
least three test cases:</p>
<p></p><ul>
<li> a<=b
</li><li> a>b && c==25
</li><li> a>b && c!=25
</li></ul>
<p>Any one of the above test cases would provide 100% statement coverage
but all three are required for 100% branch coverage. Generally speaking,
100% branch coverage implies 100% statement coverage, but the converse is
not true. To reemphasize, the
<a href="th3.html">TH3</a> test harness for SQLite provides the stronger form of
test coverage - 100% branch test coverage.</p>
<a name="defcode"></a>
<h2 id="coverage_testing_of_defensive_code"><span>7.2. </span>Coverage testing of defensive code</h2>
<p>A well-written C program will typically contain some defensive
conditionals which in practice are always true or always false.
This leads to a
programming dilemma: Does one remove defensive code in order to obtain
100% branch coverage?</p>
<p>In SQLite, the answer to the previous question is "no".
For testing purposes, the SQLite source code defines
macros called ALWAYS() and NEVER(). The ALWAYS() macro
surrounds conditions
which are expected to always evaluate as true and NEVER() surrounds
conditions that are always evaluated to false. These macros serve as
comments to indicate that the conditions are defensive code.
In release builds, these macros are pass-throughs:</p>
<div class="codeblock"><pre>#define ALWAYS(X) (X)
#define NEVER(X) (X)
</pre></div>
<p>During most testing, however, these macros will throw an assertion
fault if their argument does not have the expected truth value. This
alerts the developers quickly to incorrect design assumptions.
</p><div class="codeblock"><pre>#define ALWAYS(X) ((X)?1:assert(0),0)
#define NEVER(X) ((X)?assert(0),1:0)
</pre></div>
<p>When measuring test coverage, these macros are defined to be constant
truth values so that they do not generate assembly language branch
instructions, and hence do not come into play when calculating the
branch coverage:</p>
<div class="codeblock"><pre>#define ALWAYS(X) (1)
#define NEVER(X) (0)
</pre></div>
<p>The test suite is designed to be run three times, once for each of
the ALWAYS() and NEVER() definitions shown above. All three test runs
should yield exactly the same result. There is a run-time test using
the <a href="c3ref/test_control.html">sqlite3_test_control</a>(<a href="c3ref/c_testctrl_always.html">SQLITE_TESTCTRL_ALWAYS</a>, ...) interface that
can be used to verify that the macros are correctly set to the first
form (the pass-through form) for deployment.</p>
<a name="testcase"></a>
<h2 id="forcing_coverage_of_boundary_values_and_boolean_vector_tests"><span>7.3. </span>Forcing coverage of boundary values and boolean vector tests</h2>
<p>Another macro used in conjunction with test coverage measurement is
the <tt>testcase()</tt> macro. The argument is a condition for which
we want test cases that evaluate to both true and false.
In non-coverage builds (that is to say, in release builds) the
<tt>testcase()</tt> macro is a no-op:</p>
<div class="codeblock"><pre>#define testcase(X)
</pre></div>
<p>But in a coverage measuring build, the <tt>testcase()</tt> macro
generates code that evaluates the conditional expression in its argument.
Then during analysis, a check
is made to ensure tests exist that evaluate the conditional to both true
and false. <tt>Testcase()</tt> macros are used, for example, to help verify
that boundary values are tested. For example:</p>
<div class="codeblock"><pre>testcase( a==b );
testcase( a==b+1 );
if( a>b && c!=25 ){ d++; }
</pre></div>
<p>Testcase macros are also used when two or more cases of a switch
statement go to the same block of code, to make sure that the code was
reached for all cases:</p>
<div class="codeblock"><pre>switch( op ){
case OP_Add:
case OP_Subtract: {
testcase( op==OP_Add );
testcase( op==OP_Subtract );
/* ... */
break;
}
/* ... */
}
</pre></div>
<p>For bitmask tests, <tt>testcase()</tt> macros are used to verify that every
bit of the bitmask affects the outcome. For example, in the following block
of code, the condition is true if the mask contains either of two bits
indicating either a MAIN_DB or a TEMP_DB is being opened.
The <tt>testcase()</tt>
macros that precede the if statement verify that both cases are tested:</p>
<div class="codeblock"><pre>testcase( mask & SQLITE_OPEN_MAIN_DB );
testcase( mask & SQLITE_OPEN_TEMP_DB );
if( (mask & (SQLITE_OPEN_MAIN_DB|SQLITE_OPEN_TEMP_DB))!=0 ){ ... }
</pre></div>
<p>The SQLite source code contains 1184
uses of the <tt>testcase()</tt> macro.</p>
<a name="mcdc"></a>
<h2 id="branch_coverage_versus_mc_dc"><span>7.4. </span>Branch coverage versus MC/DC</h2>
<p>Two methods of measuring test coverage were described above:
"statement" and "branch" coverage. There are many other test coverage
metrics besides these two. Another popular metric is "Modified
Condition/Decision Coverage" or MC/DC.
<a href="http://en.wikipedia.org/wiki/Modified_Condition/Decision_Coverage">Wikipedia</a>
defines MC/DC as follows:</p>
<ul>
<li> Each decision tries every possible outcome.
</li><li> Each condition in a decision takes on every possible outcome.
</li><li> Each entry and exit point is invoked.
</li><li> Each condition in a decision is shown to independently
affect the outcome of the decision.
</li></ul>
<p>In the C programming language
where <b><tt>&&</tt></b> and <b><tt>||</tt></b>
are "short-circuit" operators, MC/DC and branch coverage are very nearly
the same thing. The primary difference is in boolean vector tests.
One can test for any of several bits in bit-vector and still obtain
100% branch test coverage even though the second element of MC/DC - the
requirement that each condition in a decision take on every possible outcome -
might not be satisfied.</p>
<p>SQLite uses <tt>testcase()</tt> macros as described in the previous
subsection to make sure that every condition in a bit-vector decision takes
on every possible outcome. In this way, SQLite also achieves 100% MC/DC
in addition to 100% branch coverage.</p>
<h2 id="measuring_branch_coverage"><span>7.5. </span>Measuring branch coverage</h2>
<p>Branch coverage in SQLite is currently measured
using <a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">gcov</a> with the "-b"
option. First the test program is compiled using options
"-g -fprofile-arcs -ftest-coverage" and then the test program is run.
Then "gcov -b" is run to generate a coverage report.
The coverage report is verbose and inconvenient to read,
so the gcov-generated report is processed using
some simple scripts to put it into a more human-friendly format.
This entire process is automated using scripts, of course.
</p><p>Note that running SQLite with gcov is not a test of SQLite —
it is a test of the test suite. The gcov run does not test SQLite because
the -fprofile-args and -ftest-coverage options cause the compiler to
generate different code.
The gcov run merely verifies that the test suite provides 100% branch test
coverage. The gcov run is a test of the test - a meta-test.
</p><p>After gcov has been run to verify 100% branch test coverage,
then the test program is recompiled using delivery compiler options
(without the special -fprofile-arcs and -ftest-coverage options)
and the test program is rerun.
This second run is the actual test of SQLite.
</p><p>It is important to verify that the gcov test run
and the second real test run both give the same output. Any
differences in output indicate either the use of undefined or
indeterminate behavior in the SQLite code (and hence a bug),
or a bug in the compiler.
Note that SQLite has, over the previous decade, encountered bugs
in each of GCC, Clang, and MSVC. Compiler bugs, while rare, do happen,
which is why it is so important to test the code in an as-delivered
configuration.
<a name="mutationtests"></a>
</p><h2 id="mutation_testing"><span>7.6. </span>Mutation testing</h2>
<p>Using gcov (or similar) to show that every branch instruction is taken
at least once in both directions is good measure of test suite quality.
But even better is showing that every branch instruction makes
a difference in the output. In other words, we want to show
not only that every branch instruction both jumps and falls through but also
that every branch is doing useful work and that the test suite is able
to detect and verify that work. When a branch is found that does not
make a difference in the output, that suggests that code associated with
the branch can be removed (reducing the size of the library and perhaps
making it run faster) or that the test suite is inadequately testing the
feature that the branch implements.
</p><p>SQLite strives to verify that every branch instruction makes a difference
using <a href="https://en.wikipedia.org/wiki/Mutation_testing">mutation testing</a>.
<a href="th3.html#muttest">A script</a>
first compiles the SQLite source code into assembly language
(using, for example, the -S option to gcc). Then the script steps through
the generated assembly language and, one by one, changes each branch
instruction into either an unconditional jump or a no-op, compiles the
result, and verifies that the test suite catches the mutation.
</p><p>
Unfortunately, SQLite contains many branch instructions that
help the code run faster without changing the output.
Such branches generate false-positives during mutation testing.
As an example, consider the following
<a href="https://www.sqlite.org/src/artifact/55b5fb474?ln=55-62">hash function</a>
used to accelerate table-name lookup:
</p><div class="codeblock"><pre>55 static unsigned int strHash(const char *z){
56 unsigned int h = 0;
57 unsigned char c;
58 while( (c = (unsigned char)*z++)!=0 ){ /*OPTIMIZATION-IF-TRUE*/
59 h = (h<<3) ^ h ^ sqlite3UpperToLower[c];
60 }
61 return h;
62 }
</pre></div>
<p>
If the branch instruction that implements the "c!=0" test on line 58
is changed into a no-op, then the while-loop will loop forever and the
test suite will fail with a time-out. But if that branch is changed
into an unconditional jump, then the hash function will always return 0.
The problem is that 0 is a valid hash. A hash function that always
returns 0 still works in the sense that SQLite still always gets the correct
answer. The table-name hash table degenerates into a linked-list
and so the table-name lookups that occur while parsing SQL statements
might be a little slower, but the end result will be the same.
</p><p>
To work around this problem, comments of the form
"<code>/*OPTIMIZATION-IF-TRUE*/</code>" and
"<code>/*OPTIMIZATION-IF-FALSE*/</code>" are inserted into the SQLite
source code to tell the mutation testing script to ignore some branch
instructions.
<a name="thoughts1"></a>
</p><h2 id="experience_with_full_test_coverage"><span>7.7. </span>Experience with full test coverage</h2>
<p>The developers of SQLite have found that full coverage testing is an
extremely effective method for locating and preventing bugs.
Because every single branch
instruction in SQLite core code is covered by test cases, the developers
can be confident that changes made in one part of the code
do not have unintended consequences in other parts of the code.
The many new features and performance improvements that have been
added to SQLite in recent years would not have been possible without
the availability of full-coverage testing.</p>
<p>Maintaining 100% MC/DC is laborious and time-consuming.
The level of effort needed to maintain full-coverage testing
is probably not cost effective for a typical application.
However, we think that full-coverage testing is justified for a
<a href="mostdeployed.html">very widely deployed</a> infrastructure library
like SQLite, and especially for a database library which by its very
nature "remembers" past mistakes.
<a name="dynamicanalysis"></a>
</p><h1 id="dynamic_analysis"><span>8. </span>Dynamic Analysis</h1>
<p>Dynamic analysis refers to internal and external checks on the
SQLite code which are performed while the code is live and running.
Dynamic analysis has proven to be a great help in maintaining the
quality of SQLite.</p>
<a name="asserts"></a>
<h2 id="assert"><span>8.1. </span>Assert</h2>
<p>The SQLite core contains 6754 <tt>assert()</tt>
statements that verify function preconditions and postconditions and
loop invariants. Assert() is a macro which is a standard part of
ANSI-C. The argument is a boolean value that is assumed to always be
true. If the assertion is false, the program prints an error message
and halts.</p>
<p>Assert() macros are disabled by compiling with the NDEBUG macro defined.
In most systems, asserts are enabled by default. But in SQLite, the
asserts are so numerous and are in such performance critical places, that
the database engine runs about three times slower when asserts are enabled.
Hence, the default (production) build of SQLite disables asserts.
Assert statements are only enabled when SQLite is compiled with the
SQLITE_DEBUG preprocessor macro defined.</p>
<p>See the <a href="assert.html">Use Of assert in SQLite</a> document
for additional information about how SQLite uses assert().</p>
<a name="valgrind"></a>
<h2 id="valgrind"><span>8.2. </span>Valgrind</h2>
<p><a href="http://valgrind.org/">Valgrind</a> is perhaps the most amazing
and useful developer tool in the world. Valgrind is a simulator - it simulates
an x86 running a Linux binary. (Ports of Valgrind for platforms other
than Linux are in development, but as of this writing, Valgrind only
works reliably on Linux, which in the opinion of the SQLite developers
means that Linux should be the preferred platform for all software development.)
As Valgrind runs a Linux binary, it looks for all kinds of interesting
errors such as array overruns, reading from uninitialized memory,
stack overflows, memory leaks, and so forth. Valgrind finds problems
that can easily slip through all of the other tests run against SQLite.
And, when Valgrind does find an error, it can dump the developer directly
into a symbolic debugger at the exact point where the error occur, to
facilitate a quick fix.</p>
<p>Because it is a simulator, running a binary in Valgrind is slower than
running it on native hardware. (To a first approximation, an application
running in Valgrind on a workstation will perform about the same as it
would running natively on a smartphone.) So it is impractical to run the full
SQLite test suite through Valgrind. However, the veryquick tests and
the coverage of the TH3 tests are run through Valgrind prior to every
release.</p>
<a name="memtesting"></a>
<h2 id="memsys2"><span>8.3. </span>Memsys2</h2>
<p>SQLite contains a pluggable
<a href="malloc.html">memory allocation subsystem</a>.
The default implementation uses system malloc() and free().
However, if SQLite is compiled with <a href="compile.html#memdebug">SQLITE_MEMDEBUG</a>, an alternative
memory allocation wrapper (<a href="malloc.html#memdebug">memsys2</a>)
is inserted that looks for memory allocation
errors at run-time. The memsys2 wrapper checks for memory leaks, of
course, but also looks for buffer overruns, uses of uninitialized memory,
and attempts to use memory after it has been freed. These same checks
are also done by valgrind (and, indeed, Valgrind does them better)
but memsys2 has the advantage of being much faster than Valgrind, which
means the checks can be done more often and for longer tests.</p>
<a name="mutextesting"></a>
<h2 id="mutex_asserts"><span>8.4. </span>Mutex Asserts</h2>
<p>SQLite contains a pluggable mutex subsystem. Depending on
compile-time options, the default mutex system contains interfaces
<a href="c3ref/mutex_held.html">sqlite3_mutex_held()</a> and <a href="c3ref/mutex_held.html">sqlite3_mutex_notheld()</a> that detect
whether or not a particular mutex is held by the calling thread.
These two interfaces are used extensively within assert() statements
in SQLite to verify mutexes are held and released at all the right
moments, in order to double-check that SQLite does work correctly
in multi-threaded applications.</p>
<a name="journaltest"></a>
<h2 id="journal_tests"><span>8.5. </span>Journal Tests</h2>
<p>One of the things that SQLite does to ensure that transactions
are atomic across system crashes and power failures is to write
all changes into the rollback journal file prior to changing the
database. The TCL test harness contains an alternative
<a href="vfs.html">OS backend</a> implementation that helps to
verify this is occurring correctly. The "journal-test VFS" monitors
all disk I/O traffic between the database file and rollback journal,
checking to make sure that nothing is written into the database
file which has not first been written and synced to the rollback journal.
If any discrepancies are found, an assertion fault is raised.</p>
<p>The journal tests are an additional double-check over and above
the crash tests to make sure that SQLite transactions will be atomic
across system crashes and power failures.</p>
<a name="intoverflow"></a>
<h2 id="undefined_behavior_checks"><span>8.6. </span>Undefined Behavior Checks</h2>
<p>In the C programming language, it is very easy to write code that
has "undefined" or "implementation defined" behavior.
That means that the code might work during development, but then give
a different answer on a different system, or when recompiled using different
compiler options.
Examples of undefined and implementation-defined behavior in
ANSI C include:
</p><ul>
<li>Signed integer overflow. (Signed integer overflow does <u>not</u>
necessarily wrap around, as most people expect.)
</li><li>Shifting an N-bit integer by more than N bits.
</li><li>Shifting by a negative amount.
</li><li>Shifting a negative number.
</li><li>Using the memcpy() function on overlapping buffers.
</li><li>The order of evaluation of function arguments.
</li><li>Whether or not "char" variables are signed or unsigned.
</li><li>And so forth....
</li></ul>
<p>Since undefined and implementation-defined behavior is non-portable
and can easily lead to incorrect answers, SQLite works very hard to avoid it.
For example,
when adding two integer column values together as part of an SQL statement,
SQLite does not simply add them together using the C-language "+" operator.
Instead, it first checks to make sure the
addition will not overflow, and if it will, it does the addition using
floating point instead.
</p><p>To help ensure that SQLite does not make use of undefined or
implementation defined behavior, the test suites are rerun using
instrumented builds that try to detect undefined behavior. For example,
test suites are run using the "-ftrapv" option of GCC. And they
are run again using the "-fsanitize=undefined" option on Clang. And
again using the "/RTC1" option in MSVC. Then the test suites are rerun
using options like "-funsigned-char" and "-fsigned-char" to make sure
that implementation differences do not matter either. Tests are then repeated
on 32-bit and 64-bit systems and on big-endian and little-endian systems,
using a variety of CPU architectures.
Furthermore, the test suites are augmented with many test cases that are
deliberately designed to provoke undefined behavior. For example:
"<b>SELECT -1*(-9223372036854775808);</b>".
<a name="disopttest"></a>
</p><h1 id="disabled_optimization_tests"><span>9. </span>Disabled Optimization Tests</h1>
<p>The <a href="c3ref/test_control.html">sqlite3_test_control</a>(<a href="c3ref/c_testctrl_always.html">SQLITE_TESTCTRL_OPTIMIZATIONS</a>, ...) interface
allows selected SQL statement optimizations to be disabled at run-time.
SQLite should always generate exactly the same answer with optimizations
enabled and with optimizations disabled; the answer simply arrives quicker
with the optimizations turned on. So in a production environment, one always
leaves the optimizations turned on (the default setting).</p>
<p>One verification technique used on SQLite is to run an entire test suite
twice, once with optimizations left on and a second time with optimizations
turned off, and verify that the same output is obtained both times. This
shows that the optimizations do not introduce errors.</p>
<p>Not all test cases can be handled this way. Some test cases check
to verify that the optimizations really are reducing the amount of
computation by counting the number of disk accesses, sort operations,
full-scan steps, or other processing steps that occur during queries.
Those test cases will appear to fail when optimizations are disabled.
But the majority of test cases simply check that the correct answer
was obtained, and all of those cases can be run successfully with and
without the optimizations, in order to show that the optimizations do not
cause malfunctions.</p>
<a name="cklist"></a>
<h1 id="checklists"><span>10. </span>Checklists</h1>
<p>The SQLite developers use an on-line checklist to coordinate testing
activity and to verify that all tests pass prior each SQLite release.
<a href="https://www.sqlite.org/checklists/index.html">Past checklists</a>
are retained for historical reference.
(The checklists are read-only for anonymous internet viewers, but
developers can log in and update checklist items in their web
browsers.)
The use of checklists for SQLite testing and other development activities
is inspired by <i>
<a href="http://atulgawande.com/book/the-checklist-manifesto/">The Checklist Manifesto</a>
</i>.</p>
<p>The latest checklists contain approximately 200 items that are
individually verified for each release. Some checklist items only take
a few seconds to verify and mark off. Others involve test suites
that run for many hours.</p>
<p>The release checklist is not automated: developers run each item on
the checklist manually. We find that it is important to keep a human in
the loop. Sometimes problems are found while running a checklist item
even though the test itself passed. It is important to have a human
reviewing the test output at the highest level, and constantly asking
"Is this really right?"</p>
<p>The release checklist is continuously evolving. As new problems or
potential problems are discovered, new checklist items are added to
make sure those problems do not appear in subsequent releases. The
release checklist has proven to be an invaluable tool in helping to
ensure that nothing is overlooked during the release process.</p>
<a name="staticanalysis"></a>
<h1 id="static_analysis"><span>11. </span>Static Analysis</h1>
<p>Static analysis means analyzing source code at compile-time to
check for correctness. Static analysis includes compiler
warning messages and more in-depth analysis engines such as the
<a href="http://clang-analyzer.llvm.org/">Clang Static Analyzer</a>.
SQLite compiles without warnings on GCC and Clang using
the -Wall and -Wextra flags on Linux and Mac and on MSVC on Windows.
No valid warnings are generated by the Clang Static Analyzer tool "scan-build"
either (though recent versions of clang seem to generate many false-positives.)
Nevertheless, some warnings might be generated by other
static analyzers. Users are encouraged not to stress over these
warnings and to instead take solace in the intense testing of SQLite
described above.
</p>
<p>Static analysis has not been helpful in finding
bugs in SQLite. Static analysis has found a few bugs in SQLite, but
those are the exceptions. More bugs have been
introduced into SQLite while trying to get it to compile without
warnings than have been found by static analysis.</p>
<a name="summary"></a>
<h1 id="summary"><span>12. </span>Summary</h1>
<p>SQLite is open source. This gives many people the idea that
it is not well tested as commercial software and is perhaps unreliable.
But that impression is false.
SQLite has exhibited very high reliability in the field and
a very low defect rate, especially considering how rapidly it is evolving.
The quality of SQLite is achieved in part by careful code design and
implementation. But extensive testing also plays a vital role in
maintaining and improving the quality of SQLite. This document has
summarized the testing procedures that every release of SQLite undergoes
with the hope of inspiring confidence that SQLite is
suitable for use in mission-critical applications.</p>
|