tdb - a trivial database system tridge@linuxcare.com December 1999 ================================== This is a simple database API. It was inspired by the realisation that in Samba we have several ad-hoc bits of code that essentially implement small databases for sharing structures between parts of Samba. As I was about to add another I realised that a generic database module was called for to replace all the ad-hoc bits. I based the interface on gdbm. I couldn't use gdbm as we need to be able to have multiple writers to the databases at one time. Compilation ----------- add HAVE_MMAP=1 to use mmap instead of read/write add NOLOCK=1 to disable locking code Testing ------- Compile tdbtest.c and link with gdbm for testing. tdbtest will perform identical operations via tdb and gdbm then make sure the result is the same Also included is tdbtool, which allows simple database manipulation on the commandline. tdbtest and tdbtool are not built as part of Samba, but are included for completeness. Interface --------- The interface is very similar to gdbm except for the following: - different open interface. The tdb_open call is more similar to a traditional open() - no tdbm_reorganise() function - no tdbm_sync() function. No operations are cached in the library anyway - added a tdb_traverse() function for traversing the whole database - added transactions support A general rule for using tdb is that the caller frees any returned TDB_DATA structures. Just call free(p.dptr) to free a TDB_DATA return value called p. This is the same as gdbm. here is a full list of tdb functions with brief descriptions. ---------------------------------------------------------------------- TDB_CONTEXT *tdb_open(char *name, int hash_size, int tdb_flags, int open_flags, mode_t mode) open the database, creating it if necessary The open_flags and mode are passed straight to the open call on the database file. A flags value of O_WRONLY is invalid The hash size is advisory, use zero for a default value. return is NULL on error possible tdb_flags are: TDB_CLEAR_IF_FIRST - clear database if we are the only one with it open TDB_INTERNAL - don't use a file, instead store the data in memory. The filename is ignored in this case. TDB_NOLOCK - don't do any locking TDB_NOMMAP - don't use mmap TDB_NOSYNC - don't synchronise transactions to disk TDB_SEQNUM - maintain a sequence number TDB_VOLATILE - activate the per-hashchain freelist, default 5 TDB_ALLOW_NESTING - allow transactions to nest TDB_DISALLOW_NESTING - disallow transactions to nest ---------------------------------------------------------------------- TDB_CONTEXT *tdb_open_ex(char *name, int hash_size, int tdb_flags, int open_flags, mode_t mode, const struct tdb_logging_context *log_ctx, tdb_hash_func hash_fn) This is like tdb_open(), but allows you to pass an initial logging and hash function. Be careful when passing a hash function - all users of the database must use the same hash function or you will get data corruption. ---------------------------------------------------------------------- char *tdb_error(TDB_CONTEXT *tdb); return a error string for the last tdb error ---------------------------------------------------------------------- int tdb_close(TDB_CONTEXT *tdb); close a database ---------------------------------------------------------------------- TDB_DATA tdb_fetch(TDB_CONTEXT *tdb, TDB_DATA key); fetch an entry in the database given a key if the return value has a null dptr then a error occurred caller must free the resulting data ---------------------------------------------------------------------- int tdb_parse_record(struct tdb_context *tdb, TDB_DATA key, int (*parser)(TDB_DATA key, TDB_DATA data, void *private_data), void *private_data); Hand a record to a parser function without allocating it. This function is meant as a fast tdb_fetch alternative for large records that are frequently read. The "key" and "data" arguments point directly into the tdb shared memory, they are not aligned at any boundary. WARNING: The parser is called while tdb holds a lock on the record. DO NOT call other tdb routines from within the parser. Also, for good performance you should make the parser fast to allow parallel operations. tdb_parse_record returns -1 if the record was not found. If the record was found, the return value of "parser" is passed up to the caller. ---------------------------------------------------------------------- int tdb_exists(TDB_CONTEXT *tdb, TDB_DATA key); check if an entry in the database exists note that 1 is returned if the key is found and 0 is returned if not found this doesn't match the conventions in the rest of this module, but is compatible with gdbm ---------------------------------------------------------------------- int tdb_traverse(TDB_CONTEXT *tdb, int (*fn)(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA dbuf, void *state), void *state); traverse the entire database - calling fn(tdb, key, data, state) on each element. return -1 on error or the record count traversed if fn is NULL then it is not called a non-zero return value from fn() indicates that the traversal should stop. Traversal callbacks may not start transactions. WARNING: The data buffer given to the callback fn does NOT meet the alignment restrictions malloc gives you. ---------------------------------------------------------------------- int tdb_traverse_read(TDB_CONTEXT *tdb, int (*fn)(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA dbuf, void *state), void *state); traverse the entire database - calling fn(tdb, key, data, state) on each element, but marking the database read only during the traversal, so any write operations will fail. This allows tdb to use read locks, which increases the parallelism possible during the traversal. return -1 on error or the record count traversed if fn is NULL then it is not called a non-zero return value from fn() indicates that the traversal should stop. Traversal callbacks may not start transactions. ---------------------------------------------------------------------- TDB_DATA tdb_firstkey(TDB_CONTEXT *tdb); find the first entry in the database and return its key the caller must free the returned data ---------------------------------------------------------------------- TDB_DATA tdb_nextkey(TDB_CONTEXT *tdb, TDB_DATA key); find the next entry in the database, returning its key the caller must free the returned data ---------------------------------------------------------------------- int tdb_delete(TDB_CONTEXT *tdb, TDB_DATA key); delete an entry in the database given a key ---------------------------------------------------------------------- int tdb_store(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA dbuf, int flag); store an element in the database, replacing any existing element with the same key If flag==TDB_INSERT then don't overwrite an existing entry If flag==TDB_MODIFY then don't create a new entry return 0 on success, -1 on failure ---------------------------------------------------------------------- int tdb_writelock(TDB_CONTEXT *tdb); lock the database. If we already have it locked then don't do anything ---------------------------------------------------------------------- int tdb_writeunlock(TDB_CONTEXT *tdb); unlock the database ---------------------------------------------------------------------- int tdb_chainlock(TDB_CONTEXT *tdb, TDB_DATA key); lock one hash chain. This is meant to be used to reduce locking contention - it cannot guarantee how many records will be locked ---------------------------------------------------------------------- int tdb_chainunlock(TDB_CONTEXT *tdb, TDB_DATA key); unlock one hash chain ---------------------------------------------------------------------- int tdb_transaction_start(TDB_CONTEXT *tdb) start a transaction. All operations after the transaction start can either be committed with tdb_transaction_commit() or cancelled with tdb_transaction_cancel(). If you call tdb_transaction_start() again on the same tdb context while a transaction is in progress, then the same transaction buffer is re-used. The number of tdb_transaction_{commit,cancel} operations must match the number of successful tdb_transaction_start() calls. Note that transactions are by default disk synchronous, and use a recover area in the database to automatically recover the database on the next open if the system crashes during a transaction. You can disable the synchronous transaction recovery setup using the TDB_NOSYNC flag, which will greatly speed up operations at the risk of corrupting your database if the system crashes. Operations made within a transaction are not visible to other users of the database until a successful commit. ---------------------------------------------------------------------- int tdb_transaction_cancel(TDB_CONTEXT *tdb) cancel a current transaction, discarding all write and lock operations that have been made since the transaction started. ---------------------------------------------------------------------- int tdb_transaction_commit(TDB_CONTEXT *tdb) commit a current transaction, updating the database and releasing the transaction locks. ---------------------------------------------------------------------- int tdb_transaction_prepare_commit(TDB_CONTEXT *tdb) prepare to commit a current transaction, for two-phase commits. Once prepared for commit, the only allowed calls are tdb_transaction_commit() or tdb_transaction_cancel(). Preparing allocates disk space for the pending updates, so a subsequent commit should succeed (barring any hardware failures). ---------------------------------------------------------------------- int tdb_check(TDB_CONTEXT *tdb, int (*check)(TDB_DATA key, TDB_DATA data, void *private_data), void *private_data);) check the consistency of the database, calling back the check function (if non-NULL) with each record. If some consistency check fails, or the supplied check function returns -1, tdb_check returns -1, otherwise 0. Note that logging function (if set) will be called with additional information on the corruption found.