For good performance using the BDB backend, a good DB_CONFIG file in the database directory (usually /var/lib/ldap) is crucial. The following two articles should help you to determine a good configuration for your requirements. A standard DB_CONFIG is installed but it may not be adequate for your system. The current version of OpenLDAP supports putting DB_CONFIG parameters into slapd.conf instead by prefixing those options with dbconfig. See the slapd-bdb(5) man page for more information. If there is no DB_CONFIG file when slapd starts and there are dbconfig lines in slapd.conf, slapd will write out a DB_CONFIG file with those settings before initializing the database. With the current version of OpenLDAP, any changes to DB_CONFIG will take effect automatically after restarting slapd. Running db_recover is no longer required. -- Torsten Landschoff Sun, 29 May 2005 18:08:10 +0200 Russ Allbery Fri, 01 Jun 2007 23:57:33 -0700 How do I configure the BDB backend? ----------------------------------- (Taken from http://www.openldap.org/faq/data/cache/893.html, author unknown) The BDB backend ("back-bdb") uses a lot of special features of Sleepycat's Berkeley DB library, and there are a lot of details that must be set correctly to get the best results from it. Even though the LDBM backend ("back-ldbm") can use the BerkeleyDB library, the BDB and LDBM backends have some very important differences, as already noted in (Xref) What are the different backends? What are their differences?. Because back-bdb is transaction-based and uses write-ahead logging to ensure data consistency, it has much heavier I/O demands than back-ldbm. Also, the transaction log files accumulate as data is written to the directory, and these log files must be cleaned out periodically. Otherwise the log files will consume enormous amounts of disk space. The cleanup procedures are described in (Xref) How to maintain Berkeley DB (logs etc.) ?. The information needed to fully understand things and to properly configure back-bdb is divided among the slapd-bdb(5) manual page and the SleepyCat BerkeleyDB documentation (http://www.sleepycat.com/docs/). You should read the entire slapd-bdb(5) manpage before proceeding. The only mandatory keyword is "directory" for setting the location of the database files. The other keywords control tradeoffs between data reliability, performance, and memory use. To ensure that committed transactions actually get flushed to disk, you should use the "checkpoint" keyword, otherwise your data is vulnerable to loss due to system failures. See the SleepyCat documentation for more information about checkpoints. (In fact, you should read all of chapter 9 "Berkeley DB Transactional Data Store Applications" in the SleepyCat reference manual. At least, read sections 1-3 and 13-24.) The "dbnosync" keyword is provided for compatibility with back-ldbm; the preferred method of setting this is to use the BDB DB_CONFIG file option set_flags DB_TXN_NOSYNC. The "lockdetect" keyword is also deprecated, you should instead use the BDB DB_CONFIG file set_lk_detect keyword. (It's safe to leave this at the default setting.) A number of important items must be configured in the BDB DB_CONFIG file and not in slapd.conf. You should, at least, read about these items: set_cachesize The BDB library maintains its own cache separate from the back-bdb entry cache. You must set this cache to a size appropriate for your database and physical memory size. Note that this is a persistent setting - after you set it the first time, further changes will be ignored until you recreate the environment using db_recover. set_lg_dir Set the directory for storing transaction logs. For best performance, the transaction logs must be located on a different physical disk from the database files. set_lg_bsize Set the buffer size for the transaction log. Larger is better, but it doesn't have much effect unless you're also using the DB_TXN_NOSYNC option. With a default log file size of 10MB I usually set this to 2MB. The default is only 32K, which is too small for back-bdb. On a very busy system you might see error messages talking about running out of locks, lockers, or lock objects. Usually the default values are plenty, and in older versions of the BDB library the errors were more likely due to library bugs than actual system load. However, it is possible that you have actually run out of lock resources due to heavy system usage. If this happens, you should read about the set_lk_max_lockers, set_lk_max_locks, and set_lk_max_objects keywords. How do I determine the proper BDB/HDB database cache size? ---------------------------------------------------------- (Taken from http://www.openldap.org/faq/data/cache/1075.html, written by hyc@openldap.org, Kurt@OpenLDAP.org) Not having a proper database cache size will cause performance issues. (Note: in older versions of Berkeley DB, an improper database case size could also cause the server to hang.) These issues are not an indication of corruption occurring in the database. It is merely the fact that the cache is thrashing itself that causes performance/response time to slowdown. If you take the time to read and understand the Berkeley DB documentation, measure the library performance using db_stat, and tune your settings, you will avoid these problems. It is not absolutely necessary to configure a BerkeleyDB cache equal in size to your entire database. All that you need is a cache that's large enough for your "working set." That means, large enough to hold all of the most frequently accessed data, plus a few less-frequently accessed items. You should really read the BDB documentation referenced above, but let me spell out what that really means here, in detail. The discussion here is focused on back-bdb and back-hdb, but most of it also applies to back-ldbm when using BerkeleyDB as the underlying database engine. Start with the most obvious - the back-bdb database lives in two main files, dn2id.bdb and id2entry.bdb. These are B-tree databases. We have never documented the back-bdb internal layout before, because it didn't seem like something anyone should have to worry about, nor was it necessarily cast in stone. But here's how it works today, in OpenLDAP 2.1 and 2.2. (All of the database files in back-ldbm are B-trees by default.) A B-tree is a balanced tree; it stores data in its leaf nodes and bookkeeping data in its interior nodes. (If you don't know what tree data structures look like in general, Google for some references, because that's getting far too elementary for the purposes of this discussion.) For decent performance, you need enough cache memory to contain all the nodes along the path from the root of the tree down to the particular data item you're accessing. That's enough cache for a single search. For the general case, you want enough cache to contain all the internal nodes in the database. "db_stat -d" will tell you how many internal pages are present in a database. You should check this number for both dn2id and id2entry. Also note that id2entry always uses 16KB per "page", while dn2id uses whatever the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing the cache and triggering these infinite hang bugs in BDB 4.1.25, your cache must be at least as large as the number of internal pages in both the dn2id and id2entry databases, plus some extra space to accomodate the actual leaf data pages. For example, in my OpenLDAP 2.2 test database, I have an input LDIF file that's about 360MB. With the back-hdb backend this creates a dn2id.bdb that's 68MB, and an id2entry that's 800MB. db_stat tells me that dn2id uses 4KB pages, has 433 internal pages, and 6378 leaf pages. The id2entry uses 16KB pages, has 52 internal pages, and 45912 leaf pages. In order to efficiently retrieve any single entry in this database, the cache should be at least (433+1) * 4KB + (52+1) * 16KB in size: 1736KB + 848KB =~ 2.5MB. This doesn't take into account other library overhead, so this is even lower than the barest minimum. The default cache size, when nothing is configured, is only 256KB. If you tried to do much of anything with this database and only default settings, BDB 4.1.25 would lock up in an infinite loop. This 2.5MB number also doesn't take indexing into account. Each indexed attribute uses another database file of its own, using a Hash structure. (Again, in back-ldbm, the indexes also use B-trees by default, so this part of the discussion doesn't apply unless back-ldbm was explicitly compiled to use Hashes instead. Also, in OpenLDAP 2.2 onward, all of the indexes use B-trees, there are no more Hash database files. So just use the B-tree information above and ignore this Hash discussion.) Unlike the B-trees, where you only need to touch one data page to find an entry of interest, doing an index lookup generally touches multiple keys, and the point of a hash structure is that the keys are evenly distributed across the data space. That means there's no convenient compact subset of the database that you can keep in the cache to insure quick operation, you can pretty much expect references to be scattered across the whole thing. My strategy here would be to provide enough cache for at least 50% of all of the hash data. (Number of hash buckets + number of overflow pages + number of duplicate pages) * page size / 2. The objectClass index for my example database is 5.9MB and uses 3 hash buckets and 656 duplicate pages. So ( 3 + 656 ) * 4KB / 2 =~ 1.3MB. With only this index enabled, I'd figure at least a 4MB cache for this backend. (Of course you're using a single cache shared among all of the database files, so the cache pages will most likely get used for something other than what you accounted for, but this gives you a fighting chance.) With this 4MB cache I can slapcat this entire database on my 1.3GHz PIII in 1 minute, 40 seconds. With the cache doubled to 8MB, it still takes the same 1:40s. Once you've got enough cache to fit the B-tree internal pages, increasing it further won't have any effect until the cache really is large enough to hold 100% of the data pages. I don't have enough free RAM to hold all the 800MB id2entry data, so 4MB is good enough. With back-bdb and back-hdb you can use "db_stat -m" to check how well the database cache is performing. Unfortunately you can't do this with back-ldbm, as the statistics are not accessible when slapd is running, nor are they saved anywhere when slapd is stopped. (Yet another reason not to use back-ldbm.)