diff options
Diffstat (limited to 'debian/README.DB_CONFIG')
-rw-r--r-- | debian/README.DB_CONFIG | 187 |
1 files changed, 187 insertions, 0 deletions
diff --git a/debian/README.DB_CONFIG b/debian/README.DB_CONFIG new file mode 100644 index 0000000..f8ee5f1 --- /dev/null +++ b/debian/README.DB_CONFIG @@ -0,0 +1,187 @@ +For good performance using the BDB backend, a good DB_CONFIG file in the +database directory (usually /var/lib/ldap) is crucial. The following two +articles should help you to determine a good configuration for your +requirements. A standard DB_CONFIG is installed but it may not be adequate +for your system. + +The current version of OpenLDAP supports putting DB_CONFIG parameters into +slapd.conf instead by prefixing those options with dbconfig. See the +slapd-bdb(5) man page for more information. If there is no DB_CONFIG file +when slapd starts and there are dbconfig lines in slapd.conf, slapd will +write out a DB_CONFIG file with those settings before initializing the +database. + +With the current version of OpenLDAP, any changes to DB_CONFIG will take +effect automatically after restarting slapd. Running db_recover is no +longer required. + + -- Torsten Landschoff <torsten@debian.org> Sun, 29 May 2005 18:08:10 +0200 + Russ Allbery <rra@debian.org> Fri, 01 Jun 2007 23:57:33 -0700 + +How do I configure the BDB backend? +----------------------------------- +(Taken from http://www.openldap.org/faq/data/cache/893.html, author unknown) + +The BDB backend ("back-bdb") uses a lot of special features of Sleepycat's +Berkeley DB library, and there are a lot of details that must be set correctly +to get the best results from it. Even though the LDBM backend ("back-ldbm") can +use the BerkeleyDB library, the BDB and LDBM backends have some very important +differences, as already noted in (Xref) What are the different backends? What +are their differences?. + +Because back-bdb is transaction-based and uses write-ahead logging to ensure +data consistency, it has much heavier I/O demands than back-ldbm. Also, the +transaction log files accumulate as data is written to the directory, and these +log files must be cleaned out periodically. Otherwise the log files will +consume enormous amounts of disk space. The cleanup procedures are described in +(Xref) How to maintain Berkeley DB (logs etc.) ?. + +The information needed to fully understand things and to properly configure +back-bdb is divided among the slapd-bdb(5) manual page and the SleepyCat +BerkeleyDB documentation (http://www.sleepycat.com/docs/). + +You should read the entire slapd-bdb(5) manpage before proceeding. The only +mandatory keyword is "directory" for setting the location of the database +files. The other keywords control tradeoffs between data reliability, +performance, and memory use. To ensure that committed transactions actually get +flushed to disk, you should use the "checkpoint" keyword, otherwise your data +is vulnerable to loss due to system failures. See the SleepyCat documentation +for more information about checkpoints. (In fact, you should read all of +chapter 9 "Berkeley DB Transactional Data Store Applications" in the SleepyCat +reference manual. At least, read sections 1-3 and 13-24.) + +The "dbnosync" keyword is provided for compatibility with back-ldbm; the +preferred method of setting this is to use the BDB DB_CONFIG file option +set_flags DB_TXN_NOSYNC. The "lockdetect" keyword is also deprecated, you +should instead use the BDB DB_CONFIG file set_lk_detect keyword. (It's safe to +leave this at the default setting.) + +A number of important items must be configured in the BDB DB_CONFIG file and +not in slapd.conf. You should, at least, read about these items: + +set_cachesize + The BDB library maintains its own cache separate from the back-bdb entry + cache. You must set this cache to a size appropriate for your database and + physical memory size. Note that this is a persistent setting - after you + set it the first time, further changes will be ignored until you recreate + the environment using db_recover. +set_lg_dir + Set the directory for storing transaction logs. For best performance, + the transaction logs must be located on a different physical disk from + the database files. +set_lg_bsize + Set the buffer size for the transaction log. Larger is better, but it + doesn't have much effect unless you're also using the DB_TXN_NOSYNC + option. With a default log file size of 10MB I usually set this to 2MB. + The default is only 32K, which is too small for back-bdb. + +On a very busy system you might see error messages talking about running out of +locks, lockers, or lock objects. Usually the default values are plenty, and in +older versions of the BDB library the errors were more likely due to library +bugs than actual system load. However, it is possible that you have actually +run out of lock resources due to heavy system usage. If this happens, you +should read about the set_lk_max_lockers, set_lk_max_locks, and +set_lk_max_objects keywords. + +How do I determine the proper BDB/HDB database cache size? +---------------------------------------------------------- +(Taken from http://www.openldap.org/faq/data/cache/1075.html, written by +hyc@openldap.org, Kurt@OpenLDAP.org) + +Not having a proper database cache size will cause performance issues. (Note: +in older versions of Berkeley DB, an improper database case size could also +cause the server to hang.) + +These issues are not an indication of corruption occurring in the database. It +is merely the fact that the cache is thrashing itself that causes +performance/response time to slowdown. If you take the time to read and +understand the Berkeley DB documentation, measure the library performance using +db_stat, and tune your settings, you will avoid these problems. + +It is not absolutely necessary to configure a BerkeleyDB cache equal in size to +your entire database. All that you need is a cache that's large enough for your +"working set." That means, large enough to hold all of the most frequently +accessed data, plus a few less-frequently accessed items. + +You should really read the BDB documentation referenced above, but let me spell +out what that really means here, in detail. The discussion here is focused on +back-bdb and back-hdb, but most of it also applies to back-ldbm when using +BerkeleyDB as the underlying database engine. + +Start with the most obvious - the back-bdb database lives in two main files, +dn2id.bdb and id2entry.bdb. These are B-tree databases. We have never +documented the back-bdb internal layout before, because it didn't seem like +something anyone should have to worry about, nor was it necessarily cast in +stone. But here's how it works today, in OpenLDAP 2.1 and 2.2. (All of the +database files in back-ldbm are B-trees by default.) + +A B-tree is a balanced tree; it stores data in its leaf nodes and bookkeeping +data in its interior nodes. (If you don't know what tree data structures look +like in general, Google for some references, because that's getting far too +elementary for the purposes of this discussion.) + +For decent performance, you need enough cache memory to contain all the nodes +along the path from the root of the tree down to the particular data item +you're accessing. That's enough cache for a single search. For the general +case, you want enough cache to contain all the internal nodes in the database. +"db_stat -d" will tell you how many internal pages are present in a database. +You should check this number for both dn2id and id2entry. + +Also note that id2entry always uses 16KB per "page", while dn2id uses whatever +the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing the +cache and triggering these infinite hang bugs in BDB 4.1.25, your cache must be +at least as large as the number of internal pages in both the dn2id and +id2entry databases, plus some extra space to accomodate the actual leaf data +pages. + +For example, in my OpenLDAP 2.2 test database, I have an input LDIF file that's +about 360MB. With the back-hdb backend this creates a dn2id.bdb that's 68MB, +and an id2entry that's 800MB. db_stat tells me that dn2id uses 4KB pages, has +433 internal pages, and 6378 leaf pages. The id2entry uses 16KB pages, has 52 +internal pages, and 45912 leaf pages. In order to efficiently retrieve any +single entry in this database, the cache should be at least + +(433+1) * 4KB + (52+1) * 16KB in size: 1736KB + 848KB =~ 2.5MB. + +This doesn't take into account other library overhead, so this is even lower +than the barest minimum. The default cache size, when nothing is configured, is +only 256KB. If you tried to do much of anything with this database and only +default settings, BDB 4.1.25 would lock up in an infinite loop. + +This 2.5MB number also doesn't take indexing into account. Each indexed +attribute uses another database file of its own, using a Hash structure. +(Again, in back-ldbm, the indexes also use B-trees by default, so this part of +the discussion doesn't apply unless back-ldbm was explicitly compiled to use +Hashes instead. Also, in OpenLDAP 2.2 onward, all of the indexes use B-trees, +there are no more Hash database files. So just use the B-tree information above +and ignore this Hash discussion.) + +Unlike the B-trees, where you only need to touch one data page to find an entry +of interest, doing an index lookup generally touches multiple keys, and the +point of a hash structure is that the keys are evenly distributed across the +data space. That means there's no convenient compact subset of the database +that you can keep in the cache to insure quick operation, you can pretty much +expect references to be scattered across the whole thing. My strategy here +would be to provide enough cache for at least 50% of all of the hash data. +(Number of hash buckets + number of overflow pages + number of duplicate pages) +* page size / 2. + +The objectClass index for my example database is 5.9MB and uses 3 hash buckets +and 656 duplicate pages. So ( 3 + 656 ) * 4KB / 2 =~ 1.3MB. + +With only this index enabled, I'd figure at least a 4MB cache for this backend. +(Of course you're using a single cache shared among all of the database files, +so the cache pages will most likely get used for something other than what you +accounted for, but this gives you a fighting chance.) + +With this 4MB cache I can slapcat this entire database on my 1.3GHz PIII in 1 +minute, 40 seconds. With the cache doubled to 8MB, it still takes the same +1:40s. Once you've got enough cache to fit the B-tree internal pages, +increasing it further won't have any effect until the cache really is large +enough to hold 100% of the data pages. I don't have enough free RAM to hold all +the 800MB id2entry data, so 4MB is good enough. + +With back-bdb and back-hdb you can use "db_stat -m" to check how well the +database cache is performing. Unfortunately you can't do this with back-ldbm, +as the statistics are not accessible when slapd is running, nor are they saved +anywhere when slapd is stopped. (Yet another reason not to use back-ldbm.) |