548 lines
21 KiB
HTML
548 lines
21 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>BTree Configuration</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
|
||
<link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
|
||
<link rel="up" href="dbconfig.html" title="Chapter 6. Database Configuration" />
|
||
<link rel="previous" href="cachesize.html" title="Selecting the Cache Size" />
|
||
</head>
|
||
<body>
|
||
<div class="navheader">
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">BTree Configuration</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a> </td>
|
||
<th width="60%" align="center">Chapter 6. Database Configuration</th>
|
||
<td width="20%" align="right"> </td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
In going through the previous chapters in this book, you may notice that
|
||
we touch on some topics that are specific to BTree, but we do not cover
|
||
those topics in any real detail. In this section, we will discuss
|
||
configuration issues that are unique to BTree.
|
||
</p>
|
||
<p>
|
||
Specifically, in this section we describe:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Allowing duplicate records.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Setting comparator callbacks.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
BTree databases can contain duplicate records. One record is
|
||
considered to be a duplicate of another when both records use keys
|
||
that compare as equal to one another.
|
||
</p>
|
||
<p>
|
||
By default, keys are compared using a lexicographical comparison,
|
||
with shorter keys collating higher than longer keys.
|
||
You can override this default using the
|
||
|
||
<tt class="methodname">Db::set_bt_compare()</tt>
|
||
|
||
method. See the next section for details.
|
||
</p>
|
||
<p>
|
||
By default, DB databases do not allow duplicate records. As a
|
||
result, any attempt to write a record that uses a key equal to a
|
||
previously existing record results in the previously existing record
|
||
being overwritten by the new record.
|
||
</p>
|
||
<p>
|
||
Allowing duplicate records is useful if you have a database that
|
||
contains records keyed by a commonly occurring piece of information.
|
||
It is frequently necessary to allow duplicate records for secondary
|
||
databases.
|
||
</p>
|
||
<p>
|
||
For example, suppose your primary database contained records related
|
||
to automobiles. You might in this case want to be able to find all
|
||
the automobiles in the database that are of a particular color, so
|
||
you would index on the color of the automobile. However, for any
|
||
given color there will probably be multiple automobiles. Since the
|
||
index is the secondary key, this means that multiple secondary
|
||
database records will share the same key, and so the secondary
|
||
database must support duplicate records.
|
||
</p>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Duplicate records can be stored in sorted or unsorted order.
|
||
You can cause DB to automatically sort your duplicate
|
||
records by
|
||
<span>
|
||
specifying the <tt class="literal">DB_DUPSORT</tt> flag at
|
||
database creation time.
|
||
</span>
|
||
|
||
</p>
|
||
<p>
|
||
If sorted duplicates are supported, then the
|
||
<span>
|
||
sorting function specified on
|
||
|
||
<tt class="methodname">Db::set_dup_compare()</tt>
|
||
</span>
|
||
|
||
is used to determine the location of the duplicate record in its
|
||
duplicate set. If no such function is provided, then the default
|
||
lexicographical comparison is used.
|
||
</p>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
For performance reasons, BTrees should always contain sorted
|
||
records. (BTrees containing unsorted entries must potentially
|
||
spend a great deal more time locating an entry than does a BTree
|
||
that contains sorted entries). That said, DB provides support
|
||
for suppressing automatic sorting of duplicate records because it may be that
|
||
your application is inserting records that are already in a
|
||
sorted order.
|
||
</p>
|
||
<p>
|
||
That is, if the database is configured to support unsorted
|
||
duplicates, then the assumption is that your application
|
||
will manually perform the sorting. In this event,
|
||
expect to pay a significant performance penalty. Any time you
|
||
place records into the database in a sort order not know to
|
||
DB, you will pay a performance penalty
|
||
</p>
|
||
<p>
|
||
That said, this is how DB behaves when inserting records
|
||
into a database that supports non-sorted duplicates:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
If your application simply adds a duplicate record using
|
||
|
||
<span><tt class="methodname">Db::put()</tt>,</span>
|
||
|
||
then the record is inserted at the end of its sorted duplicate set.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
If a cursor is used to put the duplicate record to the database,
|
||
then the new record is placed in the duplicate set according to the
|
||
flags that are provided on the
|
||
|
||
<tt class="methodname">Dbc::put()</tt>
|
||
method. The relevant flags are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="circle">
|
||
<li>
|
||
<p>
|
||
<tt class="literal">DB_AFTER</tt>
|
||
|
||
</p>
|
||
<p>
|
||
The data
|
||
<span>
|
||
provided on the call to
|
||
|
||
<tt class="methodname">Dbc::put()</tt>
|
||
</span>
|
||
is placed into the database
|
||
as a duplicate record. The key used for this operation is
|
||
the key used for the record to which the cursor currently
|
||
refers. Any key provided on the call
|
||
|
||
<span>
|
||
to
|
||
|
||
<tt class="methodname">Dbc::put()</tt>
|
||
</span>
|
||
|
||
is therefore ignored.
|
||
</p>
|
||
<p>
|
||
The duplicate record is inserted into the database
|
||
immediately after the cursor's current position in the
|
||
database.
|
||
</p>
|
||
<p>
|
||
This flag is ignored if sorted duplicates are supported for
|
||
the database.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<tt class="literal">DB_BEFORE</tt>
|
||
|
||
</p>
|
||
<p>
|
||
Behaves the same as
|
||
<tt class="literal">DB_AFTER</tt>
|
||
|
||
except that the new record is inserted immediately before
|
||
the cursor's current location in the database.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<tt class="literal">DB_KEYFIRST</tt>
|
||
|
||
</p>
|
||
<p>
|
||
If the key
|
||
<span>
|
||
provided on the call to
|
||
|
||
<tt class="methodname">Dbc::put()</tt>
|
||
</span>
|
||
already exists in the
|
||
database, and the database is configured to use duplicates
|
||
without sorting, then the new record is inserted as the first entry
|
||
in the appropriate duplicates list.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<tt class="literal">DB_KEYLAST</tt>
|
||
|
||
</p>
|
||
<p>
|
||
Behaves identically to
|
||
<tt class="literal">DB_KEYFIRST</tt>
|
||
|
||
except that the new duplicate record is inserted as the last
|
||
record in the duplicates list.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Duplicates support can only be configured
|
||
at database creation time. You do this by specifying the appropriate
|
||
<span>
|
||
flags to
|
||
|
||
<tt class="methodname">Db::set_flags()</tt>
|
||
</span>
|
||
|
||
before the database is opened for the first time.
|
||
</p>
|
||
<p>
|
||
The
|
||
<span>flags</span>
|
||
|
||
that you can use are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
<tt class="literal">DB_DUP</tt>
|
||
|
||
</p>
|
||
<p>
|
||
The database supports non-sorted duplicate records.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<tt class="literal">DB_DUPSORT</tt>
|
||
|
||
</p>
|
||
<p>
|
||
The database supports sorted duplicate records.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
The following code fragment illustrates how to configure a database
|
||
to support sorted duplicate records:
|
||
</p>
|
||
<a id="cxx_btree_dupsort"></a>
|
||
<pre class="programlisting">#include <db_cxx.h>
|
||
...
|
||
|
||
Db db(NULL, 0);
|
||
const char *file_name = "myd.db";
|
||
|
||
try {
|
||
// Configure the database for sorted duplicates
|
||
db.set_flags(DB_DUPSORT);
|
||
|
||
// Now open the database
|
||
db.open(NULL, // Txn pointer
|
||
file_name, // File name
|
||
NULL, // Logical db name (unneeded)
|
||
DB_BTREE, // Database type (using btree)
|
||
DB_CREATE, // Open flags
|
||
0); // File mode. Using defaults
|
||
} catch(DbException &e) {
|
||
db.err(e.get_errno(), "Database '%s' open failed.", file_name);
|
||
} catch(std::exception &e) {
|
||
db.errx("Error opening database: %s : %s\n", file_name, e.what());
|
||
}
|
||
|
||
...
|
||
|
||
try {
|
||
db.close(0);
|
||
} catch(DbException &e) {
|
||
db.err(e.get_errno(), "Database '%s' close failed.", file_name);
|
||
} catch(std::exception &e) {
|
||
db.errx("Error closing database: %s : %s\n", file_name, e.what());
|
||
}
|
||
|
||
</pre>
|
||
</div>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
By default, DB uses a lexicographical comparison function where
|
||
shorter records collate before longer records. For the majority of
|
||
cases, this comparison works well and you do not need to manage
|
||
it in any way.
|
||
</p>
|
||
<p>
|
||
However, in some situations your application's performance can
|
||
benefit from setting a custom comparison routine. You can do this
|
||
either for database keys, or for the data if your
|
||
database supports sorted duplicate records.
|
||
</p>
|
||
<p>
|
||
Some of the reasons why you may want to provide a custom sorting
|
||
function are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Your database is keyed using strings and you want to provide
|
||
some sort of language-sensitive ordering to that data. Doing
|
||
so can help increase the locality of reference that allows
|
||
your database to perform at its best.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
You are using a little-endian system (such as x86) and you
|
||
are using integers as your database's keys. Berkeley DB
|
||
stores keys as byte strings and little-endian integers
|
||
do not sort well when viewed as byte strings. There are
|
||
several solutions to this problem, one being to provide a
|
||
custom comparison function. See
|
||
<a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a>
|
||
for more information.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
You you do not want the entire key to participate in the
|
||
comparison, for whatever reason. In
|
||
this case, you may want to provide a custom comparison
|
||
function so that only the relevant bytes are examined.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="creatingComparisonFunctions"></a>
|
||
<span>Creating Comparison Functions</span>
|
||
|
||
</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
You set a BTree's key
|
||
<span>
|
||
comparison function
|
||
</span>
|
||
|
||
using
|
||
|
||
<span><tt class="methodname">Db::set_bt_compare()</tt>.</span>
|
||
|
||
You can also set a BTree's duplicate data comparison function using
|
||
|
||
<span><tt class="methodname">Db::set_dup_compare()</tt>.</span>
|
||
|
||
|
||
</p>
|
||
<p>
|
||
<span>
|
||
You cannot use these methods after the database has been opened.
|
||
Also, if
|
||
</span>
|
||
|
||
the database already exists when it is opened, the
|
||
<span>
|
||
function
|
||
</span>
|
||
|
||
provided to these methods must be the same as
|
||
that historically used to create the database or corruption can
|
||
occur.
|
||
</p>
|
||
<p>
|
||
The value that you provide to the <tt class="methodname">set_bt_compare()</tt> method
|
||
is a pointer to a function that has the following signature:
|
||
</p>
|
||
<pre class="programlisting">int (*function)(Db *db, const Dbt *key1, const Dbt *key2)</pre>
|
||
<p>
|
||
This function must return an integer value less than, equal to,
|
||
or greater than 0. If key1 is considered to be greater than
|
||
key2, then the function must return a value that is greater than
|
||
0. If the two are equal, then the function must return 0, and if
|
||
the first key is less than the second then the function must return
|
||
a negative value.
|
||
</p>
|
||
<p>
|
||
The function that you provide to <tt class="methodname">set_dup_compare()</tt>
|
||
works in exactly the same way, except that the
|
||
|
||
<tt class="literal">Dbt</tt>
|
||
parameters hold record data items instead of keys.
|
||
</p>
|
||
<p>
|
||
For example, an example routine that is used to sort integer
|
||
keys in the database is:
|
||
|
||
</p>
|
||
<a id="cxx_btree1"></a>
|
||
<pre class="programlisting">int
|
||
compare_int(Db *dbp, const Dbt *a, const Dbt *b)
|
||
{
|
||
int ai, bi;
|
||
|
||
// Returns:
|
||
// < 0 if a < b
|
||
// = 0 if a = b
|
||
// > 0 if a > b
|
||
memcpy(&ai, a->get_data(), sizeof(int));
|
||
memcpy(&bi, b->get_data(), sizeof(int));
|
||
return (ai - bi);
|
||
} </pre>
|
||
<p>
|
||
Note that the data must first be copied into memory that is
|
||
appropriately aligned, as Berkeley DB does not guarantee any kind of
|
||
alignment of the underlying data, including for comparison routines.
|
||
When writing comparison routines, remember that databases created on
|
||
machines of different architectures may have different integer byte
|
||
orders, for which your code may need to compensate.
|
||
</p>
|
||
<p>
|
||
To cause DB to use this comparison function:
|
||
</p>
|
||
<a id="cxx_btree2"></a>
|
||
<pre class="programlisting">#include <db_cxx.h>
|
||
#include <string.h>
|
||
|
||
...
|
||
|
||
Db db(NULL, 0);
|
||
|
||
// Set up the btree comparison function for this database
|
||
db.set_bt_compare(compare_int);
|
||
|
||
// Database open call follows sometime after this.</pre>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="dbconfig.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Selecting the Cache Size </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> </td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|