Import BSDDB 4.7.25 (as of svn r89086)
This commit is contained in:
572
docs/gsg/C/btree.html
Normal file
572
docs/gsg/C/btree.html
Normal file
@@ -0,0 +1,572 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||
<title>BTree Configuration</title>
|
||||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||||
<meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
|
||||
<link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
|
||||
<link rel="up" href="dbconfig.html" title="Chapter 6. Database Configuration" />
|
||||
<link rel="previous" href="cachesize.html" title="Selecting the Cache Size" />
|
||||
</head>
|
||||
<body>
|
||||
<div class="navheader">
|
||||
<table width="100%" summary="Navigation header">
|
||||
<tr>
|
||||
<th colspan="3" align="center">BTree Configuration</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a> </td>
|
||||
<th width="60%" align="center">Chapter 6. Database Configuration</th>
|
||||
<td width="20%" align="right"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
<hr />
|
||||
</div>
|
||||
<div class="sect1" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
In going through the previous chapters in this book, you may notice that
|
||||
we touch on some topics that are specific to BTree, but we do not cover
|
||||
those topics in any real detail. In this section, we will discuss
|
||||
configuration issues that are unique to BTree.
|
||||
</p>
|
||||
<p>
|
||||
Specifically, in this section we describe:
|
||||
</p>
|
||||
<div class="itemizedlist">
|
||||
<ul type="disc">
|
||||
<li>
|
||||
<p>
|
||||
Allowing duplicate records.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
Setting comparator callbacks.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="sect2" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
BTree databases can contain duplicate records. One record is
|
||||
considered to be a duplicate of another when both records use keys
|
||||
that compare as equal to one another.
|
||||
</p>
|
||||
<p>
|
||||
By default, keys are compared using a lexicographical comparison,
|
||||
with shorter keys collating higher than longer keys.
|
||||
You can override this default using the
|
||||
<tt class="methodname">DB->set_bt_compare()</tt>
|
||||
|
||||
|
||||
method. See the next section for details.
|
||||
</p>
|
||||
<p>
|
||||
By default, DB databases do not allow duplicate records. As a
|
||||
result, any attempt to write a record that uses a key equal to a
|
||||
previously existing record results in the previously existing record
|
||||
being overwritten by the new record.
|
||||
</p>
|
||||
<p>
|
||||
Allowing duplicate records is useful if you have a database that
|
||||
contains records keyed by a commonly occurring piece of information.
|
||||
It is frequently necessary to allow duplicate records for secondary
|
||||
databases.
|
||||
</p>
|
||||
<p>
|
||||
For example, suppose your primary database contained records related
|
||||
to automobiles. You might in this case want to be able to find all
|
||||
the automobiles in the database that are of a particular color, so
|
||||
you would index on the color of the automobile. However, for any
|
||||
given color there will probably be multiple automobiles. Since the
|
||||
index is the secondary key, this means that multiple secondary
|
||||
database records will share the same key, and so the secondary
|
||||
database must support duplicate records.
|
||||
</p>
|
||||
<div class="sect3" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
Duplicate records can be stored in sorted or unsorted order.
|
||||
You can cause DB to automatically sort your duplicate
|
||||
records by
|
||||
<span>
|
||||
specifying the <tt class="literal">DB_DUPSORT</tt> flag at
|
||||
database creation time.
|
||||
</span>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
If sorted duplicates are supported, then the
|
||||
<span>
|
||||
sorting function specified on
|
||||
<tt class="methodname">DB->set_dup_compare()</tt>
|
||||
|
||||
</span>
|
||||
|
||||
is used to determine the location of the duplicate record in its
|
||||
duplicate set. If no such function is provided, then the default
|
||||
lexicographical comparison is used.
|
||||
</p>
|
||||
</div>
|
||||
<div class="sect3" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
For performance reasons, BTrees should always contain sorted
|
||||
records. (BTrees containing unsorted entries must potentially
|
||||
spend a great deal more time locating an entry than does a BTree
|
||||
that contains sorted entries). That said, DB provides support
|
||||
for suppressing automatic sorting of duplicate records because it may be that
|
||||
your application is inserting records that are already in a
|
||||
sorted order.
|
||||
</p>
|
||||
<p>
|
||||
That is, if the database is configured to support unsorted
|
||||
duplicates, then the assumption is that your application
|
||||
will manually perform the sorting. In this event,
|
||||
expect to pay a significant performance penalty. Any time you
|
||||
place records into the database in a sort order not know to
|
||||
DB, you will pay a performance penalty
|
||||
</p>
|
||||
<p>
|
||||
That said, this is how DB behaves when inserting records
|
||||
into a database that supports non-sorted duplicates:
|
||||
</p>
|
||||
<div class="itemizedlist">
|
||||
<ul type="disc">
|
||||
<li>
|
||||
<p>
|
||||
If your application simply adds a duplicate record using
|
||||
<span><tt class="methodname">DB->put()</tt>,</span>
|
||||
|
||||
|
||||
then the record is inserted at the end of its sorted duplicate set.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
If a cursor is used to put the duplicate record to the database,
|
||||
then the new record is placed in the duplicate set according to the
|
||||
flags that are provided on the
|
||||
<tt class="methodname">DBC->put()</tt>
|
||||
|
||||
method. The relevant flags are:
|
||||
</p>
|
||||
<div class="itemizedlist">
|
||||
<ul type="circle">
|
||||
<li>
|
||||
<p>
|
||||
<tt class="literal">DB_AFTER</tt>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
The data
|
||||
<span>
|
||||
provided on the call to
|
||||
<tt class="methodname">DBC->put()</tt>
|
||||
|
||||
</span>
|
||||
is placed into the database
|
||||
as a duplicate record. The key used for this operation is
|
||||
the key used for the record to which the cursor currently
|
||||
refers. Any key provided on the call
|
||||
|
||||
<span>
|
||||
to
|
||||
<tt class="methodname">DBC->put()</tt>
|
||||
|
||||
</span>
|
||||
|
||||
is therefore ignored.
|
||||
</p>
|
||||
<p>
|
||||
The duplicate record is inserted into the database
|
||||
immediately after the cursor's current position in the
|
||||
database.
|
||||
</p>
|
||||
<p>
|
||||
This flag is ignored if sorted duplicates are supported for
|
||||
the database.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
<tt class="literal">DB_BEFORE</tt>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
Behaves the same as
|
||||
<tt class="literal">DB_AFTER</tt>
|
||||
|
||||
except that the new record is inserted immediately before
|
||||
the cursor's current location in the database.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
<tt class="literal">DB_KEYFIRST</tt>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
If the key
|
||||
<span>
|
||||
provided on the call to
|
||||
<tt class="methodname">DBC->put()</tt>
|
||||
|
||||
</span>
|
||||
already exists in the
|
||||
database, and the database is configured to use duplicates
|
||||
without sorting, then the new record is inserted as the first entry
|
||||
in the appropriate duplicates list.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
<tt class="literal">DB_KEYLAST</tt>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
Behaves identically to
|
||||
<tt class="literal">DB_KEYFIRST</tt>
|
||||
|
||||
except that the new duplicate record is inserted as the last
|
||||
record in the duplicates list.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect3" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
Duplicates support can only be configured
|
||||
at database creation time. You do this by specifying the appropriate
|
||||
<span>
|
||||
flags to
|
||||
<tt class="methodname">DB->set_flags()</tt>
|
||||
|
||||
</span>
|
||||
|
||||
before the database is opened for the first time.
|
||||
</p>
|
||||
<p>
|
||||
The
|
||||
<span>flags</span>
|
||||
|
||||
that you can use are:
|
||||
</p>
|
||||
<div class="itemizedlist">
|
||||
<ul type="disc">
|
||||
<li>
|
||||
<p>
|
||||
<tt class="literal">DB_DUP</tt>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
The database supports non-sorted duplicate records.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
<tt class="literal">DB_DUPSORT</tt>
|
||||
|
||||
</p>
|
||||
<p>
|
||||
The database supports sorted duplicate records.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<p>
|
||||
The following code fragment illustrates how to configure a database
|
||||
to support sorted duplicate records:
|
||||
</p>
|
||||
<a id="c_btree_dupsort"></a>
|
||||
<pre class="programlisting">#include <db.h>
|
||||
...
|
||||
|
||||
DB *dbp;
|
||||
FILE *error_file_pointer;
|
||||
int ret;
|
||||
char *program_name = "my_prog";
|
||||
char *file_name = "mydb.db";
|
||||
|
||||
/* Variable assignments omitted for brevity */
|
||||
|
||||
/* Initialize the DB handle */
|
||||
ret = db_create(&dbp, NULL, 0);
|
||||
if (ret != 0) {
|
||||
fprintf(error_file_pointer, "%s: %s\n", program_name,
|
||||
db_strerror(ret));
|
||||
return(ret);
|
||||
}
|
||||
|
||||
/* Set up error handling for this database */
|
||||
dbp->set_errfile(dbp, error_file_pointer);
|
||||
dbp->set_errpfx(dbp, program_name);
|
||||
|
||||
/*
|
||||
* Configure the database for sorted duplicates
|
||||
*/
|
||||
ret = dbp->set_flags(dbp, DB_DUPSORT);
|
||||
if (ret != 0) {
|
||||
dbp->err(dbp, ret, "Attempt to set DUPSORT flag failed.");
|
||||
dbp->close(dbp, 0);
|
||||
return(ret);
|
||||
}
|
||||
|
||||
/* Now open the database */
|
||||
ret = dbp->open(dbp, /* Pointer to the database */
|
||||
NULL, /* Txn pointer */
|
||||
file_name, /* File name */
|
||||
NULL, /* Logical db name (unneeded) */
|
||||
DB_BTREE, /* Database type (using btree) */
|
||||
DB_CREATE, /* Open flags */
|
||||
0); /* File mode. Using defaults */
|
||||
if (ret != 0) {
|
||||
dbp->err(dbp, ret, "Database '%s' open failed.", file_name);
|
||||
dbp->close(dbp, 0);
|
||||
return(ret);
|
||||
} </pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect2" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
By default, DB uses a lexicographical comparison function where
|
||||
shorter records collate before longer records. For the majority of
|
||||
cases, this comparison works well and you do not need to manage
|
||||
it in any way.
|
||||
</p>
|
||||
<p>
|
||||
However, in some situations your application's performance can
|
||||
benefit from setting a custom comparison routine. You can do this
|
||||
either for database keys, or for the data if your
|
||||
database supports sorted duplicate records.
|
||||
</p>
|
||||
<p>
|
||||
Some of the reasons why you may want to provide a custom sorting
|
||||
function are:
|
||||
</p>
|
||||
<div class="itemizedlist">
|
||||
<ul type="disc">
|
||||
<li>
|
||||
<p>
|
||||
Your database is keyed using strings and you want to provide
|
||||
some sort of language-sensitive ordering to that data. Doing
|
||||
so can help increase the locality of reference that allows
|
||||
your database to perform at its best.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
You are using a little-endian system (such as x86) and you
|
||||
are using integers as your database's keys. Berkeley DB
|
||||
stores keys as byte strings and little-endian integers
|
||||
do not sort well when viewed as byte strings. There are
|
||||
several solutions to this problem, one being to provide a
|
||||
custom comparison function. See
|
||||
<a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a>
|
||||
for more information.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
You you do not want the entire key to participate in the
|
||||
comparison, for whatever reason. In
|
||||
this case, you may want to provide a custom comparison
|
||||
function so that only the relevant bytes are examined.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="sect3" lang="en" xml:lang="en">
|
||||
<div class="titlepage">
|
||||
<div>
|
||||
<div>
|
||||
<h4 class="title"><a id="creatingComparisonFunctions"></a>
|
||||
<span>Creating Comparison Functions</span>
|
||||
|
||||
</h4>
|
||||
</div>
|
||||
</div>
|
||||
<div></div>
|
||||
</div>
|
||||
<p>
|
||||
You set a BTree's key
|
||||
<span>
|
||||
comparison function
|
||||
</span>
|
||||
|
||||
using
|
||||
<span><tt class="methodname">DB->set_bt_compare()</tt>.</span>
|
||||
|
||||
|
||||
You can also set a BTree's duplicate data comparison function using
|
||||
<span><tt class="methodname">DB->set_dup_compare()</tt>.</span>
|
||||
|
||||
|
||||
|
||||
</p>
|
||||
<p>
|
||||
<span>
|
||||
You cannot use these methods after the database has been opened.
|
||||
Also, if
|
||||
</span>
|
||||
|
||||
the database already exists when it is opened, the
|
||||
<span>
|
||||
function
|
||||
</span>
|
||||
|
||||
provided to these methods must be the same as
|
||||
that historically used to create the database or corruption can
|
||||
occur.
|
||||
</p>
|
||||
<p>
|
||||
The value that you provide to the <tt class="methodname">set_bt_compare()</tt> method
|
||||
is a pointer to a function that has the following signature:
|
||||
</p>
|
||||
<pre class="programlisting">int (*function)(DB *db, const DBT *key1, const DBT *key2)</pre>
|
||||
<p>
|
||||
This function must return an integer value less than, equal to,
|
||||
or greater than 0. If key1 is considered to be greater than
|
||||
key2, then the function must return a value that is greater than
|
||||
0. If the two are equal, then the function must return 0, and if
|
||||
the first key is less than the second then the function must return
|
||||
a negative value.
|
||||
</p>
|
||||
<p>
|
||||
The function that you provide to <tt class="methodname">set_dup_compare()</tt>
|
||||
works in exactly the same way, except that the
|
||||
<tt class="literal">DBT</tt>
|
||||
|
||||
parameters hold record data items instead of keys.
|
||||
</p>
|
||||
<p>
|
||||
For example, an example routine that is used to sort integer
|
||||
keys in the database is:
|
||||
<span>
|
||||
|
||||
</span>
|
||||
</p>
|
||||
<a id="c_btree1"></a>
|
||||
<pre class="programlisting">int
|
||||
compare_int(DB *dbp, const DBT *a, const DBT *b)
|
||||
{
|
||||
int ai, bi;
|
||||
|
||||
/*
|
||||
* Returns:
|
||||
* < 0 if a < b
|
||||
* = 0 if a = b
|
||||
* > 0 if a > b
|
||||
*/
|
||||
memcpy(&ai, a->data, sizeof(int));
|
||||
memcpy(&bi, b->data, sizeof(int));
|
||||
return (ai - bi);
|
||||
} </pre>
|
||||
<p>
|
||||
Note that the data must first be copied into memory that is
|
||||
appropriately aligned, as Berkeley DB does not guarantee any kind of
|
||||
alignment of the underlying data, including for comparison routines.
|
||||
When writing comparison routines, remember that databases created on
|
||||
machines of different architectures may have different integer byte
|
||||
orders, for which your code may need to compensate.
|
||||
</p>
|
||||
<p>
|
||||
To cause DB to use this comparison function:
|
||||
</p>
|
||||
<a id="c_btree2"></a>
|
||||
<pre class="programlisting">#include <db.h>
|
||||
#include <string.h>
|
||||
|
||||
...
|
||||
|
||||
DB *dbp;
|
||||
int ret;
|
||||
|
||||
/* Create a database */
|
||||
ret = db_create(&dbp, NULL, 0);
|
||||
if (ret != 0) {
|
||||
fprintf(stderr, "%s: %s\n", "my_program",
|
||||
db_strerror(ret));
|
||||
return(-1);
|
||||
}
|
||||
|
||||
/* Set up the btree comparison function for this database */
|
||||
dbp->set_bt_compare(dbp, compare_int);
|
||||
|
||||
/* Database open call follows sometime after this. */ </pre>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="navfooter">
|
||||
<hr />
|
||||
<table width="100%" summary="Navigation footer">
|
||||
<tr>
|
||||
<td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a> </td>
|
||||
<td width="20%" align="center">
|
||||
<a accesskey="u" href="dbconfig.html">Up</a>
|
||||
</td>
|
||||
<td width="40%" align="right"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="40%" align="left" valign="top">Selecting the Cache Size </td>
|
||||
<td width="20%" align="center">
|
||||
<a accesskey="h" href="index.html">Home</a>
|
||||
</td>
|
||||
<td width="40%" align="right" valign="top"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user