541 lines
21 KiB
HTML
541 lines
21 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>BTree Configuration</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
|
||
<link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
|
||
<link rel="up" href="dbconfig.html" title="Chapter 11. Database Configuration" />
|
||
<link rel="previous" href="cachesize.html" title="Selecting the Cache Size" />
|
||
</head>
|
||
<body>
|
||
<div class="navheader">
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">BTree Configuration</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a> </td>
|
||
<th width="60%" align="center">Chapter 11. Database Configuration</th>
|
||
<td width="20%" align="right"> </td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
In going through the previous chapters in this book, you may notice that
|
||
we touch on some topics that are specific to BTree, but we do not cover
|
||
those topics in any real detail. In this section, we will discuss
|
||
configuration issues that are unique to BTree.
|
||
</p>
|
||
<p>
|
||
Specifically, in this section we describe:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Allowing duplicate records.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Setting comparator callbacks.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
BTree databases can contain duplicate records. One record is
|
||
considered to be a duplicate of another when both records use keys
|
||
that compare as equal to one another.
|
||
</p>
|
||
<p>
|
||
By default, keys are compared using a lexicographical comparison,
|
||
with shorter keys collating higher than longer keys.
|
||
You can override this default using the
|
||
|
||
|
||
<tt class="methodname">DatabaseConfig.setBtreeComparator()</tt>
|
||
method. See the next section for details.
|
||
</p>
|
||
<p>
|
||
By default, DB databases do not allow duplicate records. As a
|
||
result, any attempt to write a record that uses a key equal to a
|
||
previously existing record results in the previously existing record
|
||
being overwritten by the new record.
|
||
</p>
|
||
<p>
|
||
Allowing duplicate records is useful if you have a database that
|
||
contains records keyed by a commonly occurring piece of information.
|
||
It is frequently necessary to allow duplicate records for secondary
|
||
databases.
|
||
</p>
|
||
<p>
|
||
For example, suppose your primary database contained records related
|
||
to automobiles. You might in this case want to be able to find all
|
||
the automobiles in the database that are of a particular color, so
|
||
you would index on the color of the automobile. However, for any
|
||
given color there will probably be multiple automobiles. Since the
|
||
index is the secondary key, this means that multiple secondary
|
||
database records will share the same key, and so the secondary
|
||
database must support duplicate records.
|
||
</p>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Duplicate records can be stored in sorted or unsorted order.
|
||
You can cause DB to automatically sort your duplicate
|
||
records by
|
||
|
||
<span>
|
||
setting <tt class="methodname">DatabaseConfig.setSortedDuplicates()</tt>
|
||
to <tt class="literal">true</tt>. Note that this property must be
|
||
set prior to database creation time and it cannot be changed
|
||
afterwards.
|
||
</span>
|
||
</p>
|
||
<p>
|
||
If sorted duplicates are supported, then the
|
||
|
||
<span>
|
||
<tt class="classname">java.util.Comparator</tt> implementation
|
||
identified to
|
||
<tt class="methodname">DatabaseConfig.setDuplicateComparator()</tt>
|
||
</span>
|
||
is used to determine the location of the duplicate record in its
|
||
duplicate set. If no such function is provided, then the default
|
||
lexicographical comparison is used.
|
||
</p>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
For performance reasons, BTrees should always contain sorted
|
||
records. (BTrees containing unsorted entries must potentially
|
||
spend a great deal more time locating an entry than does a BTree
|
||
that contains sorted entries). That said, DB provides support
|
||
for suppressing automatic sorting of duplicate records because it may be that
|
||
your application is inserting records that are already in a
|
||
sorted order.
|
||
</p>
|
||
<p>
|
||
That is, if the database is configured to support unsorted
|
||
duplicates, then the assumption is that your application
|
||
will manually perform the sorting. In this event,
|
||
expect to pay a significant performance penalty. Any time you
|
||
place records into the database in a sort order not know to
|
||
DB, you will pay a performance penalty
|
||
</p>
|
||
<p>
|
||
That said, this is how DB behaves when inserting records
|
||
into a database that supports non-sorted duplicates:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
If your application simply adds a duplicate record using
|
||
|
||
|
||
<span><tt class="methodname">Database.put()</tt>,</span>
|
||
then the record is inserted at the end of its sorted duplicate set.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
If a cursor is used to put the duplicate record to the database,
|
||
then the new record is placed in the duplicate set according to the
|
||
actual method used to perform the put. The relevant methods
|
||
are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="circle">
|
||
<li>
|
||
<p>
|
||
|
||
<tt class="methodname">Cursor.putAfter()</tt>
|
||
</p>
|
||
<p>
|
||
The data
|
||
|
||
is placed into the database
|
||
as a duplicate record. The key used for this operation is
|
||
the key used for the record to which the cursor currently
|
||
refers. Any key provided on the call
|
||
|
||
|
||
|
||
is therefore ignored.
|
||
</p>
|
||
<p>
|
||
The duplicate record is inserted into the database
|
||
immediately after the cursor's current position in the
|
||
database.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
|
||
<tt class="methodname">Cursor.putBefore()</tt>
|
||
</p>
|
||
<p>
|
||
Behaves the same as
|
||
|
||
<tt class="methodname">Cursor.putAfter()</tt>
|
||
except that the new record is inserted immediately before
|
||
the cursor's current location in the database.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
|
||
<tt class="methodname">Cursor.putKeyFirst()</tt>
|
||
</p>
|
||
<p>
|
||
If the key
|
||
|
||
already exists in the
|
||
database, and the database is configured to use duplicates
|
||
without sorting, then the new record is inserted as the first entry
|
||
in the appropriate duplicates list.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
|
||
<tt class="methodname">Cursor.putKeyLast()</tt>
|
||
</p>
|
||
<p>
|
||
Behaves identically to
|
||
|
||
<tt class="methodname">Cursor.putKeyFirst()</tt>
|
||
except that the new duplicate record is inserted as the last
|
||
record in the duplicates list.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Duplicates support can only be configured
|
||
at database creation time. You do this by specifying the appropriate
|
||
|
||
<span>
|
||
<tt class="classname">DatabaseConfig</tt> method
|
||
</span>
|
||
before the database is opened for the first time.
|
||
</p>
|
||
<p>
|
||
The
|
||
|
||
<span>methods</span>
|
||
that you can use are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
|
||
<tt class="methodname">DatabaseConfig.setUnsortedDuplicates()</tt>
|
||
</p>
|
||
<p>
|
||
The database supports non-sorted duplicate records.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
|
||
<tt class="methodname">DatabaseConfig.setSortedDuplicates()</tt>
|
||
</p>
|
||
<p>
|
||
The database supports sorted duplicate records.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
The following code fragment illustrates how to configure a database
|
||
to support sorted duplicate records:
|
||
</p>
|
||
<a id="java_btree_dupsort"></a>
|
||
<pre class="programlisting">package db.GettingStarted;
|
||
|
||
import java.io.FileNotFoundException;
|
||
|
||
import com.sleepycat.db.Database;
|
||
import com.sleepycat.db.DatabaseConfig;
|
||
import com.sleepycat.db.DatabaseException;
|
||
import com.sleepycat.db.DatabaseType;
|
||
|
||
...
|
||
|
||
Database myDb = null;
|
||
|
||
try {
|
||
// Typical configuration settings
|
||
DatabaseConfig myDbConfig = new DatabaseConfig();
|
||
myDbConfig.setType(DatabaseType.BTREE);
|
||
myDbConfig.setAllowCreate(true);
|
||
|
||
// Configure for sorted duplicates
|
||
myDbConfig.setSortedDuplicates(true);
|
||
|
||
// Open the database
|
||
myDb = new Database("mydb.db", null, myDbConfig);
|
||
} catch(DatabaseException dbe) {
|
||
System.err.println("MyDbs: " + dbe.toString());
|
||
System.exit(-1);
|
||
} catch(FileNotFoundException fnfe) {
|
||
System.err.println("MyDbs: " + fnfe.toString());
|
||
System.exit(-1);
|
||
} </pre>
|
||
</div>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
By default, DB uses a lexicographical comparison function where
|
||
shorter records collate before longer records. For the majority of
|
||
cases, this comparison works well and you do not need to manage
|
||
it in any way.
|
||
</p>
|
||
<p>
|
||
However, in some situations your application's performance can
|
||
benefit from setting a custom comparison routine. You can do this
|
||
either for database keys, or for the data if your
|
||
database supports sorted duplicate records.
|
||
</p>
|
||
<p>
|
||
Some of the reasons why you may want to provide a custom sorting
|
||
function are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Your database is keyed using strings and you want to provide
|
||
some sort of language-sensitive ordering to that data. Doing
|
||
so can help increase the locality of reference that allows
|
||
your database to perform at its best.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
You are using a little-endian system (such as x86) and you
|
||
are using integers as your database's keys. Berkeley DB
|
||
stores keys as byte strings and little-endian integers
|
||
do not sort well when viewed as byte strings. There are
|
||
several solutions to this problem, one being to provide a
|
||
custom comparison function. See
|
||
<a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a>
|
||
for more information.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
You you do not want the entire key to participate in the
|
||
comparison, for whatever reason. In
|
||
this case, you may want to provide a custom comparison
|
||
function so that only the relevant bytes are examined.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<div class="sect3" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h4 class="title"><a id="creatingComparisonFunctions"></a>
|
||
|
||
<span>Creating Java Comparators</span>
|
||
</h4>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
You set a BTree's key
|
||
|
||
<span>
|
||
comparator
|
||
</span>
|
||
using
|
||
|
||
|
||
<span><tt class="methodname">DatabaseConfig.setBtreeComparator()</tt>.</span>
|
||
You can also set a BTree's duplicate data comparison function using
|
||
|
||
|
||
<span><tt class="methodname">DatabaseConfig.setDuplicateComparator()</tt>.</span>
|
||
|
||
</p>
|
||
<p>
|
||
|
||
<span>
|
||
If
|
||
</span>
|
||
the database already exists when it is opened, the
|
||
|
||
<span>
|
||
comparator
|
||
</span>
|
||
provided to these methods must be the same as
|
||
that historically used to create the database or corruption can
|
||
occur.
|
||
</p>
|
||
<p>
|
||
You override the default comparison function by providing a Java
|
||
<tt class="classname">Comparator</tt> class to the database.
|
||
The Java <tt class="classname">Comparator</tt> interface requires you to implement the
|
||
<tt class="methodname">Comparator.compare()</tt> method
|
||
(see <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Comparator.html" target="_top">http://java.sun.com/j2se/1.4.2/docs/api/java/util/Comparator.html</a> for details).
|
||
</p>
|
||
<p>
|
||
DB hands your <tt class="methodname">Comparator.compare()</tt> method
|
||
the <tt class="literal">byte</tt> arrays that you stored in the database. If
|
||
you know how your data is organized in the <tt class="literal">byte</tt>
|
||
array, then you can write a comparison routine that directly examines
|
||
the contents of the arrays. Otherwise, you have to reconstruct your
|
||
original objects, and then perform the comparison.
|
||
</p>
|
||
<p>
|
||
For example, suppose you want to perform unicode lexical comparisons
|
||
instead of UTF-8 byte-by-byte comparisons. Then you could provide a
|
||
comparator that uses <tt class="methodname">String.compareTo()</tt>,
|
||
which performs a Unicode comparison of two strings (note that for
|
||
single-byte roman characters, Unicode comparison and UTF-8
|
||
byte-by-byte comparisons are identical – this is something you
|
||
would only want to do if you were using multibyte unicode characters
|
||
with DB). In this case, your comparator would look like the
|
||
following:
|
||
</p>
|
||
<a id="java_btree1"></a>
|
||
<pre class="programlisting">package db.GettingStarted;
|
||
|
||
import java.util.Comparator;
|
||
|
||
public class MyDataComparator implements Comparator {
|
||
|
||
public MyDataComparator() {}
|
||
|
||
public int compare(Object d1, Object d2) {
|
||
|
||
byte[] b1 = (byte[])d1;
|
||
byte[] b2 = (byte[])d2;
|
||
|
||
String s1 = new String(b1);
|
||
String s2 = new String(b2);
|
||
return s1.compareTo(s2);
|
||
}
|
||
} </pre>
|
||
<p>
|
||
To use this comparator:
|
||
</p>
|
||
<a id="java_btree2"></a>
|
||
<pre class="programlisting">package db.GettingStarted;
|
||
|
||
import java.io.FileNotFoundException;
|
||
import java.util.Comparator;
|
||
import com.sleepycat.db.Database;
|
||
import com.sleepycat.db.DatabaseConfig;
|
||
import com.sleepycat.db.DatabaseException;
|
||
|
||
...
|
||
|
||
Database myDatabase = null;
|
||
try {
|
||
// Get the database configuration object
|
||
DatabaseConfig myDbConfig = new DatabaseConfig();
|
||
myDbConfig.setAllowCreate(true);
|
||
|
||
// Set the duplicate comparator class
|
||
MyDataComparator mdc = new MyDataComparator();
|
||
myDbConfig.setDuplicateComparator(mdc);
|
||
|
||
// Open the database that you will use to store your data
|
||
myDbConfig.setSortedDuplicates(true);
|
||
myDatabase = new Database("myDb", null, myDbConfig);
|
||
} catch (DatabaseException dbe) {
|
||
// Exception handling goes here
|
||
} catch (FileNotFoundException fnfe) {
|
||
// Exception handling goes here
|
||
}</pre>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="dbconfig.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Selecting the Cache Size </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> </td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|