401 lines
17 KiB
HTML
401 lines
17 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>Chapter 6. Database Configuration</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
|
||
<link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
|
||
<link rel="up" href="index.html" title="Getting Started with Berkeley DB" />
|
||
<link rel="previous" href="coreindexusage.html" title="Secondary Database Example" />
|
||
<link rel="next" href="cachesize.html" title="Selecting the Cache Size" />
|
||
</head>
|
||
<body>
|
||
<div class="navheader">
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">Chapter 6. Database Configuration</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="coreindexusage.html">Prev</a> </td>
|
||
<th width="60%" align="center"> </th>
|
||
<td width="20%" align="right"> <a accesskey="n" href="cachesize.html">Next</a></td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="chapter" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title"><a id="dbconfig"></a>Chapter 6. Database Configuration</h2>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<div class="toc">
|
||
<p>
|
||
<b>Table of Contents</b>
|
||
</p>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="dbconfig.html#pagesize">Setting the Page Size</a>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="dbconfig.html#overflowpages">Overflow Pages</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="dbconfig.html#Locking">Locking</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="dbconfig.html#IOEfficiency">IO Efficiency</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="dbconfig.html#pagesizeAdvice">Page Sizing Advice</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</dd>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="cachesize.html">Selecting the Cache Size</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="btree.html">BTree Configuration</a>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="btree.html#duplicateRecords">Allowing Duplicate Records</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="btree.html#comparators">Setting Comparison Functions</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</dd>
|
||
</dl>
|
||
</div>
|
||
<p>
|
||
This chapter describes some of the database and cache configuration issues
|
||
that you need to consider when building your DB database.
|
||
In most cases, there is very little that you need to do in terms of
|
||
managing your databases. However, there are configuration issues that you
|
||
need to be concerned with, and these are largely dependent on the access
|
||
method that you are choosing for your database.
|
||
</p>
|
||
<p>
|
||
The examples and descriptions throughout this document have mostly focused
|
||
on the BTree access method. This is because the majority of DB
|
||
applications use BTree. For this reason, where configuration issues are
|
||
dependent on the type of access method in use, this chapter will focus on
|
||
BTree only. For configuration descriptions surrounding the other access
|
||
methods, see the <i class="citetitle">Berkeley DB Programmer's Reference Guide</i>.
|
||
</p>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="pagesize"></a>Setting the Page Size</h2>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Internally, DB stores database entries on pages. Page sizes are
|
||
important because they can affect your application's performance.
|
||
</p>
|
||
<p>
|
||
DB pages can be between 512 bytes and 64K bytes in size. The size
|
||
that you select must be a power of 2. You set your database's
|
||
page size using
|
||
<span><tt class="methodname">DB->set_pagesize()</tt>.</span>
|
||
|
||
|
||
</p>
|
||
<p>
|
||
Note that a database's page size can only be selected at database
|
||
creation time.
|
||
</p>
|
||
<p>
|
||
When selecting a page size, you should consider the following issues:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Overflow pages.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Locking
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Disk I/O.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
These topics are discussed next.
|
||
</p>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="overflowpages"></a>Overflow Pages</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Overflow pages are used to hold a key or data item
|
||
that cannot fit on a single page. You do not have to do anything to
|
||
cause overflow pages to be created, other than to store data that is
|
||
too large for your database's page size. Also, the only way you can
|
||
prevent overflow pages from being created is to be sure to select a
|
||
page size that is large enough to hold your database entries.
|
||
</p>
|
||
<p>
|
||
Because overflow pages exist outside of the normal database
|
||
structure, their use is expensive from a performance
|
||
perspective. If you select too small of a page size, then your
|
||
database will be forced to use an excessive number of overflow
|
||
pages. This will significantly harm your application's performance.
|
||
</p>
|
||
<p>
|
||
For this reason, you want to select a page size that is at
|
||
least large enough to hold multiple entries given the expected
|
||
average size of your database entries. In BTree's case, for best
|
||
results select a page size that can hold at least 4 such entries.
|
||
</p>
|
||
<p>
|
||
You can see how many overflow pages your database is using by
|
||
<span>
|
||
using the
|
||
<tt class="methodname">DB->stat()</tt>
|
||
|
||
method,
|
||
</span>
|
||
|
||
|
||
or by examining your database using the
|
||
<tt class="literal">db_stat</tt> command line utility.
|
||
</p>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="Locking"></a>Locking</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Locking and multi-threaded access to DB databases is built into
|
||
the product. However, in order to enable the locking subsystem and
|
||
in order to provide efficient sharing of the cache between
|
||
databases, you must use an <span class="emphasis"><em>environment</em></span>.
|
||
Environments and multi-threaded access are not fully described
|
||
in this manual (see the Berkeley DB Programmer's Reference Manual for
|
||
information), however, we provide some information on sizing your
|
||
pages in a multi-threaded/multi-process environment in the interest
|
||
of providing a complete discussion on the topic.
|
||
</p>
|
||
<p>
|
||
If your application is multi-threaded, or if your databases are
|
||
accessed by more than one process at a time, then page size can
|
||
influence your application's performance. The reason why is that
|
||
for most access methods (Queue is the exception), DB implements
|
||
page-level locking. This means that the finest locking granularity
|
||
is at the page, not at the record.
|
||
</p>
|
||
<p>
|
||
In most cases, database pages contain multiple database
|
||
records. Further, in order to provide safe access to multiple
|
||
threads or processes, DB performs locking on pages as entries on
|
||
those pages are read or written.
|
||
</p>
|
||
<p>
|
||
As the size of your page increases relative to the size of your
|
||
database entries, the number of entries that are held on any given
|
||
page also increase. The result is that the chances of two or more
|
||
readers and/or writers wanting to access entries on any given page
|
||
also increases.
|
||
</p>
|
||
<p>
|
||
When two or more threads and/or processes want to manage data on a
|
||
page, lock contention occurs. Lock contention is resolved by one
|
||
thread (or process) waiting for another thread to give up its lock.
|
||
It is this waiting activity that is harmful to your application's
|
||
performance.
|
||
</p>
|
||
<p>
|
||
It is possible to select a page size that is so large that your
|
||
application will spend excessive, and noticeable, amounts of time
|
||
resolving lock contention. Note that this scenario is particularly
|
||
likely to occur as the amount of concurrency built into your
|
||
application increases.
|
||
</p>
|
||
<p>
|
||
Oh the other hand, if you select too small of a page size, then that
|
||
that will only make your tree deeper, which can also cause
|
||
performance penalties. The trick, therefore, is to select a
|
||
reasonable page size (one that will hold a sizeable number of
|
||
records) and then reduce the page size if you notice lock
|
||
contention.
|
||
</p>
|
||
<p>
|
||
You can examine the number of lock conflicts and deadlocks occurring
|
||
in your application by examining your database environment lock
|
||
statistics. Either use the
|
||
<tt class="methodname">DB_ENV->lock_stat()</tt>
|
||
|
||
|
||
method, or use the <tt class="literal">db_stat</tt> command line utility.
|
||
The number of unavailable locks that your application waited for is
|
||
held in the lock statistic's <tt class="literal">st_lock_wait</tt> field.
|
||
|
||
</p>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="IOEfficiency"></a>IO Efficiency</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Page size can affect how efficient DB is at moving data to and
|
||
from disk. For some applications, especially those for which the
|
||
in-memory cache can not be large enough to hold the entire working
|
||
dataset, IO efficiency can significantly impact application performance.
|
||
</p>
|
||
<p>
|
||
Most operating systems use an internal block size to determine how much
|
||
data to move to and from disk for a single I/O operation. This block
|
||
size is usually equal to the filesystem's block size. For optimal
|
||
disk I/O efficiency, you should select a database page size that is
|
||
equal to the operating system's I/O block size.
|
||
</p>
|
||
<p>
|
||
Essentially, DB performs data transfers based on the database
|
||
page size. That is, it moves data to and from disk a page at a time.
|
||
For this reason, if the page size does not match the I/O block size,
|
||
then the operating system can introduce inefficiencies in how it
|
||
responds to DB's I/O requests.
|
||
</p>
|
||
<p>
|
||
For example, suppose your page size is smaller than your operating
|
||
system block size. In this case, when DB writes a page to disk
|
||
it is writing just a portion of a logical filesystem page. Any time
|
||
any application writes just a portion of a logical filesystem page, the
|
||
operating system brings in the real filesystem page, over writes
|
||
the portion of the page not written by the application, then writes
|
||
the filesystem page back to disk. The net result is significantly
|
||
more disk I/O than if the application had simply selected a page
|
||
size that was equal to the underlying filesystem block size.
|
||
</p>
|
||
<p>
|
||
Alternatively, if you select a page size that is larger than the
|
||
underlying filesystem block size, then the operating system may have
|
||
to read more data than is necessary to fulfill a read request.
|
||
Further, on some operating systems, requesting a single database
|
||
page may result in the operating system reading enough filesystem
|
||
blocks to satisfy the operating system's criteria for read-ahead. In
|
||
this case, the operating system will be reading significantly more
|
||
data from disk than is actually required to fulfill DB's read
|
||
request.
|
||
</p>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
While transactions are not discussed in this manual, a page size
|
||
other than your filesystem's block size can affect transactional
|
||
guarantees. The reason why is that page sizes larger than the
|
||
filesystem's block size causes DB to write pages in block
|
||
size increments. As a result, it is possible for a partial page
|
||
to be written as the result of a transactional commit. For more
|
||
information, see <a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/reclimit.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/reclimit.html</a>.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="pagesizeAdvice"></a>Page Sizing Advice</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Page sizing can be confusing at first, so here are some general
|
||
guidelines that you can use to select your page size.
|
||
</p>
|
||
<p>
|
||
In general, and given no other considerations, a page size that is equal
|
||
to your filesystem block size is the ideal situation.
|
||
</p>
|
||
<p>
|
||
If your data is designed such that 4 database entries cannot fit on a
|
||
single page (assuming BTree), then grow your page size to accommodate
|
||
your data. Once you've abandoned matching your filesystem's block
|
||
size, the general rule is that larger page sizes are better.
|
||
</p>
|
||
<p>
|
||
The exception to this rule is if you have a great deal of
|
||
concurrency occurring in your application. In this case, the closer
|
||
you can match your page size to the ideal size needed for your
|
||
application's data, the better. Doing so will allow you to avoid
|
||
unnecessary contention for page locks.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="coreindexusage.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="index.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> <a accesskey="n" href="cachesize.html">Next</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Secondary Database Example </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> Selecting the Cache Size</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|