713 lines
30 KiB
HTML
713 lines
30 KiB
HTML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||
<title>Chapter 3. The DB Replication Framework</title>
|
||
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
|
||
<link rel="home" href="index.html" title="Getting Started with Replicated Berkeley DB Applications" />
|
||
<link rel="up" href="index.html" title="Getting Started with Replicated Berkeley DB Applications" />
|
||
<link rel="previous" href="simpleprogramlisting.html" title="Program Listing" />
|
||
<link rel="next" href="repmgr_init_example_c.html" title="Adding the Replication Framework to simple_txn " />
|
||
</head>
|
||
<body>
|
||
<div class="navheader">
|
||
<table width="100%" summary="Navigation header">
|
||
<tr>
|
||
<th colspan="3" align="center">Chapter 3. The DB Replication Framework</th>
|
||
</tr>
|
||
<tr>
|
||
<td width="20%" align="left"><a accesskey="p" href="simpleprogramlisting.html">Prev</a> </td>
|
||
<th width="60%" align="center"> </th>
|
||
<td width="20%" align="right"> <a accesskey="n" href="repmgr_init_example_c.html">Next</a></td>
|
||
</tr>
|
||
</table>
|
||
<hr />
|
||
</div>
|
||
<div class="chapter" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title"><a id="repapp"></a>Chapter 3. The DB Replication Framework</h2>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<div class="toc">
|
||
<p>
|
||
<b>Table of Contents</b>
|
||
</p>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="repapp.html#rep_init_code">
|
||
Starting and Stopping Replication
|
||
</a>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="repapp.html#election_flags">Managing Election Policies</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="repapp.html#thread_count">Selecting the Number of Threads</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</dd>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="repmgr_init_example_c.html">Adding the Replication Framework to
|
||
simple_txn
|
||
|
||
</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="fwrkpermmessage.html">Permanent Message Handling</a>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="fwrkpermmessage.html#fmwrkpermpolicy">Identifying Permanent Message Policies</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="fwrkpermmessage.html#fmwrkpermtimeout">Setting the Permanent Message Timeout</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="fwrkpermmessage.html#perm2fmwrkexample">Adding a Permanent Message Policy to
|
||
rep_mgr
|
||
|
||
|
||
</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</dd>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="electiontimes.html">Managing Election Times</a>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<dl>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="electiontimes.html#electiontimeout">Managing Election Timeouts</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect2">
|
||
<a href="electiontimes.html#electretrytime">Managing Election Retry Times</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</dd>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="fmwrkconnectretry.html">Managing Connection Retries</a>
|
||
</span>
|
||
</dt>
|
||
<dt>
|
||
<span class="sect1">
|
||
<a href="heartbeats.html">Managing Heartbeats</a>
|
||
</span>
|
||
</dt>
|
||
</dl>
|
||
</div>
|
||
<p>
|
||
The easiest way to add replication to your transactional
|
||
application is to use the replication framework. The replication framework provides a comprehensive
|
||
communications layer that enables replication. For a brief listing
|
||
of the replication framework's feature set, see
|
||
<a href="apioverview.html#repframeworkoverview">Replication Framework Overview</a>.
|
||
</p>
|
||
<p>
|
||
To use the replication framework, you make use of special methods off the
|
||
<span><tt class="classname">DB_ENV</tt> class.</span>
|
||
|
||
That is:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
Create an environment handle as normal.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Configure your environment handle as
|
||
needed (e.g. set the error file and
|
||
error prefix values, if desired).
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Use the replication framework replication methods to
|
||
configure the replication framework. Using these
|
||
methods causes DB to know that you
|
||
are using the replication framework.
|
||
</p>
|
||
<p>
|
||
Configuring the replication framework
|
||
entails setting its replication
|
||
priority, setting the TCP/IP address
|
||
that this replication environment will use for
|
||
incoming replication messages, identify
|
||
TCP/IP addresses of other replication
|
||
environments, setting the number of
|
||
replication environments in the
|
||
replication group, and so forth. These actions are
|
||
discussed throughout the remainder of
|
||
this chapter.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Open your environment handle. When you
|
||
do this, be sure to specify
|
||
|
||
<span><tt class="literal">DB_INIT_REP</tt> and
|
||
<tt class="literal">DB_THREAD</tt> to your
|
||
open flags. (This is in addition to the
|
||
flags that you normally use for a
|
||
single-threaded transactional
|
||
application). The first of these causes
|
||
replication to be initialized for the
|
||
application. The second causes your
|
||
environment handle to be free-threaded
|
||
(thread safe). Both flags are required
|
||
for replication framework usage.
|
||
</span>
|
||
|
||
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Start replication by calling
|
||
<span><tt class="methodname">DB_ENV->repmgr_start()</tt>.</span>
|
||
|
||
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Open your databases as needed. Masters
|
||
must open their databases for read
|
||
and write activity. Replicas can open
|
||
their databases for read-only activity, but
|
||
doing so means they must re-open the
|
||
databases if the replica ever becomes a
|
||
master. Either way, replicas should never attempt to
|
||
write to the database(s) directly.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
The replication framework allows you to only use one
|
||
environment handle per process.
|
||
</p>
|
||
</div>
|
||
<p>
|
||
When you are ready to shut down your application:
|
||
</p>
|
||
<div class="orderedlist">
|
||
<ol type="1">
|
||
<li>
|
||
<p>
|
||
Close your databases
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Close your environment. This causes
|
||
replication to stop as well.
|
||
</p>
|
||
</li>
|
||
</ol>
|
||
</div>
|
||
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||
<h3 class="title">Note</h3>
|
||
<p>
|
||
Before you can use the replication framework, you may have to
|
||
enable it in your DB library. This is
|
||
<span class="emphasis"><em>not</em></span> a requirement for
|
||
Microsoft Windows systems, or Unix systems that
|
||
use pthread mutexes by default. Other systems,
|
||
notably BSD and BSD-derived systems (such as
|
||
Mac OS X), must enable the replication framework when you
|
||
configure the DB build.
|
||
</p>
|
||
<p>
|
||
You do this by <span class="emphasis"><em>not</em></span>
|
||
disabling replication and by configuring the
|
||
library with POSIX threads support. In other
|
||
words, replication must be turned on in the
|
||
build (it is by default), and POSIX thread
|
||
support must be enabled if it is not already by
|
||
default. To do this, use the
|
||
<tt class="literal">--enable-pthread_api</tt> switch
|
||
on the configure script.
|
||
</p>
|
||
<p>
|
||
For example:
|
||
</p>
|
||
<pre class="programlisting">../dist/configure --enable-pthread-api</pre>
|
||
</div>
|
||
<div class="sect1" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h2 class="title" style="clear: both"><a id="rep_init_code"></a>
|
||
Starting and Stopping Replication
|
||
</h2>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
As described above, you introduce replication to an
|
||
application by starting with a transactional
|
||
application, performing some basic replication
|
||
configuration, and then starting replication using
|
||
<span><tt class="methodname">DB_ENV->repmgr_start()</tt>.</span>
|
||
|
||
|
||
</p>
|
||
<p>
|
||
You stop replication by closing your environment
|
||
cleanly, as is normal for an DB application.
|
||
</p>
|
||
<p>
|
||
For example, the following code fragment initializes, then
|
||
stops and starts replication. Note that other replication
|
||
activities are omitted for brevity.
|
||
</p>
|
||
<pre class="programlisting">#include <db.h>
|
||
|
||
/* Use a 10mb cache */
|
||
#define CACHESIZE (10 * 1024 * 1024)
|
||
|
||
...
|
||
|
||
DB_ENV *dbenv; /* Environment handle. */
|
||
const char *progname; /* Program name. */
|
||
const char *envHome; /* Environment home directory. */
|
||
const char *listen_host; /* A TCP/IP hostname. */
|
||
const char *other_host; /* A TCP/IP hostname. */
|
||
int ret; /* Error return code. */
|
||
u_int16 listen_port; /* A TCP/IP port. */
|
||
u_int16 other_port; /* A TCP/IP port. */
|
||
|
||
/* Initialize variables */
|
||
dbenv = NULL;
|
||
progname = "example_replication";
|
||
envHome = "ENVIRONMENT_HOME";
|
||
listen_host = "mymachine.sleepycat.com";
|
||
listen_port = 5001;
|
||
other_host = "anothermachine.sleepycat.com";
|
||
other_port = 4555;
|
||
ret = 0;
|
||
|
||
/* Create the environment handle */
|
||
if ((ret = db_env_create(&dbenv, 0)) != 0 ) {
|
||
fprintf(stderr, "Error creating environment handle: %s\n",
|
||
db_strerror(ret));
|
||
goto err;
|
||
}
|
||
|
||
/*
|
||
* Configure the environment handle. Here we configure asynchronous
|
||
* transactional commits for performance reasons.
|
||
*/
|
||
dbenv->set_errfile(dbenv, stderr);
|
||
dbenv->set_errpfx(dbenv, progname);
|
||
(void)dbenv->set_cachesize(dbenv, 0, CACHESIZE, 0);
|
||
(void)dbenv->set_flags(dbenv, DB_TXN_NOSYNC, 1);
|
||
|
||
|
||
/*
|
||
* Configure the local address. This is the local hostname and port
|
||
* that this replication participant will use to receive incoming
|
||
* replication messages. Note that this can be performed only once for
|
||
* the application. It is required.
|
||
*/
|
||
if ((ret = dbenv->repmgr_set_local_site(dbenv, listen_host,
|
||
listen_port, 0)) != 0) {
|
||
fprintf(stderr, "Could not set local address (%d).\n", ret);
|
||
goto err;
|
||
}
|
||
|
||
/*
|
||
* Set this application's priority. This is used for elections.
|
||
*
|
||
* Set this number to a positive integer, or 0 if you do not want this
|
||
* site to be able to become a master.
|
||
*/
|
||
dbenv->rep_set_priority(dbenv, 100);
|
||
|
||
/*
|
||
* Add a site to the list of replication environments known to this
|
||
* application.
|
||
*/
|
||
if (dbenv->repmgr_add_remote_site(dbenv, other_host, other_port,
|
||
NULL, 0) != 0) {
|
||
fprintf(stderr, "Could not add site %s.\n", other_host);
|
||
goto err;
|
||
}
|
||
|
||
/*
|
||
* Identify the number of sites in the replication group. This is
|
||
* necessary so that elections and permanent message handling can be
|
||
* performed correctly.
|
||
*/
|
||
if (dbenv->repmgr_set_nsites(dbenv, 2) != 0) {
|
||
fprintf(stderr, "Could not set the number of sites.\n";
|
||
goto err;
|
||
}
|
||
|
||
/* Open the environment handle. Note that we add DB_THREAD and
|
||
* DB_INIT_REP to the list of flags. These are required.
|
||
*/
|
||
if ((ret = dbenv->open(dbenv, home, DB_CREATE | DB_RECOVER |
|
||
DB_INIT_LOCK | DB_INIT_LOG |
|
||
DB_INIT_MPOOL | DB_INIT_TXN |
|
||
DB_THREAD | DB_INIT_REP,
|
||
0)) != 0) {
|
||
goto err;
|
||
}
|
||
|
||
/* Start the replication framework such that it uses 3 threads. */
|
||
if ((ret = dbenv->repmgr_start(dbenv, 3, DB_REP_ELECTION)) != 0)
|
||
goto err;
|
||
|
||
/* Sleep to give ourselves time to find a master */
|
||
sleep(5);
|
||
|
||
/*
|
||
**********************************************************
|
||
*** All other application code goes here, including *****
|
||
*** database opens *****
|
||
**********************************************************
|
||
*/
|
||
|
||
err: /*
|
||
* Make sure all your database handles are closed
|
||
* (omitted from this example).
|
||
*/
|
||
|
||
/* Close the environment */
|
||
if (dbenv != NULL)
|
||
(void)dbenv->close(dbenv, 0);
|
||
|
||
/* All done */
|
||
return (ret); </pre>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="election_flags"></a>Managing Election Policies</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Before continuing, it is worth taking a look at the
|
||
|
||
<span>
|
||
startup election flags accepted by
|
||
<span><tt class="methodname">DB_ENV->repgmr_start()</tt>.</span>
|
||
|
||
These flags control how your replication application will
|
||
behave when it first starts up.
|
||
</span>
|
||
|
||
|
||
</p>
|
||
<p>
|
||
In the previous example, we specified
|
||
<tt class="literal">DB_REP_ELECTION</tt>
|
||
|
||
when we started replication. This causes the
|
||
application to try to find a master upon startup. If it
|
||
cannot, it calls for an election. In the event an
|
||
election is held, the environment receiving the most number of
|
||
votes will become the master.
|
||
</p>
|
||
<p>
|
||
There's some important points to make here:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
This
|
||
<span>flag</span>
|
||
|
||
only requires that other
|
||
environments in the replication group
|
||
participate in the vote. There is no
|
||
requirement that
|
||
<span class="emphasis"><em>all</em></span> such
|
||
environments participate. In other
|
||
words, if an environment
|
||
starts up, it can call for an
|
||
election, and select a master, even
|
||
if all other environment have not yet
|
||
joined the replication group.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
It only requires a simple majority of
|
||
participating environments to elect a master. The number of
|
||
environments used to calculate the simple
|
||
majority is based on the value set for
|
||
|
||
<span><tt class="methodname">DB_ENV->rep_set_nsites()</tt>.</span>
|
||
|
||
|
||
|
||
|
||
This is always true of elections held using the replication framework.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
As always, the environment participating in the election with the most
|
||
up-to-date log files is selected as
|
||
master. If an environment with better log files
|
||
has not yet joined the replication
|
||
group, it may not become the master.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
Any one of these points may be enough to cause a
|
||
less-than-optimum environment to be selected as master.
|
||
Therefore, to give you a better degree of control over
|
||
which environment becomes a master at application startup,
|
||
the replication framework offers the following start-up
|
||
<span>flags:</span>
|
||
|
||
</p>
|
||
<div class="informaltable">
|
||
<table border="1" width="80%">
|
||
<colgroup>
|
||
<col />
|
||
<col />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th>Flag</th>
|
||
<th>Description</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>
|
||
<tt class="literal">DB_REP_MASTER</tt>
|
||
</td>
|
||
<td>
|
||
<p>
|
||
The application starts up and declares itself to be a master
|
||
without calling for an election. It is an error for more
|
||
than one environment to start up using this flag, or for
|
||
an environment
|
||
to use this flag when a master already exists.
|
||
</p>
|
||
<p>
|
||
Note that no replication group should
|
||
<span class="emphasis"><em>ever</em></span> operate with more than
|
||
one master.
|
||
</p>
|
||
<p>
|
||
In the event that a environment attempts to become a
|
||
master when a master already exists, the
|
||
replication code will resolve the problem by
|
||
holding an election. Note, however, that there
|
||
is always a possibility of data loss in the face
|
||
of duplicate masters, because once a master is
|
||
selected, the environment that loses the election will
|
||
have to roll back any transactions committed
|
||
until it is in sync with the "real" master.
|
||
</p>
|
||
</td>
|
||
</tr>
|
||
<tr>
|
||
<td>
|
||
<tt class="literal">DB_REP_CLIENT</tt>
|
||
</td>
|
||
<td>
|
||
<p>
|
||
The application starts up and declares
|
||
itself to be a replica without calling for
|
||
an election. Note that the application
|
||
can still become a master if a subsequent
|
||
application starts up, calls for an
|
||
election, and this application is elected
|
||
master.
|
||
</p>
|
||
</td>
|
||
</tr>
|
||
<tr>
|
||
<td>
|
||
<tt class="literal">DB_REP_ELECTION</tt>
|
||
</td>
|
||
<td>
|
||
<p>
|
||
As described above, the application starts up,
|
||
looks for a master, and if one is not found calls
|
||
for an election.
|
||
</p>
|
||
</td>
|
||
</tr>
|
||
<tr>
|
||
<td>
|
||
<tt class="literal">DB_REP_FULL_ELECTION</tt>
|
||
</td>
|
||
<td>
|
||
<p>
|
||
Identical to
|
||
<tt class="literal">DB_REP_ELECTION</tt>
|
||
|
||
except that the election requires all
|
||
known members of the replication group to
|
||
participate. If a given environment has not yet
|
||
started but it is included in the
|
||
replication group count (using
|
||
<span><tt class="methodname">DB_ENV->rep_set_nsites()</tt>)</span>
|
||
|
||
|
||
then a master can not be elected.
|
||
</p>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
<div class="sect2" lang="en" xml:lang="en">
|
||
<div class="titlepage">
|
||
<div>
|
||
<div>
|
||
<h3 class="title"><a id="thread_count"></a>Selecting the Number of Threads</h3>
|
||
</div>
|
||
</div>
|
||
<div></div>
|
||
</div>
|
||
<p>
|
||
Under the hood, the replication framework is threaded and you can
|
||
control the number of threads used to process messages received from
|
||
other replicas. The threads that the replication framework uses are:
|
||
</p>
|
||
<div class="itemizedlist">
|
||
<ul type="disc">
|
||
<li>
|
||
<p>
|
||
Incoming message thread. This thread
|
||
receives messages from the site's
|
||
socket and passes those messages to
|
||
message processing threads (see below)
|
||
for handling.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Outgoing message thread. Outgoing
|
||
are performed in whatever thread
|
||
performed a write to the database(s).
|
||
That is, the thread that called, for
|
||
example,
|
||
<tt class="methodname">DB->put()</tt>
|
||
|
||
|
||
is the thread that writes replication messages
|
||
about that fact to the socket.
|
||
</p>
|
||
<p>
|
||
Note that if this write activity would
|
||
cause the thread to be blocked due to
|
||
some condition on the socket, the replication framework
|
||
will hand the outgoing message to the
|
||
incoming message thread, and it will
|
||
then write the message to the socket.
|
||
This prevents your database write
|
||
threads from blocking due to abnormal
|
||
network I/O conditions.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Message processing threads are
|
||
responsible for parsing and then
|
||
responding to incoming replication
|
||
messages. Typically, a response will
|
||
include write activity to your
|
||
database(s), so these threads can be
|
||
busy performing disk I/O.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
<p>
|
||
Of these threads, the only ones that you have any
|
||
configuration control over are the message processing
|
||
threads. In this case, you can determine how many
|
||
of these threads you want to run.
|
||
</p>
|
||
<p>
|
||
It is always a bit of an art to decide on a thread count,
|
||
but the short answer is you probably do not need more
|
||
than three threads here, and it is likely that one will
|
||
suffice. That said, the best thing to do is set your
|
||
thread count to a fairly low number and then increase
|
||
it if it appears that your application will benefit
|
||
from the additional threads.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr />
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left"><a accesskey="p" href="simpleprogramlisting.html">Prev</a> </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="u" href="index.html">Up</a>
|
||
</td>
|
||
<td width="40%" align="right"> <a accesskey="n" href="repmgr_init_example_c.html">Next</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">Program Listing </td>
|
||
<td width="20%" align="center">
|
||
<a accesskey="h" href="index.html">Home</a>
|
||
</td>
|
||
<td width="40%" align="right" valign="top"> Adding the Replication Framework to
|
||
simple_txn
|
||
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|