Import BSDDB 4.7.25 (as of svn r89086)
This commit is contained in:
452
rpc_server/clsrv.html
Normal file
452
rpc_server/clsrv.html
Normal file
@@ -0,0 +1,452 @@
|
||||
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<meta name="GENERATOR" content="Mozilla/4.76 [en] (X11; U; FreeBSD 4.3-RELEASE i386) [Netscape]">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<center>
|
||||
<h1>
|
||||
Client/Server Interface for Berkeley DB</h1></center>
|
||||
|
||||
<center><i>Susan LoVerso</i>
|
||||
<br><i>Rev 1.3</i>
|
||||
<br><i>1999 Nov 29</i></center>
|
||||
|
||||
<p>We provide an interface allowing client/server access to Berkeley DB.
|
||||
Our goal is to provide a client and server library to allow users to separate
|
||||
the functionality of their applications yet still have access to the full
|
||||
benefits of Berkeley DB. The goal is to provide a totally seamless
|
||||
interface with minimal modification to existing applications as well.
|
||||
<p>The client/server interface for Berkeley DB can be broken up into several
|
||||
layers. At the lowest level there is the transport mechanism to send
|
||||
out the messages over the network. Above that layer is the messaging
|
||||
layer to interpret what comes over the wire, and bundle/unbundle message
|
||||
contents. The next layer is Berkeley DB itself.
|
||||
<p>The transport layer uses ONC RPC (RFC 1831) and XDR (RFC 1832).
|
||||
We declare our message types and operations supported by our program and
|
||||
the RPC library and utilities pretty much take care of the rest.
|
||||
The
|
||||
<i>rpcgen</i> program generates all of the low level code needed.
|
||||
We need to define both sides of the RPC.
|
||||
<br>
|
||||
<h2>
|
||||
<a NAME="DB Modifications"></a>DB Modifications</h2>
|
||||
To achieve the goal of a seamless interface, it is necessary to impose
|
||||
a constraint on the application. That constraint is simply that all database
|
||||
access must be done through an open environment. I.e. this model
|
||||
does not support standalone databases. The reason for this constraint
|
||||
is so that we have an environment structure internally to store our connection
|
||||
to the server. Imposing this constraint means that we can provide
|
||||
the seamless interface just by adding a single environment method: <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>.
|
||||
<p>The planned interface for this method is:
|
||||
<pre>DBENV->set_rpc_server(dbenv, /* DB_ENV structure */
|
||||
hostname /* Host of server */
|
||||
cl_timeout, /* Client timeout (sec) */
|
||||
srv_timeout,/* Server timeout (sec) */
|
||||
flags); /* Flags: unused */</pre>
|
||||
This new method takes the hostname of the server, establishes our connection
|
||||
and an environment on the server. If a server timeout is specified,
|
||||
then we send that to the server as well (and the server may or may not
|
||||
choose to use that value). This timeout is how long the server will
|
||||
allow the environment to remain idle before declaring it dead and releasing
|
||||
resources on the server. The pointer to the connection is stored
|
||||
on the client in the DBENV structure and is used by all other methods to
|
||||
figure out with whom to communicate. If a client timeout is specified,
|
||||
it indicates how long the client is willing to wait for a reply from the
|
||||
server. If the values are 0, then defaults are used. Flags
|
||||
is currently unused, but exists because we always need to have a placeholder
|
||||
for flags and it would be used for specifying authentication desired (were
|
||||
we to provide an authentication scheme at some point) or other uses not
|
||||
thought of yet!
|
||||
<p>This client code is part of the monolithic DB library. The user
|
||||
accesses the client functions via a new flag to <a href="../docs/api_c/db_env_create.html">db_env_create()</a>.
|
||||
That flag is DB_CLIENT. By using this flag the user indicates they
|
||||
want to have the client methods rather than the standard methods for the
|
||||
environment. Also by issuing this flag, the user needs to connect
|
||||
to the server via the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>
|
||||
method.
|
||||
<p>We need two new fields in the <i>DB_ENV </i>structure. One is
|
||||
the socket descriptor to communicate to the server, the other field is
|
||||
the client identifier the server gives to us. The <i>DB, </i>and<i>
|
||||
DBC </i>only need one additional field, the client identifier. The
|
||||
<i>DB_TXN</i>
|
||||
structure does not need modification, we are overloading the <i>txn_id
|
||||
</i>field.
|
||||
<h2>
|
||||
Issues</h2>
|
||||
We need to figure out what to do in case of client and server crashes.
|
||||
Both the client library and the server program are stateful. They
|
||||
both consume local resources during the lifetime of the connection.
|
||||
Should one end drop that connection, the other side needs to release those
|
||||
resources.
|
||||
<p>If the server crashes, then the client will get an error back.
|
||||
I have chosen to implement time-outs on the client side, using a default
|
||||
or allowing the application to specify one through the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>
|
||||
method. Either the current operation will time-out waiting for the
|
||||
reply or the next operation called will time out (or get back some other
|
||||
kind of error regarding the server's non-existence). In any case,
|
||||
if the client application gets back such an error, it should abort any
|
||||
open transactions locally, close any databases, and close its environment.
|
||||
It may then decide to retry to connect to the server periodically or whenever
|
||||
it comes back. If the last operation a client did was a transaction
|
||||
commit that did not return or timed out from the server, the client cannot
|
||||
determine if the transaction was committed or not but must release the
|
||||
local transaction resources. Once the server is back up, recovery must
|
||||
be run on the server. If the transaction commit completed on
|
||||
the server before the crash, then the operation is redone, if the transaction
|
||||
commit did not get to the server, the pieces of the transaction are undone
|
||||
on recover. The client can then re-establish its connection and begin
|
||||
again. This is effectively like beginning over. The client
|
||||
cannot use ID's from its previous connection to the server. However,
|
||||
if recovery is run, then consistency is assured.
|
||||
<p>If the client crashes, the server needs to somehow figure this out.
|
||||
The server is just sitting there waiting for a request to come in.
|
||||
A server must be able to time-out a client. Similar to ftpd, if a
|
||||
connection is idle for N seconds, then the server decides the client is
|
||||
dead and releases that client's resources, aborting any open transactions,
|
||||
closing any open databases and environments. The server timing
|
||||
out a client is not a trivial issue however. The generated function
|
||||
for the server just calls <i>svc_run()</i>. The server code I write
|
||||
contains procedures to do specific things. We do not have access
|
||||
to the code calling <i>select()</i>. Timing out the select is not
|
||||
good enough even if we could do so. We want to time-out idle environments,
|
||||
not simply cause a time-out if the server is idle a while. See the
|
||||
discussion of the <a href="#The Server Program">server program</a> for
|
||||
a description of how we accomplish this.
|
||||
<p>Since rpcgen generates the main() function of the server, I do not yet
|
||||
know how we are going to have the server multi-threaded or multi-process
|
||||
without changing the generated code. The RPC book indicates that
|
||||
the only way to accomplish this is through modifying the generated code
|
||||
in the server. <b>For the moment we will ignore this issue while
|
||||
we get the core server working, as it is only a performance issue.</b>
|
||||
<p>We do not do any security or authentication. Someone could get
|
||||
the code and modify it to spoof messages, trick the server, etc.
|
||||
RPC has some amount of authentication built into it. I haven't yet
|
||||
looked into it much to know if we want to use it or just point a user at
|
||||
it. The changes to the client code are fairly minor, the changes
|
||||
to our server procs are fairly minor. We would have to add code to
|
||||
a <i>sed</i> script or <i>awk</i> script to change the generated server
|
||||
code (yet again) in the dispatch routine to perform authentication.
|
||||
<p>We will need to get an official program number from Sun. We can
|
||||
get this by sending mail to <i>rpc@sun.com</i> and presumably at some point
|
||||
they will send us back a program number that we will encode into our XDR
|
||||
description file. Until we release this we can use a program number
|
||||
in the "user defined" number space.
|
||||
<br>
|
||||
<h2>
|
||||
<a NAME="The Server Program"></a>The Server Program</h2>
|
||||
The server is a standalone program that the user builds and runs, probably
|
||||
as a daemon like process. This program is linked against the Berkeley
|
||||
DB library and the RPC library (which is part of the C library on my FreeBSD
|
||||
machine, others may have/need <i>-lrpclib</i>). The server basically
|
||||
is a slave to the client process. All messages from the client are
|
||||
synchronous and two-way. The server handles messages one at a time,
|
||||
and sends a reply back before getting another message. There are
|
||||
no asynchronous messages generated by the server to the client.
|
||||
<p>We have made a choice to modify the generated code for the server.
|
||||
The changes will be minimal, generally calling functions we write, that
|
||||
are in other source files. The first change is adding a call to our
|
||||
time-out function as described below. The second change is changing
|
||||
the name of the generated <i>main()</i> function to <i>__dbsrv_main()</i>,
|
||||
and adding our own <i>main()</i> function so that we can parse options,
|
||||
and set up other initialization we require. I have a <i>sed</i> script
|
||||
that is run from the distribution scripts that massages the generated code
|
||||
to make these minor changes.
|
||||
<p>Primarily the code needed for the server is the collection of the specified
|
||||
RPC functions. Each function receives the structure indicated, and
|
||||
our code takes out what it needs and passes the information into DB itself.
|
||||
The server needs to maintain a translation table for identifiers that we
|
||||
pass back to the client for the environment, transaction and database handles.
|
||||
<p>The table that the server maintains, assuming one client per server
|
||||
process/thread, should contain the handle to the environment, database
|
||||
or transaction, a link to maintain parent/child relationships between transactions,
|
||||
or databases and cursors, this handle's identifier, a type so that we can
|
||||
error if the client passes us a bad id for this call, and a link to this
|
||||
handle's environment entry (for time out/activity purposes). The
|
||||
table contains, in entries used by environments, a time-out value and an
|
||||
activity time stamp. Its use is described below for timing out idle
|
||||
clients.
|
||||
<p>Here is how we time out clients in the server. We have to modify
|
||||
the generated server code, but only to add one line during the dispatch
|
||||
function to run the time-out function. The call is made right before
|
||||
the return of the dispatch function, after the reply is sent to the client,
|
||||
so that client's aren't kept waiting for server bookkeeping activities.
|
||||
This time-out function then runs every time the server processes a request.
|
||||
In the time-out function we maintain a time-out hint that is the youngest
|
||||
environment to time-out. If the current time is less than the hint
|
||||
we know we do not need to run through the list of open handles. If
|
||||
the hint is expired, then we go through the list of open environment handles,
|
||||
and if they are past their expiration, then we close them and clean up.
|
||||
If they are not, we set up the hint for the next time.
|
||||
<p>Each entry in the open handle table has a pointer back to its environment's
|
||||
entry. Every operation within this environment can then update the
|
||||
single environment activity record. Every environment can have a
|
||||
different time-out. The <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server
|
||||
</a>call
|
||||
takes a server time-out value. If this value is 0 then a default
|
||||
(currently 5 minutes) is used. This time-out value is only a hint
|
||||
to the server. It may choose to disregard this value or set the time-out
|
||||
based on its own implementation.
|
||||
<p>For completeness, the flaws of this time-out implementation should be
|
||||
pointed out. First, it is possible that a client could crash with
|
||||
open handles, and no other requests come in to the server. Therefore
|
||||
the time-out function never gets run and those resources are not released
|
||||
(until a request does come in). Similarly, this time-out is not exact.
|
||||
The time-out function uses its hint and if it computes a hint on one run,
|
||||
an earlier time-out might be created before that time-out expires.
|
||||
This issue simply yields a handle that doesn't get released until that
|
||||
original hint expires. To illustrate, consider that at the time that
|
||||
the time-out function is run, the youngest time-out is 5 minutes in the
|
||||
future. Soon after, a new environment is opened that has a time-out
|
||||
of 1 minute. If this environment becomes idle (and other operations
|
||||
are going on), the time-out function will not release that environment
|
||||
until the original 5 minute hint expires. This is not a problem since
|
||||
the resources will eventually be released.
|
||||
<p>On a similar note, if a client crashes during an RPC, our reply generates
|
||||
a SIGPIPE, and our server crashes unless we catch it. Using <i>signal(SIGPIPE,
|
||||
SIG_IGN) </i>we can ignore it, and the server will go on. This is
|
||||
a call in our <i>main()</i> function that we write. Eventually
|
||||
this client's handles would be timed out as described above. We need
|
||||
this only for the unfortunate window of a client crashing during the RPC.
|
||||
<p>The options below are primarily for control of the program itself,.
|
||||
Details relating to databases and environments should be passed from the
|
||||
client to the server, since the server can serve many clients, many environments
|
||||
and many databases. Therefore it makes more sense for the client
|
||||
to set the cache size of its own environment, rather than setting a default
|
||||
cachesize on the server that applies as a blanket to any environment it
|
||||
may be called upon to open. Options are:
|
||||
<ul>
|
||||
<li>
|
||||
<b>-t </b> to set the default time-out given to an environment.</li>
|
||||
|
||||
<li>
|
||||
<b>-T</b> to set the maximum time-out allowed for the server.</li>
|
||||
|
||||
<li>
|
||||
<b>-L</b> to log the execution of the server process to a specified file.</li>
|
||||
|
||||
<li>
|
||||
<b>-v</b> to run in verbose mode.</li>
|
||||
|
||||
<li>
|
||||
<b>-M</b> to specify the maximum number of outstanding child server
|
||||
processes/threads we can have at any given time. The default is 10.
|
||||
<b>[We
|
||||
are not yet doing multiple threads/processes.]</b></li>
|
||||
</ul>
|
||||
|
||||
<h2>
|
||||
The Client Code</h2>
|
||||
The client code contains all of the supported functions and methods used
|
||||
in this model. There are several methods in the <i>__db_env
|
||||
</i>and
|
||||
<i>__db</i>
|
||||
structures that currently do not apply, such as the callbacks. Those
|
||||
fields that are not applicable to the client model point to NULL to notify
|
||||
the user of their error. Some method functions remain unchanged,
|
||||
as well such as the error calls.
|
||||
<p>The client code contains each method function that goes along with the
|
||||
<a href="#Remote Procedure Calls">RPC
|
||||
calls</a> described elsewhere. The client library also contains its
|
||||
own version of <a href="../docs/api_c/env_create.html">db_env_create()</a>,
|
||||
which does not result in any messages going over to the server (since we
|
||||
do not yet know what server we are talking to). This function sets
|
||||
up the pointers to the correct client functions.
|
||||
<p>All of the method functions that handle the messaging have a basic flow
|
||||
similar to this:
|
||||
<ul>
|
||||
<li>
|
||||
Local arg parsing that may be needed</li>
|
||||
|
||||
<li>
|
||||
Marshalling the message header and the arguments we need to send to the
|
||||
server</li>
|
||||
|
||||
<li>
|
||||
Sending the message</li>
|
||||
|
||||
<li>
|
||||
Receiving a reply</li>
|
||||
|
||||
<li>
|
||||
Unmarshalling the reply</li>
|
||||
|
||||
<li>
|
||||
Local results processing that may be needed</li>
|
||||
</ul>
|
||||
|
||||
<h2>
|
||||
Generated Code</h2>
|
||||
Almost all of the code is generated from a source file describing the interface
|
||||
and an <i>awk</i> script. This awk script generates six (6)
|
||||
files for us. It also modifies one. The files are:
|
||||
<ol>
|
||||
<li>
|
||||
Client file - The C source file created containing the client code.</li>
|
||||
|
||||
<li>
|
||||
Client template file - The C template source file created containing interfaces
|
||||
for handling client-local issues such as resource allocation, but with
|
||||
a consistent interface with the client code generated.</li>
|
||||
|
||||
<li>
|
||||
Server file - The C source file created containing the server code.</li>
|
||||
|
||||
<li>
|
||||
Server template file - The C template source file created containing interfaces
|
||||
for handling server-local issues such as resource allocation, calling into
|
||||
the DB library but with a consistent interface with the server code generated.</li>
|
||||
|
||||
<li>
|
||||
XDR file - The XDR message description file created.</li>
|
||||
|
||||
<li>
|
||||
Server sed file - A sed script that contains commands to apply to the server
|
||||
procedure file (i.e. the real source file that the server template file
|
||||
becomes) so that minor interface changes can be consistently and easily
|
||||
applied to the real code.</li>
|
||||
|
||||
<li>
|
||||
Server procedure file - This is the file that is modified by the sed script
|
||||
generated. It originated from the server template file.</li>
|
||||
</ol>
|
||||
The awk script reads a source file, <i>db_server/rpc.src </i>that describes
|
||||
each operation and what sorts of arguments it takes and what it returns
|
||||
from the server. The syntax of the source file describes the interface
|
||||
to that operation. There are four (4) parts to the syntax:
|
||||
<ol>
|
||||
<li>
|
||||
<b>BEGIN</b> <b><i>function version# codetype</i></b> - begins a new functional
|
||||
interface for the given <b><i>function</i></b>. Each function has
|
||||
a <b><i>version number</i></b>, currently all of them are at version number
|
||||
one (1). The <b><i>code type</i></b> indicates to the awk script
|
||||
what kind of code to generate. The choices are:</li>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<b>CODE </b>- Generate all code, and return a status value. If specified,
|
||||
the client code will simply return the status to the user upon completion
|
||||
of the RPC call.</li>
|
||||
|
||||
<li>
|
||||
<b>RETCODE </b>- Generate all code and call a return function in the client
|
||||
template file to deal with client issues or with other returned items.
|
||||
If specified, the client code generated will call a function of the form
|
||||
<i>__dbcl_<name>_ret()
|
||||
</i>where
|
||||
<name> is replaced with the function name given here. This function
|
||||
is placed in the template file because this indicates that something special
|
||||
must occur on return. The arguments to this function are the same
|
||||
as those for the client function, with the addition of the reply message
|
||||
structure.</li>
|
||||
|
||||
<li>
|
||||
<b>NOCLNTCODE - </b>Generate XDR and server code, but no corresponding
|
||||
client code. (This is used for functions that are not named the same thing
|
||||
on both sides. The only use of this at the moment is db_env_create
|
||||
and db_create. The environment create call to the server is actually
|
||||
called from the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>
|
||||
method. The db_create code exists elsewhere in the library and we
|
||||
modify that code for the client call.)</li>
|
||||
</ul>
|
||||
|
||||
<li>
|
||||
<b>ARG <i>RPC-type C-type varname [list-type]</i></b>- each line of this
|
||||
describes an argument to the function. The argument is called <b><i>varname</i></b>.
|
||||
The <b><i>C-type</i></b> given is what it should look like in the C code
|
||||
generated, such as <b>DB *, u_int32_t, const char *</b>. The
|
||||
<b><i>RPC-type</i></b>
|
||||
is an indication about how the RPC request message should be constructed.
|
||||
The RPC-types allowed are described below.</li>
|
||||
|
||||
<li>
|
||||
<b>RET <i>RPC-type C-type varname [list-type]</i></b>- each line of this
|
||||
describes what the server should return from this procedure call (in addition
|
||||
to a status, which is always returned and should not be specified).
|
||||
The argument is called <b><i>varname</i></b>. The <b><i>C-type</i></b>
|
||||
given is what it should look like in the C code generated, such as <b>DB
|
||||
*, u_int32_t, const char *</b>. The <b><i>RPC-type</i></b> is an
|
||||
indication about how the RPC reply message should be constructed.
|
||||
The RPC-types are described below.</li>
|
||||
|
||||
<li>
|
||||
<b>END </b>- End the description of this function. The result is
|
||||
that when the awk script encounters the <b>END</b> tag, it now has all
|
||||
the information it needs to construct the generated code for this function.</li>
|
||||
</ol>
|
||||
The <b><i>RPC-type</i></b> must be one of the following:
|
||||
<ul>
|
||||
<li>
|
||||
<b>IGNORE </b>- This argument is not passed to the server and should be
|
||||
ignored when constructing the XDR code. <b>Only allowed for an ARG
|
||||
specfication.</b></li>
|
||||
|
||||
<li>
|
||||
<b>STRING</b> - This argument is a string.</li>
|
||||
|
||||
<li>
|
||||
<b>INT </b>- This argument is an integer of some sort.</li>
|
||||
|
||||
<li>
|
||||
<b>DBT </b>- This argument is a DBT, resulting in its decomposition into
|
||||
the request message.</li>
|
||||
|
||||
<li>
|
||||
<b>LIST</b> - This argument is an opaque list passed to the server (NULL-terminated).
|
||||
If an argument of this type is given, it must have a <b><i>list-type</i></b>
|
||||
specified that is one of:</li>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<b>STRING</b></li>
|
||||
|
||||
<li>
|
||||
<b>INT</b></li>
|
||||
|
||||
<li>
|
||||
<b>ID</b>.</li>
|
||||
</ul>
|
||||
|
||||
<li>
|
||||
<b>ID</b> - This argument is an identifier.</li>
|
||||
</ul>
|
||||
So, for example, the source for the DB->join RPC call looks like:
|
||||
<pre>BEGIN dbjoin 1 RETCODE
|
||||
ARG ID DB * dbp
|
||||
ARG LIST DBC ** curs ID
|
||||
ARG IGNORE DBC ** dbcpp
|
||||
ARG INT u_int32_t flags
|
||||
RET ID long dbcid
|
||||
END</pre>
|
||||
Our first line tells us we are writing the dbjoin function. It requires
|
||||
special code on the client so we indicate that with the RETCODE.
|
||||
This method takes four arguments. For the RPC request we need the
|
||||
database ID from the dbp, we construct a NULL-terminated list of IDs for
|
||||
the cursor list, we ignore the argument to return the cursor handle to
|
||||
the user, and we pass along the flags. On the return, the reply contains
|
||||
a status, by default, and additionally, it contains the ID of the newly
|
||||
created cursor.
|
||||
<h2>
|
||||
Building and Installing</h2>
|
||||
I need to verify with Don Anderson, but I believe we should just build
|
||||
the server program, just like we do for db_stat, db_checkpoint, etc.
|
||||
Basically it can be treated as a utility program from the building and
|
||||
installation perspective.
|
||||
<p>As mentioned early on, in the section on <a href="#DB Modifications">DB
|
||||
Modifications</a>, we have a single library, but allowing the user to access
|
||||
the client portion by sending a flag to <a href="../docs/api_c/env_create.html">db_env_create()</a>.
|
||||
The Makefile is modified to include the new files.
|
||||
<p>Testing is performed in two ways. First I have a new example program,
|
||||
that should become part of the example directory. It is basically
|
||||
a merging of ex_access.c and ex_env.c. This example is adequate to
|
||||
test basic functionality, as it does just does database put/get calls and
|
||||
appropriate open and close calls. However, in order to test the full
|
||||
set of functions a more generalized scheme is required. For the moment,
|
||||
I am going to modify the Tcl interface to accept the server information.
|
||||
Nothing else should need to change in Tcl. Then we can either write
|
||||
our own test modules or use a subset of the existing ones to test functionality
|
||||
on a regular basis.
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user