Table of Contents
It is common for managers to have a desire to operate systems using some available, documented set of “best practices.” Documenting that sort of thing is essential to ISO 9000, ISO 9001, and other sorts of organizational certifications.
It is worthwhile to preface a discussion of “best practices” by mentioning that each organization that uses Slony-I is unique, and there may be a need for local policies to reflect unique local operating characteristics. It is for that reason that Slony-I does not impose its own policies for such things as failover ; those will need to be determined based on the overall shape of your network, of your set of database servers, and of your usage patterns for those servers.
There are, however, a number of things that early adopters of Slony-I have discovered which can at least help to suggest the sorts of policies you might want to consider.
Slony-I is a complex multi-client, multi-server system, with the result that there are almost an innumerable set of places where problems can arise.
As a natural result, maintaining a clean, consistent environment is really valuable, as any sort of environmental “messiness” can either cause unexpected problems or mask the real problem.
Numerous users have reported problems resulting from mismatches between Slony-I versions, local libraries, and PostgreSQL libraries. Details count: you need to be clear on what hosts are running what versions of what software.
This is normally a matter of being disciplined about how your software is deployed, and the challenges represent a natural consequence of being a distributed system comprised of a large number of components that need to match.
If a slonik script does not run as expected in a first attempt, it would be foolhardy to attempt to run it again until a problem has been found and resolved.
There are a very few slonik commands such as STORE
PATH that behave in a nearly idempotent manner; if
you run STORE
PATH again, that merely updates
table sl_path
with the same value.
In contrast SUBSCRIBE SET behaves in two very different ways depending on whether the subscription has been activated yet or not; if initiating the subscription didn't work at a first attempt, submitting the request again won't help make it happen.
Principle: Use an unambiguous, stable time zone such as UTC or GMT.
Users have run into problems with slon functioning properly when their system uses a time zone that PostgreSQL was unable to recognize such as CUT0 or WST. It is necessary that you use a timezone that PostgreSQL can recognize correctly. It is furthermore preferable to use a time zone where times do not shift around due to Daylight Savings Time.
The “geographically unbiased” choice seems to be
or
TZ
=UTC
, and to make sure that
systems are “in sync” by using NTP to synchronize clocks
throughout the environment. TZ
=GMT
See also Section 3.4, “ Time Synchronization”.
Principle: Long running transactions are Evil
The FAQ has an entry on growth
of pg_listener
which discusses this in a fair bit of detail;
the long and short is that long running transactions have numerous ill
effects. They are particularly troublesome on an
“origin” node, holding onto locks, preventing vacuums
from taking effect, and the like.
In version 1.2, some of the “evils” should be lessened, because:
Events in pg_listener
are only generated when
replication updates are relatively infrequent, which should mean that
busy systems won't generate many dead tuples in that table
The system will periodically rotate (using
TRUNCATE
to clean out the old table) between the
two log tables, sl_log_1 and sl_log_2, preventing unbounded growth of dead space
there.
Failover policies should be planned for ahead of time.
This may simply involve thinking about what the priority lists should be of what should fail to what, as opposed to trying to automate it. But knowing what to do ahead of time cuts down on the number of mistakes made.
At Afilias, a variety of internal [The 3AM Unhappy DBA's Guide to...] guides have been created to provide checklists of what to do when certain “unhappy” events take place. This sort of material is highly specific to the environment and the set of applications running there, so you would need to generate your own such documents. This is one of the vital components of any disaster recovery preparations.
MOVE SET should be used to allow preventative maintenance to prevent problems from becoming serious enough to require failover .
VACUUM
policy needs to be
carefully defined.
As mentioned above, “long running transactions are
Evil.” VACUUM
s are no exception in this. A
VACUUM
on a huge table will open a long-running
transaction with all the known ill effects.
Running all of the slon daemons on a central server for each network has proven preferable.
Each slon should run on a host on the same local network as the node that it is servicing, as it does a lot of communications with its database, and that connection needs to be as reliable as possible.
In theory, the “best” speed might be expected to come from running the slon on the database server that it is servicing.
In practice, strewing slon processes and configuration across a dozen servers turns out to be inconvenient to manage.
slon processes should run in the same “network context” as the node that each is responsible for managing so that the connection to that node is a “local” one. Do not run such links across a WAN.
A WAN outage can leave database connections “zombied”, and typical TCP/IP behaviour will allow those connections to persist, preventing a slon restart for around two hours.
It is not difficult to remedy this; you need only kill
SIGINT
the offending backend connection. But by running the
slon locally, you will generally not be vulnerable to this
condition.
Before getting too excited about having fallen into some big problem, consider killing and restarting all the slon processes. Historically, this has frequently been able to resolve “stickiness.”
With a very few exceptions, it is generally not a big deal to
kill off and restart the slon processes. Each slon connects to
one database for which it is the manager, and then connects to other
databases as needed to draw in events. If you kill off a slon, all
you do is to interrupt those connections. If
a SYNC
or other event is sitting there
half-processed, there's no problem: the transaction will roll back,
and when the slon restarts, it will restart that event from
scratch.
The exception, where it is undesirable to restart a slon, is
where a COPY_SET
is running on a large replication
set, such that stopping the slon may discard several hours worth of
load work.
In early versions of Slony-I, it was frequently the case that connections could get a bit “deranged” which restarting slons would clean up. This has become much more rare, but it has occasionally proven useful to restart the slon. If there has been any “network derangement”, this can clear up the issue of defunct database connections.
The Database Schema Changes section outlines some practices that have been found useful for handling changes to database schemas.
Handling of Primary Keys
Discussed in the section on Replication Sets, it is ideal if each replicated table has a true primary key constraint; it is acceptable to use a “candidate primary key.”
It is not recommended that a Slony-I-defined key (created via TABLE ADD KEY) be used to introduce a candidate primary key, as this introduces the possibility that updates to this table can fail due to the introduced unique index, which means that Slony-I has introduced a new failure mode for your application.
Grouping tables into sets suggests strategies for determining how to group tables and sequences into replication sets.
It should be obvious that actions that can delete a lot of data should be taken with great care; the section on Dropping things from Slony-I Replication discusses the different sorts of “deletion” that Slony-I supports.
Certain Slony-I operations, notably set add table
,
move set
,
lock set
,
and execute script
require acquiring exclusive locks on the
tables being replicated.
Depending on the kind of activity on the databases, this may or may not have the effect of requiring a (hopefully brief) database outage.
What to do about DDL.
Slony-I operates via detecting updates to table data via
triggers that are attached to those tables. That means that updates
that take place via methods that do not fire triggers will not notice
those updates. ALTER TABLE
, CREATE OR
REPLACE FUNCTION
, CREATE TABLE
, all
represent SQL requests that Slony-I has no way to notice.
A philosophy underlying Slony-I's handling of this is that competent system designers do not write self-modifying code, and database schemas that get modified by the application are an instance of this. It does not try hard to make it convenient to modify database schemas.
There will be cases where that is necessary, so the execute script
is provided
which will apply DDL changes at the same location in the transaction
stream on all servers.
Unfortunately, this introduces a great deal of locking of
database objects. Altering tables requires taking out an exclusive
lock on them; doing so via execute script
requires
that Slony-I take out an exclusive lock on all
replicated tables. This can prove quite inconvenient when
applications are running; you run into deadlocks and such.
One particularly dogmatic position that some hold is that
all schema changes should
always be propagated using execute
script
. This guarantees that nodes will be consistent, but
the costs of locking and deadlocking may be too high for some
users.
At Afilias, our approach has been less dogmatic; there
are sorts of changes that
must be applied using execute
script
, but we apply others independently.
Changes that must be applied using execute script
All instances of ALTER TABLE
Changes that are not normally applied using execute script
CREATE INDEX
CREATE TABLE
Tables that are not being replicated do not require Slony-I “permission”.
CREATE OR REPLACE FUNCTION
Typically, new versions of functions may be done without
Slony-I being “aware” of them. The obvious exception is
when a new function is being deployed to accomodate a table
alteration; in that case, the new version must be added in in a manner
synchronized with the execute script
for the table
alteration.
Similarly, CREATE TYPE
, CREATE
AGGREGATE
, and such will
commonly not need to be forcibly applied in “perfectly
synchronized” manner across nodes.
Security management, such as CREATE USER
, CREATE ROLE
, GRANT
, and such are largely irrelevant to Slony-I as it runs as
a “superuser”.
Indeed, we have frequently found it useful to have different security arrangements on different nodes. Access to the “master” node should be restricted to applications that truly need access to it; “reporting” users commonly are restricted much more there than on subscriber nodes.
It has proven useful to define a slony
user
for use by Slony-I, as distinct from a generic
postgres
or pgsql
user.
If all sorts of automatic “maintenance”
activities, such as vacuum
ing and performing
backups, are performed under the “ownership” of a single
PostgreSQL user, it turns out to be pretty easy to run into deadlock
problems.
For instance, a series of vacuums
that
unexpectedly run against a database that has a large
SUBSCRIBE_SET
event under way may run into a
deadlock which would roll back several hours worth of data copying
work.
If, instead, different maintenance roles are performed by
different users, you may, during vital operations such as
SUBSCRIBE_SET
, lock out other users at the
pg_hba.conf
level, only allowing the
slony
user in, which substantially reduces the risk
of problems while the subscription is in progress.
Path configuration
The section on Path Communications discusses the issues surrounding what network connections need to be in place in order for Slony-I to function.
The section on listen paths discusses the issues surrounding the table sl_listen.
As of Slony-I 1.1, its contents are computed automatically based on the communications information available to Slony-I which should alleviate the problems found in earlier versions where this had to be configured by hand. Many seemingly inexplicable communications failures, where nodes failed to talk to one another even though they technically could, were a result of incorrect listen path configuration.
Use test_slony_state.pl
to look
for configuration problems.
This is a Perl script which connects to a Slony-I node and then rummages through Slony-I configuration looking for quite a variety of conditions that tend to indicate problems, including:
Bloating of some config tables
Analysis of listen paths
Analysis of event propagation and confirmation
If replication mysteriously “isn't working”, this tool can run through many of the possible problems for you.
Configuring slon
As of version 1.1, slon configuration may be drawn either from the command line or from configuration files. “Best” practices have yet to emerge from the two options:
Configuration via command line options
This approach has the merit that all the options that are active are visible in the process environment. (And if there are a lot of them, they may be a nuisance to read.)
Unfortunately, if you invoke slon from the command line, you could forget to include log shipping configuration and thereby destroy the sequence of logs for a log shipping node.
Unlike when command line options are used, the active options are not visible. They can only be inferred from the name and/or contents of the slon configuration file, and will not reflect subsequent changes to the configuration file.
By putting the options in a file, you won't forget including any of them, so this is safer for log shipping.
Things to do when subscribing nodes
When a new node is running the COPY_SET
event for a large replication set (e.g. - one
which takes several hours to subscribe) it has been found to be
desirable to lock all users other than the slony
user out of the new subscriber because:
Applications will run into partially-copied, half-baked data that is not totally consistent.
It is possible for applications (and maintenance
scripts) to submit combinations of queries that will get the system
into a deadlock situation, thereby terminating the
COPY_SET
event, and requiring the subscription to
start over again.
It may be worth considering turning the
PostgreSQL fsync
functionality off during the
copying of data, as this will improve performance, and if the database
“falls over” during the COPY_SET
event, you will be restarting the copy of the whole replication
set.
Managing use of slonik
The notes on Using Slonik describe some of the lessons learned from managing large numbers of slonik scripts.
Notable principles that have fallen out of generating many slonik scripts are that:
Using “preamble” files is highly recommended as it means that you use heavily-verified preambles over and over.
Any opportunity that you have to automatically generate configuration whether by drawing it from a database or by using a script that generates repetitively similar elements will help prevent human error.
Handling Very Large Replication Sets
Some users have set up replication on replication sets that are
tens to hundreds of gigabytes in size, which puts some added
“strain” on the system, in particular where it may take
several days for the COPY_SET
event to complete.
Here are some principles that have been observed for dealing with
these sorts of situations.
Drop all indices other than the primary key index
while the COPY_SET
event is run.
When data is copied into a table that has indices on it, PostgreSQL builds the indices incrementally, on the fly. This is much slower than simply copying the data into the table, and then recreating each index “ex nihilo”, as the latter can take substantial advantage of sort memory.
In Slony-I version 1.1.5 and later versions, indices are dropped and recreated automatically, which effectively invalidates this practice.
If there are large numbers of updates taking place as
the large set is being copied, this can lead to the subscriber being
behind by some enormous number of SYNC
events.
If a SYNC
is generated about once per
second, that leads to the subscriber “falling behind” by
around 90,000 SYNC
s per day, possibly for several
days.
There will correspondingly be an enormous
growth of sl_log_1 and sl_seqlog. Unfortunately, once the
COPY_SET
completes, users have found that the
queries against these tables wind up reverting to Seq
Scans
so that even though a particular
SYNC
processing event is only processing a small
number of those 90,000 SYNC
events, it still reads
through the entire table. In such a case, you may never see
replication catch up.
Several things can be done that will help, involving careful selection of slon parameters:
Ensure that there exists, on the
“master” node, an index on sl_log_1(log_xid)
. If it doesn't exist, as the Slony-I instance was set up
before version 1.1.1, see slony1_base.sql
for
the exact form that the index setup should take.
In 1.2, there is a process that runs automatically to add partial indexes by origin node number, which should be the optimal form for such an index to take.
On the subscriber's slon, increase
the number of SYNC
events processed together, with
the slon_conf_sync_group_maxsize parameter to some
value that allows it to process a significant portion of the
outstanding SYNC
events.
On the subscriber's slon, set the
desired_sync_time to 0, as the adaptive
SYNC
grouping system will start with small
groupings that will, under these circumstances, perform
poorly.
Increase the slon_conf_sync_interval on the origin's slon so that SYNC
events are generated
less frequently. If a SYNC
is only generated once
per minute instead of once per second, that will cut down the number
of events by a factor of 60.
It is likely to be worthwhile to use slon_conf_vac_frequency to deactivate slon-initiated vacuuming in favor of running your own vacuum scripts, as there will be a buildup of unpurgeable data while the data is copied and the subscriber starts to catch up.