The programs that actually perform Slony-I replication are the slon daemons.
You need to run one slon instance for each node
in a Slony-I cluster, whether you consider that node a
“master” or a “slave”. On Windows™ when
running as a service things are slightly different. One slon service
is installed, and a seperate configuration file registered for each
node to be serviced by that machine. The main service then manages the
individual slons itself. Since a MOVE SET
or
FAILOVER
can switch the roles of nodes, slon needs
to be able to function for both providers and subscribers. It is not
essential that these daemons run on any particular host, but there are
some principles worth considering:
Each slon needs to be able to communicate quickly with the database whose “node controller” it is. Therefore, if a Slony-I cluster runs across some form of Wide Area Network, each slon process should run on or nearby the databases each is controlling. If you break this rule, no particular disaster should ensue, but the added latency introduced to monitoring events on the slon's “own node” will cause it to replicate in a somewhat less timely manner.
The very fastest results would be achieved by having each slon run on the database server that it is servicing. If it runs somewhere within a fast local network, performance will not be noticeably degraded.
It is an attractive idea to run many of the slon processes for a cluster on one machine, as this makes it easy to monitor them both in terms of log files and process tables from one location. This also eliminates the need to login to several hosts in order to look at log files or to restart slon instances.
Do not run a slon that is responsible to service a particular node across a WAN link if at all possible. Any problems with that connection can kill the connection whilst leaving “zombied” database connections on the node that (typically) will not die off for around two hours. This prevents starting up another slon, as described in the FAQ under multiple slon connections.
Historically, slon processes have been fairly fragile, dying if they encounter just about any significant error. This behaviour mandated running some form of “watchdog” which would watch to make sure that if one slon fell over, it would be replaced by another.
There are two “watchdog” scripts currently available in the Slony-I source tree:
tools/altperl/slon_watchdog
-
an “early” version that basically wraps a loop around the
invocation of slon, restarting any time it falls
over
tools/altperl/slon_watchdog2
- a somewhat more intelligent version that periodically polls the
database, checking to see if a SYNC
has taken place
recently. We have had VPN connections that occasionally fall over
without signalling the application, so that the slon
stops working, but doesn't actually die; this polling addresses that
issue.
The slon_watchdog2
script is probably
usually the preferable thing to run. It was at
one point not preferable to run it whilst subscribing a very large
replication set where it is expected to take many hours to do the
initial COPY SET
. The problem that came up in that
case was that it figured that since it hasn't done a
SYNC
in 2 hours, something was broken requiring
restarting slon, thereby restarting the COPY SET
event. More recently, the script has been changed to detect
COPY SET
in progress.
In Slony-I version 1.2, the structure of the slon has been revised fairly substantially to make it much less fragile. The main process should only die off if you expressly signal it asking it to be killed.
A new approach is available in the Section 19.3, “ launch_clusters.sh ” script which uses slon configuration files and which may be invoked as part of your system startup process.