slon — Slony-I daemon
slon
[option
...] [clustername
] [conninfo
]
slon is the daemon application that “runs” Slony-I replication. A slon instance must be run for each node in a Slony-I cluster.
-d
debuglevel
The log_level
specifies which Debug levels
slon should display when logging its
activity.
The eight levels of logging are:
Error
Warn
Config
Info
Debug1
Debug2
Debug3
Debug4
Thus, all the non-debugging log
levels are always displayed in the logs. If
log_level
is set to 2 (a routine, and, seemingly
preferable choice), then output at debugging levels 1 and 2 will
also be displayed.
-s
SYNC check interval
The sync_interval
, measured in milliseconds,
indicates how often slon should check
to see if a SYNC
should be introduced.
Default is 10000 ms. The main loop in
sync_Thread_main()
sleeps for intervals of
sync_interval
milliseconds between iterations.
Short sync check intervals keep the origin on a “short leash”, updating its subscribers more frequently. If you have replicated sequences that are frequently updated without there being tables that are affected, this keeps there from being times when only sequences are updated, and therefore no syncs take place
If the node is not an origin for any replication set, so no
updates are coming in, it is somewhat wasteful for this value to
be much less the sync_interval_timeout
value.
-t
SYNC
interval timeout
At the end of each sync_interval_timeout
timeout
period, a SYNC
will be generated on the
“local” node even if there has been no replicatable
data updated that would have pushed out a
SYNC
.
Default, and maximum, is 60000 ms, so that you can expect each
node to “report in” with a SYNC
once each minute.
Note that SYNC
events are also generated on
subscriber nodes. Since they are not actually generating any
data to replicate to other nodes, these SYNC
events are of not terribly much value.
-g
group size
This controls the maximum SYNC
group size,
sync_group_maxsize
; defaults to 6. Thus, if a
particular node is behind by 200 SYNC
s, it
will try to group them together into groups of a maximum size of
sync_group_maxsize
. This can be expected to
reduce transaction overhead due to having fewer transactions to
COMMIT
.
The default of 6 is probably suitable for small systems that can devote only very limited bits of memory to slon. If you have plenty of memory, it would be reasonable to increase this, as it will increase the amount of work done in each transaction, and will allow a subscriber that is behind by a lot to catch up more quickly.
Slon processes usually stay pretty small; even with large value for this option, slon would be expected to only grow to a few MB in size.
The big advantage in increasing this parameter comes from
cutting down on the number of transaction
COMMIT
s; moving from 1 to 2 will provide
considerable benefit, but the benefits will progressively fall
off once the transactions being processed get to be reasonably
large. There isn't likely to be a material difference in
performance between 80 and 90; at that point, whether
“bigger is better” will depend on whether the
bigger set of SYNC
s makes the
LOG
cursor behave badly due to consuming more
memory and requiring more time to sortt.
In Slony-I version 1.0, slon will
always attempt to group SYNC
s together to
this maximum, which won't be ideal if
replication has been somewhat destabilized by there being very
large updates (e.g. - a single transaction
that updates hundreds of thousands of rows) or by
SYNC
s being disrupted on an origin node with
the result that there are a few SYNC
s that
are very large. You might run into the problem that grouping
together some very large SYNC
s knocks over a
slon process. When it picks up
again, it will try to process the same large grouped set of
SYNC
s, and run into the same problem over and
over until an administrator interrupts this and changes the
-g
value to break this “deadlock.”
In Slony-I version 1.0, the slon
instead adaptively “ramps up” from doing 1
SYNC
at a time towards the maximum group
size. As a result, if there are a couple of
SYNC
s that cause problems, the
slon will (with any relevant watchdog
assistance) always be able to get to the point where it
processes the troublesome SYNC
s one by one,
hopefully making operator assistance unnecessary.
-o
desired sync time
A “maximum” time planned for grouped SYNC
s.
If replication is running behind, slon will gradually
increase the numbers of SYNC
s grouped
together, targetting that (based on the time taken for the
last group of SYNC
s) they
shouldn't take more than the specified
desired_sync_time
value.
The default value for desired_sync_time
is
60000ms, equal to one minute.
That way, you can expect (or at least hope!) that you'll
get a COMMIT
roughly once per minute.
It isn't totally predictable, as it
is entirely possible for someone to request a very
large update, all as one transaction, that can
“blow up” the length of the resulting
SYNC
to be nearly arbitrarily long. In such a
case, the heuristic will back off for the
next group.
The overall effect is to improve
Slony-I's ability to cope with
variations in traffic. By starting with 1 SYNC
, and gradually
moving to more, even if there turn out to be variations large
enough to cause PostgreSQL backends to
crash, Slony-I will back off down to
start with one sync at a time, if need be, so that if it is at
all possible for replication to progress, it will.
-c
cleanup cycles
The value vac_frequency
indicates how often to
VACUUM
in cleanup cycles.
Set this to zero to disable
slon-initiated vacuuming. If you are
using something like pg_autovacuum to
initiate vacuums, you may not need for slon to initiate vacuums
itself. If you are not, there are some tables
Slony-I uses that collect a
lot of dead tuples that should be vacuumed
frequently, notably pg_listener
.
In Slony-I version 1.1, this changes a little; the
cleanup thread tracks, from iteration to iteration, the earliest
transaction ID still active in the system. If this doesn't
change, from one iteration to the next, then an old transaction
is still active, and therefore a VACUUM
will
do no good. The cleanup thread instead merely does an
ANALYZE
on these tables to update the
statistics in pg_statistics
.
-p
PID filename
pid_file
contains the filename in which the PID
(process ID) of the slon is stored.
This may make it easier to construct scripts to monitor multiple slon processes running on a single host.
-f
config file
File from which to read slon configuration.
This configuration is discussed further in Slon Run-time Configuration. If there are to be a complex set of configuration parameters, or if there are parameters you do not wish to be visible in the process environment variables (such as passwords), it may be convenient to draw many or all parameters from a configuration file. You might either put common parameters for all slon processes in a commonly-used configuration file, allowing the command line to specify little other than the connection info. Alternatively, you might create a configuration file for each node.
-a
archive directory
archive_dir
indicates a directory in which to
place a sequence of SYNC
archive files for
use in log shipping mode.
-x
command to run on log archive
command_on_logarchive
indicates a command to be run
each time a SYNC file is successfully generated.
See more details on slon_conf_command_on_log_archive.
-q
quit based on SYNC provider
quit_sync_provider
indicates which provider's
worker thread should be watched in order to terminate after a
certain event. This must be used in conjunction with the
-r
option below...
This allows you to have a slon stop replicating after a certain point.
-r
quit at event number
quit_sync_finalsync
indicates the event number
after which the remote worker thread for the provider above
should terminate. This must be used in conjunction with the
-q
option above...
-l
lag interval
lag_interval
indicates an interval value such as
3 minutes
or 4 hours
or 2 days
that indicates that this node is
to lag its providers by the specified interval of time. This
causes events to be ignored until they reach the age
corresponding to the interval.
There is a concommittant downside to this lag; events that require all nodes to synchronize, as typically happens with FAILOVER and MOVE SET, will have to wait for this lagging node.
That might not be ideal behaviour at failover time, or at the time when you want to run EXECUTE SCRIPT.