librsync  2.0.2
rdiff.md
1 # rdiff {#rdiff}
2 
3 Introduction
4 ============
5 
6 *rdiff* is a program to compute and apply network deltas. An *rdiff
7 delta* is a delta between binary files, describing how a *basis* (or
8 *old*) file can be automatically edited to produce a *result* (or *new*)
9 file.
10 
11 Unlike most diff programs, librsync does not require access to both of
12 the files when the diff is computed. Computing a delta requires just a
13 short "signature" of the old file and the complete contents of the new
14 file. The signature contains checksums for blocks of the old file. Using
15 these checksums, rdiff finds matching blocks in the new file, and then
16 computes the delta.
17 
18 rdiff deltas are usually less compact and also slower to produce than
19 xdeltas or regular text diffs. If it is possible to have both the old
20 and new files present when computing the delta,
21 [xdelta](http://www.xcf.berkeley.edu/~jmacd/xdelta.html) will generally
22 produce a much smaller file. If the files being compared are plain text,
23 then GNU [diff](http://www.gnu.org/software/diffutils/diffutils.html) is
24 usually a better choice, as the diffs can be viewed by humans and
25 applied as inexact matches.
26 
27 rdiff comes into its own when it is not convenient to have both files
28 present at the same time. One example of this is that the two files are
29 on separate machines, and you want to transfer only the differences.
30 Another example is when one of the files has been moved to archive or
31 backup media, leaving only its signature.
32 
33 Symbolically
34 
35 > signature(*basis-file*) -> *sig-file*
36 >
37 > delta(*sig-file*, *new-file*) -> *delta-file*
38 >
39 > patch(*basis-file*, *delta-file*) -> *recreated-file*
40 
41 rdiff signatures and deltas are binary files in a format specific to
42 rdiff. Signatures consist of a header, followed by a list of checksums
43 for successive fixed-size blocks. Deltas consist of a header followed by
44 an instruction stream, which when executed produces the output file.
45 There are instructions to insert new data specified in the patch, or to
46 copy data from the basis file.
47 
48 Unlike regular text diffs, rdiff deltas can describe sections of the
49 input file which have been reordered or copied.
50 
51 Because block checksums are used to find identical sections, rdiff
52 cannot find common sections smaller than one block, and it may not
53 exactly identify common sections near changed sections. Changes that
54 touch every block of the file, such as changing newlines to CRLF, are
55 likely to cause no blocks to match at all.
56 
57 rdiff does not deal with file metadata or structure, such as filenames,
58 permissions, or directories. To rdiff, a file is just a stream of bytes.
59 Higher-level tools, such as
60 [rdiff-backup](http://rdiff-backup.stanford.edu/) can deal with these
61 issues in a way appropriate to their users.
62 
63 Use patterns
64 ============
65 
66 A typical application of the rsync algorithm is to transfer a file *A2*
67 from a machine A to a machine B which has a similar file *A1*. This can
68 be done as follows:
69 
70 1. B generates the rdiff signature of *A1*. Call this *S1*. B sends the
71  signature to A. (The signature is usually much smaller than the file
72  it describes.)
73 2. A computes the rdiff delta between *S1* and *A2*. Call this delta
74  *D*. A sends the delta to B.
75 3. B applies the delta to recreate *A2*.
76 
77 In cases where *A1* and *A2* contain runs of identical bytes, rdiff
78 should give a significant space saving.
79 
80 Invoking rdiff
81 ==============
82 
83 There are three distinct modes of operation: *signature*, *delta* and
84 *patch*. The mode is selected by the first command argument.
85 
86 signature
87 ---------
88 
89 > rdiff \[OPTIONS\] signature INPUT SIGNATURE
90 
91 **rdiff signature** generates a signature file from an input file. The
92 signature can later be used to generate a delta relative to the old
93 file.
94 
95 delta
96 -----
97 
98 > rdiff \[OPTIONS\] delta SIGNATURE NEWFILE DELTA
99 
100 **rdiff delta** reads in a delta describing a basis file. It then
101 calculates and writes a delta delta that transforms the basis into the
102 new file.
103 
104 patch
105 -----
106 
107 > rdiff \[OPTIONS\] patch BASIS DELTA OUTPUT
108 
109 rdiff applies a delta to a basis file and writes out the result.
110 
111 rdiff cannot update files in place: the output file must not be the same
112 as the input file.
113 
114 rdiff does not currently check that the delta is being applied to the
115 correct file. If a delta is applied to the wrong basis file, the results
116 will be garbage.
117 
118 The basis file must allow random access. This means it must be a regular
119 file rather than a pipe or socket.
120 
121 Global Options
122 --------------
123 
124 These options are available for all commands.
125 
126 `--version` Show program version and copyright.
127 
128 `--help` Show brief help message.
129 
130 `--statistics` Show counts of internal operations.
131 
132 `--debug` Write debugging information to stderr.
133 
134 Options must be specified before the command name.
135 
136 Return Value
137 ============
138 
139 0: Successful completion.
140 
141 1: Environmental problems (file not found, invalid options, IO
142  error, etc).
143 
144 2: Corrupt signature or delta file.
145 
146 3: Internal error or unhandled situation in librsync or rdiff.
147 
148 Bugs
149 ====
150 
151 Unlike text patches, rdiff deltas can only be usefully applied to the
152 exact basis file that they were generated from. rdiff does not protect
153 against trying to apply a delta to the wrong file, though this will
154 produce garbage output. It may be useful to store a hash of the file to
155 which the digest is meant to be applied.
156 
157 Author
158 ======
159 
160 rdiff was written by Martin Pool. The original rsync algorithm was
161 discovered by Andrew Tridgell.
162 
163 This program is part of the [librsync](http://librsync.sourcefrog.net/)
164 package.