xref: /bup/
NameDateSize

..16-Mar-201612 KiB

.dir-locals.el20-Feb-2013276

.gitignore20-Feb-2013161

cmd/20-Feb-20134 KiB

CODINGSTYLE29-Dec-2012888

config/20-Feb-20134 KiB

configure20-Feb-2013119

DESIGN20-Feb-201332.7 KiB

Documentation/20-Feb-20134 KiB

format-subst.pl29-Dec-2012330

HACKING20-Feb-20133.1 KiB

lib/29-Dec-20124 KiB

LICENSE29-Dec-201224.9 KiB

main.py20-Feb-20136.1 KiB

Makefile20-Feb-20134.3 KiB

README20-Feb-201315.7 KiB

README.md20-Feb-201315.7 KiB

SIGNED-OFF-BY29-Dec-2012254

t/20-Feb-20134 KiB

wvtest.py29-Dec-20126.3 KiB

wvtest.sh29-Dec-20121.8 KiB

wvtestrun29-Dec-20123.5 KiB

README

1bup: It backs things up
2=======================
3
4bup is a program that backs things up.  It's short for "backup." Can you
5believe that nobody else has named an open source program "bup" after all
6this time?  Me neither.
7
8Despite its unassuming name, bup is pretty cool.  To give you an idea of
9just how cool it is, I wrote you this poem:
10
11                             Bup is teh awesome
12                          What rhymes with awesome?
13                            I guess maybe possum
14                           But that's irrelevant.
15			
16Hmm.  Did that help?  Maybe prose is more useful after all.
17
18
19Reasons bup is awesome
20----------------------
21
22bup has a few advantages over other backup software:
23
24 - It uses a rolling checksum algorithm (similar to rsync) to split large
25   files into chunks.  The most useful result of this is you can backup huge
26   virtual machine (VM) disk images, databases, and XML files incrementally,
27   even though they're typically all in one huge file, and not use tons of
28   disk space for multiple versions.
29   
30 - It uses the packfile format from git (the open source version control
31   system), so you can access the stored data even if you don't like bup's
32   user interface.
33   
34 - Unlike git, it writes packfiles *directly* (instead of having a separate
35   garbage collection / repacking stage) so it's fast even with gratuitously
36   huge amounts of data.  bup's improved index formats also allow you to
37   track far more filenames than git (millions) and keep track of far more
38   objects (hundreds or thousands of gigabytes).
39   
40 - Data is "automagically" shared between incremental backups without having
41   to know which backup is based on which other one - even if the backups
42   are made from two different computers that don't even know about each
43   other.  You just tell bup to back stuff up, and it saves only the minimum
44   amount of data needed.
45   
46 - You can back up directly to a remote bup server, without needing tons of
47   temporary disk space on the computer being backed up.  And if your backup
48   is interrupted halfway through, the next run will pick up where you left
49   off.  And it's easy to set up a bup server: just install bup on any
50   machine where you have ssh access.
51   
52 - Bup can use "par2" redundancy to recover corrupted backups even if your
53   disk has undetected bad sectors.
54   
55 - Even when a backup is incremental, you don't have to worry about
56   restoring the full backup, then each of the incrementals in turn; an
57   incremental backup *acts* as if it's a full backup, it just takes less
58   disk space.
59   
60 - You can mount your bup repository as a FUSE filesystem and access the
61   content that way, and even export it over Samba.
62   
63 - It's written in python (with some C parts to make it faster) so it's easy
64   for you to extend and maintain.
65
66
67Reasons you might want to avoid bup
68-----------------------------------
69
70 - This is a very early version. Therefore it will most probably not work
71   for you, but we don't know why.  It is also missing some
72   probably-critical features.
73   
74 - It requires python >= 2.4, a C compiler, and an installed git version >=
75   1.5.3.1.
76 
77 - It currently only works on Linux, MacOS X >= 10.4,
78   NetBSD, Solaris, or Windows (with Cygwin).  Patches to support
79   other platforms are welcome.
80   
81   
82Getting started
83===============
84
85
86From source
87-----------
88
89 - Check out the bup source code using git:
90 
91        git clone git://github.com/bup/bup
92
93 - Install the needed python libraries (including the development
94   libraries).
95
96   On Debian/Ubuntu this is usually sufficient (run as root):
97
98            apt-get install python2.6-dev python-fuse
99            apt-get install python-pyxattr python-pylibacl
100            apt-get install linux-libc-dev
101
102   Substitute python2.5-dev or python2.4-dev if you have an older
103   system.  Alternately, on newer Debian/Ubuntu versions, you can try
104   this:
105    
106            apt-get build-dep bup
107
108   On CentOS (for CentOS 6, at least), this should be sufficient (run
109   as root):
110
111            yum groupinstall "Development Tools"
112            yum install python python-dev
113            yum install fuse-python pyxattr pylibacl
114            yum install perl-Time-HiRes
115
116   In addition to the default CentOS repositories, you may need to add
117   RPMForge (for fuse-python) and EPEL (for pyxattr and pylibacl).
118
119 - Build the python module and symlinks:
120
121        make
122 	
123 - Run the tests:
124 
125        make test
126 	
127    (The tests should pass.  If they don't pass for you, stop here and send
128    me an email.)
129
130 - You can install bup via "make install", and override the default
131   destination with DESTDIR and PREFIX.
132
133   Files are normally installed to "$DESTDIR/$PREFIX" where DESTDIR is
134   empty by default, and PREFIX is set to /usr.  So if you wanted to
135   install bup to /opt/bup, you might do something like this:
136
137        make install DESTDIR=/opt/bup PREFIX=''
138
139
140From binary packages
141--------------------
142
143Binary packages of bup are known to be built for the following OSes:
144
145 - Debian:
146    http://packages.debian.org/search?searchon=names&keywords=bup
147 - Ubuntu:
148    http://packages.ubuntu.com/search?searchon=names&keywords=bup
149 - pkgsrc (NetBSD, Dragonfly, and others)
150    http://pkgsrc.se/sysutils/bup
151    http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/
152
153
154Using bup
155---------
156
157 - Try making a local backup as a tar file:
158 
159        tar -cvf - /etc | bup split -n local-etc -vv
160 	
161 - Try restoring your backup tarball:
162 
163        bup join local-etc | tar -tf -
164 	
165 - Look at how much disk space your backup took:
166 
167        du -s ~/.bup
168 	
169 - Make another backup (which should be mostly identical to the last one;
170   notice that you don't have to *specify* that this backup is incremental,
171   it just saves space automatically):
172 
173        tar -cvf - /etc | bup split -n local-etc -vv
174 	
175 - Look how little extra space your second backup used on top of the first:
176 
177 	du -s ~/.bup
178 	
179 - Restore your old backup again (the ~1 is git notation for "one older than
180   the most recent"):
181   
182        bup join local-etc~1 | tar -tf -
183 
184 - Get a list of your previous backups:
185 
186        GIT_DIR=~/.bup git log local-etc
187	
188 - Make a backup on a remote server (which must already have the 'bup' command
189   somewhere in the server's PATH (see /etc/profile, etc/environment,
190   ~/.profile, or ~/.bashrc), and be accessible via ssh.
191   Make sure to replace SERVERNAME with the actual hostname of your server):
192   
193        tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv
194 
195 - Try restoring the remote backup tarball:
196 
197        bup join -r SERVERNAME: local-etc | tar -tf -
198 	
199 - Try using the new (slightly experimental) 'bup index' and 'bup save'
200   style backups, which bypass 'tar' but have some missing features (see
201   "Things that are stupid" below):
202   	
203        bup index -uv /etc
204        bup save -n local-etc /etc
205   	
206 - Do it again and see how fast an incremental backup can be:
207 
208        bup index -uv /etc
209        bup save -n local-etc /etc
210 	
211    (You can also use the "-r SERVERNAME:" option to 'bup save', just like
212     with 'bup split' and 'bup join'.  The index itself is always local,
213     so you don't need -r there.)
214 	
215That's all there is to it!
216
217
218Notes on FreeBSD
219----------------
220
221- FreeBSD's default 'make' command doesn't like bup's Makefile. In order to
222  compile the code, run tests and install bup, you need to install GNU Make
223  from the port named 'gmake' and use its executable instead in the commands
224  seen above. (i.e. 'gmake test' runs bup's test suite)
225
226- Python's development headers are automatically installed with the 'python'
227  port so there's no need to install them separately.
228
229- To use the 'bup fuse' command, you need to install the fuse kernel module
230  from the 'fusefs-kmod' port in the 'sysutils' section and the libraries from
231  the port named 'py-fusefs' in the 'devel' section.
232
233- The 'par2' command can be found in the port named 'par2cmdline'.
234
235- In order to compile the documentation, you need pandoc which can be found in
236  the port named 'hs-pandoc' in the 'textproc' section.
237
238
239Notes on NetBSD/pkgsrc
240----------------------
241
242 - See pkgsrc/sysutils/bup, which should be the most recent stable
243   release and includes man pages.  It also has a reasonable set of
244   dependencies (git, par2, py-fuse-bindings).
245
246 - The "fuse-python" package referred to is hard to locate, and is a
247   separate tarball for the python language binding distributed by the
248   fuse project on sourceforge.  It is available as
249   pkgsrc/filesystems/py-fuse-bindings and on NetBSD 5, "bup fuse"
250   works with it.
251
252 - "bup fuse" presents every directory/file as inode 0.  The directory
253   traversal code ("fts") in NetBSD's libc will interpret this as a
254   cycle and error out, so "ls -R" and "find" will not work.
255
256 - It is not clear if extended attribute and POSIX acl support does
257   anything useful.
258
259
260How it works
261============
262
263Basic storage:
264
265bup stores its data in a git-formatted repository.  Unfortunately, git
266itself doesn't actually behave very well for bup's use case (huge numbers of
267files, files with huge sizes, retaining file permissions/ownership are
268important), so we mostly don't use git's *code* except for a few helper
269programs.  For example, bup has its own git packfile writer written in
270python.
271
272Basically, 'bup split' reads the data on stdin (or from files specified on
273the command line), breaks it into chunks using a rolling checksum (similar to
274rsync), and saves those chunks into a new git packfile.  There is one git
275packfile per backup.
276
277When deciding whether to write a particular chunk into the new packfile, bup
278first checks all the other packfiles that exist to see if they already have that
279chunk.  If they do, the chunk is skipped.
280
281git packs come in two parts: the pack itself (*.pack) and the index (*.idx).
282The index is pretty small, and contains a list of all the objects in the
283pack.  Thus, when generating a remote backup, we don't have to have a copy
284of the packfiles from the remote server: the local end just downloads a copy
285of the server's *index* files, and compares objects against those when
286generating the new pack, which it sends directly to the server.
287
288The "-n" option to 'bup split' and 'bup save' is the name of the backup you
289want to create, but it's actually implemented as a git branch.  So you can
290do cute things like checkout a particular branch using git, and receive a
291bunch of chunk files corresponding to the file you split.
292
293If you use '-b' or '-t' or '-c' instead of '-n', bup split will output a
294list of blobs, a tree containing that list of blobs, or a commit containing
295that tree, respectively, to stdout.  You can use this to construct your own
296scripts that do something with those values.
297
298The bup index:
299
300'bup index' walks through your filesystem and updates a file (whose name is,
301by default, ~/.bup/bupindex) to contain the name, attributes, and an
302optional git SHA1 (blob id) of each file and directory.
303
304'bup save' basically just runs the equivalent of 'bup split' a whole bunch
305of times, once per file in the index, and assembles a git tree
306that contains all the resulting objects.  Among other things, that makes
307'git diff' much more useful (compared to splitting a tarball, which is
308essentially a big binary blob).  However, since bup splits large files into
309smaller chunks, the resulting tree structure doesn't *exactly* correspond to
310what git itself would have stored.  Also, the tree format used by 'bup save'
311will probably change in the future to support storing file ownership, more
312complex file permissions, and so on.
313
314If a file has previously been written by 'bup save', then its git blob/tree
315id is stored in the index.  This lets 'bup save' avoid reading that file to
316produce future incremental backups, which means it can go *very* fast unless
317a lot of files have changed.
318
319 
320Things that are stupid for now but which we'll fix later
321--------------------------------------------------------
322
323Help with any of these problems, or others, is very welcome.  Join the
324mailing list (see below) if you'd like to help.
325
326 - 'bup save' and 'bup restore' have immature metadata support.
327 
328    On the plus side, they actually do have support now, but it's new,
329    and not remotely as well tested as tar/rsync/whatever's.  If you'd
330    like to help test, please do (see t/compare-trees for one
331    comparison method).
332
333    In addition, at the moment, if any strip or graft-style options
334    are specified to 'bup save', then no metadata will be written for
335    the root directory.  That's obviously less than ideal.
336
337 - 'bup index' is slower than it should be.
338 
339    It's still rather fast: it can iterate through all the filenames on my
340    600,000 file filesystem in a few seconds.  But it still needs to rewrite
341    the entire index file just to add a single filename, which is pretty
342    nasty; it should just leave the new files in a second "extra index" file
343    or something.
344   
345 - bup could use inotify for *really* efficient incremental backups.
346
347    You could even have your system doing "continuous" backups: whenever a
348    file changes, we immediately send an image of it to the server.  We could
349    give the continuous-backup process a really low CPU and I/O priority so
350    you wouldn't even know it was running.
351
352 - bup currently has no features that prune away *old* backups.
353 
354    Because of the way the packfile system works, backups become "entangled"
355    in weird ways and it's not actually possible to delete one pack
356    (corresponding approximately to one backup) without risking screwing up
357    other backups.
358   
359    git itself has lots of ways of optimizing this sort of thing, but its
360    methods aren't really applicable here; bup packfiles are just too huge.
361    We'll have to do it in a totally different way.  There are lots of
362    options.  For now: make sure you've got lots of disk space :)
363
364 - bup has never been tested on anything but Linux, MacOS, and Windows+Cygwin.
365 
366    There's nothing that makes it *inherently* non-portable, though, so
367    that's mostly a matter of someone putting in some effort.  (For a
368    "native" Windows port, the most annoying thing is the absence of ssh in
369    a default Windows installation.)
370    
371 - bup needs better documentation.
372 
373    According to a recent article about git in Linux Weekly News
374    (https://lwn.net/Articles/380983/), "it's a bit short on examples and
375    a user guide would be nice."  Documentation is the sort of thing that
376    will never be great unless someone from outside contributes it (since
377    the developers can never remember which parts are hard to understand).
378    
379 - bup is "relatively speedy" and has "pretty good" compression.
380 
381    ...according to the same LWN article.  Clearly neither of those is good
382    enough.  We should have awe-inspiring speed and crazy-good compression. 
383    Must work on that.  Writing more parts in C might help with the speed.
384   
385 - bup has no GUI.
386 
387    Actually, that's not stupid, but you might consider it a limitation. 
388    There are a bunch of Linux GUI backup programs; someday I expect someone
389    will adapt one of them to use bup.
390    
391    
392More Documentation
393------------------
394
395bup has an extensive set of man pages.  Try using 'bup help' to get
396started, or use 'bup help SUBCOMMAND' for any bup subcommand (like split,
397join, index, save, etc.) to get details on that command.
398
399For further technical details, please see ./DESIGN.
400
401
402How you can help
403================
404
405bup is a work in progress and there are many ways it can still be improved.
406If you'd like to contribute patches, ideas, or bug reports, please join the
407bup mailing list.
408
409You can find the mailing list archives here:
410
411	http://groups.google.com/group/bup-list
412	
413and you can subscribe by sending a message to:
414
415	bup-list+subscribe@googlegroups.com
416
417Please see <a href="bup/blob/master/HACKING">./HACKING</a> for
418additional information, i.e. how to submit patches (hint - no pull
419requests), how we handle branches, etc.
420
421
422Have fun,
423
424Avery
425

README.md

1bup: It backs things up
2=======================
3
4bup is a program that backs things up.  It's short for "backup." Can you
5believe that nobody else has named an open source program "bup" after all
6this time?  Me neither.
7
8Despite its unassuming name, bup is pretty cool.  To give you an idea of
9just how cool it is, I wrote you this poem:
10
11                             Bup is teh awesome
12                          What rhymes with awesome?
13                            I guess maybe possum
14                           But that's irrelevant.
15			
16Hmm.  Did that help?  Maybe prose is more useful after all.
17
18
19Reasons bup is awesome
20----------------------
21
22bup has a few advantages over other backup software:
23
24 - It uses a rolling checksum algorithm (similar to rsync) to split large
25   files into chunks.  The most useful result of this is you can backup huge
26   virtual machine (VM) disk images, databases, and XML files incrementally,
27   even though they're typically all in one huge file, and not use tons of
28   disk space for multiple versions.
29   
30 - It uses the packfile format from git (the open source version control
31   system), so you can access the stored data even if you don't like bup's
32   user interface.
33   
34 - Unlike git, it writes packfiles *directly* (instead of having a separate
35   garbage collection / repacking stage) so it's fast even with gratuitously
36   huge amounts of data.  bup's improved index formats also allow you to
37   track far more filenames than git (millions) and keep track of far more
38   objects (hundreds or thousands of gigabytes).
39   
40 - Data is "automagically" shared between incremental backups without having
41   to know which backup is based on which other one - even if the backups
42   are made from two different computers that don't even know about each
43   other.  You just tell bup to back stuff up, and it saves only the minimum
44   amount of data needed.
45   
46 - You can back up directly to a remote bup server, without needing tons of
47   temporary disk space on the computer being backed up.  And if your backup
48   is interrupted halfway through, the next run will pick up where you left
49   off.  And it's easy to set up a bup server: just install bup on any
50   machine where you have ssh access.
51   
52 - Bup can use "par2" redundancy to recover corrupted backups even if your
53   disk has undetected bad sectors.
54   
55 - Even when a backup is incremental, you don't have to worry about
56   restoring the full backup, then each of the incrementals in turn; an
57   incremental backup *acts* as if it's a full backup, it just takes less
58   disk space.
59   
60 - You can mount your bup repository as a FUSE filesystem and access the
61   content that way, and even export it over Samba.
62   
63 - It's written in python (with some C parts to make it faster) so it's easy
64   for you to extend and maintain.
65
66
67Reasons you might want to avoid bup
68-----------------------------------
69
70 - This is a very early version. Therefore it will most probably not work
71   for you, but we don't know why.  It is also missing some
72   probably-critical features.
73   
74 - It requires python >= 2.4, a C compiler, and an installed git version >=
75   1.5.3.1.
76 
77 - It currently only works on Linux, MacOS X >= 10.4,
78   NetBSD, Solaris, or Windows (with Cygwin).  Patches to support
79   other platforms are welcome.
80   
81   
82Getting started
83===============
84
85
86From source
87-----------
88
89 - Check out the bup source code using git:
90 
91        git clone git://github.com/bup/bup
92
93 - Install the needed python libraries (including the development
94   libraries).
95
96   On Debian/Ubuntu this is usually sufficient (run as root):
97
98            apt-get install python2.6-dev python-fuse
99            apt-get install python-pyxattr python-pylibacl
100            apt-get install linux-libc-dev
101
102   Substitute python2.5-dev or python2.4-dev if you have an older
103   system.  Alternately, on newer Debian/Ubuntu versions, you can try
104   this:
105    
106            apt-get build-dep bup
107
108   On CentOS (for CentOS 6, at least), this should be sufficient (run
109   as root):
110
111            yum groupinstall "Development Tools"
112            yum install python python-dev
113            yum install fuse-python pyxattr pylibacl
114            yum install perl-Time-HiRes
115
116   In addition to the default CentOS repositories, you may need to add
117   RPMForge (for fuse-python) and EPEL (for pyxattr and pylibacl).
118
119 - Build the python module and symlinks:
120
121        make
122 	
123 - Run the tests:
124 
125        make test
126 	
127    (The tests should pass.  If they don't pass for you, stop here and send
128    me an email.)
129
130 - You can install bup via "make install", and override the default
131   destination with DESTDIR and PREFIX.
132
133   Files are normally installed to "$DESTDIR/$PREFIX" where DESTDIR is
134   empty by default, and PREFIX is set to /usr.  So if you wanted to
135   install bup to /opt/bup, you might do something like this:
136
137        make install DESTDIR=/opt/bup PREFIX=''
138
139
140From binary packages
141--------------------
142
143Binary packages of bup are known to be built for the following OSes:
144
145 - Debian:
146    http://packages.debian.org/search?searchon=names&keywords=bup
147 - Ubuntu:
148    http://packages.ubuntu.com/search?searchon=names&keywords=bup
149 - pkgsrc (NetBSD, Dragonfly, and others)
150    http://pkgsrc.se/sysutils/bup
151    http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/
152
153
154Using bup
155---------
156
157 - Try making a local backup as a tar file:
158 
159        tar -cvf - /etc | bup split -n local-etc -vv
160 	
161 - Try restoring your backup tarball:
162 
163        bup join local-etc | tar -tf -
164 	
165 - Look at how much disk space your backup took:
166 
167        du -s ~/.bup
168 	
169 - Make another backup (which should be mostly identical to the last one;
170   notice that you don't have to *specify* that this backup is incremental,
171   it just saves space automatically):
172 
173        tar -cvf - /etc | bup split -n local-etc -vv
174 	
175 - Look how little extra space your second backup used on top of the first:
176 
177 	du -s ~/.bup
178 	
179 - Restore your old backup again (the ~1 is git notation for "one older than
180   the most recent"):
181   
182        bup join local-etc~1 | tar -tf -
183 
184 - Get a list of your previous backups:
185 
186        GIT_DIR=~/.bup git log local-etc
187	
188 - Make a backup on a remote server (which must already have the 'bup' command
189   somewhere in the server's PATH (see /etc/profile, etc/environment,
190   ~/.profile, or ~/.bashrc), and be accessible via ssh.
191   Make sure to replace SERVERNAME with the actual hostname of your server):
192   
193        tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv
194 
195 - Try restoring the remote backup tarball:
196 
197        bup join -r SERVERNAME: local-etc | tar -tf -
198 	
199 - Try using the new (slightly experimental) 'bup index' and 'bup save'
200   style backups, which bypass 'tar' but have some missing features (see
201   "Things that are stupid" below):
202   	
203        bup index -uv /etc
204        bup save -n local-etc /etc
205   	
206 - Do it again and see how fast an incremental backup can be:
207 
208        bup index -uv /etc
209        bup save -n local-etc /etc
210 	
211    (You can also use the "-r SERVERNAME:" option to 'bup save', just like
212     with 'bup split' and 'bup join'.  The index itself is always local,
213     so you don't need -r there.)
214 	
215That's all there is to it!
216
217
218Notes on FreeBSD
219----------------
220
221- FreeBSD's default 'make' command doesn't like bup's Makefile. In order to
222  compile the code, run tests and install bup, you need to install GNU Make
223  from the port named 'gmake' and use its executable instead in the commands
224  seen above. (i.e. 'gmake test' runs bup's test suite)
225
226- Python's development headers are automatically installed with the 'python'
227  port so there's no need to install them separately.
228
229- To use the 'bup fuse' command, you need to install the fuse kernel module
230  from the 'fusefs-kmod' port in the 'sysutils' section and the libraries from
231  the port named 'py-fusefs' in the 'devel' section.
232
233- The 'par2' command can be found in the port named 'par2cmdline'.
234
235- In order to compile the documentation, you need pandoc which can be found in
236  the port named 'hs-pandoc' in the 'textproc' section.
237
238
239Notes on NetBSD/pkgsrc
240----------------------
241
242 - See pkgsrc/sysutils/bup, which should be the most recent stable
243   release and includes man pages.  It also has a reasonable set of
244   dependencies (git, par2, py-fuse-bindings).
245
246 - The "fuse-python" package referred to is hard to locate, and is a
247   separate tarball for the python language binding distributed by the
248   fuse project on sourceforge.  It is available as
249   pkgsrc/filesystems/py-fuse-bindings and on NetBSD 5, "bup fuse"
250   works with it.
251
252 - "bup fuse" presents every directory/file as inode 0.  The directory
253   traversal code ("fts") in NetBSD's libc will interpret this as a
254   cycle and error out, so "ls -R" and "find" will not work.
255
256 - It is not clear if extended attribute and POSIX acl support does
257   anything useful.
258
259
260How it works
261============
262
263Basic storage:
264
265bup stores its data in a git-formatted repository.  Unfortunately, git
266itself doesn't actually behave very well for bup's use case (huge numbers of
267files, files with huge sizes, retaining file permissions/ownership are
268important), so we mostly don't use git's *code* except for a few helper
269programs.  For example, bup has its own git packfile writer written in
270python.
271
272Basically, 'bup split' reads the data on stdin (or from files specified on
273the command line), breaks it into chunks using a rolling checksum (similar to
274rsync), and saves those chunks into a new git packfile.  There is one git
275packfile per backup.
276
277When deciding whether to write a particular chunk into the new packfile, bup
278first checks all the other packfiles that exist to see if they already have that
279chunk.  If they do, the chunk is skipped.
280
281git packs come in two parts: the pack itself (*.pack) and the index (*.idx).
282The index is pretty small, and contains a list of all the objects in the
283pack.  Thus, when generating a remote backup, we don't have to have a copy
284of the packfiles from the remote server: the local end just downloads a copy
285of the server's *index* files, and compares objects against those when
286generating the new pack, which it sends directly to the server.
287
288The "-n" option to 'bup split' and 'bup save' is the name of the backup you
289want to create, but it's actually implemented as a git branch.  So you can
290do cute things like checkout a particular branch using git, and receive a
291bunch of chunk files corresponding to the file you split.
292
293If you use '-b' or '-t' or '-c' instead of '-n', bup split will output a
294list of blobs, a tree containing that list of blobs, or a commit containing
295that tree, respectively, to stdout.  You can use this to construct your own
296scripts that do something with those values.
297
298The bup index:
299
300'bup index' walks through your filesystem and updates a file (whose name is,
301by default, ~/.bup/bupindex) to contain the name, attributes, and an
302optional git SHA1 (blob id) of each file and directory.
303
304'bup save' basically just runs the equivalent of 'bup split' a whole bunch
305of times, once per file in the index, and assembles a git tree
306that contains all the resulting objects.  Among other things, that makes
307'git diff' much more useful (compared to splitting a tarball, which is
308essentially a big binary blob).  However, since bup splits large files into
309smaller chunks, the resulting tree structure doesn't *exactly* correspond to
310what git itself would have stored.  Also, the tree format used by 'bup save'
311will probably change in the future to support storing file ownership, more
312complex file permissions, and so on.
313
314If a file has previously been written by 'bup save', then its git blob/tree
315id is stored in the index.  This lets 'bup save' avoid reading that file to
316produce future incremental backups, which means it can go *very* fast unless
317a lot of files have changed.
318
319 
320Things that are stupid for now but which we'll fix later
321--------------------------------------------------------
322
323Help with any of these problems, or others, is very welcome.  Join the
324mailing list (see below) if you'd like to help.
325
326 - 'bup save' and 'bup restore' have immature metadata support.
327 
328    On the plus side, they actually do have support now, but it's new,
329    and not remotely as well tested as tar/rsync/whatever's.  If you'd
330    like to help test, please do (see t/compare-trees for one
331    comparison method).
332
333    In addition, at the moment, if any strip or graft-style options
334    are specified to 'bup save', then no metadata will be written for
335    the root directory.  That's obviously less than ideal.
336
337 - 'bup index' is slower than it should be.
338 
339    It's still rather fast: it can iterate through all the filenames on my
340    600,000 file filesystem in a few seconds.  But it still needs to rewrite
341    the entire index file just to add a single filename, which is pretty
342    nasty; it should just leave the new files in a second "extra index" file
343    or something.
344   
345 - bup could use inotify for *really* efficient incremental backups.
346
347    You could even have your system doing "continuous" backups: whenever a
348    file changes, we immediately send an image of it to the server.  We could
349    give the continuous-backup process a really low CPU and I/O priority so
350    you wouldn't even know it was running.
351
352 - bup currently has no features that prune away *old* backups.
353 
354    Because of the way the packfile system works, backups become "entangled"
355    in weird ways and it's not actually possible to delete one pack
356    (corresponding approximately to one backup) without risking screwing up
357    other backups.
358   
359    git itself has lots of ways of optimizing this sort of thing, but its
360    methods aren't really applicable here; bup packfiles are just too huge.
361    We'll have to do it in a totally different way.  There are lots of
362    options.  For now: make sure you've got lots of disk space :)
363
364 - bup has never been tested on anything but Linux, MacOS, and Windows+Cygwin.
365 
366    There's nothing that makes it *inherently* non-portable, though, so
367    that's mostly a matter of someone putting in some effort.  (For a
368    "native" Windows port, the most annoying thing is the absence of ssh in
369    a default Windows installation.)
370    
371 - bup needs better documentation.
372 
373    According to a recent article about git in Linux Weekly News
374    (https://lwn.net/Articles/380983/), "it's a bit short on examples and
375    a user guide would be nice."  Documentation is the sort of thing that
376    will never be great unless someone from outside contributes it (since
377    the developers can never remember which parts are hard to understand).
378    
379 - bup is "relatively speedy" and has "pretty good" compression.
380 
381    ...according to the same LWN article.  Clearly neither of those is good
382    enough.  We should have awe-inspiring speed and crazy-good compression. 
383    Must work on that.  Writing more parts in C might help with the speed.
384   
385 - bup has no GUI.
386 
387    Actually, that's not stupid, but you might consider it a limitation. 
388    There are a bunch of Linux GUI backup programs; someday I expect someone
389    will adapt one of them to use bup.
390    
391    
392More Documentation
393------------------
394
395bup has an extensive set of man pages.  Try using 'bup help' to get
396started, or use 'bup help SUBCOMMAND' for any bup subcommand (like split,
397join, index, save, etc.) to get details on that command.
398
399For further technical details, please see ./DESIGN.
400
401
402How you can help
403================
404
405bup is a work in progress and there are many ways it can still be improved.
406If you'd like to contribute patches, ideas, or bug reports, please join the
407bup mailing list.
408
409You can find the mailing list archives here:
410
411	http://groups.google.com/group/bup-list
412	
413and you can subscribe by sending a message to:
414
415	bup-list+subscribe@googlegroups.com
416
417Please see <a href="bup/blob/master/HACKING">./HACKING</a> for
418additional information, i.e. how to submit patches (hint - no pull
419requests), how we handle branches, etc.
420
421
422Have fun,
423
424Avery
425