NameDateSize

..16-Mar-201612 KiB

.ensime29-Dec-2012219

.gitignore29-Dec-2012174

config/29-Dec-20124 KiB

doc/29-Dec-20124 KiB

LICENSE29-Dec-2012550

project/29-Dec-20124 KiB

README.markdown29-Dec-20124.2 KiB

src/22-Mar-20124 KiB

TODO29-Dec-2012873

README.markdown

1
2# FlockDB
3
4FlockDB is a distributed graph database for storing adjancency lists, with
5goals of supporting:
6
7- a high rate of add/update/remove operations
8- potientially complex set arithmetic queries
9- paging through query result sets containing millions of entries
10- ability to "archive" and later restore archived edges
11- horizontal scaling including replication
12- online data migration
13
14Non-goals include:
15
16- multi-hop queries (or graph-walking queries)
17- automatic shard migrations
18
19FlockDB is much simpler than other graph databases such as neo4j because it
20tries to solve fewer problems. It scales horizontally and is designed for
21on-line, low-latency, high throughput environments such as web-sites.
22
23Twitter uses FlockDB to store social graphs (who follows whom, who blocks
24whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster
25stores 13+ billion edges and sustains peak traffic of 20k writes/second and
26100k reads/second.
27
28
29# It does what?
30
31If, for example, you're storing a social graph (user A follows user B), and
32it's not necessarily symmetrical (A can follow B without B following A), then
33FlockDB can store that relationship as an edge: node A points to node B. It
34stores this edge with a sort position, and in both directions, so that it can
35answer the question "Who follows A?" as well as "Whom is A following?"
36
37This is called a directed graph. (Technically, FlockDB stores the adjacency
38lists of a directed graph.) Each edge has a 64-bit source ID, a 64-bit
39destination ID, a state (normal, removed, archived), and a 32-bit position
40used for sorting. The edges are stored in both a forward and backward
41direction, meaning that an edge can be queried based on either the source or
42destination ID.
43
44For example, if node 134 points to node 90, and its sort position is 5, then
45there are two rows written into the backing store:
46
47    forward: 134 -> 90 at position 5
48    backward: 90 <- 134 at position 5
49
50If you're storing a social graph, the graph might be called "following", and
51you might use the current time as the position, so that a listing of followers
52is in recency order. In that case, if user 134 is Nick, and user 90 is Robey,
53then FlockDB can store:
54
55    forward: Nick follows Robey at 9:54 today
56    backward: Robey is followed by Nick at 9:54 today
57
58The (source, destination) must be unique: only one edge can point from node A
59to node B, but the position and state may be modified at any time. Position is
60used only for sorting the results of queries, and state is used to mark edges
61that have been removed or archived (placed into cold sleep).
62
63
64# Building
65
66In theory, building is as simple as
67
68    $ sbt clean update package-dist
69
70but there are some pre-requisites. You need:
71
72- java 1.6
73- sbt 0.7.4
74- thrift 0.5.0
75
76If you haven't used sbt before, this page has a quick setup:
77[http://code.google.com/p/simple-build-tool/wiki/Setup](http://code.google.com/p/simple-build-tool/wiki/Setup).
78My `~/bin/sbt` looks like this:
79
80    #!/bin/bash
81    java -server -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -Xmx1024m -jar `dirname $0`/sbt-launch-0.7.4.jar "$@"
82
83Apache Thrift 0.5.0 is pre-requisite for building java stubs of the thrift
84IDL. It can't be installed via jar, so you'll need to install it separately
85before you build. It can be found on the apache thrift site:
86[http://thrift.apache.org/](http://thrift.apache.org/).
87You can find the download for 0.5.0 here: 
88[http://archive.apache.org/dist/incubator/thrift/0.5.0-incubating/](http://archive.apache.org/dist/incubator/thrift/0.5.0-incubating/).
89
90In addition, the tests require a local mysql instance to be running, and for
91`DB_USERNAME` and `DB_PASSWORD` env vars to contain login info for it. You can
92skip the tests if you want (but you should feel a pang of guilt):
93
94    $ NO_TESTS=1 sbt package-dist
95
96
97# Running
98
99Check out
100[the demo](http://github.com/twitter/flockdb/blob/master/doc/demo.markdown)
101for instructions on how to start up a local development instance of FlockDB.
102It also shows how to add edges, query them, etc.
103
104
105# Community
106
107- Twitter: #flockdb
108- IRC: #twinfra on freenode (irc.freenode.net)
109- Mailing list: <flockdb@googlegroups.com> [subscribe](http://groups.google.com/group/flockdb)
110
111
112# Contributors
113
114- Nick Kallen @nk
115- Robey Pointer @robey
116- John Kalucki @jkalucki
117- Ed Ceaser @asdf
118