Posted 24/May 2011 at 20:57
by in Science & Tech / read by 2431 people

csync2 - keeping data and config files in sync

If you're hoping to find a comprehensive setup guide for csync, get lost. This is not it. What this is, is a little story about how one person discovered the cluster synchronization tool a few weeks ago, battled with getting it to work last night and has gotten to love it to bits over the course of less then 24 hours.

That's right. I like it. A lot ;).

So, I am an internet software developer, regularly install web apps on servers and often need to keep servers in sync with eachother. And not just a few files, but a lot. One setup currently has 55GB of photos in 26324 files. Until a few days ago, I used rsync to keep things in sync and was pretty hooked on it. I ran it a loop every minute so that newly uploaded images would quickly get send to all servers in the cluster. 

But this created quite some traffic, and kept the nodes rather busy as well. After all, rsync checks if every file exists on the node, compares the contents, size or last modification date and builds a list of files to be transferred based on that. And every time it needs to connect to each nodes. Fine for occasional updates, less fine for more regular ones.

So, I had been searching for an alternative method for a while. One that in particular handles larger amounts of files in deeper directory structures better than rsync did. And then I found csync2.

Csync2 keeps a little database which contains the state of each file. This means that whenever it gets invoked, it first updates the database - and only starts to connect to the nodes in case any files were added, modified or deleted. A massive win in the number of connections it needs to make to the nodes, as most of the time there won't be any new files. It also is a lot faster in checking.

The transfer itself seems a bit slower, but I'll take that for granted.

This setup, as the "official csync2 paper" also writes, implies that the master (ie. the server where csync2 is invoked) can only push changes. Well, so be it. You can setup any number of nodes to be a master, so each one of them can take care of pushing their own changes.

The configuration is based on groups. Each group defines a key, which is shared accross all member nodes, and a list of include and exclude patterns. An include pattern could for example be the full path to the application folder of your Zend Framework project, the path to your apache configuration file and perhaps also one a directory with user uploads. If you're using Subversion as your version control system, it could be a good idea to add .svn to the exclude pattern.

One thing I have grown particularly fond of, is the ability to use path-prefixes, define them per server/node (or for multiple servers at the same time using wildcards) and use them in include patterns. On the master you might have several webapps in your home directory, while the production server only has one application somewhere below /var/www/.

If you paid attention, you would have noticed that I included the apache2 configuration file in the list of include patterns you might want to use in your synchronization group. This was meant as a carefully planned bridge to one of the most features that distinguish  csync2 most from rsync. You can setup actions which are to be executed when certain files are synced. In the case of a webserver configuration file, you might want to gracefully restart the server. Aint that neat?

Are you still with me, or have you completely lost track of the point I'm trying to make with my enthusiastic rambling? I know I have!

But it's not all greatness, csync2 comes with. Last night I had almost put it down in the discard pile because I simply couldn't get it to work. I kept getting Permission Denied, Error for peer XXXXX: Connection Closed, config command failed and Establishing SSL connection failed errors. And I was sure I had done everything right, according to the csync2 paper. It was quite clear, on leaving the Common Name field in the ssl certificate empty. But when I had finally figured it out, it turned out to be rather simple:

First, I decided to simplify things by disabling SSL. At the top of my config file, I added nossl * *, and then at least it was able to make the connection. But it still gave me the Permission Denied.

Ok, here I had been a bit clumsy and missed the part of the paper that key file (defined per group) need to be the same on each one of the nodes. Finally, I could sync.

But I wouldn't give up on SSL so easily. I generated new certificates. And again. And again. And again, sometimes throwing away the files in /var/lib/csync2/*.db to make sure it hadn't remembered the old certificates. Each time using the server's hostname in the OU field. Except for the last one, there I just kept all the defaults and ended up with a certificate made out to Internet Widgits Pty Ltd in Some-State, Australia. On both servers. And then I figured it out. All the fields need to have the same value on each one of the nodes. A bit funny way of some extra security. The easiest way to make sure always to have the same values, is to have all values be empty. For example by specifying -subj "/" on the commandline of your openssl request generation.

And with the resolution of what I can only expect to be a rather common pitfall when setting up csync2, I think it's time to wrap up my enthusiastic rambling about csync2 with getting back to the point.

It Rocks!

Ps. Universal peace and a good attitude for everyone.

You might also want to explore these likealots