Users often ask if there’s a reasonable way to run GlusterFS so that clients connect to the servers over a public front-end network while the servers connect to each other over a separate private back-end network. This is actually a really good idea, because it helps to isolate things like self-heal and rebalance traffic on a separate network from the one the clients are using (though they still contend for resources on the servers so it’s no panacea). The basic problem is that both client-server and server-server connections are done using the same names, which in a normal configuration will resolve for everyone to the same addresses on the same network. The dirty way to deal with this is to use /etc/hosts or “split horizon” DNS so that the names actually resolve differently – to the front end when queried from the client and to the back end when queried from a server. You can also use host routes. These approaches basically work, but they have some flaws.
- They require extra server configuration (i.e. outside of GlusterFS) that might be forgotten when servers are replaced or rebuilt. If that happens, you won’t get a clean or obvious failure. Instead you’re likely to get an overloaded front-end network.
- They affect all traffic between the servers, even that not related to GlusterFS.
- Inconsistent name resolution often leads to all sorts of other hard-to-debug problems.
I’m going to suggest something that doesn’t solve the first problem (in fact it’s even uglier in many ways) but might avoid the other two. Let’s say you have two servers (gfs1 and gfs2) and each have interfaces on two separate networks (front and back). If you also have a reasonably modern version of iptables you can do something like this on gfs1.
iptables -t nat -A OUTPUT -p tcp -d gfs2-front -j DNAT --to-destination gfs2-back iptables -t nat -A POSTROUTING -p tcp -s gfs1-front -d gfs2-back -j SNAT --to-source gfs1-back
Then you do the inverse on gfs2 and voila – any connections initiated by one to the other get forced onto the back-end network even though they’re using the front-end names and addresses. You can actually generalize the POSTROUTING rule so that it uses netmasks instead of specific hosts, but you’re pretty much stuck adding a new OUTPUT rule for each peer. I told you it’s uglier than the other approaches. The real key thing that makes this different is that you can make it more specific to GlusterFS traffic, leaving other traffic alone, by adding more iptables rules for specific ports. For example:
iptables -t nat -I OUTPUT -p tcp --dport 655 -j ACCEPT
Because this is an insert instead of an append, this rule is hit first and exempts port 655 from the back-end forcing. This is actually pretty important in certain cases. For example, some readers might have recognized port 655 as the one used by tinc. I don’t actually have two physical networks, so I used tinc to create a sort of VPN with separate interfaces. If I hadn’t added the exception above, tinc’s traffic would get forced onto the VPN that tinc itself is providing. That causes an infinite loop and everything breaks, as I found many times before I came up with the “magic” iptables formula above.
There’s more to using tinc than just simulating an extra network, though. It’s also a secure network. Using this trick, you can have your server-to-server connections encrypted and authenticated, without needing to change anything in GlusterFS. I imagine that tinc’s compression feature could also be useful in some cases, though in general probably not.
None of this is any substitute for having advanced networking features in GlusterFS itself. Messing around with tinc and iptables is still awkward at best, there are significant performance implications, it doesn’t do anything for RDMA because it’s all IP-based, and it still doesn’t handle even more advanced configurations such as multiple front-end networks. Still, some people might like to know that it’s possible to do these things today, without having to wait for the appearance of these features in the code itself.