Snyke's Braindump

A Braindead Braindump

Getting Git to Play Nicely With CDNs

Git is a really cool version control system. So cool in fact that I decided to use it to distribute the project I’m working on to several hundreds of Planetlab nodes. So I went ahead and created a repository with git init --bare somewhere in under the root of my local Apache2. Using pssh we can clone and pull from the repository simply by specifying the URL to that repo.

Obviously the traffic is still pretty high, after all every request still ends up at my machine, so I have to serve the whole repository once for each machine. Then I stumbled over CoralCDN, a free Content Distribution Network, that runs on Planetlab. So instead of cloning directly from my machine I took the URL of the repo, added .nyud.net to the domain and cloned from that URL instead.

The drop in traffic when cloning was immediate and I was happily working with this setup, for some time. Then I noticed that having the CDN cache the contents has its drawbacks: if I want to push changes quickly one after another, say, because I noticed a typo just after issuing the update, I have to wait for the cache to time out.

To solve this problem we have to set the objects files, which do not change because it is part of gits content addressable design, and set a short caching time for the few files that do change. Placing this .htaccess file in the repository and activating mod_headers and mod_expires should do the trick:

ExpiresActive On
ExpiresDefault A300
Header append Cache-Control "public"

<FilesMatch "(info|branches|refs|HEAD)">
  ExpiresDefault A10
  Header append Cache-Control "must-revalidate"
</FilesMatch>

This sets everything to be cacheable for 5 minutes (300 seconds), except the references, which tells git where to look for the content.

Bitcoin’s Getting Some Traction

It’s an amazing time to be part of the Bitcoin family. With the Wikileaks scandal we had some quite heated discussions on whether to promote ourselfs as an alternative way for them to acquire funds, but in the end we decided not to, preferring not to be associated with a company being investigated by some countries. However the decision seems to have already been taken for us: as this article in PCWorld demonstrates we are not the only ones making that connection.

Furthermore people are investing more and more resources into Bitcoin as the confidence in the future of the currency grows. Currently the Bitcoin economy containing 4’464’000 coins is worth just short of 1 million USD (MtGox). Meanwhile the growing interest increased the difficulty to generate blocks (the means to acquire new coins and confirm transactions) to incredible heights, and newcomers are getting frustrated at how long it takes them to earn their first real coins. Luckily the Bitcoin Faucet and a pooled mining effort should counteract part of this problem, but the trend is quite clear, people that do not invest heavily into GPUs are will have nearly no chance at accumulating large quantities just by mining, but then where does a country just give you freshly printed money?

In the meantime a lot of discussion is going on about improvements to the Protocol, and what should be part of the Bitcoin ecosystem, specifically an alternative DNS system is in discussion, which would piggyback on the currency transactions.

That should be it for now, if you’re interested why not give Bitcoin a try, join us on the Forum or read up on the latest Developer discussions?

Migrating to JRockit

I’ve been bothered with the now famous PermGen Space error while developing a web application on a local jetty instance quite often, and I was hoping that the problem wouldn’t prove to be that serious once deployed on a tomcat server, but quite the opposite is the case.

The problem happens when the JVM runs out of permanent generation heap space, which most of the time is due to classloaders not being correctly garbage collected. Permanent generation heap space is an optimization that the Sun JVM contains to speed up object creation, but the default size is too small if classes are loaded and unloaded often during runtime, which is exactly the mechanism most application servers load applications. So the first, quick and dirty, solution would be to enlarge the permanent generation heap space: -XX:MaxPermSize=256m. Sadly, this still doesn’t get rid of the problem. Another solution is to use a completely different JVM altogether: JRockit.

JRockit, a proprietary Java Virtual Machine (JVM) from BEA Systems, became part of Oracle Fusion Middleware in 2008. Many JRE class files distributed with BEA JRockit exactly replicate those distributed by Sun. JRockit overrides class files which relate closely to the JVM, therefore retaining API compatibility while enhancing the performance of the JVM. [from Wikipedia]

I wasn’t thrilled having to change JVM because it isn’t available in the openSuse repositories at all, and I wasn’t quite sure how hard it would be to make the switch. As I found out, it’s incredibly easy.

NAT-Hole-Punching Explained

What is the difference between a server and a client? Those of you who have tried to explain this difference to non-technical people will have found it difficult, people seem always to think of servers to be different, huge machines sitting in some climatized room, they are disappointed when I tell them that even my Notebook can be used as a Server. For the purpose of this article a server will be simply a computer that offers some services over the network.

So why can’t every client be a server? A basic rule for servers is that they have to be reachable over the network, clients on the other hand don’t require this and most of aren’t either. The trend is going away from the classic layout where a computer would be connected directly to the internet using a Modem, and towards the small family networks, using wireless structures and requiring another layout:

Thus more and more computers on the network become unreachable from the outside, allowing them to be contacted often requires complex configurations on the NAT (Network Address Translation), and sometimes the simple user can’t do this. This is deadly for P2P! P2P is another approach to offering services, away from the Client-Server paradigm, in this new Network every Client is a Server in the meantime, it is used to distribute the service it’s using to other clients (often called peers since “Client” refers to the Client-Server paradigm). P2P is proving stronger than the Server modell, and is having a huge success amongst all kinds of Companies (no, I’m not only talking about FileSharing) it can be used in many different applications and is cheaper than having to buy huge dedicated machines.

Ignoring all those shielded and unreachable peers is a huge waste of resources, and we absolutely have to find a way to deal with this problem.

The Solution is NAT-Hole-Punching (also called UDP-Hole-Punching, but it is also applicable for TCP/IP) it is a way to reach otherwise unreachable hosts, with a minimal additional effort. All you need is a Peer that is reachable by both Peers that want to establish a connection that coordinates the connections. But let’s start from the beginning. Routers use a NAT-table to decide the packages to drop and those that are to be redirected to a host in its network. When a Computer in the network behind a router wants to open a connection to another computer a SYN-packet is sent to the server, through the router, and the router will register in its NAT-Table that all responses from the ip:port combination will be redirected to the client. Now the problem is if both peers are behind a Router:

Wether Peer A or Peer B try to open a connection it will fail because the other Router will drop the unrequested packets. Now the idea is that both Peers punch a hole in the NAT of their router (punching is a bit a hard word for it, they just tell the router that they want the packets to a certain port to be redirected to them). But ports for outgoing connections are assigned by the Operating System randomly, so what we do is:

  1. Create a socket as we usually would in our program
  2. Get the port this socket is bound to
  3. Inform a transaction handler what our IP:Port combination is
  4. The transaction handler will tell the foreign host this combination and the same way we get the information from the foreign host
  5. Now that we have all the required information we start sending specifically crafted packets with source ip and port we told the transaction handler earlier, and destination ip and port the information we got from the transaction handler.
  6. Eventually one of the two routers will have the hole we were looking for and the packets from the other peers will finally reach the destination, thus the communication has been established.

In the entire process the only precondition is that we have a Peer that is reachable by both Peers, that will act as a transaction handler, this is already given in most of the layouts as for example a Chat where the peers are connected to a central server, MSN which could act as a transaction handler too, or a BitTorrent Tracker. The load on the transaction handler is minimal and does not affect the performance of the P2P Concept, because once the connection is established, the peers become completely independent from the transaction handler.

NAT-Hole-Punching does not weaken the protection that a Firewall or Router gives to its users as for a communication to be established an action from the inside must be taken to open the connection. It still remains difficult or even impossible to open unrequested connections to the inside!

In this article we focused on Routers because they are the most common problem for P2P-Communication, but the concepts are also applicable to most kinds of middleboxes as is explained in more detail in the draft at the bottom of this article.

Interesting readings and resources: