Sunday 22nd April 2012
London Realtime - Live Router Stats
Last weekend, I attended the London Realtime hackathon.
There were over 150 attendees, and by the end of the weekend, 27 hacks had been put together.
I rocked up at White Bear Yard after work on Friday night, and discovered rapidly that there was a problem with the Wifi coverage at the event. Chris Leydon grabbed me for my skills as a Sysadmin to see if I could figure out a solution.
Basically, they'd bought a Draytek 2920w router, which was simply rebooting every 5-10 minutes. I grabbed a spare Macbook, and installed the Logstash agent. Next step was to point the Syslog target of the Draytek at the Logstash server, and grab some logfiles whilst it was shifting traffic, with the hope of seeing a pattern before a reboot.
Attempting to log traffic to a USB stick was also futile, as the router didn't always flush() before rebooting, so there was no guarantee that that session would contain any logs. Hence the need to log to something a little more reliable. Logstash was it.
I have a lot of time for Logstash, and will write on that topic soon, as I've just implemented a large centralised syslogging platform at $work.
But I digress.
We attempted a firmware upgrade, but discovered that the Draytek was running the latest "stable" release. Stable my arse.
I decided that I'd head home fairly early, and come back with some better grade routing hardware on Saturday morning.
At home, I've got a Cisco 2621XM ISR, and a handful of TP-Link Wireless APs, which despite their low cost, are surprisingly capable, and work really rather well.
So on Saturday morning, after a brief trip to $work to collect an unused 10/100 24 port switch made by a hitherto unheard of company called "Planet", I turned up at LDNRealtime for another go at overhauling the network.
We basically had internet access provided by the office space that we were using, on some 10.0.100.0/29 IP, which we couldn't really use, for the number of clients we wanted to use, so I set the Cisco up to present a DHCP server on 192.168.6.1/24, which gave us enough IP addresses for ~250 clients.
We shoved the router and switch into the comms cabinet on site, and started patching in the odd floor ports for connecting the Wireless APs to.
Within a couple of hours, we had good, stable multi-AP wireless coverage of the upstairs floor, at least. Which was where most of the geeks were hanging out, so that was all good.
I decided that for my hack, I'd try to use Geckoboard, a customisable web dashboard, to display live stats from the router via SNMP.
The only minor problem with this plan was that I couldn't get a public IP on the router's Fa0/0 interface, only the one from within the RFC1918 network of the office.
My basic plan was this:
Part 1 was trivial. I toyed with the net-snmp libraries for Python, and then decided that the quickest way to do it was to shell out to snmpget with the subprocess module, and shove it through a regex to grab the value of a Counter32 type.
That bit worked fine, and was reliable enough without having to fiddle with low-level libraries. Sidenote: There appears to be no, decent high-level interface to SNMP for python.
Part 2 was also trivial, and basically involved calculating a delta for the amount of data sent and received on Fa0/0, which is the LAN interface on the Cisco.
As I've mentioned, I wasn't able to get a public IP, so direct pull/push to the VM was out. But as Amazon Web Services were a sponsor, it seemed only fitting to fire up a Micro instance, and serve the files from there.
I could have set up some intricate service between the two, using SSH forwarding and rabbitMQ in order to deliver the data between the collector and the presentation server, but I instead opted for a far lower level solution. Less moving parts, so to speak.
I created a user on the EC2 node, and a user on the VM, created SSH keys, and copied the .pub across, then had the python daemon write out the JSON object to ./tmp, then shell out again with subprocess to scp, and transfer the file across to the EC2 presentation server.
From there, it was easy to just install Apache, throw in an
AddType application/json .json
line to the Default config, and serve the files from there.
It wasn't realtime per se, as you can't easily poll SNMP data every second (without overloading the router), so it had an effective granularity of about 30s. Which as it turns out is fine, as 30s is Geckoboard's finest refresh granularity too.
At 3AM on Sunday morning, I added Line Charts to the dashboard, showing current and historical usage (over the last 3 hours or so). To do this, I added Redis to the stack of things powering the app, and basically, every time I grabbed new data, I'd do a LPUSH to a key storing a bunch of time-series values for the data usage delta, then pull the last 100 or so, and use those to build the JSON object.
This is an example of the output (I've set it to serve static files now, so it's Sunday Evening's data forever). https://ldnrealtimetrouter.geckoboard.com/dashboard/2A2BCD35F43DE96C/
Geckoboard is a lovely slick dashboard, and I really enjoyed using it in my hack. I have only 2 minor issues with it, both of which I discussed with the Geckoboard team over the weekend. Firstly, the numeric value widget defaults to "Financial" data presentation, ie 1,000,000,000 becomes 1B, not 1G (ish), which is irritating for data presentation use. It's not also obvious whether it's using long-scale or short-scale Billions.
The other bug comes when using the Simple Line Chart widget, which is you can basically only use a maximum of ~250 datapoints in your line, as it's effectively a wrapped call to Google Charts, and after that length, you hit restrictions on URL length.
This bug is effectively solved by using the HighCharts widget.
Another bit of antiquated network hardware from my personal collection is my Axis 205 IP Camera, which I also brought along (mostly for fun), and then proceeded to set up as a streaming webcam for saturday+sunday.
The Axis 205, although initially a bugger to set up, as it doesn't have a particularly intuitive setup procedure involving static ARP and pings and so on, is a pretty robust camera, and probably one of the smallest IP cameras available.
It provides a web interface to a MotionJPEG stream, which is excellent if you're viewing on the LAN, but a bit of a bitch to proxy.
You can't use a straight-off HTTP reverse proxy, like Varnish or just Apache, as it doesn't work like that. It's a bit more like having to proxy a websocket.
Whilst my previous hack had been effectively stateless communication between the Local VM and the EC2 instance, this would require a bit more ingenuity to get the traffic across the wire in one piece.
I experimented with a simple netcat pipeline, basically one netcat to listen to the MJPEG stream, and then pipe it into another on the EC2 instance, but this doesn't work, because once you've got the stream, you can't very easily present it to a bunch of people.
VLC apparently can't transcode MJPEG. Sadly.
So this Node Proxy was pretty much the only sensible solution.
This document was most useful in the creation of a Point-to-Point VPN between my VM and EC2, all over SSH. The advantage of this was being able to present a Layer 3 interface from the Amazon EC2 instance to the VM, without any special port forwarding, or connectivity, so it could open as many ports as it wants without any special configuration.
Many people are familiar with the use of ssh to forward ports, but few are aware that it can actually be used to create a point-to-point tunnel. Basically, you get a pair of tun0 devices, one on each end of your tunnel, assign IP addresses to them, and away you go.
I didn't even need to set up routing, as all I needed was for one side to be able to connect to the other.
I ran a local proxy on the VM, which listened to the IP Camera's MJPEG stream from 192.168.6.6, and presented it as a new stream on the tunnel interface (172.16.254.10).
On the EC2 instance, I ran another instance of the Node Proxy, to listen to 172.16.254.10, and re-broadcast the MJPEG stream on the public interface of the EC2 node. I reserved an Elastic IP for this, just to make it a little bit easier, and provide something we could point the A record (webcam.londonrealtime.co.uk) at more easily.
Could've done it with a CNAME, but Elastic IPs are more stable.
I also considered using the Amazon VPC and connecting an IPSEC tunnel to the Cisco directly, but this would've taken me all weekend to set up, as I didn't seem to have VPC enabled, and getting it enabled was taking some time. This was quicker, but potentially dirtier, and did get the job done.
So by the end of Saturday, we had the Live Router Stats working, and then by midnight on Saturday, we had a streaming webcam feed.
The webcam feed had an interesting side-effect to the router stats, as more people connected, they each got about 1.0Mbit of video streamed to them every second. By the time it came to the prize giving on Sunday afternoon, we were pushing about 55.0Mbit out of the router. The highest I saw it get to was 72Mbit out, 25Mbit in. Which is pretty damn impressive for a 10 year old Cisco.
I also found some time to help out Lawrence Job with his interesting "hardware hack", GoCubed. He'd brought along an Arduino, and a strip of 32 RGB LEDs, and was interfacing the GoSquared API (which gives you a current visitors count), with the LED Strip, for a live visitors count.
Together, we wrote a bit of VB.Net to drive the Serial port interface to the Arduino (not having an Ethernet shield), to grab the data from the GoSquared API, and present it to the Arduino in a sensible format.
You can see more of that here.
So that was London Realtime. I had a lot of fun, it was my first-ever hackathon, and I found a great niche to work in.
Many thanks again to all of the API sponsors, and the people of White Bear Yard for putting up with us all. Thanks to Chris Leydon, James Gill, Geoff Wagstaff, James Taylor, Saul Cullen and the rest of the GoSquared team for making this an insanely good weekend.
Here's some videos from Friday
Tom is the reason why Waldo is hiding.