Posts Tagged ‘protocol’

SSH over UDP: How to swallow an elephant

Thursday, December 16th, 2010

(this is a rehash of a message I sent to a private list.  Those of you who aren’t interested in theoretical network discussion, read no further.)

I just got back from a very successful network rollout (100mbps in 4 business days, to a government building no less) that ultimately failed because of the limitations of TCP and latency – the servers to which the data was being sent were too far away to have reasonable throughput.  After some thought, it seems that my problems could have been resolved if I was just streaming dumb packets (UDP) instead of TCP streams that required the ~150ms ACK round trip.  I know I can’t be the first to encounter this, and it would seem that in these edge cases some TCP-over-UDP method of packet delivery would be suitable despite the hackiness of the solution.

My goal: move large packet streams over mildly (max 5%) lossy network elements with long latency (max 600ms) but high bandwidth.  A 100mbps pipe is worthless if you’re trying to move a 10gig file across a trans-continental (or further) link with any realistic latency or even the slightest packet loss.  But if you’re prepared to re-send some data and hold things for a few seconds in buffer, it seems like a vast speed improvement could be obtained.

It seems that UDP would be an ideal way to transmit large blocks of data, with a very large buffer at each end and a non-sequential retransmission strategy for lost packets. This would not be simply re-writing TCP over UDP, with the inherent ACK path for each datagram.  It would be a larger re-write, with large blocks of datagrams collected and ACK’ed in reply packets, with re-transmits possible within the buffer pool and not just at the very end of the buffer pool.  This seems do-able, given that most modern machines have VERY large memory capabilities and the network is typically the weak link.  This might lead to “bursty” output while waiting for blocks to complete during retransmits, but I’d think that the output would be much larger in size over the same time period as that of a comparative TCP stream.

The goal here is not requiring kernel access, and no demand for any control over network elements in the path.  This should be 100% user-land accessible for installation on generic UNIX style hosts without root permissions.

Yes, I have done a bit of Googling on this, but there’s a flood of responses.  Iproxy seems to be for multicast.  bbftp is interesting with the multi-stream method, but is limited to file transfer and not generic TCP connections.  atou seems to require heavy kernel modifications on each side.  I found some ssh-over-UDP sites that are blurry in their details, and they seem to not be sophisticated at all – still blocking at the point where ACKs are requested back on a packet-by-packet basis, and not blasting out huge piles of data and then selectively backfilling if there are drops reported by the receiver.

It would seem to me that SSH would be a great place to shim this in.  The number of services that can run over SSH is growing, and the tunnel capability (both UDP and TCP) and port re-direction seem to be an already versatile set of methods that would benefit from such a shim component to increase bandwidth.  It also has the advantage of having native file transfer (scp) that is well-supported.

Anyone have any ideas on research on this that has already been done, or shim layers that already exist to take advantage of UDP’s fill-the-pipe methodology?  Looks like some people have done experiments, but the data is obscured (paywall) and/or it is unclear that what I’m looking for has actually been attempted.

JT

notes:
http://www.csm.ornl.gov/~dunigan/net100/bulk.html
http://www.csm.ornl.gov/~dunigan/net100/atou.html
http://doc.in2p3.fr/bbftp/
http://horms.net/projects/iproxy/
http://code.google.com/p/udptunnel/
http://publications.lib.chalmers.se/cpl/record/index.xsql?pubid=123799

update: Several people replied to me privately, with these for-pay options which seem to do pretty much what I’m talking about.  A free (open source) variation of this embedded in SSH might be a game-changer.  This seems well within the range of a graduate CS project.

http://www.dataexpedition.com/
http://www.asperasoft.com/images/Aspera_Technology_Capabilities_2010.pdf