Free alternative to RocketStream Station? - ftp

I just tried RocketStream Station (www.rocketstream.com) as an alternative to FTP for transferring large files over long Internet distances (relatively high latency) and was blown away by its performance which for me was over 125 times the speed of FTP. It uses Udp for the data channel or a protocol they call "Pdp".
Are there any free (or cheaper) alternatives to this application?

Kurt,
We use a product from a company called Data Expedition, Inc. Their ExpeDat software is a high-speed file transfer product that works very well for us (our avg. file size 4GB, 45Mbps lines, between California and Europe). They are also UDP-based and we compared them to a couple of vendors, roughly 18 months ago, when we were making our decision. Rocketstream was one of the vendors Data Expedition beat out for our situation.
Hope this helps!

I have to also echo Jim's statement above re: ExpeDat and Data Expedition. Our particular scenario involves 100 meg connections between NY and the UK with 10's of GBs being moved daily. For some reason most of the UDP-based applications we tested had a hard time scaling and were only getting us ~50-60% throughput while ExpeDat consistently gets us 90+.
Technology aside, these guys have been really easy to work with (no BS on support or pricing, etc.) and we're pretty glad we found them, definitely a smaller company but one which we've recommended up and down to our clients. www.dataexpedition.com is there site. Good luck all!

Dropbox is always a good option for sharing files with people. Works very reliably and you get 2 gigs for free.

Related

LAMP stack performance under heavy traffic loads

I know the title of my question is rather vague, so I'll try to clarify as much as I can. Please feel free to moderate this question to make it more useful for the community.
Given a standard LAMP stack with more or less default settings (a bit of tuning is allowed, client-side and server-side caching turned on), running on modern hardware (16Gb RAM, 8-core CPU, unlimited disk space, etc), deploying a reasonably complicated CMS service (a Drupal or Wordpress project for arguments sake) - what amounts of traffic, SQL queries, user requests can I resonably expect to accommodate before I have to start thinking about performance?
NOTE: I know that specifics will greatly depend on the details of the project, i.e. optimizing MySQL queries, indexing stuff, minimizing filesystem hits - assuming web developers did a professional job - I'm really looking for a very rough figure in terms of visits per day, traffic during peak visiting times, how many records before (transactional) MySQL fumbles, so on.
I know the only way to really answer my question is to run load testing on a real project, and I'm concerned that my question may be treated as partly off-top.
I would like to get a set of figures from people with first-hand experience, e.g. "we ran such and such set-up and it handled at least this much load [problems started surfacing after such and such]". I'm also greatly interested in any condenced (I'm short on time atm) reading I can do to get a better understanding of the matter.
P.S. I'm meeting a client tomorrow to talk about his project, and I want to be prepared to reason about performance if his project turns out to be akin FourSquare.
Very tricky to answer without specifics as you have noted. If I was tasked with what you have to do, I would take each component in turn ( network interface, CPU/memory, physical IO load, SMP locking etc) and get the maximum capacity available, divide by rough estimate of use per request.
For example, network io. You might have 1x 1Gb card, which might achieve maybe 100Mbytes/sec. ( I tend to use 80% of theoretical max). How big will a typical 'hit' be? Perhaps 3kbytes average, for HTML, images etc. that means you can achieve 33k requests per second before you bottleneck at the physical level. These numbers are absolute maximums, depending on tools and skills you might not get anywhere near them, but nobody can exceed these maximums.
Repeat the above for every component, perhaps varying your numbers a little, and you will build a quick picture of what is likely to be a concern. Then, consider how you can quickly get more capacity in each component, can you just chuck $$ and gain more performance (eg use SSD drives instead of HD)? Or will you hit a limit that cannot be moved without rearchitecting? Also take into account what resources you have available, do you have lots of skilled programmer time, DBAs, or wads of cash? If you have lots of a resource, you can tend to reduce those constraints easier and quicker as you move along the experience curve.
Do not forget external components too, firewalls may have limits that are lower than expected for sustained traffic.
Sorry I cannot give you real numbers, our workloads are using custom servers, high memory caching and other tricks, and not using all the products you list. However, I would concentrate most on IO/SQL queries and possibly network IO, as these tend to be more hard limits, than CPU/memory, although I'm sure others will have a different opinion.
Obviously, the question is such that does not have a "proper" answer, but I'd like to close it and give some feedback. The client meeting has taken place, performance was indeed a biggie, their hosting platform turned out to be on the Amazon cloud :)
From research I've done independently:
Memcache is a must;
MySQL (or whatever persistent storage instance you're running) is usually the first to go. Solutions include running multiple virtual instances and replicate data between them, distributing the load;
http://highscalability.com/ is a good read :)

Accelerated downloads with HTTP byte range headers

Has anybody got any experience of using HTTP byte ranges across multiple parallel requests to speed up downloads?
I have an app that needs to download fairly large images from a web service (1MB +) and then send out the modified files (resized and cropped) to the browser. There are many of these images so it is likely that caching will be ineffective - i.e. the cache may well be empty. In this case we are hit by some fairly large latency times whilst waiting for the image to download, 500 m/s +, which is over 60% our app's total response time.
I am wondering if I could speed up the download of these images by using a group of parallel HTTP Range requests, e.g. each thread downloads 100kb of data and the responses are concatenated back into a full file.
Does anybody out there have any experience of this sort of thing? Would the overhead of the extra downloads negate a speed increase or might this actually technique work? The app is written in ruby but experiences / examples from any language would help.
A few specifics about the setup:
There are no bandwidth or connection restrictions on the service (it's owned by my company)
It is difficult to pre-generate all the cropped and resized images, there are millions with lots of potential permutations
It is difficult to host the app on the same hardware as the image disk boxes (political!)
Thanks
I found your post by Googling to see if someone had already written a parallel analogue of wget that does this. It's definitely possible and would be helpful for very large files over a relatively high-latency link: I've gotten >10x improvements in speed with multiple parallel TCP connections.
That said, since your organization runs both the app and the web service, I'm guessing your link is high-bandwidth and low-latency, so I suspect this approach will not help you.
Since you're transferring large numbers of small files (by modern standards), I suspect you are actually getting burned by the connection setup more than by the transfer speeds. You can test this by loading a similar page full of tiny images. In your situation you may want to go serial rather than parallel: see if your HTTP client library has an option to use persistent HTTP connections, so that the three-way handshake is done only once per page or less instead of once per image.
If you end up getting really fanatical about TCP latency, it's also possible to cheat, as certain major web services like to.
(My own problem involves the other end of the TCP performance spectrum, where a long round-trip time is really starting to drag on my bandwidth for multi-TB file transfers, so if you do turn up a parallel HTTP library, I'd love to hear about it. The only tool I found, called "puf", parallelizes by files rather than byteranges. If the above doesn't help you and you really need a parallel transfer tool, likewise get in touch: I may have given up and written it by then.)
I've written the backend and services for the sort of place you're pulling images from. Every site is different so details based on what I did might not apply to what you're trying to do.
Here's my thoughts:
If you have a service agreement with the company you're pulling images from (which you should because you have a fairly high bandwidth need), then preprocess their image catalog and store the thumbnails locally, either as database blobs or as files on disk with a database containing the paths to the files.
Doesn't that service already have the images available as thumbnails? They're not going to send a full-sized image to someone's browser either... unless they're crazy or sadistic and their users are crazy and masochistic. We preprocessed our images into three or four different thumbnail sizes so it would have been trivial to supply what you're trying to do.
If your request is something they expect then they should have an API or at least some resources (programmers) who can help you access the images in the fastest way possible. They should actually have a dedicated host for that purpose.
As a photographer I also need to mention that there could be copyright and/or terms-of-service issues with what you're doing, so make sure you're above board by consulting a lawyer AND the site you're accessing. Don't assume everything is ok, KNOW it is. Copyright laws don't fit the general public's conception of what copyrights are, so involving a lawyer up front can be really educational, plus give you a good feeling you're on solid ground. If you've already talked with one then you know what I'm saying.
I would guess that using any p2p network would be useless as there is more permutations then often used files.
Downloading parallel few parts of file can give improvement only in slow networks (slower then 4-10Mbps).
To get any improvement of using parallel download you need to ensure there will be enough server power. From you current problem (waiting over 500ms for connection) I assume you already have problem with servers:
you should add/improve load-balancing,
you should think of changing server software for something with more performance
And again if 500ms is 60% of total response time then you servers are overloaded, if you think they are not you should search for bottle neck in connections/server performance.

Proving that replacing hardware will improve developer performance

Now the machines we are forced to use are 2GB Ram, Intel Core 2 Duo E6850 # 3GHz CPU...
The policy within the company is that everyone has the same computer no matter what and that they are on a 3 year refresh cycle... Meaning I will have this machine for the next 2 years... :S
We have been complaining like crazy but they said they want proof that upgrading the machines will provide exactly X time saving before doing anything... And with that they are only semi considering giving us more RAM...
Even when you put forward that developer resources are much more expensive than hardware, they firstly say go away, then after a while they say prove it. As far as they are concerned paying wages comes from a different bucket of money to the machines and that they don't care (i.e. the people who can replace the machines, because paying wages doesn't come from their pockets)...
So how can I prove that $X benefit will be gained by spending $Y on new hardware...
The stack I'm working with is as follows: VS 2008, SQL 2005/2008. As duties dictate we are SQL admins as well as Web/Winform/WebService Developers. So its very typical to have 2 VS sessions and at least one SQL session open at the same time.
Cheers
Anthony
Actually, the main cost for your boss is not the lost productivity. It is that his developers don't enjoy their working conditions. This leads to:
loss of motivation and productivity
more stress causing illness
external opportunities causing developers to go away
That sounds like a decent machine for your stack. Have you proven to yourself that you're going to get better performance, using real-world tests?
Check with your IT people to see if you can get the disks benchmarked, and max out the memory. Management should be more willing to take these incremental steps first.
The machine looks fine apart from the RAM.
If you want to prove this sort of thing time all the things you wait for (typically load times and compile times), add it all up and work how much it costs you to sit around. From that make some sort of guess how much time you'll save (it'll have to be a guess unless you can compare like with like, which is difficult if they won't upgrade your systems). You'll probably find that they'll make the money back on the RAM at least in next to no time - and that's before you even begin to factor in the loss of productivity from people's minds wandering whilst they wait for stuff to happen.
Unfortunately if they're skeptical then it's unlikely you can prove it to them in a quantitative way alone. Even if you came up with numbers, they'll probably question the methodology. I suggest you see if they're willing to watch a 10 minute demo (maybe call it a presentation), and show them the experience of switching between VS instances (while explaining why you'd need to switch and how often), show them the build process (again explaining why you'd need to create a build and how often), etc.
Ask them if you're allowed to bring your own hardware. If you're really convinced it would make you more productive, upgrade it yourself and when you start producing more ask for a raise or to be reimbursed.
Short of that though..
I have to ask: what else are you running? I'm not really that familiar with that stack, but it really shouldn't be that taxing. Are they forcing you to run some kind of system-slowing monitoring or antivirus app?
You'd probably have better luck convincing them to let you change that than getting them to roll out new updates.
If you really must convince them, your best bet is to benchmark your machine as accurately as you can and price out exactly what you need upgraded. Its a lot easier to get them to agree to an exact (and low) dollar amount than some open-ended upgrade
Even discussion this with them for more than five minutes will cost more than just calling out to your local PC dealer and buy the RAM out of your own pocket. Ask you project lead whether they can put it on the tab of the project as another "development tool". If (s)he can't, don't bother and cough up the
When they come complaining, then put the time of the meetings for this on their budget (since they come crying). See how long they can take this.
When we had the same issue, my boss bought better gfx cards for the whole team out of his own pockets and went to the PC guys to get each of us a second monitor. A few days later, he went again to get each of us 2GB more RAM, too.
The main cost from slow developer machines comes from the slow builds and the 'context switching', ie the time that it takes you to switch between the tasks required of you:
Firing up the second instance of VS and waiting for it to load and build
Checking out or updating a source tree
Starting up another instance of VS or checking out a clean source tree to 'have a quick look at' some bug that's been assigned
Multiple build/debug cycles to fix difficult bugs
The mental overhead in switching between different tasks, which shouldn't be underestimated
I made a case a while ago for new hardware after doing a breakdown of the amount of time that was wasted waiting for the machine to catch up. In a typical day we might need to do 2 or 3 full builds at half an hour each. The link time was around 3 minutes, and in a build/debug cycle you might do that 40 times a day. So that's 3.5 hours a day waiting for the machine. The bulk of that is in small 2 or 3 minute pockets which isn't long enough for you to context switch and do something else. It's long enough to check your mail, check stackoverflow, blow your nose and that's about it. So there's nothing else productive you can do with that time.
If you can show that a new machine will build the full project in 15 minutes and link in 1 minute then that's theoretically given you an extra 2 hours of productivity a day (or more realistically, the potential for more build cycles).
So I would get some objective timings that show how long it takes for different parts of your work cycle, then try to do comparative timings on machines with 4GB of RAM, a second drive (eg something fast like a WD Raptor), an SSD, whatever, to come up with some hard figures to support your case.
EDIT: I forgot to mention: present this as your current hardware is making you lose productivity, and put a cost on the amount of time lost by multiplying it by a typical developer hourly rate. On this basis I was able to show that a new PC would pay for itself in about month.
Take a task you do regularly that would be improved with faster hardware - ex: running the test suite, running a build, booting and shutting down a virtual machine - and measure the time it takes with current hardware and with better hardware.
Then compute the monthly, or yearly cost: how many times per month x time gained x hourly salary, and see if this is enough to make a case.
For instance, suppose you made $10,000/month, and gained 5 minutes a day with a better machine, the loss to your company per month would be around (5/60 hours lost a day) x 20 work days/month x $10,000 / 8 hours/day = $105 / month. Or about $1200/year lost because of the machine (assuming I didn't mess up the math...). Now before talking to your manager, think about whether this number is significant.
Now this is assuming that 1) you can measure the improvement, even though you don't have a better machine, and 2) while you are "wasting" your 5 minutes a day, you are not doing anything productive, which is not obvious.
For me, the cost of a slow machine is more psychological, but it's hard to quantify - after a few days of having to wait for things to happen on the PC, I begin to get cranky, which is both bad for my focus, and my co-workers!
It’s easy; hardware is cheap, developers are expensive. Throwing reasonable amounts of money at the machinery should be an absolute no brainer and if your management doesn’t understand that and won’t be guided by your professional opinion then you might be in the wrong job.
As for your machine, throw some more RAM at it and use a fast disk (have a look at how intensive VS is on disk IO using the resource monitor – it’s very hungry). Lots of people going towards 10,000 RPM or even SSD these days and they make a big difference to your productivity.
Try this; take the price of the hardware you need (say fast disk and more RAM), split it across a six month period (a reasonable time period in which to recoup the investment) and see what it’s worth in “developer time” each day. You’ll probably find it only needs to return you a few minutes a day to pay for itself. Once again, if your management can’t understand or support this then question if you’re in the right place.

How can I estimate ethernet performance?

I need to think about performance limitations of 100 mbps ethernet (including scenarios with up to ~100 endpoints on the same subnet) and I'm wondering how best to go about estimating the capacity of the network. Are there any rules of thumb for this?
The reason I ask is that I am working on some back-of-the-envelope level calculations about performance limitations, so it doesn't need to be incredibly accurate. I just haven't been through this exercise before and was hoping to gain some insight from those who have. Mark Brackett's answer (as of 1/26) is along the lines of what I am looking for.
If you're using switches (and, honestly, who isn't these days) - then I've found 80% of capacity a reasonable estimate. Usually, it's really about 90% because of TCP overhead - but 80% accounts for occasional retransmits.
If it's a single collision domain (hubs), then you'd probably be around 30% with moderate activity on those 100 nodes. But, it'd be pretty variable based on the traffic generated. And anyone putting 100 nodes in a single CD these days would no doubt be shot - so I don't think you'll actually run into those IRL.
Edit: Note that these numbers are for a relatively healthy network - one that is generally defined as working. Extremely excessive broadcasts or other anomalous traffic patterns have been known to bring a network to it's knees.
Use WANem
WANem is a Wide Area Network Emulator,
meant to provide a real experience of
a Wide Area Network/Internet, during
application development / testing over
a LAN environment.
You can simulate any network scenario using it and then test your application's behaviour using it. It is open-source and is available with sourceforge.
Link : WANem - The Wide Area Network emulator
Opnet creates software for simulating network performance. I once used Opnet IT Guru Academic edition. Maybe this application or some other software from opnet may be of some help.
100 endpoints are not suppose to be an issue. If the network is properly configured (nothing special) the only issue is the bandwidth. Fast Ethernet (100 mbps) should be able to transfer almost 10Mb (bytes) per second. It is able to transfer it to one client or to many. If you are using hubs instead of switches. And if you are using half-duplex instead of full-duplex. Then you should change that( this is the rule of thumb).
Working from the title of your post, "How can I estimate Ethernet performance", see this wiki link; http://en.wikipedia.org/wiki/Ethernet_frame#Maximum_throughput

Average User Download Speeds

Any ideas what the average user's download speed is? I'm working on a site that streams video and am trying to figure out what an average download speed as to determine quality.
I know i might be comparing apples with oranges but I'm just looking for something to get a basis for where to start.
Speedtest.net has a lot of stats broken down by country, region, city and ISP. Not sure about accuracy, since it's only based on the people using their "bandwidth measurement" service.
It would depend on the geography that you are targeting. For example, in India, you can safely assume it would be a number below 256kbps.
Try attacking it from the other angle. Look at streaming services that cater to the customer you want, and have significant volume (maybe youtube) and see what they're pushing. You'll find there'a pretty direct correlation between alexa rating (popularity) and quality(minimum bitrate required). Vimeo will always have fewer users than Youtube because the user experience is poor for low bitrate users.
There are many other factors, and this should only form one small facet of your bandwidth decision, but it's a useful comparison to make.
Keep in mind, however, that you want to degrade gracefully. As more and more sites come online you'll start bumping into ISPs that limit total transfer, and being able to tell your customers how much of their bandwidth your site is consuming is useful, as well as proclaiming that you are a low bandwidth site.
Further, more and more users are using portable cellular connections (iPhone) where limited bandwidth is a big deal. AT&T has oversold many markets so being able to get useful video through a tiny link will enable you to capture market that vimeo and Hulu cannot.
Quite frankly, though, the best thing to do is degrade on the fly gracefully. Measure the bandwidth of the connection continuously and adjust bandwidth as needed for a smooth playback experience with good audio. Then you can take all users across the gamut...
-Adam
You could try looking at the lower tier offerings from AT&T and Comcast. Probably 1.5 Mbps for the basic level (which I imagine most people get).
The "test your bandwidth" sites may have some stats on this, too.
There are a lot of factors involved (server bandwidth, local ISP, network in between, etc) which make it difficult to give a hard answer. With my current ISP, I typically get 200-300 kB/sec. Although when the planets align I've gotten as much as 2 MB/sec (the "quoted" peak downlink speed). That was with parallel streams, however. The peak bandwidth I've achieved on a single stream is 1.2 MB/sec
The best strategy is always to give your users options. Why don't you start the stream at a low bitrate that will work for everyone and provide a "High Quality" link for those of us with FTTH connections? I believe YouTube has started doing this.
According to CWA, the average US resident has a 1.9Mbps download speed. They have data by state, so if you have money then you can probably get a more specific report for your intended audience. Keep in mind, however, that more and more people are sharing this with multiple computers, using VOIP devices, and running background processes that consume bandwidth.
-Adam
Wow.
This is so dependent on the device, connection method, connection type, ISP throttling, etc. involved in the end-to-end link.
To try and work out an average speed would be fairly impossible.
Think, fat pipe at home (8Gb plus) versus bad wireless connection provided for free at the airport (9.6kb) and you can start to get an idea of the range of connections you're trying to average over.
Then we move onto variations in screen sizes and device capabilities.
Maybe trawl the UA stings of incoming connectins to get an idea of the capabilities of the user devices being used out there.
Maybe see if you can use some sort of geolocation solution to try and see how people are connecting to your site to get an idea of connection capabilities as well.
Are you offering the video in a fixed format, i.e. X x Y pixel size?
HTH.
cheers,
Rob
If I'm using your site, "average" doesn't matter. All I care about is MY experience, and so you either need to make the site adaptive, design for a pretty low speed (iPhone 2G gets you 70-80 kbps if you're lucky, to take one common case), or be very clear about the requirements so I can decide whether or not my connection-of-the-moment will work or not.
What you don't want to subject your users to is unpredictably choppy, intermittent video and audio.

Resources