Data Transfer Speeds: NFS vs HTTP - performance

Am currently considering using REST access to Nirvanix online storage to store/download files. However, Nirvanix also offers NFS access to the network storage.
I was wondering if there are any known benchmarks or protocol-specific reasons for choosing REST over NFS?

Use whatever best fits your environment. Any difference is going to be negligible, especially over non-LAN-speed links where things like CPU usage become irrelevant as they're overwhelmed by the simple fact that the link is already saturated.
One possible exception is dealing with lots of little files. If your use case involves rapid access to a lot of little files, I'd suggest testing both and seeing if one is faster by a large enough margin to matter.

It's a toss-up.
NFS, with the right setup, version, and tuning, is just a tad slower than SMB/CIFS. Older versions, however, can be significantly slower.
What you do gain with NFS is:
primitive file access control (via standard Unix file permissions)
primitive share access control
user mapping
For those platforms that support it, a near-invisibility with regard to operation. It looks just like another subdirectory...
However, if you are not working in a 100% NFS environment, you might find that it's not worth the effort.
By the way, for the record, Windows 7 Beta/RC does support NFS out of the box.

They should be almost the same, but there is one big difference NFS normally works over UDP(can be configured to run over TCP) and HTTP over TCP.. so if you have a high packet loss then HTTP should be more stable!

NFS is not a file transfer protocol, it's a Network File System protocol. Properly configured and implemented, it should be possible for HTTP to beat it easily.
It will depend on the details of what you're trying to do. If you're just uploading and downloading entire files, then I suspect you'll be able to configure HTTP to do a lot better than NFS.
Recall also that NFS was created in an earlier time. Is NFS 2.0 still the latest version? I recall updating the code of an NFS implementation from 2 to 3. That was in 1996 or so.

Related

Implementing "extreme" bandwidth saving for web browsing with a compression proxy

I have a network connection where I pay per megabyte, so I'm interested in reducing my bandwidth usage as far as possible while still having a reasonably good browsing experience. I use this wonderful extension (https://bandwidth-hero.com/). This extension runs a image-compression proxy on my heroku account that accepts images URLs, and returns a low-quality version of those images.This reduces bandwidth usage by 30-40% when images are loaded.
To further reduce usage, I typically browse with both JavaScript and images disabled (there are various extensions for doing this in firefox/firefox-esr/google-chrome). This has an added bonus of blocking most ads (since they usually need JavaScript to run).
For daily browsing, the most efficient solution is using a text-mode browser in a virtual console such as elinks/lynx/links2 running over ssh (with zlib compression) on a VPS server. But sometimes using JavaScript becomes necessary, as sites will not render without it .Elinks is the only text-mode browser that even tries to support JavaScript, and even that support is quite rudimentary. When I have to come back to using firefox/chrome, I find my bandwidth usage shooting up. I would like to avoid this.
I find that bandwidth is used partially to get the 'raw' html files of the sites I'm browsing, but more often for the associated .js/.css files. These are typically highly compressible. On my local workstation, html+css+javascript files typically compress by a factor of more than 10x when using lzma(2) compression.
It seems to me that one way to do drastically reduce bandwidth consumption would be to use the same template as the bandwidth-hero extension, i.e. run a compression proxy either on a vps or on my heroku account but do so for text content (.html/.js/.css).
Ideally, I would like to run a compression proxy on my local machine. When I open a site (say www.stackoverflow.com), the browser should send a request to this local proxy. This local proxy then sends a request to a back-end running on heroku/vps. The heroku/vps back-end actually fetches all the content, and compresses it (lzma/bzip/gzip). The compressed content is sent back to my local proxy. The local proxy decompresses the content and finally gives it to the browser.
There is something like this mentioned in this answer (https://stackoverflow.com/a/42505732/10690958) for node.js . I am thinking of the same for python.
From what google searches show, HTTP can "automatically" ask for gzip versions of pages. But does this also apply for the associated files that are loaded by JavaScript, and for the css files? Perhaps, what I am thinking about is already implemented by default ?
Any pointers would be welcome. I was thinking of writing a local proxy in python,as I am reasonably fluent in it. But I know little about heroku or the intricacies of HTTP.
thanks.
Update: I found a possible solution here https://github.com/barnacs/compy
which does almost exactly what I need (minify+compress with brotli/gzip+transcode jpeg/gif/png). It uses go instead of python, but that does not really matter. It also has a docker image here https://hub.docker.com/r/andrewgaul/compy/ . Since I'm not very familiar with heroku, I cant figure out how to use this to run the compression proxy service on my account. The heroku docs also weren't of much help to me. Any pointers would be welcome.

How to speed up the TYPO3 Backend?

Given: Each call to a BE module takes several seconds even with a SSD drive. (A well configured setup runs below 1 second for general BE tasks.)
What are likely bottlenecks?
How to check for them?
What options to speed up?
On purpose I don't give a special configuration, but ask for a general checklist, so that the answer is suitable for many people as first entry point.
General tips on performance tuning for TYPO3 can be found here: https://wiki.typo3.org/Performance_tuning
However, in my experience most general performance problems are due to one of a few reasons:
Bad/no caching. Usually this is a problem with one or more extensions (partly) disabling cache. Try disabling all third party extensions and enabling them one by one to see which causes the site to slow down the most. $GLOBALS['TSFE']->set_no_cache() will disable all cache, so you could search for that. USER_INT and COA_INT in TypoScript also disable cache for anything that's configured inside there.
A lot of data. Check the database for any tables containing a lot of data. How many constitutes "a lot", depends on a lot of factors, but generally anything below a million records shouldn't be too much of a problem unless for example you do queries with things like LIKE '%...%' on fields containing a lot of data.
Not enough resources on the server. To fix this, add more memory and/or CPU cores to the server. Or if it's a shared server, reduce the number of sites running on it.
Heavy traffic. No matter how many resources a server has, it will always have a limit to the number of requests it can process in a given time. If this is your problem you will have to look into load balancing and caching servers. If you don't (normally) have a lot of visitors, high traffic can still be caused by robots crawling your site too quickly. These are usually easy to block on IP address in your firewall or webserver configuration.
A slow backend on a server without any other traffic (you're the only one who can access it) rules out 1 (can only cause a slow backend if users are accessing the frontend and causing a high server load) and 4 (no other traffic).
one further aspect you could inspect: in the user record a lot of things are stored, for example the settings you used in the log module.
one setting which could consume a lot of memory (and time to serialize and deserialize) is the state of the pagetree (which pages are expanded/ which are not).
Cleaning the user settings could make the backend faster for this user.
If you have a large page tree and the user has to navigate through many pages the effect will stall. another draw back: you loose all settings as there still is no selective cleaning.
Cannot comment here but need to say: The TSFE-Object does absolutely nothing in the TYPO3 Backend. The Backend is always uncached. The TYPO3-Backend is a standalone module to edit and maintenance the frontend output. There are tons of Google search results that will ignore this fact.
Possible performance bottlenecks are poor written extensions that do rendering or data processing. Hooks to core functions are usually no big deal but rendering of many elements for edit forms (especially in TYPO3s Fluid Template Engine) can cause performance problems.
The Extbase-DBAL-Layer can also cause massive performance problems. The reason is the database model does not know indexes. It' simple but stupid. A SQL-Join on a big table of 2000 records+ will delay the output perceptibly, depending on the data model.
Also TYPO3 Backend does not really depend on the Typoscript-Configuration but in effect to control some output or loaded by extensions, the full parsing of the *.ts files is needed. And this parser is very slow.
If you want to speed things up you need to know what goes wrong. The only way to debug this behaviour is to inspect the runtime with a PHP profiling tool like xdebug because the TYPO3 Framework is very complex. It's using some kind of Doctrine Framework and will load tons of files, by every request. Thus a good configured OpCache is a must.
Most reason the whole thing is slow is because it is poor written. You can confirm that fact by inspecting the runtime.
In addition to what already has been said, put the runtime environment onto your checklist:
Memory:
If heavy IDE and other tools are open at the same time, available memory can become an issue. To check the memory profile, you may start a tool that monitors the memory usage of the machine.
If virtualization is used, check the memory assigned to the box. Try if assigning more memory improves behaviour.
If required and possible spend more memory to your machine. This should not be a bugfix to poorly written code. Bad code can blow up any size of memory.
File access:
TYPO3 reads and writes thousands of files. If you work with a contemporary SSD, this is surprisingly fast. I did measure this. Loading all class files of TYPO3 takes just a fraction of a second.
However this may look different if you do not work with a standard setup. Many factors may slow you down:
USB-Sticks as storage.
Memory cards as storage.
All kind of external storage may be limited due to slow drivers.
Virtualization can become an issue. Again it's a question of drivers.
In doubt test and store your files and DB on a different drive to compere the behaviour.
Routing
The database itself may be fast. A bad routing of your request may still slow you down. Think of firewalls, proxies etc. even on your local machine and specially if virtualisation is used.
Database connection:
I fast database connection is crucial. If the database access is slow TYPO3 can't be fast.
Especially due to Extbase TYPO3 often queries much more data than really required and more often than really required, because a lot of relations are resolved in the PHP layer instead of the DB layer itself. Loading data structures like the root line may cause a lot of ping-pong between the PHP and the DB layer.
I can't give advice, how to measure your DB-connection. You have to as your admin for that. What you always can do is to test and compare with another DB from a completely different environment.
The speed of the database may depend on the type of the database itself. Typically you use MySQL/Maria-DB which should be fast. It also depends on the factors mentioned above, memory, file access and routing.
Strategy:
Even without being and admin and knowing all performance tools, you can always exchange parts of your system and check if matters improve. By this approach you can localise the culprit without being an expert. Once having spotted the culprit, Google may help you to get more information.
When it comes to a clean and performant setup of routing or virtualisation it's still the best idea to ask an experienced admin.
Summary
This is all in addition to what others have already pointed to.
What would be really helpful would be a BE-Plugin, that analyses and measures the environment. May there are some out there I don't know.

How can I fine-tune cowboy's runtime behavior?

I'm in the process of choosing a technology for my high-throughput web server. I've created two naive implementations, one in Go and one in Elixir, using Phoenix.
I've deployed these versions on an extra large machine on AWS, and used siege to benchmark their performance.
I've managed to increase Go's performance after setting the GOMAXPROCS, but running the Elixir version seems to reach its peak performance long before it fully utilizes the machine's CPU or memory.
I couldn't seem to find any documentation or explanation on how I can fine-tune cowboy's behavior in production settings, so it will properly utilize the machine it runs on, and produce the performance everybody talks about...
I'm pretty sure that there is a simple place (file or environment variable) where I can tweak a value or two to produce much better results.
Can anyone tell me where that place may be?
Following the suggestions in the comments, I've re-implemented my project using plug instead of phoenix.
With the same functionality (parsing post body to JSON, calling DynamoDB, reading from an Amnesia table and formatting a JSON response) I've received a much better performance, with far more resource utilization.
I guess I can still "milk" a few more requests per second (currently I get around 500 requests per second), but it is now on-par with the Go implementation of the same thing...
I don't have enough rep to comment directly, so I'll answer here. I'd love to see the numbers you got with Phoenix. Were you running in prod mode? Perf will be much slower if you were running in dev (the default) since code reloading is enabled and checking on every request. Vanilla Plug is going to be doing less work than Phoenix, but not much less. A standard Phoenix Router/Controller should be more or less inline with the Plug code you end up with.

How many connections/how much bandwidth can Apache handle?

This is a request for pointers to good documentation/good articles. I'm looking for information on how many connections an Apache server can reasonably handle, and potentially how to load balance between multiple servers. I've done Google searches but it's harder for beginners to judge what are good docs.
Apache 1.3 had some nasty scalability limitations, but later versions are designed to scale with the hardware and operating system, making them the bottleneck rather than the web server itself. As always, though, it comes down to how you configure and tune it if you want uber performance. Each situation has its own demands, and they're documented here:
http://httpd.apache.org/docs/2.2/misc/perf-tuning.html
The above assumes you're serving static content, which is where Apache excels. If you run webapps behind it, that's your bottleneck, not Apache.
Unfortunately you'll be disappointed.
Apache's ability to handle connections (and indeed any other web server's) is limited by what the web application sitting on top of it is doing. If you're serving static pages, you will be able to serve a lot of requests with very little hardware.
Depending on the IO workload (Apache cannot work faster than the IO subsystem - install enough ram to cache your entire content, if you can), you will be able to fill up a gigabit network on any reasonable spec modern box.
Once you've filled a gigabit network, you'll have other things to worry about.
But the reasons that you really need load balancers are because your application slows down Apache and uses up the box's resources. Your application will not be infinitely fast, nor infinitely scalable. You'll need to address those issues.
As the previous answers have pointed out it is generally not the case that Apache becomes the bottleneck, instead it is usually the application server (PHP, Mongrel, etc). However, if you are only serving static content then you will want to do some benchmarking to see how fast it can go. Of course it is unlikely to peg the exact number which Apache will be able to serve since a lot depends on how you configure it (e.g. disabling persistent connections) and the specs of the server. However to get a ballpark estimate you can use this benchmark as a reference since it is run on 1-8 cores (using one or two servers) so you should be able to find something reasonably comparable to the hardware you are considering.
Of course in order to get the most accurate results you will want to test it yourself using a load generator like ab or httperf.

Images in load balanced environment

I have a load balanced enviorment with over 10 web servers running IIS. All websites are accessing a single file storage that hosts all the pictures. We currently have 200GB of pictures - we store them in directories of 1000 images per directory. Right now all the images are in a single storage device (RAID 10) connected to a single server that serves as the file server. All web servers are connected to the file server on the same LAN.
I am looking to improve the architecture so that we would have no single point of failure.
I am considering two alternatives:
Replicate the file storage to all of the webservers so that they all access the data locally
replicate the file storage to another storage so if something happens to the current storage we would be able to switch to it.
Obviously the main operations done on the file storage are read, but there are also a lot of write operations. What do you think is the preferred method? Any other idea?
I am currently ruling out use of CDN as it will require an architecture change on the application which we cannot make right now.
Certain things i would normally consider before going for arch change is
what are the issues of current arch
what am i doing wrong with the current arch.(if this had been working for a while, minor tweaks will normally solve a lot of issues)
will it allow me to grow easily (here there will always be a upper limit). Based on the past growth of data, you can effectively plan it.
reliability
easy to maintain / monitor / troubleshoot
cost
200GB is not a lot of data, and you can go in for some home grown solution or use something like a NAS, which will allow you to expand later on. And have a hot swappable replica of it.
Replicating to storage of all the webservers is a very expensive setup, and as you said there are a lot of write operations, it will have a large overhead in replicating to all the servers(which will only increase with the number of servers and growing data). And there is also the issue of stale data being served by one of the other nodes. Apart from that troubleshooting replication issues will be a mess with 10 and growing nodes.
Unless the lookup / read / write of files is very time critical, replicating to all the webservers is not a good idea. Users(of web) will hardly notice the difference of 100ms - 200ms in loadtime.
There are some enterprise solutions for this sort of thing. But I don't doubt that they are expensive. NAS doesn’t scale well. And you have a single point of failure which is not good.
There are some ways that you can write code to help with this. You could cache the images on the web servers the first time they are requested, this will reduce the load on the image server.
You could get a master slave set up, so that you have one main image server but other servers which copy from this. You could load balance these, and put some logic in your code so that if a slave doesn’t have a copy of an image, you check on the master. You could also assign these in priority order so that if the master is not available the first slave then becomes the master.
Since you have so little data in your storage, it makes sense to buy several big HDs or use the free space on your web servers to keep copies. It will take down the strain on your backend storage system and when it fails, you can still deliver content for your users. Even better, if you need to scale (more downloads), you can simply add a new server and the stress on your backend won't change, much.
If I had to do this, I'd use rsync or unison to copy the image files in the exact same space on the web servers where they are on the storage device (this way, you can swap out the copy with a network file system mount any time).
Run rsync every now and then (for example after any upload or once in the night; you'll know better which sizes fits you best).
A more versatile solution would be to use a P2P protocol like Bittorreent. This way, you could publish all the changes on the storage backend to the web servers and they'd optimize the updates automatcially.

Resources