Use only for downloading: FTP vs HTTP in my server? [closed] - download

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to get some files from my computer (text, video, images) and I'd like to download them to a folder on my Android device. I have been looking for alternatives and I think that there are two ways for doing that, but I don´t know if there is a great difference in using one or another.
Which protocol is better and why, FTP or HTTP, in my case? I don`t need uploading anything, and the size of the files is not too big. (I guess around 5M the biggest file)
I think HTTP is easier and FTP is fastest, could be? But I would like, thinking in programming, which is better.

In terms of speed, for file sizes larger than roughly 10kB both are equivalent. The difference is that FTP sends pure, raw data on its data channel without any headers so it has a slightly smaller overhead. But HTTP sends only around 12 or so lines of text as header for each file before blasting raw data onto the channel. So for files of around 10kB or less, yes HTTP overhead can be quite high - around 1% to 2% of the total bandwidth. For large files, the dozen or so lines of HTTP header becomes negligible.
FTP wastes one socket though for the control channel so for lots of users HTTP is twice more scalable. Remember, your OS has a limited number of sockets that it can open.
And finally, the most important consideration is that a lot of people access the internet through firewalls. Be it corporate, or school or dormitory or apartment building. And a lot of firewalls are configured to only allow HTTP access. You may find sometimes you can't get access to your files because of this. There are ways around this of course but it's one additional hassle you have to think about.
Additional answer:
I saw you asking about access restriction and security. The slight downside with HTTP is that you need to write your own web app to implement this. Web servers like Apache can be configured to do this just by writing a configuration file using HTTP basic authentication.
Luckily, people have had this problem before and some of them have written software to do this. Google around for "HTTP file server" and you'll find many implementations. Here's one fairly good open source web app: http://pfn.sourceforge.net/
Also, if you really want security you should set up SSL/TLS for your server regardless weather you end up using FTP or HTTP.

I would recommend HTTP. It allows you downloading file with multiple connections, you can easily share urls and you would also be able to download it in a restricted environment where all the ports except http are blocked.
FTP is more suitable if you want to control access to files on per user basis and require good amount of uploading also.
Addition:
You can implement security in http also using .htaccess files. However, it is not very scalable and not suitable for too many users with different access rights.
There are several other methods of protecting file on http. You will be able to find a lot of open source utilities on http://sourceforge.net which will let you do that. When speed is concerned, http is best. It allows you to fetch arbitrary part of the file and hence it is possible to have a multi thread download.
You will notice that most of file sharing sites use http and it is so for scalability reason.

Related

CDN-server with http/1.1 vs. webserver with http/2

I have a hosted webserver with http/2 (medium fast) and additionally I have a space on a fast CDN-Server with only http/1.1.
Is it recommended to load some ressources from the CDN or should I use only the webserver because of http/2?
Loading too many recources from the CDN could be a bottleneck due to http/1.1?
Would be kind to get some hints...
You need to test. It really depends on your app, your users and your servers.
Under HTTP/1.1 you are limited to 6 connections to a domain. So hosting content on a separate domain (e.g. static.example.com) or loading from a CDN was a way to increase that limit beyond 6. These separate domains are also often cookie-less as they are on separate domains which is good for performance and security. And finally if loading jQuery from code.jquery.com then you might benefit from the user already having downloaded it for another site so save that download completely (though with the number of versions of libraries and CDNs the chance of having a commonly used library already downloaded and in the browser cache is questionable in my opinion).
However separate domains requires setting up a separate connection. Which means a DNS lookup, a TCP connection and usually an HTTPS handshake too. This all takes time and especially if downloading just one asset (e.g. jQuery) then those can often eat up any benefits from having the assets hosted on a separate site! This is in fact why browsers limit the connections to 6 - there was a diminishing rate of return in increasing it beyond that. I've questioned the value of sharded domains for a while because of this and people shouldn't just assume that they will be faster.
HTTP/2 aims to solve the need for separate domains (aka sharded domains) by removing the need for separate connections by allowing multiplexing, thereby effectively removing the limit of 6 "connections", but without the downsides of separate connections. They also allow HTTP header compression, reducing the performance downside to sending large cookies back and forth.
So in that sense I would recommended just serving everything from your local server. Not everyone will be on HTTP/2 of course but the support is incredible strong so most users should.
However, the other benefit of a CDN is that they are usually globally distributed. So a user on the other side of the world can connect to a local CDN server, rather than come all the way back to your server. This helps with connection time (as TCP handshake and HTTPS handshake is based on shorter distances) and content can also be cached there. Though if the CDN has to refer back to the origin server for a lot of content then there is still a lag (though the benefits for the TCP and HTTPS setup are still there).
So in that sense I would advise to use a CDN. However I would say put all the content through this CDN rather than just some of it as you are suggesting, but you are right HTTP/1.1 could limit the usefulness of that. That's weird those as most commercial CDNs support HTTP/2, and you also say you have a "CDN server" (rather than a network of servers - plural) so maybe you mean a static domain, rather than a true CDN?
Either way it all comes down to testing as, as stated at the beginning of this answer it really depends on your app, your users and your servers and there is no one true, definite answer here.
Hopefully that gives you some idea of the things to consider. If you want to know more, because Stack Overflow really isn't the place for some of this and this answer is already long enough, then I've just written a book which spends large parts discussing all this: https://www.manning.com/books/http2-in-action

Optimizing File Cacheing and HTTP2

Our site is considering making the switch to http2.
My understanding is that http2 renders optimization techniques like file concatenation obsolete, since a server using http2 just sends one request.
Instead, the advice I am seeing is that it's better to keep file sizes smaller so that they are more likely to be cached by a browser.
It probably depends on the size of a website, but how small should a website's files be if its using http2 and wants to focus on caching?
In our case, our many individual js and css files fall in the 1kb to 180kb range. Jquery and bootstrap might be more. Cumulatively, a fresh download of a page on our site is usually less than 900 kb.
So I have two questions:
Are these file sizes small enough to be cached by browsers?
If they are small enough to be cached, is it good to concatenate files anyways for users who use browsers that don't support http2.
Would it hurt to have larger file sizes in this case AND use HTTP2? This way, it would benefit users running either protocol because a site could be optimized for both http and http2.
Let's clarify a few things:
My understanding is that http2 renders optimization techniques like file concatenation obsolete, since a server using http2 just sends one request.
HTTP/2 renders optimisation techniques like file concatenation somewhat obsolete since HTTP/2 allows many files to download in parallel across the same connection. Previously, in HTTP/1.1, the browser could request a file and then had to wait until that file was fully downloaded before it could request the next file. This lead to workarounds like file concatenation (to reduce the number of files required) and multiple connections (a hack to allow downloads in parallel).
However there's a counter argument that there are still overheads with multiple files, including requesting them, caching them, reading them from cache... etc. It's much reduced in HTTP/2 but not gone completely. Additionally gzipping text files works better on larger files, than gzipping lots of smaller files separately. Personally, however I think the downsides outweigh these concerns, and I think concatenation will die out once HTTP/2 is ubiquitous.
Instead, the advice I am seeing is that it's better to keep file sizes smaller so that they are more likely to be cached by a browser.
It probably depends on the size of a website, but how small should a website's files be if its using http2 and wants to focus on caching?
The file size has no bearing on whether it would be cached or not (unless we are talking about truly massive files bigger than the cache itself). The reason splitting files into smaller chunks is better for caching is so that if you make any changes, then any files which has not been touched can still be used from the cache. If you have all your javascript (for example) in one big .js file and you change one line of code then the whole file needs to be downloaded again - even if it was already in the cache.
Similarly if you have an image sprite map then that's great for reducing separate image downloads in HTTP/1.1 but requires the whole sprite file to be downloaded again if you ever need to edit it to add one extra image for example. Not to mention that the whole thing is downloaded - even for pages which just use one of those image sprites.
However, saying all that, there is a train of thought that says the benefit of long term caching is over stated. See this article and in particular the section on HTTP caching which goes to show that most people's browser cache is smaller than you think and so it's unlikely your resources will be cached for very long. That's not to say caching is not important - but more that it's useful for browsing around in that session rather than long term. So each visit to your site will likely download all your files again anyway - unless they are a very frequent visitor, have a very big cache, or don't surf the web much.
is it good to concatenate files anyways for users who use browsers that don't support http2.
Possibly. However, other than on Android, HTTP/2 browser support is actually very good so it's likely most of your visitors are already HTTP/2 enabled.
Saying that, there are no extra downsides to concatenating files under HTTP/2 that weren't there already under HTTP/1.1. Ok it could be argued that a number of small files could be downloaded in parallel over HTTP/2 whereas a larger file needs to be downloaded as one request but I don't buy that that slows it down much any. No proof of this but gut feel suggests the data still needs to be sent, so you have a bandwidth problem either way, or you don't. Additionally the overhead of requesting many resources, although much reduced in HTTP/2 is still there. Latency is still the biggest problem for most users and sites - not bandwidth. Unless your resources are truly huge I doubt you'd notice the difference between downloading 1 big resource in I've go, or the same data split into 10 little files downloaded in parallel in HTTP/2 (though you would in HTTP/1.1). Not to mention gzipping issues discussed above.
So, in my opinion, no harm to keep concatenating for a little while longer. At some point you'll need to make the call of whether the downsides outweigh the benefits given your user profile.
Would it hurt to have larger file sizes in this case AND use HTTP2? This way, it would benefit users running either protocol because a site could be optimized for both http and http2.
Absolutely wouldn't hurt at all. As mention above there are (basically) no extra downsides to concatenating files under HTTP/2 that weren't there already under HTTP/1.1. It's just not that necessary under HTTP/2 anymore and has downsides (potentially reduces caching use, requires a build step, makes debugging more difficult as deployed code isn't same as source code... etc.).
Use HTTP/2 and you'll still see big benefits for any site - except the most simplest sites which will likely see no improvement but also no negatives. And, as older browsers can stick with HTTP/1.1 there are no downsides for them. When, or if, you decide to stop implementing HTTP/1.1 performance tweaks like concatenating is a separate decision.
In fact only reason not to use HTTP/2 is that implementations are still fairly bleeding edge so you might not be comfortable running your production website on it just yet.
**** Edit August 2016 ****
This post from an image heavy, bandwidth bound, site has recently caused some interest in the HTTP/2 community as one of the first documented example of where HTTP/2 was actually slower than HTTP/1.1. This highlights the fact that HTTP/2 technology and understand is still new and will require some tweaking for some sites. There is no such thing as a free lunch it seems! Well worth a read, though worth bearing in mind that this is an extreme example and most sites are far more impacted, performance wise, by latency issues and connection limitations under HTTP/1.1 rather than bandwidth issues.

how does one identify why a website is slow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I was asked this question once at an interview:
"Suppose you own a website where the server is at some remote location. One day, some user calls/emails you saying the site is abominably slow. How would you identify why the site is slow? Also, when you check the website yourself as any user would (using your browser), the site behaves just fine."
I could think of only one thing (which was shot down):
Check the server logs to analyse incoming traffic. Maybe a DoS attack or exceptionally high traffic. Interviewer told me to assume the server has normal traffic and no DoS.
I was kind of lost because I had never thought of this problem. I have almost no idea how running a server/website works. So if someone could highlight a few approaches, it would be nice.
While googling around, I could find only this relevant, wonderful article. That article is kind of too technical for me now, but I'm slowly breaking it down and understanding it.
Since you already said when you check the site yourself the speed is fine, this means that (at least for the pages you checked) there is nothing wrong with the server and it can serve those pages at a good speed. What you should be figuring out at this point is what the difference is between you and the user that reports your site is slow. It might be a lot of different things:
Is the user using a slow network connection (mobile for example)?
Does the user experience the same problems with other websites hosted at the same webhoster? If so, this could indicate a network problem. Normally this could also indicate a resource problem at the webserver, but in that case the site would also be slow for you.
If neither of the above leads to an answer, you could assume that the connection to the server and the server itself are fine. This means the problem must be in the users device. Find out which browser/OS he uses and try to replicate the problem. If that fails find out if he uses any antivirus or similar software that might cause problems.
This is a great tool to find the speed of web pages and tells you what makes it slow: https://developers.google.com/speed/pagespeed/insights
I think one of the important thing that is missing from above answers is the server location, which can play a vital in web performance.
When someone is saying that it is taking a longer time to open a web page that means high latency. High latency can be caused due to server location.
Let's assume as you are the owner of the web page then the server and client are co-located, so it will have a low latency.
But, now if client is across the border, then latency time will increase drastically. And hence a slow perfomance.
Another factor is caching which drastically affects the latency time.
Taking the example of facebook, they have server all over the world to reduce the latency time (and also to provide several other advantages) and they use huge caching system to cache their hot data (trending topics) whereas cold data (old data) are stored in hard disk so it takes a longer time to load an older photo or post.
So, a user might would have complained about this as they were trying load up some cold data.
I can think of these few reasons (first two are already mentioned above):
High Latency due to location of client
Server memory might need to be increased
Number of service calls from the page.
If a service could be down at the time of complaint, it could prevent page from loading.
The server load might be too high at the time of the poor experience. The server might need to increase the resources (e.g. adding another server/web server to the cluster).
Check if there was any background job running on the server at that time.
It is important to check the logs and schedules of the batch jobs to determine what all was running at that time.
Hope this help.
Normally the user takes the page loading time as a measure to find out that the site is slow. But if you really want to know that what is taking the maximum time the you can open the browser debugger by pressing f12. if your browser is chrome the click on network and see what calls your application is making and which are taking maximum time. If you are using Firefox the you need to install firebug. If you have that, then again press f12 and click on Net.
One reason could be the role of the user is different of your role. You might be having suppose an administrator privilege (some thing like super user role) and the code might be just allowing everything for such role that means it does not really do much of conditional checking to see what is allowed or not. Some times, it's a considerable over ahead to get all the privileges of the user and have the conditions checking, how course depends how how the authorization is implemented. That means, the page might be really slow for specific roles. Hence, you should find out the roles of the user and see if that is a reason.
Obviously an issue with the connection of the person connecting to your site OR it's possible it was a temporary issue and by the time you checked your site, everything was dandy. You could check your logs or ask your host if there was an issue at the time the slow down occured.
This is usually a memory issue and it can be resolved by increasing the Heap Size of the Web Server hosting the application. In case the application is running on Weblogic Server. Heap size can be increased in "setEnv" file located in Application Home.
Goodluck!
Michael Orebe
Though your question is quite clear, web site optimisation is a very extensive subject.
The majority of the popular web developing frameworks are for some reason, extremely processor inefficient.
The old fashioned way of developing n-tier web applications is still very relevant and is still considered to be best practice according the W3C. If you take a little time to read the source code structure of the most popular web developing frameworks you will see that they run much more code at the server than is necessary.
This may seem a bit of a simple answer but, the less code you run at the server and the more code you run at the client the faster your servers will work.
Sometimes contrasting framework code against the old fashioned way is the best way to get an understanding of this. Here is a link to a fully working mini web application which represents W3C best practices and runs the minimum amount of code at the server and the maximum amount of code at the client: http://developersfound.com/W3C_MVC_EX.zip this codebases is also MVC compliant.
This codebase comes with a MySQL database dump, php and client side code. To see this code in action you will need to restore the SQL dump to a MySQL instance (sql dump came from MySQL 8 Community) and add the user and schema permissions that are found in the php file (conn_include.php); setting the user to have execute permissions on the schema.
If you contrast this code base against all of the most popular web frameworks, it will really open your eyes to just how inefficient these frameworks are. The popular PHP frameworks that claim to be MVC frameworks aren’t actually MVC compliant at all. This is because they rely on embedding PHP tags inside HTML tags or visa-versa (considered very bad practice according the W3C). Also most popular node frameworks run way more code at the server than is necessary. Embedded tags also stop asynchronous calls from working properly unless the framework supports AJAX dumps such as Yii 2.
Two of the most important rules to follow with MVC compliance is: never embed server side tags (such as PHP tags) in HTML tags or visa-versa (unless there is a very good excuse such as SEO) and religiously never write code to run at the server if it can be run at the client. Also true MVC is based on tier separation, where as the MVC frameworks are based on code separation. True MVC compliance is very processor efficient. Don’t get me wrong MVC frameworks are very useful for a lot of things, but if you’re developing a site that is going to get millions of hits, they are quite useless, or at least they will drive your cloud bills so high that it will really eat into your company’s profits.
In summary frameworks don’t give much control over what code runs at the client or server and are very inefficient but you can get prototypes up and running quicker with less code.
In contrast the old fashioned way takes a bit more elbow grease but you have complete control over what runs at the server and what runs at the client.
As an additional bit of advice for optimisation avoid using pass-through queries and triggers and instead opt for stored procedures. Historically stored procedures weren’t invented at the time MVC was present as a paradigm but it definitely increases separation of concerns between the tiers and is much more processor efficient.
Hope this advice helps.

Some way to send data from one FTP server to another without intermediary?

I'm trying to move a large amount of files from one CDN to another. I know that they have a very high-speed connection to each other, so what I'd like to do is connect them directly, but the only protocol I have access to with each is FTP. Is there any way to log into the current CDN and send their files to the other FTP? It seems like it should be possible, I just have no idea how to do it.
FXP: see Wikipedia article. Powerful FTP clients are capable of doing this. From protocol point of view it's trivial.
BTW this question is probably offtopic here.

Why is p2p web hosting not widely used? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
We can see the growth of systems using peer to peer principles.
But there is an area where peer to peer is not (yet) widely used: web hosting.
Several projects are already launched, but there is no big solution which would permit users to use and to contribute to a peer to peer webhosting.
I don't mean not-open projects (like Google Web Hosting, which use Google resources, not users'), but open projects, where each user contribute to the hosting of the global web hosting by letting its resources (cpu, bandwidth) be available.
I can think of several assets of such systems:
automatic load balancing
better locality
no storage limits
free
So, why such a system is not yet widely used ?
Edit
I think that the "97.2%, plz seed!!" problem occurs because all users do not seed all the files. But if a system where all users equally contribute to all the content is built, this problem does not occur any more. Peer to peer storage systems (like Wuala) are reliable, thanks to that.
The problem of proprietary code is pertinent, as well of the fact that a user might not know which content (possibly "bad") he is hosting.
I add another problem: the latency which may be higher than with a dedicated server.
Edit 2
The confidentiality of code and data can be achieved by encryption. For example, with Wuala, all files are encrypted, and I think there is no known security breach in this system (but I might be wrong).
It's true that seeders would not have many benefits, or few. But it would prevent people from being dependent of web hosting companies. And such a decentralized way to host websites is closer of the original idea of the internet, I think.
This is what Freenet basically is,
Freenet is free software which lets you publish and obtain information on the Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized and publishers and consumers of information are anonymous. Without anonymity there can never be true freedom of speech, and without decentralization the network will be vulnerable to attack.
[...]
Users contribute to the network by giving bandwidth and a portion of their hard drive (called the "data store") for storing files. Unlike other peer-to-peer file sharing networks, Freenet does not let the user control what is stored in the data store. Instead, files are kept or deleted depending on how popular they are, with the least popular being discarded to make way for newer or more popular content. Files in the data store are encrypted to reduce the likelihood of prosecution by persons wishing to censor Freenet content.
The biggest problem is that it's slow. Both in transfer speed and (mainly) latency.. Even if you can get lots of people with decent upload throughput, it'll still never be as quick a dedicated servers or two.. The speed is fine for what Freenet is (publishing data without fear of censorship), but not for hosting your website..
A bigger problem is the content has to be static files, which rules out it's use for a majority of high-traffic websites.. To serve dynamic data each peer would have to execute code (scary), and would probably have to retrieve data from a database (which would be another big delay, again because of the latency)
I think "cloud computing" is about as close to P2P web-hosting as we'll see for the time being..
P2P website hosting is not yet widely used, because the companion technology allowing higher upstream rates for individual clients is not yet widely used, and this is something I want to look into*.
What is needed for this is called Wireless Mesh Networking, which should allow the average user to utilise the full upstream speed that their router is capable of, rather than just whatever some profiteering ISP rations out to them, while they relay information between other routers so that it eventually reaches its target.
In order to host a website P2P, a sort of combination of technology is required between wireless mesh communication, multiple-redundancy RAID storage, torrent sharing, and some kind of encryption key hierarchy that allows various users different abilities to change the data that is being transmitted, allowing something dynamic such as a forum to be hosted. The system would have to be self-updating to incorporate the latter, probably by time-stamping all distributed data packets.
There may be other possible catalysts that would cause the widespread use of p2p hosting, but I think anything that returns the physical architecture of hardware actually wiring up the internet back to its original theory of web communication is a good candidate.
Of course as always, the main reason this has not been implemented yet is because there is little or no money in it. The idea will be picked up much faster if either:
Someone finds a way to largely corrupt it towards consumerism
Router manufacturers realise there is a large demand for WiMesh-ready routers
There is a global paradigm shift away from the profit motive and towards creating things only to benefit all of humanity by creating abundance and striving for optimum efficiency
*see p2pint dot darkbb dot com if you're interested in developing this concept
For our business I can think of 2 reasons not to use peer hosting:
Responsiveness. Peer hosted solutions are often reliable because of the massive number of shared resources, but they are also nutoriously unstable. So the browsing experience will be intermittent.
Proprietary data/code. If I've written custom logic for my site I don't want everyone on the network having access. You also run into privacy issues with customer data.
If I were to donate some of my PCs CPU and bandwidth to some p2p web hosting service, how could I be sure that it wouldn't end up being used to serve child porn or other similarly disgusting content?
How many times have you seen "97.2%, please seed!!" for any random torrent?
Just imagine the havoc if even a small portion of the web became unavailable in this way.
It sounds like this idea would add a lot of cost to the individual seeder (bandwidth) without a lot of benefit.

Resources