Storing Facebook photos on my system - performance

My application is joined at the hip with the facebook application. I show the user's photo and their friends from facebook. I am debating between storing the user (and their friends) photo on my system. What is better for the system performance? Is it better to store the photos in my system or retrieve them facebook at run-time?

Unless your system has some inherent advantages (like local storage), Facebook's server setup is likely to be more optimized than your own. For example:
They use CDN's, so unless you also do their requests will take fewer network hops.
They likely have servers in more geographic locations than you likely do so on average a user will reach one of their servers in fewer hops than yours.
The best way to find out though is to test.

Related

Launching a website to be accessed globally

I have a website that could be visited by countries in different continentals. I noticed that most hosting companies have data centers in the US only, which might affect the performance when people from India, for example, are visiting the site. AWS and google own data centers all around the world, so would this be a better choice to solve the above-mentioned doubt? Are they using some technology that makes the website located in all datacenters ?
More about the website :
It is a dynamic website which depends heavily on the database. It mostly involves text. Few ajax code is there.
It is a Q & A website.
You would use some sort of load balancer.
Such as
AWS Elastic Load Balancing
Cloud Load Balancing
Cloud providers such as AWS has something called edge locations. When you deploy a website code, AWS will deploy the same code to edge locations around the world. When a user visits your website and the request reaches to AWS, AWS will redirect the requests to the edge location that is geographically closer to the user. So that the request will be served to the user faster.
I noticed that most hosting companies have data centers in the US only, which might affect the performance when people from India, for example, are visiting the site.
If your web site has purely or mostly static content, it usually won't matter (read about web caching), unless its traffic is large. As a typical example, I manage http://refpersys.org/ (physically hosted by OVH in France) and it is well visible from India: the latency is less than a few hundred milliseconds.
If your web site is extremely dynamic, it could matter (e.g. if every keystroke in a web browser started from India required an AJAX call to the US-located host).
Read much more about HTTP and perhaps TCP/IP. Don't confuse the World Wide Web with the Internet (which existed before the Web).
If performance really matters to you, you would set up some distributed and load balanced web service, by hosting on each continent. You might for instance use some distributed database technologies for the data (read about database replication), e.g. with PostGreSQL as your back-end database.
Of course, you can find web hosting in India.
And all that has some cost, mostly software development and deployment (network sysadmin skills are rare).
It is a Q & A website.
Then it is not that critical (assuming a small to medium traffic), and you can afford (at first) a single hosting located in a single place. I assume no harm is done if a given answer becomes visible worldwide only after several minutes.
Once your website is popular enough, you would have resources (human labor and computing hosting) to redesign it. AFAIK, StackOverflow started with a single web hosting and later improved to its current state. Design your website with some agile software development mindset: the data (that is past questions and answers typed by human users) is the most important asset, so make sure to design your database schema correctly, taking into account database normalization), and ensure that your data is backed-up correctly and often enough. And web technologies are evolving too (in 2021 the Web won't be exactly the same as today in December 2019, see e.g. this question).
If you wanted a world-wide fault-proof Q & A website, you could get a PhD in designing it well enough. Global distributed database consistency is still a research topic (see e.g. this research report).

Why it is recommended to store images on remote server?

Sorry for such a question, but I can not find any article on the web with cons on that, I guess it is about async uploading and downloading, but it's just a guess, is there somewhere a detailed info?
It's mostly about specialization, data locality, and concurrency.
Servers that are specialized at serving static content typically do so much faster than dynamic web servers (the web servers are optimized for the specific use-case).
You also have the advantage of storing your content in many zones to achieve better performance (the content is physically closer to the person requesting it), where-as web applications typically should be near its other dependencies, such as databases.
Lastly browsers (for http/1 at least) only allow a fixed number of connections per server, so if your images and api calls are on separate servers, one cannot influence the other in terms of request scheduling.
There are a lot of other reasons I'm sure, but these are just off the top of my head.

How are user sessions/data stored on web servers?

I'm just starting to get into web development in earnest, more than just static pages, and I keep wondering, how is data stored per user on the server? When one user submits a form post, and it gets looked up on another page, how can the server tell user1's FooForm data from user2's FooForm data? I know you can keep track of users with cookies and HTTP requests, but how does the server keep all these data sets available? Is there like a tiny virtual machine for each user session? Or some kind of paging system where the user's content is served, and then the server swaps that process out for a while like a system thread? I've always wondered how a site with a million concurrent visitors manages to serve them all and keep all the individual user sessions contained.
I'm familiar with OS multithreaded architecture and algorithms, but the idea of doing that with anywhere from one to a million separate "threads" on a anywhere from one to n separate machines is mind blowing.
Do the algorithms change as the scale does?
I'll stop before I ask too many questions and this gets too broad, but I'd love for someone with some expertise to elucidate this for me, or point me to a good resource.
Thanks for reading.

How to implement a secure distributed social network?

I'm interested in how you would approach implementing a BitTorrent-like social network. It might have a central server, but it must be able to run in a peer-to-peer manner, without communication to it:
If a whole region's network is disconnected from the internet, it should be able to pass updates from users inside the region to each other
However, if some computer gets the posts from the central server, it should be able to pass them around.
There is some reasonable level of identification; some computers might be dissipating incomplete/incorrect posts or performing DOS attacks. It should be able to describe some information as coming from more trusted computers and some from less trusted.
It should be able to theoretically use any computer as a server, however, optimizing dynamically the network so that typically only fast computers with ample internet work as seeders.
The network should be able to scale to hundreds of millions of users; however, each particular person is interested in less than a thousand feeds.
It should include some Tor-like privacy features.
Purely theoretical question, though inspired by recent events :) I do hope somebody implements it.
Interesting question. With the use of already existing tor, p2p, darknet features and by using some public/private key infrastructure, you possibly could come up with some great things. It would be nice to see something like this in action. However I see a major problem. Not by some people using it for file sharing, BUT by flooding the network with useless information. I therefore would suggest using a twitter like approach where you can ban and subscribe to certain people and start with a very reduced set of functions at the beginning.
Incidentally we programmers could make a good start to accomplish that goal by NOT saving and analyzing to much information about the users and use safe ways for storing and accessing user related data!
Interesting, the rendezvous protocol does something similar to this (it grabs "buddies" in the local network)
Bittorrent is a mean of transfering static information, its not intended to have everyone become producers of new content. Also, bittorrent requires that the producer is a dedicated server until all of the clients are able to grab the information.
Diaspora claims to be such one thing.

Why is p2p web hosting not widely used? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
We can see the growth of systems using peer to peer principles.
But there is an area where peer to peer is not (yet) widely used: web hosting.
Several projects are already launched, but there is no big solution which would permit users to use and to contribute to a peer to peer webhosting.
I don't mean not-open projects (like Google Web Hosting, which use Google resources, not users'), but open projects, where each user contribute to the hosting of the global web hosting by letting its resources (cpu, bandwidth) be available.
I can think of several assets of such systems:
automatic load balancing
better locality
no storage limits
free
So, why such a system is not yet widely used ?
Edit
I think that the "97.2%, plz seed!!" problem occurs because all users do not seed all the files. But if a system where all users equally contribute to all the content is built, this problem does not occur any more. Peer to peer storage systems (like Wuala) are reliable, thanks to that.
The problem of proprietary code is pertinent, as well of the fact that a user might not know which content (possibly "bad") he is hosting.
I add another problem: the latency which may be higher than with a dedicated server.
Edit 2
The confidentiality of code and data can be achieved by encryption. For example, with Wuala, all files are encrypted, and I think there is no known security breach in this system (but I might be wrong).
It's true that seeders would not have many benefits, or few. But it would prevent people from being dependent of web hosting companies. And such a decentralized way to host websites is closer of the original idea of the internet, I think.
This is what Freenet basically is,
Freenet is free software which lets you publish and obtain information on the Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized and publishers and consumers of information are anonymous. Without anonymity there can never be true freedom of speech, and without decentralization the network will be vulnerable to attack.
[...]
Users contribute to the network by giving bandwidth and a portion of their hard drive (called the "data store") for storing files. Unlike other peer-to-peer file sharing networks, Freenet does not let the user control what is stored in the data store. Instead, files are kept or deleted depending on how popular they are, with the least popular being discarded to make way for newer or more popular content. Files in the data store are encrypted to reduce the likelihood of prosecution by persons wishing to censor Freenet content.
The biggest problem is that it's slow. Both in transfer speed and (mainly) latency.. Even if you can get lots of people with decent upload throughput, it'll still never be as quick a dedicated servers or two.. The speed is fine for what Freenet is (publishing data without fear of censorship), but not for hosting your website..
A bigger problem is the content has to be static files, which rules out it's use for a majority of high-traffic websites.. To serve dynamic data each peer would have to execute code (scary), and would probably have to retrieve data from a database (which would be another big delay, again because of the latency)
I think "cloud computing" is about as close to P2P web-hosting as we'll see for the time being..
P2P website hosting is not yet widely used, because the companion technology allowing higher upstream rates for individual clients is not yet widely used, and this is something I want to look into*.
What is needed for this is called Wireless Mesh Networking, which should allow the average user to utilise the full upstream speed that their router is capable of, rather than just whatever some profiteering ISP rations out to them, while they relay information between other routers so that it eventually reaches its target.
In order to host a website P2P, a sort of combination of technology is required between wireless mesh communication, multiple-redundancy RAID storage, torrent sharing, and some kind of encryption key hierarchy that allows various users different abilities to change the data that is being transmitted, allowing something dynamic such as a forum to be hosted. The system would have to be self-updating to incorporate the latter, probably by time-stamping all distributed data packets.
There may be other possible catalysts that would cause the widespread use of p2p hosting, but I think anything that returns the physical architecture of hardware actually wiring up the internet back to its original theory of web communication is a good candidate.
Of course as always, the main reason this has not been implemented yet is because there is little or no money in it. The idea will be picked up much faster if either:
Someone finds a way to largely corrupt it towards consumerism
Router manufacturers realise there is a large demand for WiMesh-ready routers
There is a global paradigm shift away from the profit motive and towards creating things only to benefit all of humanity by creating abundance and striving for optimum efficiency
*see p2pint dot darkbb dot com if you're interested in developing this concept
For our business I can think of 2 reasons not to use peer hosting:
Responsiveness. Peer hosted solutions are often reliable because of the massive number of shared resources, but they are also nutoriously unstable. So the browsing experience will be intermittent.
Proprietary data/code. If I've written custom logic for my site I don't want everyone on the network having access. You also run into privacy issues with customer data.
If I were to donate some of my PCs CPU and bandwidth to some p2p web hosting service, how could I be sure that it wouldn't end up being used to serve child porn or other similarly disgusting content?
How many times have you seen "97.2%, please seed!!" for any random torrent?
Just imagine the havoc if even a small portion of the web became unavailable in this way.
It sounds like this idea would add a lot of cost to the individual seeder (bandwidth) without a lot of benefit.

Resources