Why is p2p web hosting not widely used? [closed] - hosting

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
We can see the growth of systems using peer to peer principles.
But there is an area where peer to peer is not (yet) widely used: web hosting.
Several projects are already launched, but there is no big solution which would permit users to use and to contribute to a peer to peer webhosting.
I don't mean not-open projects (like Google Web Hosting, which use Google resources, not users'), but open projects, where each user contribute to the hosting of the global web hosting by letting its resources (cpu, bandwidth) be available.
I can think of several assets of such systems:
automatic load balancing
better locality
no storage limits
free
So, why such a system is not yet widely used ?
Edit
I think that the "97.2%, plz seed!!" problem occurs because all users do not seed all the files. But if a system where all users equally contribute to all the content is built, this problem does not occur any more. Peer to peer storage systems (like Wuala) are reliable, thanks to that.
The problem of proprietary code is pertinent, as well of the fact that a user might not know which content (possibly "bad") he is hosting.
I add another problem: the latency which may be higher than with a dedicated server.
Edit 2
The confidentiality of code and data can be achieved by encryption. For example, with Wuala, all files are encrypted, and I think there is no known security breach in this system (but I might be wrong).
It's true that seeders would not have many benefits, or few. But it would prevent people from being dependent of web hosting companies. And such a decentralized way to host websites is closer of the original idea of the internet, I think.

This is what Freenet basically is,
Freenet is free software which lets you publish and obtain information on the Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized and publishers and consumers of information are anonymous. Without anonymity there can never be true freedom of speech, and without decentralization the network will be vulnerable to attack.
[...]
Users contribute to the network by giving bandwidth and a portion of their hard drive (called the "data store") for storing files. Unlike other peer-to-peer file sharing networks, Freenet does not let the user control what is stored in the data store. Instead, files are kept or deleted depending on how popular they are, with the least popular being discarded to make way for newer or more popular content. Files in the data store are encrypted to reduce the likelihood of prosecution by persons wishing to censor Freenet content.
The biggest problem is that it's slow. Both in transfer speed and (mainly) latency.. Even if you can get lots of people with decent upload throughput, it'll still never be as quick a dedicated servers or two.. The speed is fine for what Freenet is (publishing data without fear of censorship), but not for hosting your website..
A bigger problem is the content has to be static files, which rules out it's use for a majority of high-traffic websites.. To serve dynamic data each peer would have to execute code (scary), and would probably have to retrieve data from a database (which would be another big delay, again because of the latency)
I think "cloud computing" is about as close to P2P web-hosting as we'll see for the time being..

P2P website hosting is not yet widely used, because the companion technology allowing higher upstream rates for individual clients is not yet widely used, and this is something I want to look into*.
What is needed for this is called Wireless Mesh Networking, which should allow the average user to utilise the full upstream speed that their router is capable of, rather than just whatever some profiteering ISP rations out to them, while they relay information between other routers so that it eventually reaches its target.
In order to host a website P2P, a sort of combination of technology is required between wireless mesh communication, multiple-redundancy RAID storage, torrent sharing, and some kind of encryption key hierarchy that allows various users different abilities to change the data that is being transmitted, allowing something dynamic such as a forum to be hosted. The system would have to be self-updating to incorporate the latter, probably by time-stamping all distributed data packets.
There may be other possible catalysts that would cause the widespread use of p2p hosting, but I think anything that returns the physical architecture of hardware actually wiring up the internet back to its original theory of web communication is a good candidate.
Of course as always, the main reason this has not been implemented yet is because there is little or no money in it. The idea will be picked up much faster if either:
Someone finds a way to largely corrupt it towards consumerism
Router manufacturers realise there is a large demand for WiMesh-ready routers
There is a global paradigm shift away from the profit motive and towards creating things only to benefit all of humanity by creating abundance and striving for optimum efficiency
*see p2pint dot darkbb dot com if you're interested in developing this concept

For our business I can think of 2 reasons not to use peer hosting:
Responsiveness. Peer hosted solutions are often reliable because of the massive number of shared resources, but they are also nutoriously unstable. So the browsing experience will be intermittent.
Proprietary data/code. If I've written custom logic for my site I don't want everyone on the network having access. You also run into privacy issues with customer data.

If I were to donate some of my PCs CPU and bandwidth to some p2p web hosting service, how could I be sure that it wouldn't end up being used to serve child porn or other similarly disgusting content?

How many times have you seen "97.2%, please seed!!" for any random torrent?
Just imagine the havoc if even a small portion of the web became unavailable in this way.

It sounds like this idea would add a lot of cost to the individual seeder (bandwidth) without a lot of benefit.

Related

Existing Event Driven Network Protocols

I am building a set of programs that consist of multiple clients and a single server.
The clients are frequently pushing small packets of data to the server, which will validate the information (returning an error if the data is invalid), and process the received information. The information may then incur the firing of events, which clients will be subscribed to, allowing for clients to be instantly (or as close as possible) notified (along with a small amount of data).
I have some ideas about how to do this, but I am trying to avoid creating a protocol of my own, mainly as I'm sure it would take forever and I would probably make a few errors. So I was wondering if there are any existing protocols that I could implement into my system that would provide such functionality.
The number of clients will initially be quite small, but will be growing over time to potentially include 1000's of clients (with their own subscriptions), and several front end servers (each one handling a subset of subscriptions) parsing the information back and forth with back end servers for improved capability.
So, if anyone knows of any existing protocols that implement these requirements and functionality, that would be fantastic.
EDIT
I am currently looking at the XMPP protocol, and the JXTA protocol suite (for reference, and implement with another language). Both seem quite good and provide the necessary connectivity, but I have not had the opportunity to test each of them out in my environment, or if they are even suitable for what I am attempting.
Additionally, some of the network clients will be outside of the local network and operating over WAN. Security is not so much of an issue, but I need to take into account the increase latency of this, and firewall rules (local to the connection that is hosting the application and ISP firewalls) that could be blocking certain ports or transport protocols (I have read some text that said that some ISPs where blocking UDP packets, but not sure of how wide this goes. I can do it at home, the office, mobile, friends houses, etc and have yet to experience it myself).
I'm sorry if the following is not exactly what you're after but I am slightly confused by your use of the word 'protocol'. I understand a protocol to be a 'communication specification' only, where the implementation is left entirely to you. If that is the case I always find the the following graphic usefull, link.
If on the other hand you are looking for a solution which allows you to easily implement the networking side of your application, helping save time, then checkout the following network libraries, which implement their own custom protocol:
NetworkComms.Net
Lidgren
ZeroMQ
Disclaimer: I'm a developer for NetworkComms.Net

How to implement a secure distributed social network?

I'm interested in how you would approach implementing a BitTorrent-like social network. It might have a central server, but it must be able to run in a peer-to-peer manner, without communication to it:
If a whole region's network is disconnected from the internet, it should be able to pass updates from users inside the region to each other
However, if some computer gets the posts from the central server, it should be able to pass them around.
There is some reasonable level of identification; some computers might be dissipating incomplete/incorrect posts or performing DOS attacks. It should be able to describe some information as coming from more trusted computers and some from less trusted.
It should be able to theoretically use any computer as a server, however, optimizing dynamically the network so that typically only fast computers with ample internet work as seeders.
The network should be able to scale to hundreds of millions of users; however, each particular person is interested in less than a thousand feeds.
It should include some Tor-like privacy features.
Purely theoretical question, though inspired by recent events :) I do hope somebody implements it.
Interesting question. With the use of already existing tor, p2p, darknet features and by using some public/private key infrastructure, you possibly could come up with some great things. It would be nice to see something like this in action. However I see a major problem. Not by some people using it for file sharing, BUT by flooding the network with useless information. I therefore would suggest using a twitter like approach where you can ban and subscribe to certain people and start with a very reduced set of functions at the beginning.
Incidentally we programmers could make a good start to accomplish that goal by NOT saving and analyzing to much information about the users and use safe ways for storing and accessing user related data!
Interesting, the rendezvous protocol does something similar to this (it grabs "buddies" in the local network)
Bittorrent is a mean of transfering static information, its not intended to have everyone become producers of new content. Also, bittorrent requires that the producer is a dedicated server until all of the clients are able to grab the information.
Diaspora claims to be such one thing.

Why would you not want to use Cloud Computing [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Our company is considering moving from hosting our own servers to EC2 and I was wondering if this was a good idea.
I have seen a lot of stuff about can cloud computing (and specifically EC2) do x, or can it do y, but my real question is why would you NOT want to use it?
If you were setting up a business, what are the reasons (outside of cost) that you would choose to go through the trouble of managing your own servers?
I know there are a lot of cost calculations you can put in regarding bandwidth, disk usage etc, but there are of course, other costs regarding maintenance of your own server. For the sake of this discussion I am willing to consider the costs roughly equal.
I seem to remember that Joel Spolsky wrote a little blur on this at one time, but I was unable to find it.
Anyone have any reasons?
Thanks!
I can think of several reasons why not use EC2 (and I am talking about EC2, not grid comp in general):
Reliability: Amazon makes no guarantee as to the availability / down time / safety of EC2
Security: Amazon does not makes any guarantee as to whom it will disclose your data
Persistence: ensuring persistence of your data (that includes, effort to set up the system) is complicated over EC2
Management: there are very few integrated management tools for a cloud deployed on EC2
Network: the virtual network that allows EC2 instances to communicates has some quite painful limitations (latency, no multicast, arbitrary topological location)
And to finish that:
Cost: on the long run, if you are not using EC2 to absorb peak traffic, it is going to be much more costly than investing into your own servers (cheapo servers like Supermicro cost just a couple of hundred bucks...)
On the other side, I still think EC2 is a great way to soak up non-sensitive peak traffic, if your architecture allows it.
Some questions to ask:
What is the expected uptime, and how does downtime affect your business? What sort of service level agreement can you get, what are the penalties for missing it, and how confident are you that the SLA uptime goals will be met? (They may be better or worse at keeping the systems up than you are.)
How sensitive is the data you're proposing to put into the cloud? Again, we get into the questions of how secure the provider promises to be, what the contractual penalties and indemnities are, and how confident you are that the provider will live up to the agreement. Further, there may be external requirements. If you deal with health-related data in the US, you are subject to very strict requirements. If you deal with credit card data, you also have responsibilities (contractual, not legal).
How easy will it be to back out of the arrangement, should service not be what was expected, or if you find a better deal elsewhere? This includes not only getting your data back, but also some version of the applications you've been using. Consider the possibilities of your provider going bankrupt (Amazon isn't going to go bankrupt any time soon, but they could split off a cloud provider which could then go bankrupt), or having an internal reorganization. Bear in mind that a company in serious trouble may not be able to live up to your expectations of service.
How much independence are you going to have? Are you going to be running their software or software you pick? How easy will it be to reconfigure?
What is the pricing scheme? Is it possible for the bills to hit unacceptable levels without adequate warning?
What is the disaster plan? Ideally, it's running your software on servers in a different location from where the disaster hit.
What does your legal department (or retained corporate attorney) think of the contract? Is there a dispute resolution mechanism, and, if so, is it fair to you?
Finally, what do you expect to get out of moving to the cloud? What are you willing to pay? What can you compromise on, and what do you need?
Highly sensitive data might be better to control yourself. And there's legislation; some privacy sensitive information, for example, might not leave the the country.
Also, except for Microsoft Azure in combination with SDS, the data stores tend to be not relational, which is a nuisance in certain cases.
Maybe concern that that big a company will more likely be approached by an Agent Smith from the government to spy on everyone that a little small provider somewhere.
Big company - more customers - more data to aggregate and recognize patterns - more resources to organize a sophisticated watch system.
Maybe it's more of a fantasy but who ever knows?
If you don't have a paranoia it doesn't mean yet that you are not being watched.
The big one is: if Amazon goes down, there's nothing you can do to bring it back up.
I'm not talking about doomsday scenarios where the company disappears. I mean that you're at the mercy of their downtime, with little recourse of your own.
Security -- you don't know what is being done to your data
Dependency -- your business is now directly intertwined with the provider
There are different kinds of cloud computing with lots of different vendors providing it. It would make me nervous to code my apps to work with a single cloud vendor. that you specifically had to code for..amazon and Microsoft I believe you need to specifically code for that platform - maybe google too.
That said, I recently jettisoned my own dedicated servers and moved to Rackspaces Mosso Cloud platform (which have no proprietary coding necessary) and I am really, really pleased with it so far. Cut my costs in half, and performance is way better than before. My sql server databases are now running on 64Bit enterprise SQL server versions with 32G of ram - that would have cost me a fortune on my previous providers infrastructure.
As far as being out of luck when the cloud is down, that was true if my dedicated server went down - it never did, but if there was a hardware crash on my dedicated server, I am not sure it would be back on-line any quicker than rackspace could bring their cloud back up.
Lack of control.
Putting your software on someone else's cloud represents handing over some control. They might institute a file upload size limit, or memory limits which could ruin your application. A security vulnerbility in their control panel could get your site hacked.
Security issues are not relevant if your application does its own encryption. Amazon is then storing encrypted data that they have no way of decrypting.
But in addition to the uptime issues, Amazon could decide to increase their prices to whatever they want. If you're dependent on them, you'll just have to pay it.
Depends how much you trust your own infrastructure in comparison to a 3rd party cloud service. In my opinion, most businesses (at least not IT related) should choose the later.
Another thing you lose with the cloud is the ability to choose exactly what operating system you want to run. For example, the latest Fedora Linux kernel available on EC2 is FC8, and the latest Windows version is Server 2003.
Besides the issues raised regarding dependability, reliability, and cost is the issue of data ownership. When you locate data on someone else's server, you no longer control who views, accesses, modifies, or uses that data. While the cloud operators can limit your access, you possess no way of limiting theirs or limiting who they give access to. Yes, you can encrypt all the data on the server but you lack any way of knowing who possesses root access to the server itself and any means to stop others from downloading your encrypted data and cracking it open. You lose control over your data; depending on what type of apps you are running and the proprietary nature of the data involved, this could engender corporate security and/or liability risks.
The other factor to consider is what would happen to your company if Amazon and/or EC2 were to suddenly vanish overnight. While a seemingly preposterous position, it could happen. Would you be able to quickly fill the hole and restore service, or would your potentially revenue generating apps languish while the IT staff scramble to obtain servers and bandwidth to get them back online? Also, what would happen to your data? The cloud hard drive holding all your information still exists, somewhere, and could pose a potential liability risk depending on the information you stored there--items such as personal information, business transaction records etc.
If I was starting my own business now, I would go through the hassle of purchasing and maintaining my own severs so I retained data ownership. I could control root access to the hardware, as well as control who can access and modify the data.
Unanswered security questions.
Really, do you want your IP out there, where you're not the one in control of it?
Most cloud computing environment are at least partially vendor specific. There's no good way to move stuff from one cloud to another without having to do a lot of rewriting. That sort of lock-in puts you at the mercy of one vendor when it comes to downtime, price increases, etc. If you rent or own your own servers, hosting providers and colos are pretty much interchangeable. You always have the option of moving somewhere else.
This may change in the future, as these things become standardized, but for now tying yourself to the cloud means tying yourself to a specific vendor.
This is kind of like the "Why would you use Linux" comment I received from management many years ago. The response I got was that it is a solution in search of a problem.
So what are your goals and objectives in moving to EC2?
I'd be interested to know if you'd still want to move to a cloud, if it was your own.
Cloud computing has brought parallel programming a little closer to the masses, but you still have to understand how best to use it - otherwise you're going to waste compute cycles and bandwidth.
Re-architecting your application for most efficient use of a cloud computing service is non-trivial.
Besides what has already been said here, we have to consider uniformity across the business. Are all of you applications going to be hosted in the cloud, or only most? Is most enough to pull the trigger on using the cloud when you still have to have personnel to handle a few special servers?
In particular, there might be special hardware that you need to communicate with such modems to accept incoming data, or voice cards that make automated phone calls. I don't know how such things could be handled in a cloud environment.

What are the advantages and disadvantages of self-hosting? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What are the advantages and disadvantages of self-hosting something like a svn repository? All links and ideas are appreciated.
Off the top of my head:
Advantages of Self-hosting
Flexibility. On my own machine I can install whatever I want. If I would like to use a vcs like Bazaar and use Loggerhead instead of Trac, then right now there isn't really much choice beyond Launchpad, which has its warts
Save money. Costs add up over time especially for large teams
The free plans offered by sites like Assembla are not private. Anybody can have access to your code
Advantages of Paid hosting (ie: GitHub, Assembla, Google Code)
Robustness. You don't need to worry about your server catching fire because it's become somebody else's problem.
Less hassle. Don't need to be do all the system administration and tweaking of conf files. Instead you can just focus on the coding
For production you should only use self hosting if you are professional sys admin. Can you answer yes to following questions (a bit linux oriented, but you should get an idea):
Can you react to system failure in minutes (I mean you need sleep at least. Do you have somebody to look after system while you are asleep?)
Can you spot a system break?
Can you remove exploits from your system?
Can you recompile kernel. If you can't remove exploits?
Can you configure the system for optimal performance?
Are you willing to pay for UPS, backup storage and alternative internet provider?
If you can answer yes to these questions benefits are very atractive and I would go with it.
On the other hand hosting development environment can be managed by administrator of any level especially when there are such easy to use servers like Ubuntu.
You specifically asked about hosting a subversion repository, so the first disadvantage that comes to mind is data protection. I personally would never trust a third-party with my source code, except for open source code or code of an unimportant side project. Source code is a very important asset for a ISV, so trusting a third-party to protect your source code doesn't sound like a good idea.
And even if it's not about source code, outsourcing other critical parts of you business such as email, accounting/invoicing* is just asking for trouble. And it's not like you don't have to care about backups anymore when you outsource your data hosting. You still should backup your data, in case the hosting company screws up.
*) With outsourcing accounting/invoicing I mean all those new hosted invoicing apps, not getting help from an accountant of course
I find the web interface of external hosts to be hassles. Plus you can have however much space you want on your machine. Like you said, the maintenance can be a burden for self hosting though.
How big is your project? If it is not too big just get an account at http://www.beanstalkapp.com
That is what I did. I do not have to worry about any setups and can focus on the actual development.
If your situation is more complex self-hosting is worth considering. But keep in mind that you would have to take of backups too and that an update of the server screw a lot of things up.
This ties into the server catching fire, but one key advantage of external hosting is that it's (presumably) backed up automatically. Doing your own backups is a hassle, and ends up being less reliable than what you'd get from Google.
With self-hosting there comes great responsibility.
you have to backup everything
you need spareparts for your hardware
if you have stuff that is important you need redundant hardware
you don't have real holidays. if something breaks you have to fix it
In addition to what others have already mentioned, there are also benefits specific to using cloud services by companies like Amazon, Yahoo, Google, Microsoft, etc.
Despite what some might claim, self-hosting is not inherently "safer." In most cases, it's quite the opposite actually. This is because most small-to-medium-sized companies do not have the resources to provide the level of reliability and redundancy that mega-corporations like Microsoft or Amazon can. Unless you are hosting source code for a top-secret defense project or other projects where the threat of espionage is very real, the greatest threats to your code and your business are more mundane things like server/network downtime.
Redundancy: Cloud services provide levels of redundancy that most businesses simply cannot obtain on their own. This includes data redundancy (back-ups/RAID), hardware redundancy (components/equipment), and geographical redundancy (multiple server locations across the globe). If a natural disaster hits your city, is your data going to be safe?
Multi-tenancy: Each small business by itself cannot afford 24/7 support staff and multi-million dollar equipment. But pooling their resources together through a Cloud service affords them (through centralization and better resource utilization/higher efficiency) access to a much higher level of service.
Security: Related to multi-tenancy, by centralizing the data of thousands of businesses, this allows security resources to be much more tightly focused.
Lastly, it should be noted that most commercial hosting providers offer co-location and dedicated hosting. And even cloud service providers allow customers to configure their "server" however they want, and installing/running whatever applications on it they want. So you can have a great deal more freedom than that offered by $10/month web hosting.

High traffic web sites

What makes a site good for high traffic?
Does it have more to do with the hardware/infrastructure, or with how one writes the software, using Java as the example, if it matters?
I'm wondering how the software changes just because it is expected that billions of users will be on the site, if at all.
My understanding up to this point is that the code doesn't change, but that it is deployed on multiple servers, in a cluster, and a load balancer distributes the load, so really, on any one server/deployment, the application is just as any other standard application/website.
I highly recommend reading Jeff Atwood's blog on Micro-Optimization. In previous blogs he talks somewhat about how this site was created and the hardware upgrades he has had (which quickly summarized said that better hardware performs better only the extent that it is faster/better), but the real speed of a site comes from good programming, and this article seems like it should sum up some of your site programming questions quite well.
Hardware is cheap. Programming is expensive.
There are some programming techniques to make sure your code can handle multiple simultaneous views/updates. If you're using an existing framework, much of that work is (hopefully) done for you, but otherwise you're going to find stuff that worked for a few hundred hits an hour on one server isn't going to work when you're getting hundreds of thousands of hits and you have to deploy multiple load balancing machines.
Well, it is primarily an issue of hardware scaling but there are a few things to keep in mind with respect to the software involved in scaling. For example, if you are on a server farm, you'll need to work with a session management server (either via SQL Server or via a state server - which has implications in that your session variables need to be serializable).
But, in the bigger picture, there are a variety of things that you would want to do to scale to an enterprise level. For example, it becomes particularly important that you abstract out your database calls to a DAL because you may well need to adopt the use of a middleware package for high volume environments.

Resources