How to implement a secure distributed social network? - social-networking

I'm interested in how you would approach implementing a BitTorrent-like social network. It might have a central server, but it must be able to run in a peer-to-peer manner, without communication to it:
If a whole region's network is disconnected from the internet, it should be able to pass updates from users inside the region to each other
However, if some computer gets the posts from the central server, it should be able to pass them around.
There is some reasonable level of identification; some computers might be dissipating incomplete/incorrect posts or performing DOS attacks. It should be able to describe some information as coming from more trusted computers and some from less trusted.
It should be able to theoretically use any computer as a server, however, optimizing dynamically the network so that typically only fast computers with ample internet work as seeders.
The network should be able to scale to hundreds of millions of users; however, each particular person is interested in less than a thousand feeds.
It should include some Tor-like privacy features.
Purely theoretical question, though inspired by recent events :) I do hope somebody implements it.

Interesting question. With the use of already existing tor, p2p, darknet features and by using some public/private key infrastructure, you possibly could come up with some great things. It would be nice to see something like this in action. However I see a major problem. Not by some people using it for file sharing, BUT by flooding the network with useless information. I therefore would suggest using a twitter like approach where you can ban and subscribe to certain people and start with a very reduced set of functions at the beginning.
Incidentally we programmers could make a good start to accomplish that goal by NOT saving and analyzing to much information about the users and use safe ways for storing and accessing user related data!

Interesting, the rendezvous protocol does something similar to this (it grabs "buddies" in the local network)
Bittorrent is a mean of transfering static information, its not intended to have everyone become producers of new content. Also, bittorrent requires that the producer is a dedicated server until all of the clients are able to grab the information.

Diaspora claims to be such one thing.

Related

Existing Event Driven Network Protocols

I am building a set of programs that consist of multiple clients and a single server.
The clients are frequently pushing small packets of data to the server, which will validate the information (returning an error if the data is invalid), and process the received information. The information may then incur the firing of events, which clients will be subscribed to, allowing for clients to be instantly (or as close as possible) notified (along with a small amount of data).
I have some ideas about how to do this, but I am trying to avoid creating a protocol of my own, mainly as I'm sure it would take forever and I would probably make a few errors. So I was wondering if there are any existing protocols that I could implement into my system that would provide such functionality.
The number of clients will initially be quite small, but will be growing over time to potentially include 1000's of clients (with their own subscriptions), and several front end servers (each one handling a subset of subscriptions) parsing the information back and forth with back end servers for improved capability.
So, if anyone knows of any existing protocols that implement these requirements and functionality, that would be fantastic.
EDIT
I am currently looking at the XMPP protocol, and the JXTA protocol suite (for reference, and implement with another language). Both seem quite good and provide the necessary connectivity, but I have not had the opportunity to test each of them out in my environment, or if they are even suitable for what I am attempting.
Additionally, some of the network clients will be outside of the local network and operating over WAN. Security is not so much of an issue, but I need to take into account the increase latency of this, and firewall rules (local to the connection that is hosting the application and ISP firewalls) that could be blocking certain ports or transport protocols (I have read some text that said that some ISPs where blocking UDP packets, but not sure of how wide this goes. I can do it at home, the office, mobile, friends houses, etc and have yet to experience it myself).
I'm sorry if the following is not exactly what you're after but I am slightly confused by your use of the word 'protocol'. I understand a protocol to be a 'communication specification' only, where the implementation is left entirely to you. If that is the case I always find the the following graphic usefull, link.
If on the other hand you are looking for a solution which allows you to easily implement the networking side of your application, helping save time, then checkout the following network libraries, which implement their own custom protocol:
NetworkComms.Net
Lidgren
ZeroMQ
Disclaimer: I'm a developer for NetworkComms.Net

How to I block bad bots from my site without interfering with real users?

I want to keep no-good scrapers (aka. bad bots that by defintition ignores robots.txt) that steal content and consume bandwidth off my site. At the same time, I do not want to interfere with the user experience of legitimate human users, or stop well-behaved bots (such as Googlebot) from indexing the site.
The standard method for dealing with this has already been described here: Tactics for dealing with misbehaving robots. However, the solution presented and upvoted in that thread is not what I am looking for.
Some bad bots connect through tor or botnets, which means that their IP address is ephemeral and may well belong to a human being using a compromised computer.
I've therefore been thinking about how to improve the industry standard method by letting the "false positives" (i.e. humans) that has their IP blacklisted get access to my website again. One idea is to stop blocking these IPs outright, and instead asking them to pass a CAPTCHA before being allowed access. While I consider CAPTCHA to be a PITA for legitimate users, vetting suspected bad bots with a CAPTCHA seems to be a better solution than blocking access for these IPs completely. By tracking the session of users that completes the CAPTCHA, I should be able to determine whether they are human (and should have their IP removed from the blacklist), or robots smart enough to solve a CAPTCHA, placing them on an even blacker list.
However, before I go ahead and implement this idea, I want to ask the good people here if they foresee any problems or weaknesses (I am already aware that some CAPTCHAs has been broken - but I think that I shall be able to handle that).
The question I believe is whether or not there are foreseeable problems with captcha. Before I dive into that, I also want to address the point of how you plan on catching bots to challenge them with a captcha. TOR and proxy nodes change regularly so that IP list will need to be constantly updated. You can use Maxmind for a decent list of proxy addresses as your baseline. You can also find services that update the addresses of all the TOR nodes. But not all bad bots come from those two vectors, so you need find other ways of catching bots. If you add in rate limiting and spam lists then you should get to over 50% of the bad bots. Other tactics really have to be custom built around your site.
Now to talk about problems with Captchas. First, there are services like http://deathbycaptcha.com/. I dont know if I need to elaborate on that one, but it kind of renders your approach useless. Many of the other ways people get around Captcha's are using OCR software. The better the Captcha is at beating OCR, the harder it is going to be on your users. Also, many Captcha systems use client side cookies that someone can solve once and then upload to all their bots.
Most famous I think is Karl Groves's list of 28 ways to beat Captcha. http://www.karlgroves.com/2013/02/09/list-of-resources-breaking-captcha/
For full disclosure, I am a cofounder of Distil Networks, a SaaS solution to block bots. I often pitch our software as a more sophisticated system than simply using captcha and building it yourself so my opinion of the effectivity of your solution is biased.

Where to begin with SNMP agent implementation?

before I start I realise there are a few SNMP related questions here already but not many seem to have been answered - that could mean I'm asking in the wrong place but I don't know where else to go at the moment.
I've been reading up as best I can on SNMP for a couple of days but am finding it difficult to get my head around what is meant to be happening. The idea is eventually we will integrate SNMP into our Java application server which will allow the end users to incorporate it into their pre-existing Network Management Systems(NMS).
Unfortunately I'm feeling entirely confused by what is meant to be going on. From what I understood from talking to the end users (which was unfortunately before any research) was that the monitoring allows their existing NMS to give their admin guys a view of the vital statistics in a tree type display, giving them feedback regarding different parts of the system at a high level and allowing them to dig down into specific subsystems.
From reading around we would implement an 'Agent' which has several defined interfaces allowing for GET requests etc to be processed and responded to. That makes sense but I am at a loss to work out what the format of the communication is - there don't seem to be any specific examples of what any of the messages look like, how the information is encoded.
More of my confusion though is regarding Management Information Base(MIB). I had, wrongly, assumed that the interface of the agent would allow for the monitored attributes to be requested and then in turn the values for those attributes requested. Allowing any new Agent to be started and detected without any configuration on the NMS end (with the exception of authentication in v3). This, if I understand correctly, is not the case and the Agent must instead define MIBs which can be used by the NMS to determine those attributes. My confusion is increased when people start referring to thousands of existing MIBs and that they can be reused which I don't understand. Is the intention that a single MIB definition can be used to say describe how a particular attribute of a network device (something simple like internet connected on a router:yes/no) for many different devices? If so I don't believe that our software would allow the monitoring of anything common to any other device/system but should we be looking for already exising MIBs? At the moment I don't really see any good rational for such a system, surely it would be easier for the Agent to export that information - so I'd appreciate it if someone could enlighten me!
I think it would help if I was able to setup a simple SNMP agent and some sort of client, I could begin to see the process and eventually inspect the communication between the two but am finding it difficult to find anywhere that provides any information on doing such a thing. Nagios has been recommended to us as a test 'client'/NMS but their 'get started quick' section recommends downloading a 600Mb virtual machine - surely there is a quicker way to get started?
Any help or suggestions will be appreciated, I have been through the Wiki page but it doesn't seem to go into much detail about the MIBs and the having not had to deal with anything like the referenced RFCs before, while they may contain all of the information they seem completely impenetrable to me at the moment. Or if there are any books that can be recommended for an overview and implementation of v3?
Thanks for reading and even more thanks if you think you can help!
It seems to me that you read all SNMP information piece by piece in an disorganized way. This is highly not recommended and of course lead you to confusion.
What about forgetting what you have learnt so far and dive into a good book such as Essential SNMP?
http://shop.oreilly.com/product/9780596008406.do
Click the Google Preview icon to preview it please.
You could not depend on a network forum to tell you the ABCs, as that's impractical I find out.
The communications interface is SNMP. That's the protocol used for transmission (usually on top of UDP). The thing that services information requests is an SNMP Agent. The thing that sends information requests is an SNMP Manager.
The definition of what information should be made available by the Agent, and requested by the Manager, goes in a MIB. A MIB is the "glue", a directory of what sort of things any particular system can/should offer. It maps numeric codes to names and types that allow us to make sense of the data, much like how a phone directory maps phone numbers to people's names and addresses.
Generally you would create and ship and use your own MIBs that can describe aspects specific to your own product, but you are supposed to service some standard information requests as well, which are defined in existing MIBs. Yes there are thousands of other pre-existing MIBs and the likelihood that you need more than one or two of these is remote. They are typically published versions of MIBs for existing products.
The conventional way to "toy around" is to install Net-SNMP (a software suite that includes an agent implementation and allows you to "bolt on" your own logic and your own MIBs fairly easily) then examine the results using a packet capturer like Wireshark.
For a fuller implementation in production you may stick with Net-SNMP, or write your own Agent software, or do what I did and create a hybrid of the two that's a little more flexible and performant but uses Net-SNMP's backend for handling all the low-level SNMP stuff.
Your first step, though, is to read a book or some other teaching material that can clear all your misconceptions, because guesswork won't cut it.
I had success using the samples from this page. Both the shell and Perl NetSNMP code was very straightforward to implement and query.

is it reasonable to protect drm'd content client side

Update: this question is specifically about protecting (encipher / obfuscate) the content client side vs. doing it before transmission from the server. What are the pros / cons on going in an approach like itune's one - in which the files aren't ciphered / obfuscated before transmission.
As I added in my note in the original question, there are contracts in place that we need to comply to (as its the case for most services that implement drm). We push for drm free, and most content providers deals are on it, but that doesn't free us of obligations already in place.
I recently read some information regarding how itunes / fairplay approaches drm, and didn't expect to see the server actually serves the files without any protection.
The quote in this answer seems to capture the spirit of the issue.
The goal should simply be to "keep
honest people honest". If we go
further than this, only two things
happen:
We fight a battle we cannot win. Those who want to cheat will succeed.
We hurt the honest users of our product by making it more difficult to use.
I don't see any impact on the honest users in here, files would be tied to the user - regardless if this happens client or server side. This does gives another chance to those in 1.
An extra bit of info: client environment is adobe air, multiple content types involved (music, video, flash apps, images).
So, is it reasonable to do like itune's fairplay and protect the media client side.
Note: I think unbreakable DRM is an unsolvable problem and as most looking for an answer to this, the need for it relates to it already being in a contract with content providers ... in the likes of reasonable best effort.
I think you might be missing something here. Users hate, hate, hate, HATE DRM. That's why no media company ever gets any traction when they try to use it.
The kicker here is that the contract says "reasonable best effort", and I haven't the faintest idea of what that will mean in a court of law.
What you want to do is make your client happy with the DRM you put on. I don't know what your client thinks DRM is, can do, costs in resources, or if your client is actually aware that DRM can be really annoying. You would have to answer that. You can try to educate the client, but that could be seen as trying to explain away substandard work.
If the client is not happy, the next fallback position is to get paid without litigation, and for that to happen, the contract has to be reasonably clear. Unfortunately, "reasonable best effort" isn't clear, so you might wind up in court. You may be able to renegotiate parts of the contract in the client's favor, or you may not.
If all else fails, you hope to win the court case.
I am not a lawyer, and this is not legal advice. I do see this as more of a question of expectations and possible legal interpretation than a technical question. I don't think we can help you here. You should consult with a lawyer who specializes in this sort of thing, and I don't even know what speciality to recommend. If you're in the US, call your local Bar Association and ask for a referral.
I don't see any impact on the honest users in here, files would be tied to the user - regardless if this happens client or server side. This does gives another chance to those in 1.
Files being tied to the user requires some method of verifying that there is a user. What happens when your verification server goes down (or is discontinued, as Wal-Mart did)?
There is no level of DRM that doesn't affect at least some "honest users".
Data can be copied
As long as client hardware, standalone, can not distinguish between a "good" and a "bad" copy, you will end up limiting all general copies, and copy mechanisms. Most DRM companies deal with this fact by a telling me how much this technology sets me free. Almost as if people would start to believe when they hear the same thing often enough...
Code can't be protected on the client. Protecting code on the server is a largely solved problem. Protecting code on the client isn't. All current approaches come with stingy restrictions.
Impact works in subtle ways. At the very least, you have the additional cost of implementing client-side-DRM (and all follow-up cost, including the horde of "DMCA"-shouting lawyer gorillas) It is hard to prove that you will offset this cost with the increased revenue.
It's not just about code and crypto. Once you implement client-side DRM, you unleash a chain of events in Marketing, Public Relations and Legal. A long as they don't stop to alienate users, you don't need to bother.
To answer the question "is it reasonable", you have to be clear when you use the word "protect" what you're trying to protect against...
For example, are you trying to:
authorized users from using their downloaded content via your app under certain circumstances (e.g. rental period expiry, copied to a different computer, etc)?
authorized users from using their downloaded content via any app under certain circumstances (e.g. rental period expiry, copied to a different computer, etc)?
unauthorized users from using content received from authorized users via your app?
unauthorized users from using content received from authorized users via any app?
known users from accessing unpurchased/unauthorized content from the media library on your server via your app?
known users from accessing unpurchased/unauthorized content from the media library on your server via any app?
unknown users from accessing the media library on your server via your app?
unknown users from accessing the media library on your server via any app?
etc...
"Any app" in the above can include things like:
other player programs designed to interoperate/cooperate with your site (e.g. for flickr)
programs designed to convert content to other formats, possibly non-DRM formats
hostile programs designed to
From the article you linked, you can start to see some of the possible limitations of applying the DRM client-side...
The third, originally used in PyMusique, a Linux client for the iTunes Store, pretends to be iTunes. It requested songs from Apple's servers and then downloaded the purchased songs without locking them, as iTunes would.
The fourth, used in FairKeys, also pretends to be iTunes; it requests a user's keys from Apple's servers and then uses these keys to unlock existing purchased songs.
Neither of these approaches required breaking the DRM being applied, or even hacking any of the products involved; they could be done simply by passively observing the protocols involved, and then imitating them.
So the question becomes: are you trying to protect against these kinds of attack?
If yes, then client-applied DRM is not reasonable.
If no (for example, you're only concerned about people using your app, like Apple/iTunes does), then it might be.
(repeat this process for every situation you can think of. If the adig nswer is always either "client-applied DRM will protect me" or "I'm not trying to protect against this situation", then using client-applied DRM is resonable.)
Note that for the last four of my examples, while DRM would protect against those situations as a side-effect, it's not the best place to enforce those restrictions. Those kinds of restrictions are best applied on the server in the login/authorization process.
If the server serves the content without protection, it's because the encryption is per-client.
That being said, wireshark will foil your best-laid plans.
Encryption alone is usually just as good as sending a boolean telling you if you're allowed to use the content, since the bypass is usually just changing the input/output to one encryption API call...
You want to use heavy binary obfuscation on the client side if you want the protection to literally hold for more than 5 minutes. Using decryption on the client side, make sure the data cannot be replayed and that the only way to bypass the system is to reverse engineer the entire binary protection scheme. Properly done, this will stop all the kids.
On another note, if this is a product to be run on an operating system, don't use processor specific or operating system specific anomalies such as the Windows PEB/TEB/syscalls and processor bugs, those will only make the program even less portable than DRM already is.
Oh and to answer the question title: No. It's a waste of time and money, and will make your product not work on my hardened Linux system.

Why is p2p web hosting not widely used? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
We can see the growth of systems using peer to peer principles.
But there is an area where peer to peer is not (yet) widely used: web hosting.
Several projects are already launched, but there is no big solution which would permit users to use and to contribute to a peer to peer webhosting.
I don't mean not-open projects (like Google Web Hosting, which use Google resources, not users'), but open projects, where each user contribute to the hosting of the global web hosting by letting its resources (cpu, bandwidth) be available.
I can think of several assets of such systems:
automatic load balancing
better locality
no storage limits
free
So, why such a system is not yet widely used ?
Edit
I think that the "97.2%, plz seed!!" problem occurs because all users do not seed all the files. But if a system where all users equally contribute to all the content is built, this problem does not occur any more. Peer to peer storage systems (like Wuala) are reliable, thanks to that.
The problem of proprietary code is pertinent, as well of the fact that a user might not know which content (possibly "bad") he is hosting.
I add another problem: the latency which may be higher than with a dedicated server.
Edit 2
The confidentiality of code and data can be achieved by encryption. For example, with Wuala, all files are encrypted, and I think there is no known security breach in this system (but I might be wrong).
It's true that seeders would not have many benefits, or few. But it would prevent people from being dependent of web hosting companies. And such a decentralized way to host websites is closer of the original idea of the internet, I think.
This is what Freenet basically is,
Freenet is free software which lets you publish and obtain information on the Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized and publishers and consumers of information are anonymous. Without anonymity there can never be true freedom of speech, and without decentralization the network will be vulnerable to attack.
[...]
Users contribute to the network by giving bandwidth and a portion of their hard drive (called the "data store") for storing files. Unlike other peer-to-peer file sharing networks, Freenet does not let the user control what is stored in the data store. Instead, files are kept or deleted depending on how popular they are, with the least popular being discarded to make way for newer or more popular content. Files in the data store are encrypted to reduce the likelihood of prosecution by persons wishing to censor Freenet content.
The biggest problem is that it's slow. Both in transfer speed and (mainly) latency.. Even if you can get lots of people with decent upload throughput, it'll still never be as quick a dedicated servers or two.. The speed is fine for what Freenet is (publishing data without fear of censorship), but not for hosting your website..
A bigger problem is the content has to be static files, which rules out it's use for a majority of high-traffic websites.. To serve dynamic data each peer would have to execute code (scary), and would probably have to retrieve data from a database (which would be another big delay, again because of the latency)
I think "cloud computing" is about as close to P2P web-hosting as we'll see for the time being..
P2P website hosting is not yet widely used, because the companion technology allowing higher upstream rates for individual clients is not yet widely used, and this is something I want to look into*.
What is needed for this is called Wireless Mesh Networking, which should allow the average user to utilise the full upstream speed that their router is capable of, rather than just whatever some profiteering ISP rations out to them, while they relay information between other routers so that it eventually reaches its target.
In order to host a website P2P, a sort of combination of technology is required between wireless mesh communication, multiple-redundancy RAID storage, torrent sharing, and some kind of encryption key hierarchy that allows various users different abilities to change the data that is being transmitted, allowing something dynamic such as a forum to be hosted. The system would have to be self-updating to incorporate the latter, probably by time-stamping all distributed data packets.
There may be other possible catalysts that would cause the widespread use of p2p hosting, but I think anything that returns the physical architecture of hardware actually wiring up the internet back to its original theory of web communication is a good candidate.
Of course as always, the main reason this has not been implemented yet is because there is little or no money in it. The idea will be picked up much faster if either:
Someone finds a way to largely corrupt it towards consumerism
Router manufacturers realise there is a large demand for WiMesh-ready routers
There is a global paradigm shift away from the profit motive and towards creating things only to benefit all of humanity by creating abundance and striving for optimum efficiency
*see p2pint dot darkbb dot com if you're interested in developing this concept
For our business I can think of 2 reasons not to use peer hosting:
Responsiveness. Peer hosted solutions are often reliable because of the massive number of shared resources, but they are also nutoriously unstable. So the browsing experience will be intermittent.
Proprietary data/code. If I've written custom logic for my site I don't want everyone on the network having access. You also run into privacy issues with customer data.
If I were to donate some of my PCs CPU and bandwidth to some p2p web hosting service, how could I be sure that it wouldn't end up being used to serve child porn or other similarly disgusting content?
How many times have you seen "97.2%, please seed!!" for any random torrent?
Just imagine the havoc if even a small portion of the web became unavailable in this way.
It sounds like this idea would add a lot of cost to the individual seeder (bandwidth) without a lot of benefit.

Resources