Wondering who gets to see my source/data in the cloud - heroku

Let us assume I have few ideas that I don't want stolen and I am willing to pay a reasonable price.
There are instances like this:
http://www.pcmag.com/article2/0,2817,2369188,00.asp
or the facebook story, or the early apple/xerox story.
which could happen in cloud services like ec2, heroku, linode, azure etc.
Assuming I have a good mix of interpreted/compiled code, how can I protect my source and ideas from being stolen? I know I am not really idea rich nor is there some source thief on the loose. I just have this nagging feeling about putting source code in the cloud, which I am sure some of you also have or didn't care much about.
Would disk encryption help? What are my choices (other than building my own mini data center?)

If you are hosting your code in the cloud at some point you will need to trust your host. Otherwise you'll be doing the hosting yourself. True, any host that has your code may be able to take the code and use it, but if you choose a host that seems trustworthy, has a good reputation, and many other satisfied customers you can probably wager that your code will be secure.
Encrypting things on disk might help - it depends what you're encrypted and how you use it. Do you need your data to be unencrypted at some point to work on it? Is your data only transferred from one place to another? How much damage could an adversary do if they got a hold of your data?
You may be interested in this answer, which outlines several options.

Related

Can Amazon actually read my code stored at AWS?

I don't know if this is the right place to ask, but I'm developing a web application, and I suggested using AWS. Nevertheless, my bosses are concerned about Amazon being able to read/steal our code. I don't know why Amazon would want to get my code, but it's not me the one which is worried about that.
I guess there should be some kind of encryption, or at least a legal clause at the AWS user contract where it says that Amazon won't do that or you will be able to sue them. The thing is I haven't been able to find this information so far.
Does anyone know where to find this information? I really want them to let me use AWS, since I think it is a great opportunity to learn about this technology.
Bonus: I know there are similar services, such as Heroku, or Openstack. I will also accept the kind of information resource I'm searching for any other similar services. But unless anyone can point that AWS is not the best option out there, I'd rather stick to AWS.
A) You should assume they can read your code B) you should also assume they don't care about your code.
Edit: Possibly more useful resources w/regards to AWS security
http://aws.amazon.com/articles/1697
http://aws.amazon.com/compliance/

is it reasonable to protect drm'd content client side

Update: this question is specifically about protecting (encipher / obfuscate) the content client side vs. doing it before transmission from the server. What are the pros / cons on going in an approach like itune's one - in which the files aren't ciphered / obfuscated before transmission.
As I added in my note in the original question, there are contracts in place that we need to comply to (as its the case for most services that implement drm). We push for drm free, and most content providers deals are on it, but that doesn't free us of obligations already in place.
I recently read some information regarding how itunes / fairplay approaches drm, and didn't expect to see the server actually serves the files without any protection.
The quote in this answer seems to capture the spirit of the issue.
The goal should simply be to "keep
honest people honest". If we go
further than this, only two things
happen:
We fight a battle we cannot win. Those who want to cheat will succeed.
We hurt the honest users of our product by making it more difficult to use.
I don't see any impact on the honest users in here, files would be tied to the user - regardless if this happens client or server side. This does gives another chance to those in 1.
An extra bit of info: client environment is adobe air, multiple content types involved (music, video, flash apps, images).
So, is it reasonable to do like itune's fairplay and protect the media client side.
Note: I think unbreakable DRM is an unsolvable problem and as most looking for an answer to this, the need for it relates to it already being in a contract with content providers ... in the likes of reasonable best effort.
I think you might be missing something here. Users hate, hate, hate, HATE DRM. That's why no media company ever gets any traction when they try to use it.
The kicker here is that the contract says "reasonable best effort", and I haven't the faintest idea of what that will mean in a court of law.
What you want to do is make your client happy with the DRM you put on. I don't know what your client thinks DRM is, can do, costs in resources, or if your client is actually aware that DRM can be really annoying. You would have to answer that. You can try to educate the client, but that could be seen as trying to explain away substandard work.
If the client is not happy, the next fallback position is to get paid without litigation, and for that to happen, the contract has to be reasonably clear. Unfortunately, "reasonable best effort" isn't clear, so you might wind up in court. You may be able to renegotiate parts of the contract in the client's favor, or you may not.
If all else fails, you hope to win the court case.
I am not a lawyer, and this is not legal advice. I do see this as more of a question of expectations and possible legal interpretation than a technical question. I don't think we can help you here. You should consult with a lawyer who specializes in this sort of thing, and I don't even know what speciality to recommend. If you're in the US, call your local Bar Association and ask for a referral.
I don't see any impact on the honest users in here, files would be tied to the user - regardless if this happens client or server side. This does gives another chance to those in 1.
Files being tied to the user requires some method of verifying that there is a user. What happens when your verification server goes down (or is discontinued, as Wal-Mart did)?
There is no level of DRM that doesn't affect at least some "honest users".
Data can be copied
As long as client hardware, standalone, can not distinguish between a "good" and a "bad" copy, you will end up limiting all general copies, and copy mechanisms. Most DRM companies deal with this fact by a telling me how much this technology sets me free. Almost as if people would start to believe when they hear the same thing often enough...
Code can't be protected on the client. Protecting code on the server is a largely solved problem. Protecting code on the client isn't. All current approaches come with stingy restrictions.
Impact works in subtle ways. At the very least, you have the additional cost of implementing client-side-DRM (and all follow-up cost, including the horde of "DMCA"-shouting lawyer gorillas) It is hard to prove that you will offset this cost with the increased revenue.
It's not just about code and crypto. Once you implement client-side DRM, you unleash a chain of events in Marketing, Public Relations and Legal. A long as they don't stop to alienate users, you don't need to bother.
To answer the question "is it reasonable", you have to be clear when you use the word "protect" what you're trying to protect against...
For example, are you trying to:
authorized users from using their downloaded content via your app under certain circumstances (e.g. rental period expiry, copied to a different computer, etc)?
authorized users from using their downloaded content via any app under certain circumstances (e.g. rental period expiry, copied to a different computer, etc)?
unauthorized users from using content received from authorized users via your app?
unauthorized users from using content received from authorized users via any app?
known users from accessing unpurchased/unauthorized content from the media library on your server via your app?
known users from accessing unpurchased/unauthorized content from the media library on your server via any app?
unknown users from accessing the media library on your server via your app?
unknown users from accessing the media library on your server via any app?
etc...
"Any app" in the above can include things like:
other player programs designed to interoperate/cooperate with your site (e.g. for flickr)
programs designed to convert content to other formats, possibly non-DRM formats
hostile programs designed to
From the article you linked, you can start to see some of the possible limitations of applying the DRM client-side...
The third, originally used in PyMusique, a Linux client for the iTunes Store, pretends to be iTunes. It requested songs from Apple's servers and then downloaded the purchased songs without locking them, as iTunes would.
The fourth, used in FairKeys, also pretends to be iTunes; it requests a user's keys from Apple's servers and then uses these keys to unlock existing purchased songs.
Neither of these approaches required breaking the DRM being applied, or even hacking any of the products involved; they could be done simply by passively observing the protocols involved, and then imitating them.
So the question becomes: are you trying to protect against these kinds of attack?
If yes, then client-applied DRM is not reasonable.
If no (for example, you're only concerned about people using your app, like Apple/iTunes does), then it might be.
(repeat this process for every situation you can think of. If the adig nswer is always either "client-applied DRM will protect me" or "I'm not trying to protect against this situation", then using client-applied DRM is resonable.)
Note that for the last four of my examples, while DRM would protect against those situations as a side-effect, it's not the best place to enforce those restrictions. Those kinds of restrictions are best applied on the server in the login/authorization process.
If the server serves the content without protection, it's because the encryption is per-client.
That being said, wireshark will foil your best-laid plans.
Encryption alone is usually just as good as sending a boolean telling you if you're allowed to use the content, since the bypass is usually just changing the input/output to one encryption API call...
You want to use heavy binary obfuscation on the client side if you want the protection to literally hold for more than 5 minutes. Using decryption on the client side, make sure the data cannot be replayed and that the only way to bypass the system is to reverse engineer the entire binary protection scheme. Properly done, this will stop all the kids.
On another note, if this is a product to be run on an operating system, don't use processor specific or operating system specific anomalies such as the Windows PEB/TEB/syscalls and processor bugs, those will only make the program even less portable than DRM already is.
Oh and to answer the question title: No. It's a waste of time and money, and will make your product not work on my hardened Linux system.

How to implement a secure distributed social network?

I'm interested in how you would approach implementing a BitTorrent-like social network. It might have a central server, but it must be able to run in a peer-to-peer manner, without communication to it:
If a whole region's network is disconnected from the internet, it should be able to pass updates from users inside the region to each other
However, if some computer gets the posts from the central server, it should be able to pass them around.
There is some reasonable level of identification; some computers might be dissipating incomplete/incorrect posts or performing DOS attacks. It should be able to describe some information as coming from more trusted computers and some from less trusted.
It should be able to theoretically use any computer as a server, however, optimizing dynamically the network so that typically only fast computers with ample internet work as seeders.
The network should be able to scale to hundreds of millions of users; however, each particular person is interested in less than a thousand feeds.
It should include some Tor-like privacy features.
Purely theoretical question, though inspired by recent events :) I do hope somebody implements it.
Interesting question. With the use of already existing tor, p2p, darknet features and by using some public/private key infrastructure, you possibly could come up with some great things. It would be nice to see something like this in action. However I see a major problem. Not by some people using it for file sharing, BUT by flooding the network with useless information. I therefore would suggest using a twitter like approach where you can ban and subscribe to certain people and start with a very reduced set of functions at the beginning.
Incidentally we programmers could make a good start to accomplish that goal by NOT saving and analyzing to much information about the users and use safe ways for storing and accessing user related data!
Interesting, the rendezvous protocol does something similar to this (it grabs "buddies" in the local network)
Bittorrent is a mean of transfering static information, its not intended to have everyone become producers of new content. Also, bittorrent requires that the producer is a dedicated server until all of the clients are able to grab the information.
Diaspora claims to be such one thing.

Why would you not want to use Cloud Computing [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Our company is considering moving from hosting our own servers to EC2 and I was wondering if this was a good idea.
I have seen a lot of stuff about can cloud computing (and specifically EC2) do x, or can it do y, but my real question is why would you NOT want to use it?
If you were setting up a business, what are the reasons (outside of cost) that you would choose to go through the trouble of managing your own servers?
I know there are a lot of cost calculations you can put in regarding bandwidth, disk usage etc, but there are of course, other costs regarding maintenance of your own server. For the sake of this discussion I am willing to consider the costs roughly equal.
I seem to remember that Joel Spolsky wrote a little blur on this at one time, but I was unable to find it.
Anyone have any reasons?
Thanks!
I can think of several reasons why not use EC2 (and I am talking about EC2, not grid comp in general):
Reliability: Amazon makes no guarantee as to the availability / down time / safety of EC2
Security: Amazon does not makes any guarantee as to whom it will disclose your data
Persistence: ensuring persistence of your data (that includes, effort to set up the system) is complicated over EC2
Management: there are very few integrated management tools for a cloud deployed on EC2
Network: the virtual network that allows EC2 instances to communicates has some quite painful limitations (latency, no multicast, arbitrary topological location)
And to finish that:
Cost: on the long run, if you are not using EC2 to absorb peak traffic, it is going to be much more costly than investing into your own servers (cheapo servers like Supermicro cost just a couple of hundred bucks...)
On the other side, I still think EC2 is a great way to soak up non-sensitive peak traffic, if your architecture allows it.
Some questions to ask:
What is the expected uptime, and how does downtime affect your business? What sort of service level agreement can you get, what are the penalties for missing it, and how confident are you that the SLA uptime goals will be met? (They may be better or worse at keeping the systems up than you are.)
How sensitive is the data you're proposing to put into the cloud? Again, we get into the questions of how secure the provider promises to be, what the contractual penalties and indemnities are, and how confident you are that the provider will live up to the agreement. Further, there may be external requirements. If you deal with health-related data in the US, you are subject to very strict requirements. If you deal with credit card data, you also have responsibilities (contractual, not legal).
How easy will it be to back out of the arrangement, should service not be what was expected, or if you find a better deal elsewhere? This includes not only getting your data back, but also some version of the applications you've been using. Consider the possibilities of your provider going bankrupt (Amazon isn't going to go bankrupt any time soon, but they could split off a cloud provider which could then go bankrupt), or having an internal reorganization. Bear in mind that a company in serious trouble may not be able to live up to your expectations of service.
How much independence are you going to have? Are you going to be running their software or software you pick? How easy will it be to reconfigure?
What is the pricing scheme? Is it possible for the bills to hit unacceptable levels without adequate warning?
What is the disaster plan? Ideally, it's running your software on servers in a different location from where the disaster hit.
What does your legal department (or retained corporate attorney) think of the contract? Is there a dispute resolution mechanism, and, if so, is it fair to you?
Finally, what do you expect to get out of moving to the cloud? What are you willing to pay? What can you compromise on, and what do you need?
Highly sensitive data might be better to control yourself. And there's legislation; some privacy sensitive information, for example, might not leave the the country.
Also, except for Microsoft Azure in combination with SDS, the data stores tend to be not relational, which is a nuisance in certain cases.
Maybe concern that that big a company will more likely be approached by an Agent Smith from the government to spy on everyone that a little small provider somewhere.
Big company - more customers - more data to aggregate and recognize patterns - more resources to organize a sophisticated watch system.
Maybe it's more of a fantasy but who ever knows?
If you don't have a paranoia it doesn't mean yet that you are not being watched.
The big one is: if Amazon goes down, there's nothing you can do to bring it back up.
I'm not talking about doomsday scenarios where the company disappears. I mean that you're at the mercy of their downtime, with little recourse of your own.
Security -- you don't know what is being done to your data
Dependency -- your business is now directly intertwined with the provider
There are different kinds of cloud computing with lots of different vendors providing it. It would make me nervous to code my apps to work with a single cloud vendor. that you specifically had to code for..amazon and Microsoft I believe you need to specifically code for that platform - maybe google too.
That said, I recently jettisoned my own dedicated servers and moved to Rackspaces Mosso Cloud platform (which have no proprietary coding necessary) and I am really, really pleased with it so far. Cut my costs in half, and performance is way better than before. My sql server databases are now running on 64Bit enterprise SQL server versions with 32G of ram - that would have cost me a fortune on my previous providers infrastructure.
As far as being out of luck when the cloud is down, that was true if my dedicated server went down - it never did, but if there was a hardware crash on my dedicated server, I am not sure it would be back on-line any quicker than rackspace could bring their cloud back up.
Lack of control.
Putting your software on someone else's cloud represents handing over some control. They might institute a file upload size limit, or memory limits which could ruin your application. A security vulnerbility in their control panel could get your site hacked.
Security issues are not relevant if your application does its own encryption. Amazon is then storing encrypted data that they have no way of decrypting.
But in addition to the uptime issues, Amazon could decide to increase their prices to whatever they want. If you're dependent on them, you'll just have to pay it.
Depends how much you trust your own infrastructure in comparison to a 3rd party cloud service. In my opinion, most businesses (at least not IT related) should choose the later.
Another thing you lose with the cloud is the ability to choose exactly what operating system you want to run. For example, the latest Fedora Linux kernel available on EC2 is FC8, and the latest Windows version is Server 2003.
Besides the issues raised regarding dependability, reliability, and cost is the issue of data ownership. When you locate data on someone else's server, you no longer control who views, accesses, modifies, or uses that data. While the cloud operators can limit your access, you possess no way of limiting theirs or limiting who they give access to. Yes, you can encrypt all the data on the server but you lack any way of knowing who possesses root access to the server itself and any means to stop others from downloading your encrypted data and cracking it open. You lose control over your data; depending on what type of apps you are running and the proprietary nature of the data involved, this could engender corporate security and/or liability risks.
The other factor to consider is what would happen to your company if Amazon and/or EC2 were to suddenly vanish overnight. While a seemingly preposterous position, it could happen. Would you be able to quickly fill the hole and restore service, or would your potentially revenue generating apps languish while the IT staff scramble to obtain servers and bandwidth to get them back online? Also, what would happen to your data? The cloud hard drive holding all your information still exists, somewhere, and could pose a potential liability risk depending on the information you stored there--items such as personal information, business transaction records etc.
If I was starting my own business now, I would go through the hassle of purchasing and maintaining my own severs so I retained data ownership. I could control root access to the hardware, as well as control who can access and modify the data.
Unanswered security questions.
Really, do you want your IP out there, where you're not the one in control of it?
Most cloud computing environment are at least partially vendor specific. There's no good way to move stuff from one cloud to another without having to do a lot of rewriting. That sort of lock-in puts you at the mercy of one vendor when it comes to downtime, price increases, etc. If you rent or own your own servers, hosting providers and colos are pretty much interchangeable. You always have the option of moving somewhere else.
This may change in the future, as these things become standardized, but for now tying yourself to the cloud means tying yourself to a specific vendor.
This is kind of like the "Why would you use Linux" comment I received from management many years ago. The response I got was that it is a solution in search of a problem.
So what are your goals and objectives in moving to EC2?
I'd be interested to know if you'd still want to move to a cloud, if it was your own.
Cloud computing has brought parallel programming a little closer to the masses, but you still have to understand how best to use it - otherwise you're going to waste compute cycles and bandwidth.
Re-architecting your application for most efficient use of a cloud computing service is non-trivial.
Besides what has already been said here, we have to consider uniformity across the business. Are all of you applications going to be hosted in the cloud, or only most? Is most enough to pull the trigger on using the cloud when you still have to have personnel to handle a few special servers?
In particular, there might be special hardware that you need to communicate with such modems to accept incoming data, or voice cards that make automated phone calls. I don't know how such things could be handled in a cloud environment.

What are the reasons for a "simple" website not to choose Cloud Based Hosting?

I have been doing some catching up lately by reading about cloud hosting.
For a client that has about the same characteristics as StackOverflow (Windows stack, same amount of visitors), I need to set up a hosting environment. Stackoverflow went from renting to buying.
The question is why didn't they choose cloud hosting?
Since Stackoverflow doesn't use any weird stuff that needs to run on a dedicated server and supposedly cloud hosting is 'the' solution, why not use it?
By getting answers to this question I hope to be able to make a weighted decision myself.
I honestly do not know why SO runs like it does, on privately owned servers.
However, I can assume why a website would prefer this:
Maintainability - when things DO go wrong, you want to be hands-on on the problem, and solve it as quickly as possible, without needing to count on some third-party. Of course the downside is that you need to be available 24/7 to handle these problems.
Scalability - Cloud hosting (or any external hosting, for that matter) is very convenient for a small to medium-sized site. And most of the hosting providers today do give you the option to start small (shared hosting for example) and grow to private servers/VPN/etc... But if you truly believe you will need that extra growth space, you might want to count only on your own infrastructure.
Full Control - with your own servers, you are never bound to any restrictions or limitations a hosting service might impose on you. Run whatever you want, hog your CPU or your RAM, whatever. It's your server. Many hosting providers do not give you this freedom (unless you pay up, of course :) )
Again, this is a cost-effectiveness issue, and each business will handle it differently.
I think this might be a big reason why:
Cloud databases are typically more
limited in functionality than their
local counterparts. App Engine returns
up to 1000 results. SimpleDB times out
within 5 seconds. Joining records from
two tables in a single query breaks
databases optimized for scale. App
Engine offers specialized storage and
query types such as geographical
coordinates.
The database layer of a cloud instance
can be abstracted as a separate
best-of-breed layer within a cloud
stack but developers are most likely
to use the local solution for both its
speed and simplicity.
From Niall Kennedy
Obviously I cannot say for StackOverflow, but I have a few clients that went the "cloud hosting" route. All of which are now frantically trying to get off of the cloud.
In a lot of cases, it just isn't 100% there yet. Limitations in user tracking (passing of requestor's IP address), fluctuating performance due to other load on the cloud, and unknown usage number are just a few of the issues that have came up.
From what I've seen (and this is just based on reading various blogged stories) most of the time the dollar-costs of cloud hosting just don't work out, especially given a little bit of planning or analysis. It's only really valuable for somebody who expects highly fluctuating traffic which defies prediction, or seasonal bursts. I guess in it's infancy it's just not quite competitive enough.
IIRC Jeff and Joel said (in one of the podcasts) that they did actually run the numbers and it didn't work out cloud-favouring.
I think Jeff said in one of the Podcasts that he wanted to learn a lot of things about hosting, and generally has fun doing it. Some headaches aside (see the SO blog), I think it's a great learning experience.
Cloud computing definitely has it's advantages as many of the other answers have noted, but sometimes you just want to be able to control every bit of your server.
I looked into it once for quite a small site. Running a small Amazon instance for a year would cost around £700 + bandwidth costs + S3 storage costs. VPS hosting with similar specs and a decent bandwidth allowance chucked in is around £500. So I think cost has a lot to do with it unless you are going to have fluctuating traffic and lots of it!
I'm sure someone from SO will answer it but "Isn't just more hassle"? Old school hosting is still cheap and unless you got big scalability problems why would you do cloud hosting?

Resources