GrapheneDB vs Graph Story on Heroku - heroku

I have no experience with Graph DB applications but I'm trying to write one. I intend to host on Heroku.
I can see there are 2 Graph DB service providers with free plans but I can't decide which one to use, they are both marketing themselves using different attributes, and I can't compare ! For example:
GrapheneDB mentions only the node and relationship count limit, and the query time limit. But nothing about the storage limit.
Graph Story mentions the RAM limit, `storage limit and data transfer limit.
Other properties are mentioned too but they aren't comparable between both providers.
Has anyone tried any of these services on Heroku and could share his experience please ?
EDIT: I found this page which give an idea about how much space does neo4j need.

I'll take a spin at answering this question by staying as much as objective as possible, as, I and some other frequent answerers here, have good relationships with both providers.
Both have their own pro's and con's, and I think looking only at the Heroku side is maybe not a good choice.
There is also one difference between both that you need to know, GraphStory provide Neo4j enterprise while GrapheneDB provide Neo4j Community, this is a fact. However I am personally thinking that if you run neo4j on heroku, then you don't need enterprise because "enterprise" users of Neo4j are using their own environment with clustering on servers with "real" RAM and SSD's, which in fact can be managed by both providers with a licence and support.
You speak about the storage limit. Well the storage depends about your amount of nodes, relationships and properties in the database, so if there is a limit of 1000 nodes you don't need to care about the storage limit I think.
I tried both on heroku, and except the nodes limit, there is not that much difference in matters of performance when you deploy free dynos.
If you are a startup, running Neo4j on heroku is great if you take the paid plan of course, both providers have cool support and both are rewarding their long term customers.
If you look only at the free dynos, then you don't need to care about the limitations, because it will just be LIMITED, in any way !
Outside of Heroku, here are some points I viewed :
GrapheneDB runs on all platforms including Azure which is a cool stuff
GraphStory runs enterprise so you can benefit from the high performance cache
GrapheneDB has an accessible API for creating neo4j servers on the fly and destroying it.
Depending of your location, you may want support from Europe or from US.
basic plans, on both, are suffering of some latency or boot time when not used for a long time
Both have support for spatial
Both are actors in the Neo4j community with cool stuff, you can meet them in real :)
Now, you can test them, both, for free !!!

I tried yesterday one CRUD application deployed in 2 Heroku app: the first with Graph Story and the other with GrapheneDB.
I had monitored with NewRelic and I detected Graph Story app have a medium latency variable from 1 to 2 seconds, instead the GrapheneDB service need only from 20 to 40 ms to perform the same operations.
Graph Story latency:
GrapheneDB latency:
I wanted to try a paid plan for some minutes in Graph Story, but for doing that you need to contact the assistance and waiting for an unknown time. Instead, GrapheneDB allow you to change plan autonomously without any issue.
I tried to export db in Graph Story, but the operation is not in real time: You need to wait for a link dispatched via email. I initiate the operation for 2 times, but the email after 10 hours not yet arrived.
Instead in GrapheneDB the export is immediate without waiting anxious emails

Graph Story offers the following features that differentiate it from other offerings:
Graph Story offers the Enterprise version of Neo4j
There are no limits on nodes or relationships on the free plan
Max query time is 30 seconds
You wouldn't want to use the free plan in production, of course, but it's excellent for proofs-of-concept, learning Neo4j, small hobby projects, etc.
(Full disclosure: I'm the CTO at Graph Story.)

Related

Amazon SimpleDB or DynamoDB

We are building a mobile app with a rails CMS to manage it.
What our app look like?
Every admin user of the app can set one private channel with very small amount of data -
About 50 short strings.
Users can then download the app and register few different channels and fetch the data from the server to their devices. The data will be stored locally and will not be fetched again unless the admin user will update the data (but we assume that it won't happen so often). Every channel will be available to not more then 500 devices.
The users can contribute to the channel but this data will be stored on S3 and not on the database.
2 important points:
Most of the channels will be active for 5 months and not for 500 users +-. But most of the activity will happen on the same couple of days.
Every channel is for small amout of users (500) But we hope :) to get to hundreds of thousens of admin users.
Building the CMS with rails we saw that using SimpleDB is more strait-forward then using DynamoDB. But, as we are not server experts, we saw the limitations of SimpleDB and we don't know if SimpleDB could handle the amount of data transfer that we will have (if our app will succeed). another important point is that DynamoDb costs are much higher and not depended on the use while SimpleDb will be much cheaper at the beginning.
The question is:
Does simpleDB can feet our needs?
Could we migrate later to dynamoDB if our service will grow in the future ?
Starting out with a new project and not really knowing what to expect from the usage i'd say that the better option is to go with SimpleDB. It doesn't sound like your usage is going to be very high SimpleDB should be able to handle that no problem. The real power of dynamoDB comes in when you really have a lot of load. You don't fall into that category it seems.
If you design your application correctly switching between SimpleDB and DynamoDB should be a simple task if you decide at some point that SimlpeDB is not working out. I do these kind of switches all the time with other components in my software. Since both databases are NoSQL you shouldn't have a problem converting between the two. Just make sure that any any features you use in SimpleDB are available in DynamoDB. Make sure to design your database design for both DynamoDB has stricter requirements using indexes make sure that the two will be compatible.
That being said. Plenty of people have been using SimpleDB for their applications and I don't expect that you would see any performance problems unless your product really takes off, at which time you can invest in resources to move to DynamoDB.
Aside from all that we have the pricing, like you already mentioned. SimpleDB is the obvious solution for your use case.

How many users a single-small Windows azure web role can support?

We are creating a simple website but with heave user logins (about 25000 concurrent users). How can I calculate no. of instances required to support it?
Load testing and performance testing are really the only way you're going to figure out the performance metrics and instance requirements of your app. You'll need to define "concurrent users" - does that mean 25,000 concurrent transactions, or does that simply mean 25,000 active sessions? And if the latter, how frequently does a user visit web pages (e.g. think-time between pages)? Then, there's all the other moving parts: databases, Azure storage, external web services, intra-role communication, etc. All these steps in your processing pipeline could be a bottleneck.
Don't forget SLA: Assuming you CAN support 25,000 concurrent sessions (not transactions per second), what's an acceptable round-trip time? Two seconds? Five?
When thinking about instance count, you also need to consider VM size in your equation. Depending again on your processing pipeline, you might need a medium or large VM to support specific memory requirements, for instance. You might get completely different results when testing different VM sizes.
You need to have a way of performing empirical tests that are repeatable and remove edge-case errors (for instance: running tests a minimum of 3 times to get an average; and methodically ramping up load in a well-defined way and observing results while under that load for a set amount of time to allow for the chaotic behavior of adding load to stabilize). This empirical testing includes well-crafted test plans (e.g. what pages the users will hit for given usage scenarios, including possible form data). And you'll need the proper tools for monitoring the systems under test to determine when a given load creates a "knee in the curve" (meaning you've hit a bottleneck and your performance plummets).
Final thought: Be sure your load-generation tool is not the bottleneck during the test! You might want to look into using Microsoft's load-testing solution with Visual Studio, or a cloud-based load-test solution such as Loadstorm (disclaimer: Loadstorm interviewed me about load/performance testing last year, but I don't work for them in any capacity).
EDIT June 21, 2013 Announced at TechEd 2013, Team Foundation Service will offer cloud-based load-testing, with the Preview launching June 26, coincident with the //build conference. The announcement is here.
No one can answer this question without a lot more information... like what technology you're using to build the website, what happens on a page load, what backend storage is used (if any), etc. It could be that for each user who logs on, you compute a million digits of pi, or it could be that for each user you serve up static content from a cache.
The best advice I have is to test your application (either in the cloud or equivalent hardware) and see how it performs.
It all depends on the architecture design, persistence technology and number of read/write operations you are performing per second (average/peak).
I would recommend to look into CQRS-based architectures for this kind of application. It fits cloud computing environments and allows for elastic scaling.
I was recently at a Cloud Summit and there were a few case studies. The one that sticks in my mind is an exam app. it has a burst load of about 80000 users over 2 hours, for which they spin up about 300 instances.
Without knowing your load profile it's hard to add more value, just keep in mind concurrent and continuous are not the same thing. Remember the Stack overflow versus Digg debacle "http://twitter.com/#!/spolsky/status/27244766467"?

Best scaling methodologies for a highly traffic web application?

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.
Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?
Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.
No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.
That said:
PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.
It's scaling the state/data storage that gets more complicated.
For your app, you have three basic jobs:
what ad do we show?
serving the add
logging the impression
Each of these will require thought and likely different tools.
The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.
Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.
The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.
All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.
You do realize that 40 billion per month is roughly 15,500 per second, right?
Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.
This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.
I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.
That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.
Good luck!
I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.
In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.
You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".
PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.
But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

What are the reasons for a "simple" website not to choose Cloud Based Hosting?

I have been doing some catching up lately by reading about cloud hosting.
For a client that has about the same characteristics as StackOverflow (Windows stack, same amount of visitors), I need to set up a hosting environment. Stackoverflow went from renting to buying.
The question is why didn't they choose cloud hosting?
Since Stackoverflow doesn't use any weird stuff that needs to run on a dedicated server and supposedly cloud hosting is 'the' solution, why not use it?
By getting answers to this question I hope to be able to make a weighted decision myself.
I honestly do not know why SO runs like it does, on privately owned servers.
However, I can assume why a website would prefer this:
Maintainability - when things DO go wrong, you want to be hands-on on the problem, and solve it as quickly as possible, without needing to count on some third-party. Of course the downside is that you need to be available 24/7 to handle these problems.
Scalability - Cloud hosting (or any external hosting, for that matter) is very convenient for a small to medium-sized site. And most of the hosting providers today do give you the option to start small (shared hosting for example) and grow to private servers/VPN/etc... But if you truly believe you will need that extra growth space, you might want to count only on your own infrastructure.
Full Control - with your own servers, you are never bound to any restrictions or limitations a hosting service might impose on you. Run whatever you want, hog your CPU or your RAM, whatever. It's your server. Many hosting providers do not give you this freedom (unless you pay up, of course :) )
Again, this is a cost-effectiveness issue, and each business will handle it differently.
I think this might be a big reason why:
Cloud databases are typically more
limited in functionality than their
local counterparts. App Engine returns
up to 1000 results. SimpleDB times out
within 5 seconds. Joining records from
two tables in a single query breaks
databases optimized for scale. App
Engine offers specialized storage and
query types such as geographical
coordinates.
The database layer of a cloud instance
can be abstracted as a separate
best-of-breed layer within a cloud
stack but developers are most likely
to use the local solution for both its
speed and simplicity.
From Niall Kennedy
Obviously I cannot say for StackOverflow, but I have a few clients that went the "cloud hosting" route. All of which are now frantically trying to get off of the cloud.
In a lot of cases, it just isn't 100% there yet. Limitations in user tracking (passing of requestor's IP address), fluctuating performance due to other load on the cloud, and unknown usage number are just a few of the issues that have came up.
From what I've seen (and this is just based on reading various blogged stories) most of the time the dollar-costs of cloud hosting just don't work out, especially given a little bit of planning or analysis. It's only really valuable for somebody who expects highly fluctuating traffic which defies prediction, or seasonal bursts. I guess in it's infancy it's just not quite competitive enough.
IIRC Jeff and Joel said (in one of the podcasts) that they did actually run the numbers and it didn't work out cloud-favouring.
I think Jeff said in one of the Podcasts that he wanted to learn a lot of things about hosting, and generally has fun doing it. Some headaches aside (see the SO blog), I think it's a great learning experience.
Cloud computing definitely has it's advantages as many of the other answers have noted, but sometimes you just want to be able to control every bit of your server.
I looked into it once for quite a small site. Running a small Amazon instance for a year would cost around £700 + bandwidth costs + S3 storage costs. VPS hosting with similar specs and a decent bandwidth allowance chucked in is around £500. So I think cost has a lot to do with it unless you are going to have fluctuating traffic and lots of it!
I'm sure someone from SO will answer it but "Isn't just more hassle"? Old school hosting is still cheap and unless you got big scalability problems why would you do cloud hosting?

mosso versus gogrid which is better?

I have reasonable experience to manage my own server, so gogrid style management is not a problem. But seems mosso is a tag cheaper somewhat- except the very difficult to access compute cycles terms. Anyone could share about this would be very welcomed.
Well, even at the current moment as correct answer is marked GoGrid choice, I think I need to share my experience with GoGrid.
It's been several weeks after we broke our commitment with them and I think I'm pretty calm now to write cons for them.
1) Images. We were trying to use Windows 2008 images and those were pretty old. To be up to date, you need to install 80+ updates and that takes a while. But that's not the worst thing. Worst thing is, that default image hdd size is 20gb and that was not enough to complete windows updating, at least in automatic way (not talking about installing additional software). There's no way to increase image size, so you need to make all kinds of workarounds (for example disable virtual memory, when installing).
2) Support. It's not fanatic. I would call it robotic. Although live chat is working, at least we were unable to solve by live chat most of the problems, because live chat support personel would always forward request to upper level, which is not accessible through live chat. Another thing is, that as I understood, engineers, that have real knowledge and access to infrastructure don't work at night and in weekends (I was working from Europe, so I had completely different time zone).
3) Service Level Agreement. You need to be careful about small print (for example I've missed that rule 1hour of non working is compensated 100x was working only for one month bill), but there are things, that are not mentioned - for example I was told, that SLA terms do not work for cloud storage, although I think you won't find this mentioned in SLA.
4) Reaction time. Although in SLA they say, that will solve any issue in two hours, we couldn't get solution in 10 days. Problem was clear: network speed between gogrid server instances, also between instance and cloud storage was 10-15kbps (measured using several tools, such as netio and etc., tested several instances and so on). That wasn't because they forgot or smth., we were checking status at various levels every day. My management talked with VP of technology or something and he promised that problem will be solved in nearest time, several days passed and no solution was proposed. And some of the emails about how they are investigating problem made me laugh.
5) Internet speeds. Sometimes they were really good (I've measured 550mbps download speed), but sometimes they are terrible (upload up to 0.05mbps).
If someone thinks, that this is some kind of competitors posting, I have chat and email logs about mentioned issues, also screen shots of internet speed tests and could provide under request.
Ok, and one good thing about their service - you can use several IP addresses on one instance (what our current hosting provider - Amazon EC2 is unable to do).
Stay away from GoGrid !
I don't have any experience with Mosso, but I do have (unfortunately) VERY bad experience with GoGrid.
As other people mentioned, their support is horrible. Most times you will get a live chat person that really is no help at all - doesn't really know their system or how it works so he can't really help with any problem beyond restarting your server.
Another issue is their performance which is at best unreliable and at worst just not there. Starting from I/O which can drop to < 1mb/s (measured by a few tools) - ranging to network connections that are very slow - load balancers which do not spread the load (2 servers on RoundRobin get 70/30)
Not to mention a very buggy portal - new server picks a free ip, which I am then told is in use...and not by me - even though I have the whole range "assigned" to me -
new cases which are saved without the text - buttons which say "upgrade to a new plan" but do nothing... etc... etc...
Their billing department which is not responsive and you have to argue about everything (why am I paying $0.5/gb traffic when the site states $0.29 ?????)
I have been using them for about a year now - and that's only because I don't have the time to move. Hopefully I will be able to get the hell out of there in a month.
As you can tell, I am very very frustrated with them. I know it's my fault I didn't run away sooner, but I really didn't expect such a low level of service and quality.
beware....
Yoav.
Mosso has way better service though, and the clients stay happy. The only issue I have experienced with them ever was installing DNN (which is a pain period) and a single client machine refused to allow for FTP access to their site... but again, Mosso techs did everything they could to get it going.
It's simple, Mosso is just like a "reseller" hosting. They provide you everything whitelabel from billing to control panel then you sell it back to customers.
If you are developer, I recommend you choose GoGrid. Firstly, Mosso doesn't provide SSH access. Secondly, if you are RoR/Mongrel user, you are capped to limited RAM (unless you pay extra in addition to $100). Moreover, GoGrid allows you to choose server image (CentOS, Redhat, Windows) with some out-of-the-box support for RoR and LAMP.
Somemore, GoGrid provides you initial credits ($50 or $95 if you use MS-WEBFWRD) for you to try out before actually paying for it.
Mosso does not give you Admin control over the "servers" anymore...
Disclosure: I am the Technology Evangelist for GoGrid.
I wanted to address some of the points above by #Giedrius and #Yoav. I'm sorry if your experience was lower than expected. We have and continue to make dramatic improvements and upgrades to both our product features as well as our service. That being said, I want to answer a few points that you listed above, specifically:
1) Images - Do note that the HD size (persistent storage) is tied to the RAM allocation. Our base images for the lowest RAM allocation (512 MB) is now 30 GBs. Also, because some users experienced some performance issues with low allocations of RAM on Windows servers, we have set a minimum allocation of 1 GB or higher for most Windows instances. Also, all of our Windows 2008 instances now have SP2 on them: wiki.gogrid.com/wiki/index.php/Server_Images#Windows_2008_Server
2) Support - We are always working on making our support team and processes even better. Remember that there are several public clouds that charge for support, something we don't do. Yes, it is available 24/7/365 and you are correct that there are typically more support personnel available during business hours (that is the norm for many companies). Be we are here to help 24x7. Also, every GoGrid account gets a dedicated service team which consists of a variety of personnel from our organization (acct mgmt, tech support, billing, etc.)
3) SLA - We offer one of the most robust SLAs in the marketplace. Also, Cloud Storage IS in fact covered in our SLA under Section VI here: www.gogrid.com/legal/sla.php .
4) Reaction time - I do not believe that we ever state in the SLA that any issue will be "resolved" within 2 hours. I doubt that ANY hosting provider can offer that, simply because of the nature of hosting and the complexity therein. We will acknowledge and respond to tickets (as stated within the SLA) within 2 hours or 30 minutes depending on the nature of the ticket. I'm sorry if that isn't clear so please let me know where it can be better explained.
5) Internet speeds - we have multiple bandwidth providers for our datacenter. It is not typical that there is latency, jitter or slow transfer speeds. If a situation is encountered where the speeds are not what you expect, I encourage you to open a support ticket so that we can investigate.
6) I/O - recently we have been benchmarked by an independent 3rd party, CloudHarmony.com, as having the best I/O of cloud providers: http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-cloud.html
7) Network Connections - see #5 above
8) Load Balancers - if you are encountering balancing issues, we encourage you to report it. Details on our LB can be found on the wiki: wiki.gogrid.com/wiki/index.php/(F5)_Load_Balancer
9) Portal - We continue to make optimizations to the web portal including recently launching a "list view" for customers with larger environments. If the portal is "misbehaving", I recommend clearing your cache and using the latest browser version (I personally use Chrome and Firefox regularly on the portal w/o issue). Alternatively, you could use the API to manage your GoGrid infrastructure.
10) Transfer Plan - A few months ago, we released some new RAM and Transfer Plans. It seems that you are still on the old Transfer plan if you have $0.50/GB instead of $0.29. We don't automatically change customers' plans without their permission. So I recommend that you upgrade your plan to enjoy the new pricing.
Hope that helps answer the questions/concerns. I didn't mean for it to be a sales pitch (as I'm not a sales guy) but I wanted to be sure that other readers had "the other side of the story."
Please contact me should you have any questions: michael[at]gogrid.com
Thanks!
-Michael

Resources