Amazon SimpleDB or DynamoDB - ruby-on-rails-3.1

We are building a mobile app with a rails CMS to manage it.
What our app look like?
Every admin user of the app can set one private channel with very small amount of data -
About 50 short strings.
Users can then download the app and register few different channels and fetch the data from the server to their devices. The data will be stored locally and will not be fetched again unless the admin user will update the data (but we assume that it won't happen so often). Every channel will be available to not more then 500 devices.
The users can contribute to the channel but this data will be stored on S3 and not on the database.
2 important points:
Most of the channels will be active for 5 months and not for 500 users +-. But most of the activity will happen on the same couple of days.
Every channel is for small amout of users (500) But we hope :) to get to hundreds of thousens of admin users.
Building the CMS with rails we saw that using SimpleDB is more strait-forward then using DynamoDB. But, as we are not server experts, we saw the limitations of SimpleDB and we don't know if SimpleDB could handle the amount of data transfer that we will have (if our app will succeed). another important point is that DynamoDb costs are much higher and not depended on the use while SimpleDb will be much cheaper at the beginning.
The question is:
Does simpleDB can feet our needs?
Could we migrate later to dynamoDB if our service will grow in the future ?

Starting out with a new project and not really knowing what to expect from the usage i'd say that the better option is to go with SimpleDB. It doesn't sound like your usage is going to be very high SimpleDB should be able to handle that no problem. The real power of dynamoDB comes in when you really have a lot of load. You don't fall into that category it seems.
If you design your application correctly switching between SimpleDB and DynamoDB should be a simple task if you decide at some point that SimlpeDB is not working out. I do these kind of switches all the time with other components in my software. Since both databases are NoSQL you shouldn't have a problem converting between the two. Just make sure that any any features you use in SimpleDB are available in DynamoDB. Make sure to design your database design for both DynamoDB has stricter requirements using indexes make sure that the two will be compatible.
That being said. Plenty of people have been using SimpleDB for their applications and I don't expect that you would see any performance problems unless your product really takes off, at which time you can invest in resources to move to DynamoDB.
Aside from all that we have the pricing, like you already mentioned. SimpleDB is the obvious solution for your use case.

Related

Where to store files (pictures & vids) for my website?

I'm a newbie web developer and I have a basic question regarding my Laravel based website: Where should I put my files? I know there are services like Amazon S3, but firstly I don't know how to work with them, and second they are NOT FREE.
There is going to be a fairly large amount of data including pics and videos (around 10 GB).where should I store them? And how should I use Laravel to allow users to upload files?
If it will be a bigger project, you should use a cloud service. This is going to be the future of backend development as it is making your project much easier and faster to mantain and run. If you want to make your own backend, this will take a long time to get it done, since you have to learn a lot of new things and should be good at it. There would be many key aspects you have to be aware of. Like securitiy, scaling, performance and so on ... Like you suggested Amazon AWS or imo much better Google Firebase. I think Google Firebase should be your pick because it is really easy do understand and has a great documentation. Next to the storing service (Google Cloud Storage) there are many several services you could use in the future like analytics, machine learning or nosql databases. And the good thing is that you can connect them all together.
With Google Firebase you have a Free Spark Plan which is completely free with some limitations. And if you scale to many users you can upgrade to the other plans, which is not very expensive. Don't forget that your own Back-End would cost you time and also money for the electricity and hardware cost.
If you have more questions be free to ask me :)

What is the right way to realize a large web app

I have to realize a web app able to guarantee two main actions execution:
first one action is about allowing to a small number of users to upload ads or posts, these A users will can upload ads in the application as they will want but they will can upload photos until five megabyte overall threshold. Moreover ads total number will be approximately 10k.
Second action is about allowing to a broad public (1k users per day) to looking for and check published articles from A users, these research will can be more accurate inserting advanced filters.
I would like to know if is strictly necessary build up my app in a scalable way or if I could simply use a MVC approach?
This app will be developed using Laravel framework and it will hosted by Amazon server.
what do you recommend me to do ?
I would like to have some advices, tips and tricks to do it in the best way.
Thanks in advance
The MVC approach is fine, Laravel, or any platform, can be scaled in multiple ways. The simplest is separating the functions of DB, laravel app, cache, queue, to separate servers, and each of those pieces can be scaled separately.
There is a great online set of videos about this, https://serversforhackers.com/scaling-laravel/forge.
But unless you know you will have a large amount of traffic right away, it's better to start with a simpler structure, you will save on cost and it's not hard to scale it later. I mean, start with one server for now, then maybe separate the functions (cache, DB etc) to separate servers as you find they need to be scaled.
If you want to save a little hassle though, I do recommend Laravel Forge, and Envoyer. It makes deploying and managing servers a lot easier. Envoyer is for deployments, great to automate all that.

GrapheneDB vs Graph Story on Heroku

I have no experience with Graph DB applications but I'm trying to write one. I intend to host on Heroku.
I can see there are 2 Graph DB service providers with free plans but I can't decide which one to use, they are both marketing themselves using different attributes, and I can't compare ! For example:
GrapheneDB mentions only the node and relationship count limit, and the query time limit. But nothing about the storage limit.
Graph Story mentions the RAM limit, `storage limit and data transfer limit.
Other properties are mentioned too but they aren't comparable between both providers.
Has anyone tried any of these services on Heroku and could share his experience please ?
EDIT: I found this page which give an idea about how much space does neo4j need.
I'll take a spin at answering this question by staying as much as objective as possible, as, I and some other frequent answerers here, have good relationships with both providers.
Both have their own pro's and con's, and I think looking only at the Heroku side is maybe not a good choice.
There is also one difference between both that you need to know, GraphStory provide Neo4j enterprise while GrapheneDB provide Neo4j Community, this is a fact. However I am personally thinking that if you run neo4j on heroku, then you don't need enterprise because "enterprise" users of Neo4j are using their own environment with clustering on servers with "real" RAM and SSD's, which in fact can be managed by both providers with a licence and support.
You speak about the storage limit. Well the storage depends about your amount of nodes, relationships and properties in the database, so if there is a limit of 1000 nodes you don't need to care about the storage limit I think.
I tried both on heroku, and except the nodes limit, there is not that much difference in matters of performance when you deploy free dynos.
If you are a startup, running Neo4j on heroku is great if you take the paid plan of course, both providers have cool support and both are rewarding their long term customers.
If you look only at the free dynos, then you don't need to care about the limitations, because it will just be LIMITED, in any way !
Outside of Heroku, here are some points I viewed :
GrapheneDB runs on all platforms including Azure which is a cool stuff
GraphStory runs enterprise so you can benefit from the high performance cache
GrapheneDB has an accessible API for creating neo4j servers on the fly and destroying it.
Depending of your location, you may want support from Europe or from US.
basic plans, on both, are suffering of some latency or boot time when not used for a long time
Both have support for spatial
Both are actors in the Neo4j community with cool stuff, you can meet them in real :)
Now, you can test them, both, for free !!!
I tried yesterday one CRUD application deployed in 2 Heroku app: the first with Graph Story and the other with GrapheneDB.
I had monitored with NewRelic and I detected Graph Story app have a medium latency variable from 1 to 2 seconds, instead the GrapheneDB service need only from 20 to 40 ms to perform the same operations.
Graph Story latency:
GrapheneDB latency:
I wanted to try a paid plan for some minutes in Graph Story, but for doing that you need to contact the assistance and waiting for an unknown time. Instead, GrapheneDB allow you to change plan autonomously without any issue.
I tried to export db in Graph Story, but the operation is not in real time: You need to wait for a link dispatched via email. I initiate the operation for 2 times, but the email after 10 hours not yet arrived.
Instead in GrapheneDB the export is immediate without waiting anxious emails
Graph Story offers the following features that differentiate it from other offerings:
Graph Story offers the Enterprise version of Neo4j
There are no limits on nodes or relationships on the free plan
Max query time is 30 seconds
You wouldn't want to use the free plan in production, of course, but it's excellent for proofs-of-concept, learning Neo4j, small hobby projects, etc.
(Full disclosure: I'm the CTO at Graph Story.)

Best scaling methodologies for a highly traffic web application?

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.
Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?
Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.
No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.
That said:
PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.
It's scaling the state/data storage that gets more complicated.
For your app, you have three basic jobs:
what ad do we show?
serving the add
logging the impression
Each of these will require thought and likely different tools.
The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.
Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.
The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.
All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.
You do realize that 40 billion per month is roughly 15,500 per second, right?
Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.
This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.
I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.
That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.
Good luck!
I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.
In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.
You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".
PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.
But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

Performance problems with external data dependencies

I have an application that talks to several internal and external sources using SOAP, REST services or just using database stored procedures. Obviously, performance and stability is a major issue that I am dealing with. Even when the endpoints are performing at their best, for large sets of data, I easily see calls that take 10s of seconds.
So, I am trying to improve the performance of my application by prefetching the data and storing locally - so that at least the read operations are fast.
While my application is the major consumer and producer of data, some of the data can change from outside my application too that I have no control over. If I using caching, I would never know when to invalidate the cache when such data changes from outside my application.
So I think my only option is to have a job scheduler running that consistently updates the database. I could prioritize the users based on how often they login and use the application.
I am talking about 50 thousand users, and at least 10 endpoints that are terribly slow and can sometimes take a minute for a single call. Would something like Quartz give me the scale I need? And how would I get around the schedular becoming a single point of failure?
I am just looking for something that doesn't require high maintenance, and speeds at least some of the lesser complicated subsystems - if not most. Any suggestions?
This does sound like you might need a data warehouse. You would update the data warehouse from the various sources, on whatever schedule was necessary. However, all the read-only transactions would come from the data warehouse, and would not require immediate calls to the various external sources.
This assumes you don't need realtime access to the most up to date data. Even if you needed data accurate to within the past hour from a particular source, that only means you would need to update from that source every hour.
You haven't said what platforms you're using. If you were using SQL Server 2005 or later, I would recommend SQL Server Integration Services (SSIS) for updating the data warehouse. It's made for just this sort of thing.
Of course, depending on your platform choices, there may be alternatives that are more appropriate.
Here are some resources on SSIS and data warehouses. I know you've stated you will not be using Microsoft products. I include these links as a point of reference: these are the products I was talking about above.
SSIS Overview
Typical Uses of Integration Services
SSIS Documentation Portal
Best Practices for Data Warehousing with SQL Server 2008

Resources