Twitter like Model using SQL Server/Azure or Graph DB - algorithm

Is it possible to design a twitter like DB using SQL server? a DB that will ensure high scalability and fast queries.
I am building a .NET platform that requires a similar model like twitter (User, Follower, Tweet) and looking into what will fit best in terms of fast queries and scalability.
Will it be possible using a relational DB or is a graph db much better?

SQL Server will most certainly be able to handle any load that you have. SQL Azure supports databases up to 150GB (though I hear you can get more if you ask). With Azure SQL Federation, you can scale out multiple databases on hundreds of nodes around the world.
As for a relational database like SQL Server, or the "NoSQL" variants like Azure Table Storage, it depends on your needs and how structured your data is. Given you'll probably do a lot of joins, querying for followers of users, tweets that someone should see, etc. you're best bet is to go with a relational db. Even Facebook still uses MySQL, so you're not exactly in bad company with using a relational db.

Related

Big Data Platform for a healthcare application

I have to develop a web application (for Healthcare sector) in python using a Big Data platform (NoSQL Database, like Elasticsearch for ex.)
I want to know, what is the best Big Data platform for this situation ?
Could someone help me ?
You need to do an evaluation of NoSQL databases. Like write down the features you need or how much you need them, examples how you want to access or fill the database, and then try out several noSQL databases how they perform or how you can work with them.
There is no "best" noSQL Database.

Azure Technology Choice for Project

There is a lot of information out there about the various Azure data storage flavors however I'd like to ask for some advice for my particular scenario.
I'm putting together a pet project to become more familiar with Azure technology, in particular, Service Bus/Event Hubs and data storage platforms. The system I want to create is fairly simple: accept a moderate load of events (not IoT scale), persist them, and make aggregated data available such as 'User A had N events of type X in the past day/week/month/etc.' as reports.
Given that the data will be quite structured (e.g. users, user groups, events, etc.), and I will need aggregation capabilities, it suggests that relational storage may be the best fit, although more expensive.
Another alternative I've considered is to maintain aggregated data at near real-time using something like stream analytics but not sure if this is overkill compared to a more data warehouse-esque solution.
Any suggestions/help would be greatly appreciated.
John
John,
Azure SQL would be a decent choice, or if that proves to be too expensive, regular SQL hosted on a VM. You can create an Azure Service Bus to hold the incoming requests, and then create competing consumers on 1 or more worker roles to monitor and process the messages. Each consumer can run the SQL and persist the data in a new table that is created and "pre-aggregated" for the caller, or you could persist the information to Azure BLOB storage in a structured format that matches your reporting tool (i.e. JSON). BLOB storage of the aggregated information will be the most cost effective, and relieve strain on SQL.
An alternative would be HDInsight which can aggregate the information in batch processing mode as well. I guess the choice between SQL/HDInsight depends on the native format of the base (non-aggregated) information.
I agree with Daniel. SQL Azure may be the way to go for your relational data needs. Another option to investigate for larger workloads for streaming and analytics is Azure Data Lake (https://azure.microsoft.com/en-us/solutions/data-lake/)

Cassandra as Cache Front-end to RDBMS

We are using Oracle RDBMS in our system. To reduce database load we plan to use a caching layer.
I am looking to see if we can use Apache Cassandra as a Caching Storage frontend to Oracle db.
From what I have looked so far Cassandra is more like a database with built-in caching features. So, using it as a caching layer to Oracle would be more like using another database. I feel it would be better to Cassandra itself as an alternative to Oracle and other RDBMS rather than using it along with Oracle.
Has anyone used Cassandra as a caching layer to RDBMS. But, I have not found any resources or examples for using it. If so can you help me on this.
I'm not sure what you mean by a caching storage frontend.
Cassandra might be useful if you are expecting a large volume of writes that arrive at a rate faster than Oracle could handle. Cassandra can handle a high volume of writes since it can scale by adding more nodes.
You could then do some kind of data analysis and reduction on the data in Cassandra before inserting the crunched data into Oracle. You might then use Oracle for the tasks that suit it better such as financial reporting, ad hoc queries, etc.

Reason of why OLAP in HBase is possible

OLAP directly upon most of the noSQL databases is not possible, but from what I researched it's actually possible in HBase, so I was wondering what features does HBase have in particular that distinguishes it from the others allowing us to do this.
You will have to write lots of data processing logic in your application layer to accomplish this. Hbase is a Data store not a DBMS. So yes as long as the data goes in, you can get it out and process it in your application layer however you want.
If this proves inconvenient for you and a nosql platform that supports SQL for OLAP is desirable, you could try Amisa Server

Simulated OLAP

We have a client that has Oracle Standard, and a project that would be ten times easier addressed using OLAP. However, Oracle only supports OLAP in the Enterprise version.
Migration to enterprise is not possible
I'm thinking of doing some manual simulation of OLAP, creating relational tables to simulate the technology.
Do you know of some other way I could do this? Maybe an open-source tool for OLAP? Any ideas?
You can simulate OLAP functionality using client side tools pointed at a relational database.
Personally I think the best tool for the job is probably Tableau Desktop. This is an amazingly sophisticated front end analytics tool that will make your relational data look multidimensional without much effort, and the tool itself is really mind blowing. They have a free trial so you can take it for a spin. We use Tableau heavily for our own analysis and have been very impressed. Of course, this tool also works with multidimensional databases as well, so if you end up with some cubes at the end of the day you can continue to use the Tableau front end.
As for open source, you could try out Palo - an open source MOLAP server and Excel front end.
If you are interesting in building your own reporting front end and use .NET there are a number of components (such as the DevExpress PivotGrid or the several tools from RadarSoft) that will do the same thing, but will require some elbow grease to get wired together.
I find that it's the schema that causes most of the issues people have with querying a database. OLAP forces you to either a flat table or a Star/snowflake schema which is easy to query and comparably faster to the source oltp tables. So if you ETL your source to a flat table or star schema you should get 80% of what you get from OLAP, the 20% being MDX and analytic functions and performance.
Note that you should get a perf boost with a star schema in relational database as well and Oracle probably has analytic functions in PL/SQL anyways.
Try an open-source OLAP server called 'Mondrian'. IIRC the XMLA API on this is sufficiently compatible with AS to fool Pivot Table Services, which would allow you to use it with ProClarity or Excel.
IIRC it was originally designed to work over Oracle - it is a HOLAP architecture using base tables in the underlying relational store and caching aggregates. You can also make use of materialised views and query rewrite in the underlying Oracle database to do aggregates.
A few more thoughts on this topic:
Actually, Oracle Standard does have an OLAP facility based on a descendent of Express embedded in the database engine and storing its internal data structures in BLOBs in the main tablespaces. Using this is technically possible but not necessarily advisable for the following reasons:
It uses a highly non-standard OLAP query engine with very little third party tool support (AFAIK ArcPlan is the only third-party OLAP front-end supporting 10g+ OLAP), poor documentation for the query language and almost no third party literature describing it. This will work with B.I. Beans if you feel like writing a JSP front-end. It is not compatible with MDX at all. As of early 2006 the best Oracle could do when asked about drillthrough (this functionality was not supported in Discoverer 'Drake') was to recommend building a JSP apllication using B.I. Beans.
The reason that there is no migration path from Standard to Enterprise is that Enterprise is actually what used to be Siebel Analytics. Standard is the old Oracle OLAP/Express descendant which Oracle partners recommended avoiding even before Oracle bought out Seibel. Oracle has not even attempted to support migrating.
From this point of view, Mondrian is actually the most cost-effective OLAP solution for an Oracle Standard Edition shop. You can get a supported version from an outfit called Pentaho1. The next cheapest is Analysis Services, which comes with SQL Server. Following that you are into the likes of Hyperion Essbase, which will be an order of magnitude more expensive than SQL Server or any supported verion of Mondrian.
Whilst MS SQL Server offers OLAP, you'll need an Enterprise licence to use a cube in a live environment that is web-facing.
You might want as well to give a try to www.icCube.com - we're quite flexible on the data-source used to populate the cube and are quite cost effective compared to the big actors of the market.

Resources