Indexed Storage in QLDB - indexed

What is the type of database used in indexed storage as QLDB supports storing structured, semi-structured, nested data and it doesn't enforce schema but it supports PartiQL(SQL compatible access)?

QLDB is built on custom Amazon AWS technology, supports indices, and is designed to handle high OLTP transaction workloads while supporting PartiQL queries as you point out.

Related

what does ElasticSearch unlike Solr designed from the ground up to be a distributed index mean?

In a talk, I heard that ElasticSearch
Unlike Solr, was designed from the ground up to be a distributed index
I was wondering what that means by ElasticSearch designed from the ground up to be a distributed index?
What is Solr designed to be? How is the answer different from distributed index?
The first versions of Solr did not support clustering - it didn't even support more than one core inside each instance of Solr. Multicore support was introduced later, then SolrCloud (the clustering support) and collections was introduced with Solr 4.
You did have manual clustering support (i.e. what's known as sharding) and replication support (first through external programs such as rsync, then built-in through http replication) before SolrCloud was introduced, but SolrCloud was the first version that supported it without explicit handling from your own code.

Big data implementation on cloud

Could someone please let me know what does it mean by 'Big Data implementation over Cloud'
I have been using Amazon S3 to store data and query using hive, which I read is one of the cloud implementation. I would like to know what exactly does this mean and all possible ways to implement it.
Thanks,
Sree
Following are choices in the levels of services that a Cloud provider can offer for a Big Data analytics solution:
Data platform infrastructure service, such as Hadoop as a Service, that provides pre-installed and managed infrastructures. With this level of service, you are responsible for loading, governing, and managing the data and analytics for the analytics solution.
Data management service, such as a Data Lake Service, that provides data management, catalog services, analytics development, security, and information governance services on top of one or more data platforms. With this level of service, you are responsible for defining the policies for how data is managed and for connecting data sources to the cloud solution. The data owners have direct control of how their data is loaded, secured, and used. Consumers of data are able to use the catalog to locate the data they want, request access, and make use of the data through self-service interfaces.
Insight and Data Service, such as a Customer Analytics Service, that gives you the responsibility for connecting data sources to the cloud solution. The cloud solution then provides APIs to access combinations of your data and additional data sources, both proprietary to the solution and public open data, along with analytical insight generated from this data.
For more information regarding this, read the detailed article published by IBM here: http://www.ibm.com/developerworks/cloud/library/cl-ibm-leads-building-big-data-analytics-solutions-cloud-trs/index.html
Also take a look at the services provided by Qubole, which greatly simplifies, speeds and scales big data analytics workloads against data stored on AWS, Google, or Azure clouds - https://www.qubole.com/features.
Storing and processing big volumes of data
requires scalability plus availability.
Cloud computing delivers all these through hardware
virtualization. For the same reason, it is only logical that big data and cloud computing are
two compatible concepts as cloud enables big data to
be available, scalable and fault tolerant.
Not only that, the implementation does not stop there - many companies are now offering Big Data as A Service (BDaaS), such as Stratoscale, Cloudera and of course Azure and others.

Azure Technology Choice for Project

There is a lot of information out there about the various Azure data storage flavors however I'd like to ask for some advice for my particular scenario.
I'm putting together a pet project to become more familiar with Azure technology, in particular, Service Bus/Event Hubs and data storage platforms. The system I want to create is fairly simple: accept a moderate load of events (not IoT scale), persist them, and make aggregated data available such as 'User A had N events of type X in the past day/week/month/etc.' as reports.
Given that the data will be quite structured (e.g. users, user groups, events, etc.), and I will need aggregation capabilities, it suggests that relational storage may be the best fit, although more expensive.
Another alternative I've considered is to maintain aggregated data at near real-time using something like stream analytics but not sure if this is overkill compared to a more data warehouse-esque solution.
Any suggestions/help would be greatly appreciated.
John
John,
Azure SQL would be a decent choice, or if that proves to be too expensive, regular SQL hosted on a VM. You can create an Azure Service Bus to hold the incoming requests, and then create competing consumers on 1 or more worker roles to monitor and process the messages. Each consumer can run the SQL and persist the data in a new table that is created and "pre-aggregated" for the caller, or you could persist the information to Azure BLOB storage in a structured format that matches your reporting tool (i.e. JSON). BLOB storage of the aggregated information will be the most cost effective, and relieve strain on SQL.
An alternative would be HDInsight which can aggregate the information in batch processing mode as well. I guess the choice between SQL/HDInsight depends on the native format of the base (non-aggregated) information.
I agree with Daniel. SQL Azure may be the way to go for your relational data needs. Another option to investigate for larger workloads for streaming and analytics is Azure Data Lake (https://azure.microsoft.com/en-us/solutions/data-lake/)

Cassandra as Cache Front-end to RDBMS

We are using Oracle RDBMS in our system. To reduce database load we plan to use a caching layer.
I am looking to see if we can use Apache Cassandra as a Caching Storage frontend to Oracle db.
From what I have looked so far Cassandra is more like a database with built-in caching features. So, using it as a caching layer to Oracle would be more like using another database. I feel it would be better to Cassandra itself as an alternative to Oracle and other RDBMS rather than using it along with Oracle.
Has anyone used Cassandra as a caching layer to RDBMS. But, I have not found any resources or examples for using it. If so can you help me on this.
I'm not sure what you mean by a caching storage frontend.
Cassandra might be useful if you are expecting a large volume of writes that arrive at a rate faster than Oracle could handle. Cassandra can handle a high volume of writes since it can scale by adding more nodes.
You could then do some kind of data analysis and reduction on the data in Cassandra before inserting the crunched data into Oracle. You might then use Oracle for the tasks that suit it better such as financial reporting, ad hoc queries, etc.

Is using ElasticSearch and Azure Search as regular data stores combined with search appropriate?

We are still deciding on ElasticSearch on an Azure VM or Azure Search service to act as our search repository. However, for user accounts, etc., is there any need to create a separate db (in SQL Azure or even another noSQL db)?
No, there is no need to create a separate db account in order to us Azure Search (or ElasticSearch on Azure VM). Azure Search is a REST API based service where you push your data to be "indexed" at which point it becomes searchable, also through this REST API. The only time you might need a SQL account that I can think of is to use our new Indexer that will automatically ingest data (and data changes) into Azure Search from your Azure SQL or SQL Server on Azure VM database.
I think what you are asking is whether you can use Elasticsearch/Azure Search as your primary store for everything in an app, not just searchable data.
You can certainly do it. There are a few aspects you need to keep in mind (I'm sure there are others besides this):
Durability: when search indexes are just an index sometimes it's fine to run with no replicas or just 1 replica. If you want strong durability you probably want at least 3 total copies of the index to ensure availability and resilience to index corruption and things like that.
Consistency. Elasticsearch has a weak consistency model which also surfaces in Azure Search. You need to write your application taking into account this fact, which can make some scenarios tricky. Other stores such as SQL and DocumentDB offer the option for strict consistency which is easier to work with for a primary store.

Resources