Oracle to Apache Cassandra data migration - cassandra-2.0

I am working on an Apache Cassandra data migration. I have couple of tables which I need to move to the Cassandra column family with data - what is the best way to do this?
I have seen Apache Sqoop, will it help me? If yes, then what are the steps?

There is no silver bullet to migrate data from Oracle (or any RDBMS) to Cassandra. The way your data is modeled in Cassandra is fundamentally different from a relation database schema. Tools might help you to some degree, but you'll first have to create a new data model that will match the way you're going to read and write data into Cassandra. This article gives you a good start with Cassandra data modeling: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

Related

Import Neo4J data into Oracle relational database

I would like to keep the Neo4J database as the master database, and I would like to keep oracle relational database tables sincronized with Neo4J dat as a read-only Materialized View .
I only find links and articles explaining how to import relational data into Neo4j and not the other way around.
Conceptually I am looking for a kind of Materizalized View in Oracle using a Cypher Query as source. Maybe I could make a custom Merge program, mapping Oracle Tables to Cypher queries. Ideally I would like to run this program in Oracle (PLSQL).
Thanks in advance,

Is aggregating outside of Hive a better choice?

I have more of a conceptual question. I'm using Hive to pull data and then I want to insert all the retrieved values into IBM BigSQL (basically DB2) so that aggregating data would be easier/faster. So I want to create a view in Hive that I will use nightly perform CTAS so that I can take the table and migrate it to db2 and do the rest of the aggregation.
Is there a better practice?
I wanted to do everything including aggregation in Hive but it is extremely slow.
Thanks for your suggestions!
Considering that you are using Cloudera, is there a reason why you don't perform the aggregations in Impala? convert the json data to parquet (I would recommend this if there is not a lot of nested structure) shouldn't be really expensive. Another alternative depending the kind of aggregations that you are doing is use Spark to convert the data (also will depend a lot of your cluster size). I would like to give you more specific hints but without know what aggregations you are doing is be complicated

Cassandra/Hadoop WITH COMPACT STORAGE option. Why is it needed, is it possible to add it to existing tables/cf

I'm working on a Hadoop / Cassandra integration I have a couple of questions I was hoping someone could help me with.
First, I seem to require the source table/cf to have been created with the option WITH COMPACT STORAGE otherwise I get an error can't read keyspace in map/reduce code.
I was wondering if this is just how it needs to be?
And if this is the case, my second question was, is it possible/how do I add the WITH COMPACT STORAGE option on to a pre-exsting table? .. or am I going to have to re-create them and move data around.
I am using Cassandra 1.2.6
thanks in advance
Gerry
I'm assuming you are using job.setInputFormatClass(ColumnFamilyInputFormat.class);
Instead, try using job.setInputFormatClass(CqlPagingInputFormat.class);
The Mapper input for this is Map<String, ByteBuffer>, Map<String,ByteBuffer>
Similarly, if you need to write out to Cassandra us CqlPagingOutputFormat and the appropriate output types.
See http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive for more info.
#Gerry
The "WITH COMPACT STORAGE" thing is CQL3 syntax to create tables structure compatible with Thrift clients and legacy column families.
Essentially, when using this option, the table, or should I say column family, is created without using any Composite.
You should know that CQL3 tables heavily rely on composites to work.
Now to answer your questions:
I was wondering if this is just how it needs to be?
Probably because your map/reduce code cannot deal with Composites. But I believe in version 1.2.6 of Cassandra, you have all the necessary code to deal with CQL3 tables. Look at classes in package org.apache.cassandra.hadoop.
is it possible/how do I add the WITH COMPACT STORAGE option on to a pre-exsting table?
No, it's not possible to modify/change table structure once created. You'll need some kind of migrations.

Oracle SCHEMA Based XML DB

I am using schema based XML DB Column in one of oracle Table. I am using 30 nodes in XSD(schema) and the table performance is good.
I am thinking to increase the limit of nodes to 300 or more say 1000(Internally oracle treat each schema node as a column so we can have only 1000 nodes in XSD). But am not sure what impact will be on the performance of of such big table.
If any one has anyone has experience or any reference please guide. I am trying understand how oracle XML DB work with Schema registration.
Regards
Hmm..Performance is always subjective, and based on more than the query / data.
This might get you started:
http://www.orafaq.com/node/508
If you need references, oracles homepage is a really good start.
Have a look around, there is lots of information (too much sometimes actually) :-)
oracle xml schema basics

Move data from Oracle to Cassandra and/or MongoDB

At work we are thinking to move from Oracle to a NoSQL database, so I have to make some test on Cassandra and MongoDB. I have to move a lot of tables to the NoSQL database the idea is to have the data synchronized between this two platforms.
So I create a simple procedure that make selects into the Oracle DB and insert into mongo. Some of my colleagues point that maybe there is an easier(and more professional) way to do it.
Anybody had this problem before? how do you solve it?
If your goal is to copy your existing structure from Oracle to a NoSQL database then you should probably reconsider your move in the first place. By doing that you are losing any of the benefits one sees from going to a non-relational data store.
A good first step would be to take a long look at your existing structure and determine how it can be modified to affect positive impact on your application. Additionally, consider a hybrid system at the same time. Cassandra is great for a lot of things, but if you need a relational system and already are using a lot of Oracle functionality, it likely makes sense for most of your database to stay in Oracle, while moving the pieces that require frequent writes and would benefit from a different structure to Mongo or Cassandra.
Once you've made the decisions about your structure, I would suggest writing scripts/programs/add a module to your existing app, to write the data in the new format to the new data store. That will give you the most fine-grained control over every step in the process, which in a large system-wide architectural change, I would want to have.
You can also consider using components of Hadoop ecosystem to perform this kind of (ETL) task .For that you need to model your Cassandra DB as per the requirements.
Steps could be to migrate your oracle table data to HDFS (using SQOOP preferably) and then writing Map-Reduce job to transform this data and insert into Cassandra Data Model .

Resources