Old Style Oracle Joins in Kafka - oracle

Please can someone help with a Kafka question? For Kafka querying an Oracle database is there a way to use the old (+) style joins in the connectors rather than the newer 'right, left join' style please? We have a mass of old queries we would like to re-use. Something in json maybe like an XML CData that avoids parsing and throws the query straight at the database. Thanks in Advance, Dave

Related

Oracle to Apache Cassandra data migration

I am working on an Apache Cassandra data migration. I have couple of tables which I need to move to the Cassandra column family with data - what is the best way to do this?
I have seen Apache Sqoop, will it help me? If yes, then what are the steps?
There is no silver bullet to migrate data from Oracle (or any RDBMS) to Cassandra. The way your data is modeled in Cassandra is fundamentally different from a relation database schema. Tools might help you to some degree, but you'll first have to create a new data model that will match the way you're going to read and write data into Cassandra. This article gives you a good start with Cassandra data modeling: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

Suggestion technology/design on this BIGDATA usecase

I am new to big data technologies and design so looking for help from java world.
I have concept of tags and tagcombinations.
For example U.S.A and Pen are two tags AND if they come together in some definition then register a tagcombination(U.S.A-Pen) for that..
tags (U.S.A, Pen, Pencil, India, Shampoo)
tagcombinations(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen, India-Pen-Shampoo)
millions of tags
billions of tagcombinations
one tagcombination generally have 2-8 tags....
Every day we get lakhs of new tagcombinations to write
daily crores of queries to find matching combination by set of tags
Query need to support :
one tag or set of tags appears in how many tagcombinationids ????
If i query for Pen,India then it should return two tagcombinaions (India-Pen, India-Pen-Shampoo))..Query will be fired by application in realtime.
Please suggest a solution which is distributed with java client and can
handle scale of data i am looking for..
Already tried on cassandra but not able toconclude that as right match for my problem..
Thanks
Naresh
I suggest you look into Apache Lucene project:
http://lucene.apache.org/
You won't be able to use Cassandra directly for this but if you store your data inside Cassandra, you can use Solr to add extra indexes on top of your data. DataStax has a bundle solution called DataStax Enterprise that has Cassandra/Solr together:
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise

Cassandra/Hadoop WITH COMPACT STORAGE option. Why is it needed, is it possible to add it to existing tables/cf

I'm working on a Hadoop / Cassandra integration I have a couple of questions I was hoping someone could help me with.
First, I seem to require the source table/cf to have been created with the option WITH COMPACT STORAGE otherwise I get an error can't read keyspace in map/reduce code.
I was wondering if this is just how it needs to be?
And if this is the case, my second question was, is it possible/how do I add the WITH COMPACT STORAGE option on to a pre-exsting table? .. or am I going to have to re-create them and move data around.
I am using Cassandra 1.2.6
thanks in advance
Gerry
I'm assuming you are using job.setInputFormatClass(ColumnFamilyInputFormat.class);
Instead, try using job.setInputFormatClass(CqlPagingInputFormat.class);
The Mapper input for this is Map<String, ByteBuffer>, Map<String,ByteBuffer>
Similarly, if you need to write out to Cassandra us CqlPagingOutputFormat and the appropriate output types.
See http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive for more info.
#Gerry
The "WITH COMPACT STORAGE" thing is CQL3 syntax to create tables structure compatible with Thrift clients and legacy column families.
Essentially, when using this option, the table, or should I say column family, is created without using any Composite.
You should know that CQL3 tables heavily rely on composites to work.
Now to answer your questions:
I was wondering if this is just how it needs to be?
Probably because your map/reduce code cannot deal with Composites. But I believe in version 1.2.6 of Cassandra, you have all the necessary code to deal with CQL3 tables. Look at classes in package org.apache.cassandra.hadoop.
is it possible/how do I add the WITH COMPACT STORAGE option on to a pre-exsting table?
No, it's not possible to modify/change table structure once created. You'll need some kind of migrations.

Oracle SCHEMA Based XML DB

I am using schema based XML DB Column in one of oracle Table. I am using 30 nodes in XSD(schema) and the table performance is good.
I am thinking to increase the limit of nodes to 300 or more say 1000(Internally oracle treat each schema node as a column so we can have only 1000 nodes in XSD). But am not sure what impact will be on the performance of of such big table.
If any one has anyone has experience or any reference please guide. I am trying understand how oracle XML DB work with Schema registration.
Regards
Hmm..Performance is always subjective, and based on more than the query / data.
This might get you started:
http://www.orafaq.com/node/508
If you need references, oracles homepage is a really good start.
Have a look around, there is lots of information (too much sometimes actually) :-)
oracle xml schema basics

Hbase / Hadoop Query Help

I'm working on a project with a friend that will utilize Hbase to store it's data. Are there any good query examples? I seem to be writing a ton of Java code to iterate through lists of RowResult's when, in SQL land, I could write a simple query. Am I missing something? Or is Hbase missing something?
I think you, like many of us, are making the mistake of treating bigtable and HBase like just another RDBMS when it's actually a column-oriented storage model meant for efficiently storing and retrieving large sets of sparse data. This means storing, ideally, many-to-one relationships within a single row, for example. Your queries should return very few rows but contain (potentially) many datapoints.
Perhaps if you told us more about what you were trying to store, we could help you design your schema to match the bigtable/HBase way of doing things.
For a good rundown of what HBase does differently than a "traditional" RDBMS, check out this awesome article: Matching Impedance: When to use HBase by Bryan Duxbury.
If you want to access HBase using a query language and a JDBC driver it is possible. Paul Ambrose has released a library called HBQL at hbql.com that will help you do this. I've used it for a couple of projects and it works well. You obviously won't have access to full SQL, but it does make it a little easier to use.
I looked at Hadoop and Hbase and as Sean said, I soon realised it didn't give me what I actually wanted, which was a clustered JDBC compliant database.
I think you could be better off using something like C-JDBC or HA-JDBC which seem more like what I was was after. (Personally, I haven't got farther with either of these other than reading the documentation so I can't tell which of them is any good, if any.)
I'd recommend taking a look at Apache Hive project, which is similar to HBase (in the sense that it's a distributed database) which implements a SQL-esque language.
Thanks for the reply Sean, and sorry for my late response. I often make the mistake of treating HBase like a RDBMS. So often in fact that I've had to re-write code because of it! It's such a hard thing to unlearn.
Right now we have only 4 tables. Which, in this case, is very few considering my background. I was just hoping to use some RDBMS functionality while mostly sticking to the column-oriented storage model.
Glad to hear you guys are using HBase! I'm not an expert by any stretch of the imagination, but here are a couple of things that might help.
HBase is based on / inspired by BigTable, which happens to be exposed by AppEngine as their db api, so browsing their docs should help a great deal if you're working on a webapp.
If you're not working on a webapp, the kind of iterating you're describing is usually handled with via map/reduce (don't emit the values you don't want). Skipping over values using iterators virtually guarantees your application will have bottlenecks with HBase-sized data sets. If you find you're still thinking in SQL, check out cloudera's pig tutorial and hive tutorial.
Basically the whole HBase/SQL mental difference (for non-webapps) boils down to "Send the computation to the data, don't send the data to the computation" -- if you keep that in mind while you're coding you'll do fine :-)
Regards,
David

Resources