SAP Business Object Performance Issue with Impala - hadoop

We are switching from Oracle to Hadoop due to slow performance with Oracle DB, built a universe with Cloudera Simba ODBC connections scheduled a report expecting a faster performance compare to Oracle DB but the report took more than 2 hours, took the same query and ran in HUE SQL editor the result got back in less than 2 mins
We tested in DEV, TEST, & PROD, & also tried switching to JDBC connection no difference, we feel its the network's latency issue and opened a Case with SAP
Point to note here that our Hadoop servers and BO servers are in two different locations NCAL and SCAL, we have 3.5 million records to pull
I am looking for some tested advice here on this issue if anyone has already faced such issue

Related

Data Connectivity and Performance

I have a Tableau Data Extract that refreshes on a schedule. Our Tableau Production Server is on-promise, and when I run this extract on the Tableau Production Server, it takes forever to finish. There is another server VM (let's call VM 'X') which is at the same location as the Tableau Production Server. When I run the extract from this machine, it finishes in 10 minutes. Our data lake is in Oracle Exadata.
What I have tried so far:
I ran the Trace Route which didn't help much.
I thought it might be a traffic issue as Tableau Production Server is usually pretty busy. So, I went to our Tableau Dev Server VM which at the same location, and ran the extract there. It takes same time as production server.
Any ideas on what else I can try before reaching out to our networking team?

Oracle Golden Gate with Cassandra

I am pretty new to Oracle Golden Gate, wanted to understand if it possible to create a bidirectional sync between Oracle 12x and Cassandra(DSE) using Oracle Golden Gate? Searched several places in internet but most examples are replicating data between Oracle databases. I started wondering if it is even possible to do so. Can anyone help me with any documentation?
There is a separate module called Oracle GoldenGate for BigData. It supports many NoSQL replication targets.
One of the supported BigData databases is also Apache Cassandra.
There is a separate manual explaining how to use it.
There is no separate module that allows you to connect Apache Cassandra as the source of your replication. If you need such replication you need to provide some intermediate step. The source of replication for Oracle GoldenGate can only be a database (Oracle, TimesTen, DB2, Informix, MySQL, MS SQL Server, NonStop SQL/MX, SAP/Sybase ASE, Teradata) or a JMS queue.

Impala streaming over JDBC is really slow

I have run several large queries using impala-shell and found the performance to be satisfactory. These queries typically write 100k-1m rows to disk. However, when I run the very same queries programmatically using JDBC, the results take much, much longer to write to disk. For example, a query which takes five minutes from impala-shell takes up to thirty minutes over JDBC.
I have tried both the Hive and Cloudera JDBC drivers but get similarly bad performance. I have tried various fetch sizes but it has not made any difference. Is Impala streaming over JDBC fundamentally slow or could I do something else to speed up the streaming?
This is on CDH 5.9.1.
This turned out to be a client-side issue. I was using curl to test a web application which was making the Impala queries. Switching from curl to a client written in Scala code removed the latency.

Oracle to Cassandra real-time replication

We have an Oracle Database that resides tables. We would like to implement a new project as I mentioned in title; Oracle to Cassandra real-time replication.
But this new Cassandra environment would be as a reporting service. From the application (in-house), datas is inserted to Oracle production environment. Then our custom service (or what ever) will read delta and insert to Cassandra (this would be like Goldengate may be).
Briefly, does the Cassandra will answer our needs for this scenario?
In our case, we have 20 oracle DBs in different locations (these 20 dbs has similar implementation) 1 central report DB that is daily refresh from these 20 DBs. We use "outdated" snapshot technology, every night our central single report DB (REPORTDB) with fast refresh option, we gather the daily delta from these 20 dbs within oracle ss. we need a structure that reads data from 20 dbs and real-time injection to new cassandra database just like REPORDB
These days you can run spark jobs on Cassandra, thanks to Datastax so yes it can be used as a reporting tool. It's best utilized as a key value store if your number of writes are high compared to your reads.
Reading delta is not real time so you should try using Oracle's AQs. I've been doing real time replication of Oracle to Cassandra using Oracle's AQ and Apache Storm for almost 4 years now and it's running flawlessly.
I don't understand this Oracle/Cassandra architecture running alongside.
Either Oracle suits your needs then you should stick with it. Or it doesn't and you need scalability/high availability then switch to Cassandra.
Can you elaborate on the reasons that make you choose Cassandra for the reporting service ?

Poor performance for a clean Apache Cassandra server installation

Just downloaded latest version (2.1.7) of Apache Cassandra from official site.
Then I started the server without any changes on localhost and created table via GettingStarted Guide
I noticed, that all queries to the Cassandra server are very slow.
For example, this trivial query takes about 250ms:
SELECT * FROM users where user_id=1745;
Is it normal performance? I see much better performance for other database systems on the same machine.
May be I should tweak something?
I have:
Intel Core i5 CPU 2.27GHz
8GB RAM
Windows 8.1
Edit1:
Well.. I see something strange.
The trace log looks pretty nice (6ms):
But when I execute this query in DataStax DevStudio, it shows 476ms:
It cannot be network latency, because I use server on localhost.

Resources