How to check connection of cassandra with pentaho data integrator

How to check connection of cassandra with pentaho data integrator - oracle

I'm trying to load data from Oracle table to Cassandra table by using Pentaho Data Integration 5.1(Community Edition). But I'm not getting whether connection has been established between oracle and cassandra. I'm using Cassandra 2.2.3 and Oracle 11gR2.
I've added following jars in lib folder of data-integration
--cassandra-thrift-1.0.0
--apache-cassandra-cql-1.0.0
--libthrift-0.6.jar
--guava-r08.jar
--cassandra_driver.jar
Please anyone can help me to figure out how to check whether connection has been established in Pentaho.

There are some ways to debug if a connection is established to a database, I don't know if all of them are valid for cassandra, but I'll add a especial one for that.
1) The test button
By simply clicking the test button on the connection edit screen.
2) Logs with high details may help
Another way to test is running you transformation with a high detail log:
sh pan.sh -file=my_cassandra_transformation.ktr -level=Rowlevel
3) The input preview
For cassandra, in especific, I would try just to create a simple read operation using Cassandra Input step and clicking in the 'preview' button.
4) The controlled output test
Or maybe you can try with a simplier transformation first, to make sure it's running fine. Eg.

Related

I'm currently trying to migrate a large table from cassandra to oraclesql and can't find many solutions

I've been researching and looking for ideas but the only thing close to a solution i've found has been where someone used pyspark to convert an oracle table into hdfs and then from hdfs into cassandra but I was hoping there was another/a clear solution to this data migration.

Title suggests that it is Cassandra > Oracle. Message text says Oracle > HDFS > Cassandra (i.e. the opposite direction). What exactly are you trying to do?
Suppose it is the title that is correct. If there's no tool which would do the migration for you, from my - developer's - point of view, creating a database link in my Oracle schema which points to Cassandra might be a good option. Then I'd just write some SQL code to migrate data I need. Here's how: Access Cassandra Data as a Remote Oracle Database.
Shortly:
connect to Cassandra as an ODBC data source
set connection properties for compatibility with Oracle
configure the ODBC gateway, Oracle Net and Oracle database
write queries

Take Data From Oracle to Cassandra in every day

We want to take tables from Oracle to Cassandra every day. Because tables is updated in Oracle everyday. So when i searched this , i find these options:
Extract oracle tables as a file , then write Cassandra
Using sqoop to get tables from oracle, write Map Reduce job and insert into Cassandra ?
I am not sure which way is the appropriate ? Also is there another options ?
Thank you.

Option 1
Extracting oracle tables as a file and then writing to Cassandra manually everyday can be tiresome process unless if you are scheduling a cron job. I have tried this before, but if the process fails then logging it might be an issue. If you are using this process and exporting to CSV and trying to write to cassandra then I would suggest using cassandra bulk loader (https://github.com/brianmhess/cassandra-loader)
Option 2
I haven't worked with this, so can't speak about this.
Option 3 (I use this)
I use an open source tool, Pentaho Data Integration (Spoon) (https://community.hitachivantara.com/docs/DOC-1009855-data-integration-kettle) to solve this problem. It's fairly a simple process
spoon. You can automate this process by using a carte server (spoon server) which has logging capabilities as well as automatic restarting if the process failed in between.
Let me know if you found any other solution that worked for you.

How to replicate existing OracleRDB ODBC connection in Oracle's SQL Developer application?

I am new to Oracle database in general, but I'm attempting to get Oracle's SQL Developer running on a workstation that has pre-configured System DSNs created for an OracleRDB database. I've confirmed the ODBC connections are working because I can use MS Access to connect and link to the tables. The "test" options within ODBC also succeed. Now I am trying to get a similar connection created using SQL Developer so I can see the column types and write queries in a more useful editor.
Here's what I have available when examining the ODBC connection properties:
Now I'm trying to create a duplicate connection in SQL Developer, but I'm at a loss for why things don't work. I first tried using the default SQL Developer installation, but couldn't get things working. Then I discovered there's an OracleRDB extension available, so I installed that, but I keep getting this error when attempting to use similar values:
As I stated, these ODBC connections were pre-configured on the workstation I'm using, so I don't know anything more than what is provided by the Oracle ODBC driver window.
Is there something obvious I'm not seeing or doing to replicate this connection in SQL Developer? Or perhaps something else I can do to debug this to learn more?
UPDATE
On the advice of one answer I'm trying to make the connection with JDBC, but having a hard time understanding what I'm doing wrong. Here's another screenshot with the connection parameters I have available, but with the server and database names changed:
With these values (the port came from my tnsnames.ora file), if I try to make a JDBC connection I keep getting the following error from SQL Developer:
One final attempt I did was to use the proper values in the Oracle RDB tab, and when I use them and click 'test' the Testing Connection dialog just spins and never seems to return:
So I apologize for the long post here, but I'm struggling because there's just something I am really not understanding about how this all works. I appreciate everyone who took the time to read this question.

Oracle SQL Developer is a Java Application. You'll need to get the JDBC Driver for RDB.
Once you have that, in the SQL Developer preferences, find the Third Party JDBC section, and then use that to add an entry and point to the JAR for what you just installed.
Step by step instructions here.

Working connection string for RDB Thin Driver:
RDB_DB_CONN_STR = "jdbc:rdbThin://node.myplace.com:1707/";
where node.myplace.com is the name of the OpenVMS node hosting the RDB Thin Driver, 1707 is the port number assigned to the RDB Thin Driver.

LibreOffice Base JDBC connection to Hive returns “Method not supported” when executing valid select statement

I'm trying to get LibreOffice's Base v5.1.4.2, running on Ubuntu v16.04 to connect to a Hive v1.2.1 database via JDBC. I added the following jars, downloaded from Maven Central, to LibreOffice's classpath ('Tools -> LibreOffice -> Advanced -> Class Path'):
hive-common-1.2.1.jar
hive-jdbc-1.2.1.jar
hive-metastore-1.2.1.jar
hive-service-1.2.1.jar
hadoop-common-2.6.2.jar
httpclient-4.4.jar
httpcore-4.4.jar
libthrift-0.9.2.jar
commons-logging-1.1.3.jar
slf4j-api-1.7.5.jar
I then restarted LibreOffice, opened Base, selected 'Connect to an existing database' -> 'JDBC' and set the following properties:
I entered the credentials and clicked the 'Test Connection' button, which returned a "the connection was established successfully" message. Great!
In the LibreOffice Base UI, the options under the 'Tables' panel were grayed out. The options in the queries tab were not, so I tried to connect to Hive.
The 'Use Wizard to Create Query' option prompts for a password and then returns "The field names from 'airline.on_time_performance' could not be retrieved."
The JDBC connection is able to connect to Hive and list the tables, though it seems to have problems retrieving the columns. When I try to execute a simple select statement, the 'Create Query in SQL View' option returns a somewhat cryptic "Method not supported" message:
The error message is a bit vague. I suspect that I may be missing a dependency since I am able to connect to Hive from Java using JDBC.
I'm curious to know if anyone in the community has LibreOffice Base working with Hive. If so, what am I missing?

The Apache JDBC driver reports "Method not supported" for most features, just because the Apache committers did not bother to handle the list of simple yes/no API calls. Duh.
If you want to see by yourself, just download DBVisualizer Free, configure the Apache Hive driver, open a connection, and check the Database Info tab.
Now, DBVis is quite permissive with lame drivers, but it seems that LibreOffice is not.
You can try the Cloudera Hive JDBC driver as an alternative. You just have to "register" -- i.e. leave your e-mail address -- to access the download URL; it's simpler to deploy than the Apache thing (based on the Simba SDK, all Hive-specific JARs are bundled) and it works with about any BI tool. So hopefully it works with LibreThing too.
Disclaimer: I wish the Apache distro had a proper JDBC driver, and anyone could use it instead of relying of "free" commercial software. But for now it's just a wish.

What should be approach?

Try to be more clear, I'm in lack of ideas in this problem, even it sounds like a classic.
My application is running on weblogic 10.3.3 application server, and for database I am using Oracle database 11g. My problem is that there is table in db, let's say "user.", there is column, let's say "columnA", in this table. This table is updating by some module of application.
What I want if when value of column is "abc.", then I have to show alert to console(IP). {IP can be retrieved from DB as it is configured in DB. this ip will be other linux system other than linux machine where oracle database is installed.} Updating is continuously done on my table from module of application. Please tell me from where should I start?, what should I read. I am not able to understand what should be approach. Any help is much appreciated.

A trigger on the table can call UTL_HTTP to communicate with another machine (eg call a RESTful API).
The architectural questions are :
This will happen PRIOR to the commit so you may get false alerts if a change is rolled back
If you wait for a response, it will slow the system down.
What do you do if you get an non-standard response (eg the other server isn't available)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio