How to write spark dataframe to impala database - jdbc

I use the following code to write the spark dataframe to impala through JDBC connection.
df.write.mode("append").jdbc(url="jdbc:impala://10.61.1.101:21050/test;auth=noSasl",table="t_author_classic_copy", pro)
But I get the following error: java.sql.SQLException: No suitable driver found
then I change the mode:
df.write.mode("overwrite").jdbc(url="jdbc:impala://10.61.1.101:21050/test;auth=noSasl",table="t_author_classic_copy", pro)
but it still get an error:
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE t_author_classic_copy1 (id TEXT NOT NULL, domain_id TEXT NOT NULL, pub_num INTEGER , cited_num INTEGER , rank DOUBLE PRECISION ).

This works for me:
spark-shell --driver-class-path ImpalaJDBC41.jar --jars ImpalaJDBC41.jar
val jdbcURL = s"jdbc:impala://192.168.56.101:21050;AuthMech=0"
val connectionProperties = new java.util.Properties()
import org.apache.spark.sql.SaveMode
sqlContext.sql("select * from my_users").write.mode(SaveMode.Append).jdbc(jdbcURL, "users", connectionProperties)

Related

Unable to load data into parquet file format?

I am trying to parse log data into parquet file format in hive , the separator used is "||-||".
The sample row is
"b8905bfc-dc34-463e-a6ac-879e50c2e630||-||syntrans1||-||CitBook"
After performing the data staging I am able to get the result
"b8905bfc-dc34-463e-a6ac-879e50c2e630 syntrans1 CitBook ".
While converting the data to parquet file format I got error :
`
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2185)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:137)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
... 24 more
This is what I have tried
create table log (a String ,b String ,c String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES (
"field.delim"="||-||",
"collection.delim"="-",
"mapkey.delim"="#"
);
create table log_par(
a String ,
b String ,
c String
) stored as PARQUET ;
insert into logspar select * from log_par ;
`
Aman kumar,
To resolve this issue, run the hive query after adding the following jar:
hive> add jar hive-contrib.jar;
To add the jar permanently, do the following:
1.On Hive Server host, create a /usr/hdp//hive/auxlib directory.
2.Copy /usr/hdp//hive/lib/hive-contrib-.jar to /usr/hdp//hive/auxlib.
3.Restart the HS2 server.
Please check further reference.
https://community.hortonworks.com/content/supportkb/150175/errororgapachehadoophivecontribserde2multidelimits.html.
https://community.hortonworks.com/questions/79075/loading-data-to-hive-via-pig-orgapachehadoophiveco.html
Let me know,if you face any issues

ERROR MESSAGE when cassandra python Driver accept parameters from user GUI

i'm using cassandra 2.1 and CQL 3.2.1 , i want to let user specify keyspace name ,replication strategy , replication factor from UI , and then pass these values to query to execute Insert CQL , but give me an syntax error , i try a lot but nothing go write >>
i'm create keyspace -> connect
and column family -> keyspaces
but insertion cause error
here is my code :
from cassandra.cluster import Cluster
class Connection():
def __init__ (self , ips , keyspace ,replication_strategy ,replication_factor):
self.keyspace=keyspace
self.ips =ips
self.replication_strategy=replication_strategy
self.replication_factor=replication_factor
cluster = Cluster([ips])
session = cluster.connect()
session.execute("CREATE keyspace IF NOT EXISTS connect with replication={ 'class' : 'SimpleStrategy', 'replication_factor' :1}")
session.execute("CREATE TABLE IF NOT EXISTS connect.keyspaces (id int primary key , keyspaces_name text, replication_strategy text, replication_factor int)")
session.execute("INSERT INTO connect.keyspaces(id , keyspaces_name , replication_strategy ,replication_factor ) VALUES (1 " +',' + self.keyspace + ',' + self.replication_strategy +',' + self.replication_factor + ")")
and the ERROR MESSAGE IS :
File "cassandra/cluster.py", line 3822, in cassandra.cluster.ResponseFuture.result (cassandra/cluster.c:74332)
raise self._final_exception
SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:125 no viable alternative at input ',' (...) VALUES (1 ,noon,[SimpleStrategy],...)">
This means there is a syntax error in the INSERT statement you're building. It might be easier to troubleshoot if you print the string query you've built.
Alternatively I would suggest parameterizing your query to let the driver do formatting:
http://datastax.github.io/python-driver/getting_started.html#passing-parameters-to-cql-queries

SPARK SQL (1.5.1) connect to Oracle and write to Avro

I am using spark-sql to connect to oracle databse and getting data as dataframes. I would like to write this retrieved data into avro file. While writing to avro I am seeing multiple issues, could you help us.
Here is the code -
val df = sqlContext.read.format("jdbc")
.options(Map( "driver"->"oracle.jdbc.driver.OracleDriver",
"url" -> "jdbc:oracle:thin:user/password#host/service"
, "numPartitions" -> "1", "dbtable"-> "
(Select * from schema.table WHERE STAGE_NUM <=39 and
guid='I284ba1f9cdba11dea82ab9f4ee295c21')"))
.load()
df.write.format("com.databricks.spark.avro").save("Outputfile")
Dependencies that are there in my project -
<dependency><br> <groupId>org.apache.spark</groupId><br> <artifactId>spark-sql_2.10</artifactId><br> <version>1.5.1</version><br></dependency><br><dependency><br> <groupId>com.databricks</groupId><br> <artifactId>spark-avro_2.10</artifactId><br> <version>2.0.1</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro</artifactId><br> <version>1.7.7</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro-mapred</artifactId><br> <version>1.7.7</version><br></dependency>
Here is the exception information -
java.lang.RuntimeException: com.databricks.spark.avro.DefaultSource does not allow create table as select
If I use - df.write.avro("headnotes"), I get the following exception.
java.lang.IllegalAccessError: tried to access class org.apache.avro.SchemaBuilder$FieldDefault from class com.databricks.spark.avro.SchemaConverters$$anonfun$convertStructToAvro$1

Need javax.jdo.option.ConnectionURL for cassandra

Are the below properties in hive-site.xml correct for Hive access to cassandra??
(I HAVE COPIED ENTIRE HIVE-DEFAULT.XML CONTENT BUT HAVE CHANGED ONLY THE BELOW PROPERTIES)
javax.jdo.option.ConnectionURL : cassandra://localhost:9160
javax.jdo.option.ConnectionDriverName:org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbclass: jdbc:cassandra
hive.stats.jdbcdriver: org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbconnectionstring: jdbc:cassandra:;databaseName=TempStatsStore;create=true
I am running 1-node Cassandra. But, later would make it a minimum 2 node cluster.
When I run the below table creation command I get an error:
CREATE EXTERNAL TABLE MyHiveTable
(m string, n string, o string, p string)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
TBLPROPERTIES ( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "test",
"cassandra.cql3.type" = "text, text, text, text");
Error:
FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
don't know about jdo settings but you could try this link which is far better option for integrating hive with cassandra -
https://github.com/milliondreams/hive/tree/cas-support-cql/cassandra-handler

Errors when creating table using cassandra-jdbc-1.2.1 jar

I am having some difficulty creating a column family (table) in cassandra via the cassandra-jdbc driver.
The cql command works correctly in cqlsh, but doesn't when using cassandra jdbc. I suspect this is something to do with the way I have defined my connection string. Any help would be greatly helpful.
Let me try and explain what I have done.
I have created a keyspace using cqlsh with the following command
CREATE KEYSPACE authdb WITH
REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
This is as per the documentation at: http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_KEYSPACE#cql-create-keyspace
I am able to create a table (column family) in cqlsh using
CREATE TABLE authdb.users(
user_name varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
birth_year bigint
);
This works correctly.
My problems start when I try to create the table using cassandra-jdbc-1.2.1.jar
The code I use is:
public static void createColumnFamily() {
try {
Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/authdb?version=3.0.0");
String qry = "CREATE TABLE authdb.users(" +
"user_name varchar PRIMARY KEY," +
"password varchar," +
"gender varchar," +
"session_token varchar," +
"birth_year bigint" +
")";
Statement smt = con.createStatement();
smt.executeUpdate(qry);
con.close();
} catch (Exception e) {
e.printStackTrace();
}
When using cassandra-jdbc-1.2.1.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main DEBUG jdbc.CassandraConnection - Connected to localhost:9160 in Cluster 'authdb' using Keyspace 'Test Cluster' and CQL version '3.0.0'
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Ljava/nio/ByteBuffer;Lorg/apache/cassandra/thrift/Compression;Lorg/apache/cassandra/thrift/ConsistencyLevel;)Lorg/apache/cassandra/thrift/CqlResult;
at org.apache.cassandra.cql.jdbc.CassandraConnection.execute(CassandraConnection.java:447)
Note: the cluster and key space are not correct
When using cassandra-jdbc-1.1.2.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main INFO jdbc.CassandraConnection - Connected to localhost:9160 using Keyspace authdb and CQL version 3.0.0
java.sql.SQLSyntaxErrorException: Cannot execute/prepare CQL2 statement since the CQL has been set to CQL3(This might mean your client hasn't been upgraded correctly to use the new CQL3 methods introduced in Cassandra 1.2+).
Note: in this instance the cluster and keyspace appear to be correct.
The error when using the 1.2.1 jar is because you have an old version of the cassandra-thrift jar. You need to keep that in sync with the cassandra-jdbc version. The cassandra-thrift jar is in the lib directory of the binary download.

Resources