How to write spark dataframe to impala database

How to write spark dataframe to impala database - jdbc

I use the following code to write the spark dataframe to impala through JDBC connection.
df.write.mode("append").jdbc(url="jdbc:impala://10.61.1.101:21050/test;auth=noSasl",table="t_author_classic_copy", pro)
But I get the following error: java.sql.SQLException: No suitable driver found
then I change the mode:
df.write.mode("overwrite").jdbc(url="jdbc:impala://10.61.1.101:21050/test;auth=noSasl",table="t_author_classic_copy", pro)
but it still get an error:
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE t_author_classic_copy1 (id TEXT NOT NULL, domain_id TEXT NOT NULL, pub_num INTEGER , cited_num INTEGER , rank DOUBLE PRECISION ).

This works for me:
spark-shell --driver-class-path ImpalaJDBC41.jar --jars ImpalaJDBC41.jar
val jdbcURL = s"jdbc:impala://192.168.56.101:21050;AuthMech=0"
val connectionProperties = new java.util.Properties()
import org.apache.spark.sql.SaveMode
sqlContext.sql("select * from my_users").write.mode(SaveMode.Append).jdbc(jdbcURL, "users", connectionProperties)

Related

Unable to load data into parquet file format?

I am trying to parse log data into parquet file format in hive , the separator used is "||-||".
The sample row is
"b8905bfc-dc34-463e-a6ac-879e50c2e630||-||syntrans1||-||CitBook"
After performing the data staging I am able to get the result
"b8905bfc-dc34-463e-a6ac-879e50c2e630 syntrans1 CitBook ".
While converting the data to parquet file format I got error :
`
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2185)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:137)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
... 24 more
This is what I have tried
create table log (a String ,b String ,c String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES (
"field.delim"="||-||",
"collection.delim"="-",
"mapkey.delim"="#"
);
create table log_par(
a String ,
b String ,
c String
) stored as PARQUET ;
insert into logspar select * from log_par ;
`

Aman kumar,
To resolve this issue, run the hive query after adding the following jar:
hive> add jar hive-contrib.jar;
To add the jar permanently, do the following:
1.On Hive Server host, create a /usr/hdp//hive/auxlib directory.
2.Copy /usr/hdp//hive/lib/hive-contrib-.jar to /usr/hdp//hive/auxlib.
3.Restart the HS2 server.
Please check further reference.
https://community.hortonworks.com/content/supportkb/150175/errororgapachehadoophivecontribserde2multidelimits.html.
https://community.hortonworks.com/questions/79075/loading-data-to-hive-via-pig-orgapachehadoophiveco.html
Let me know,if you face any issues

ERROR MESSAGE when cassandra python Driver accept parameters from user GUI

i'm using cassandra 2.1 and CQL 3.2.1 , i want to let user specify keyspace name ,replication strategy , replication factor from UI , and then pass these values to query to execute Insert CQL , but give me an syntax error , i try a lot but nothing go write >>
i'm create keyspace -> connect
and column family -> keyspaces
but insertion cause error
here is my code :
from cassandra.cluster import Cluster
class Connection():
def __init__ (self , ips , keyspace ,replication_strategy ,replication_factor):
self.keyspace=keyspace
self.ips =ips
self.replication_strategy=replication_strategy
self.replication_factor=replication_factor
cluster = Cluster([ips])
session = cluster.connect()
session.execute("CREATE keyspace IF NOT EXISTS connect with replication={ 'class' : 'SimpleStrategy', 'replication_factor' :1}")
session.execute("CREATE TABLE IF NOT EXISTS connect.keyspaces (id int primary key , keyspaces_name text, replication_strategy text, replication_factor int)")
session.execute("INSERT INTO connect.keyspaces(id , keyspaces_name , replication_strategy ,replication_factor ) VALUES (1 " +',' + self.keyspace + ',' + self.replication_strategy +',' + self.replication_factor + ")")
and the ERROR MESSAGE IS :
File "cassandra/cluster.py", line 3822, in cassandra.cluster.ResponseFuture.result (cassandra/cluster.c:74332)
raise self._final_exception
SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:125 no viable alternative at input ',' (...) VALUES (1 ,noon,[SimpleStrategy],...)">

This means there is a syntax error in the INSERT statement you're building. It might be easier to troubleshoot if you print the string query you've built.
Alternatively I would suggest parameterizing your query to let the driver do formatting:
http://datastax.github.io/python-driver/getting_started.html#passing-parameters-to-cql-queries

SPARK SQL (1.5.1) connect to Oracle and write to Avro

I am using spark-sql to connect to oracle databse and getting data as dataframes. I would like to write this retrieved data into avro file. While writing to avro I am seeing multiple issues, could you help us.
Here is the code -
val df = sqlContext.read.format("jdbc")
.options(Map( "driver"->"oracle.jdbc.driver.OracleDriver",
"url" -> "jdbc:oracle:thin:user/password#host/service"
, "numPartitions" -> "1", "dbtable"-> "
(Select * from schema.table WHERE STAGE_NUM <=39 and
guid='I284ba1f9cdba11dea82ab9f4ee295c21')"))
.load()
df.write.format("com.databricks.spark.avro").save("Outputfile")
Dependencies that are there in my project -
<dependency><br> <groupId>org.apache.spark</groupId><br> <artifactId>spark-sql_2.10</artifactId><br> <version>1.5.1</version><br></dependency><br><dependency><br> <groupId>com.databricks</groupId><br> <artifactId>spark-avro_2.10</artifactId><br> <version>2.0.1</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro</artifactId><br> <version>1.7.7</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro-mapred</artifactId><br> <version>1.7.7</version><br></dependency>
Here is the exception information -
java.lang.RuntimeException: com.databricks.spark.avro.DefaultSource does not allow create table as select
If I use - df.write.avro("headnotes"), I get the following exception.
java.lang.IllegalAccessError: tried to access class org.apache.avro.SchemaBuilder$FieldDefault from class com.databricks.spark.avro.SchemaConverters$$anonfun$convertStructToAvro$1

Need javax.jdo.option.ConnectionURL for cassandra

Are the below properties in hive-site.xml correct for Hive access to cassandra??
(I HAVE COPIED ENTIRE HIVE-DEFAULT.XML CONTENT BUT HAVE CHANGED ONLY THE BELOW PROPERTIES)
javax.jdo.option.ConnectionURL : cassandra://localhost:9160
javax.jdo.option.ConnectionDriverName:org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbclass: jdbc:cassandra
hive.stats.jdbcdriver: org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbconnectionstring: jdbc:cassandra:;databaseName=TempStatsStore;create=true
I am running 1-node Cassandra. But, later would make it a minimum 2 node cluster.
When I run the below table creation command I get an error:
CREATE EXTERNAL TABLE MyHiveTable
(m string, n string, o string, p string)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
TBLPROPERTIES ( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "test",
"cassandra.cql3.type" = "text, text, text, text");
Error:
FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

don't know about jdo settings but you could try this link which is far better option for integrating hive with cassandra -
https://github.com/milliondreams/hive/tree/cas-support-cql/cassandra-handler

Errors when creating table using cassandra-jdbc-1.2.1 jar

I am having some difficulty creating a column family (table) in cassandra via the cassandra-jdbc driver.
The cql command works correctly in cqlsh, but doesn't when using cassandra jdbc. I suspect this is something to do with the way I have defined my connection string. Any help would be greatly helpful.
Let me try and explain what I have done.
I have created a keyspace using cqlsh with the following command
CREATE KEYSPACE authdb WITH
REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
This is as per the documentation at: http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_KEYSPACE#cql-create-keyspace
I am able to create a table (column family) in cqlsh using
CREATE TABLE authdb.users(
user_name varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
birth_year bigint
);
This works correctly.
My problems start when I try to create the table using cassandra-jdbc-1.2.1.jar
The code I use is:
public static void createColumnFamily() {
try {
Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/authdb?version=3.0.0");
String qry = "CREATE TABLE authdb.users(" +
"user_name varchar PRIMARY KEY," +
"password varchar," +
"gender varchar," +
"session_token varchar," +
"birth_year bigint" +
")";
Statement smt = con.createStatement();
smt.executeUpdate(qry);
con.close();
} catch (Exception e) {
e.printStackTrace();
}
When using cassandra-jdbc-1.2.1.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main DEBUG jdbc.CassandraConnection - Connected to localhost:9160 in Cluster 'authdb' using Keyspace 'Test Cluster' and CQL version '3.0.0'
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Ljava/nio/ByteBuffer;Lorg/apache/cassandra/thrift/Compression;Lorg/apache/cassandra/thrift/ConsistencyLevel;)Lorg/apache/cassandra/thrift/CqlResult;
at org.apache.cassandra.cql.jdbc.CassandraConnection.execute(CassandraConnection.java:447)
Note: the cluster and key space are not correct
When using cassandra-jdbc-1.1.2.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main INFO jdbc.CassandraConnection - Connected to localhost:9160 using Keyspace authdb and CQL version 3.0.0
java.sql.SQLSyntaxErrorException: Cannot execute/prepare CQL2 statement since the CQL has been set to CQL3(This might mean your client hasn't been upgraded correctly to use the new CQL3 methods introduced in Cassandra 1.2+).
Note: in this instance the cluster and keyspace appear to be correct.

The error when using the 1.2.1 jar is because you have an old version of the cassandra-thrift jar. You need to keep that in sync with the cassandra-jdbc version. The cassandra-thrift jar is in the lib directory of the binary download.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to write spark dataframe to impala database - jdbc

Related

Unable to load data into parquet file format?

ERROR MESSAGE when cassandra python Driver accept parameters from user GUI

SPARK SQL (1.5.1) connect to Oracle and write to Avro

Need javax.jdo.option.ConnectionURL for cassandra

Errors when creating table using cassandra-jdbc-1.2.1 jar

Categories

Resources