i'm using cassandra 2.1 and CQL 3.2.1 , i want to let user specify keyspace name ,replication strategy , replication factor from UI , and then pass these values to query to execute Insert CQL , but give me an syntax error , i try a lot but nothing go write >>
i'm create keyspace -> connect
and column family -> keyspaces
but insertion cause error
here is my code :
from cassandra.cluster import Cluster
class Connection():
def __init__ (self , ips , keyspace ,replication_strategy ,replication_factor):
self.keyspace=keyspace
self.ips =ips
self.replication_strategy=replication_strategy
self.replication_factor=replication_factor
cluster = Cluster([ips])
session = cluster.connect()
session.execute("CREATE keyspace IF NOT EXISTS connect with replication={ 'class' : 'SimpleStrategy', 'replication_factor' :1}")
session.execute("CREATE TABLE IF NOT EXISTS connect.keyspaces (id int primary key , keyspaces_name text, replication_strategy text, replication_factor int)")
session.execute("INSERT INTO connect.keyspaces(id , keyspaces_name , replication_strategy ,replication_factor ) VALUES (1 " +',' + self.keyspace + ',' + self.replication_strategy +',' + self.replication_factor + ")")
and the ERROR MESSAGE IS :
File "cassandra/cluster.py", line 3822, in cassandra.cluster.ResponseFuture.result (cassandra/cluster.c:74332)
raise self._final_exception
SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:125 no viable alternative at input ',' (...) VALUES (1 ,noon,[SimpleStrategy],...)">
This means there is a syntax error in the INSERT statement you're building. It might be easier to troubleshoot if you print the string query you've built.
Alternatively I would suggest parameterizing your query to let the driver do formatting:
http://datastax.github.io/python-driver/getting_started.html#passing-parameters-to-cql-queries
Related
I'm trying to run a test task on Airflow but I keep getting the following error:
FAILED: ParseException 2:0 cannot recognize input near 'create_import_table_fct_latest_values' '.' 'hql'
Here is my Airflow Dag file:
import airflow
from datetime import datetime, timedelta
from airflow.operators.hive_operator import HiveOperator
from airflow.models import DAG
args = {
'owner': 'raul',
'start_date': datetime(2018, 11, 12),
'provide_context': True,
'depends_on_past': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
'email': ['raul.gregglino#leroymerlin.ru'],
'email_on_failure': True,
'email_on_retry': False
}
dag = DAG('opus_data',
default_args=args,
max_active_runs=6,
schedule_interval="#daily"
)
import_lv_data = HiveOperator(
task_id='fct_latest_values',
hive_cli_conn_id='metastore_default',
hql='create_import_table_fct_latest_values.hql ',
hiveconf_jinja_translate=True,
dag=dag
)
deps = {}
# Explicity define the dependencies in the DAG
for downstream, upstream_list in deps.iteritems():
for upstream in upstream_list:
dag.set_dependency(upstream, downstream)
Here is the content of my HQL file, in case this may be the issue and I can't figure:
*I'm testing the connection to understand if the table is created or not, then I'll try to LOAD DATA, hence the LOAD DATA is commented out.
CREATE TABLE IF NOT EXISTS opus_data.fct_latest_values_new_data (
id_product STRING,
id_model STRING,
id_attribute STRING,
attribute_value STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED ',';
#LOAD DATA LOCAL INPATH
#'/media/windows_share/schemas/opus/fct_latest_values_20181106.csv'
#OVERWRITE INTO TABLE opus_data.fct_latest_values_new_data;
In the HQL file it should be FIELDS TERMINATED BY ',':
CREATE TABLE IF NOT EXISTS opus_data.fct_latest_values_new_data (
id_product STRING,
id_model STRING,
id_attribute STRING,
attribute_value STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
And comments should start with -- in HQL file, not #
Also this seems incorrect and causing Exception hql='create_import_table_fct_latest_values.hql '
Have a look at this example:
#Create full path for the file
hql_file_path = os.path.join(os.path.dirname(__file__), source['hql'])
print hql_file_path
run_hive_query = HiveOperator(
task_id='run_hive_query',
dag = dag,
hql = """
{{ local_hive_settings }}
""" + "\n " + open(hql_file_path, 'r').read()
)
See here for more details.
Or put all HQL into hql parameter:
hql='CREATE TABLE IF NOT EXISTS opus_data.fct_latest_values_new_data ...'
I managed to find the answer for my issue.
It was related to the path my HiveOperator was calling the file from. As no Variable had been defined to tell Airflow where to look for, I was getting the error I mentioned in my post.
Once I have defined it using the webserver interface (See picture), my dag started to work propertly.
I made a change to my DAG code regarding the file location for organization only and this is how my HiveOperator looks like now:
import_lv_data = HiveOperator(
task_id='fct_latest_values',
hive_cli_conn_id='metastore_default',
hql='hql/create_import_table_fct_latest_values2.hql',
hiveconf_jinja_translate=True,
dag=dag
)
Thanks to (#panov.st) who helped me in person to identify my issue.
I use the following code to write the spark dataframe to impala through JDBC connection.
df.write.mode("append").jdbc(url="jdbc:impala://10.61.1.101:21050/test;auth=noSasl",table="t_author_classic_copy", pro)
But I get the following error: java.sql.SQLException: No suitable driver found
then I change the mode:
df.write.mode("overwrite").jdbc(url="jdbc:impala://10.61.1.101:21050/test;auth=noSasl",table="t_author_classic_copy", pro)
but it still get an error:
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE t_author_classic_copy1 (id TEXT NOT NULL, domain_id TEXT NOT NULL, pub_num INTEGER , cited_num INTEGER , rank DOUBLE PRECISION ).
This works for me:
spark-shell --driver-class-path ImpalaJDBC41.jar --jars ImpalaJDBC41.jar
val jdbcURL = s"jdbc:impala://192.168.56.101:21050;AuthMech=0"
val connectionProperties = new java.util.Properties()
import org.apache.spark.sql.SaveMode
sqlContext.sql("select * from my_users").write.mode(SaveMode.Append).jdbc(jdbcURL, "users", connectionProperties)
I am using spark-sql to connect to oracle databse and getting data as dataframes. I would like to write this retrieved data into avro file. While writing to avro I am seeing multiple issues, could you help us.
Here is the code -
val df = sqlContext.read.format("jdbc")
.options(Map( "driver"->"oracle.jdbc.driver.OracleDriver",
"url" -> "jdbc:oracle:thin:user/password#host/service"
, "numPartitions" -> "1", "dbtable"-> "
(Select * from schema.table WHERE STAGE_NUM <=39 and
guid='I284ba1f9cdba11dea82ab9f4ee295c21')"))
.load()
df.write.format("com.databricks.spark.avro").save("Outputfile")
Dependencies that are there in my project -
<dependency><br> <groupId>org.apache.spark</groupId><br> <artifactId>spark-sql_2.10</artifactId><br> <version>1.5.1</version><br></dependency><br><dependency><br> <groupId>com.databricks</groupId><br> <artifactId>spark-avro_2.10</artifactId><br> <version>2.0.1</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro</artifactId><br> <version>1.7.7</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro-mapred</artifactId><br> <version>1.7.7</version><br></dependency>
Here is the exception information -
java.lang.RuntimeException: com.databricks.spark.avro.DefaultSource does not allow create table as select
If I use - df.write.avro("headnotes"), I get the following exception.
java.lang.IllegalAccessError: tried to access class org.apache.avro.SchemaBuilder$FieldDefault from class com.databricks.spark.avro.SchemaConverters$$anonfun$convertStructToAvro$1
Are the below properties in hive-site.xml correct for Hive access to cassandra??
(I HAVE COPIED ENTIRE HIVE-DEFAULT.XML CONTENT BUT HAVE CHANGED ONLY THE BELOW PROPERTIES)
javax.jdo.option.ConnectionURL : cassandra://localhost:9160
javax.jdo.option.ConnectionDriverName:org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbclass: jdbc:cassandra
hive.stats.jdbcdriver: org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbconnectionstring: jdbc:cassandra:;databaseName=TempStatsStore;create=true
I am running 1-node Cassandra. But, later would make it a minimum 2 node cluster.
When I run the below table creation command I get an error:
CREATE EXTERNAL TABLE MyHiveTable
(m string, n string, o string, p string)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
TBLPROPERTIES ( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "test",
"cassandra.cql3.type" = "text, text, text, text");
Error:
FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
don't know about jdo settings but you could try this link which is far better option for integrating hive with cassandra -
https://github.com/milliondreams/hive/tree/cas-support-cql/cassandra-handler
I am having some difficulty creating a column family (table) in cassandra via the cassandra-jdbc driver.
The cql command works correctly in cqlsh, but doesn't when using cassandra jdbc. I suspect this is something to do with the way I have defined my connection string. Any help would be greatly helpful.
Let me try and explain what I have done.
I have created a keyspace using cqlsh with the following command
CREATE KEYSPACE authdb WITH
REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
This is as per the documentation at: http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_KEYSPACE#cql-create-keyspace
I am able to create a table (column family) in cqlsh using
CREATE TABLE authdb.users(
user_name varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
birth_year bigint
);
This works correctly.
My problems start when I try to create the table using cassandra-jdbc-1.2.1.jar
The code I use is:
public static void createColumnFamily() {
try {
Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/authdb?version=3.0.0");
String qry = "CREATE TABLE authdb.users(" +
"user_name varchar PRIMARY KEY," +
"password varchar," +
"gender varchar," +
"session_token varchar," +
"birth_year bigint" +
")";
Statement smt = con.createStatement();
smt.executeUpdate(qry);
con.close();
} catch (Exception e) {
e.printStackTrace();
}
When using cassandra-jdbc-1.2.1.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main DEBUG jdbc.CassandraConnection - Connected to localhost:9160 in Cluster 'authdb' using Keyspace 'Test Cluster' and CQL version '3.0.0'
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Ljava/nio/ByteBuffer;Lorg/apache/cassandra/thrift/Compression;Lorg/apache/cassandra/thrift/ConsistencyLevel;)Lorg/apache/cassandra/thrift/CqlResult;
at org.apache.cassandra.cql.jdbc.CassandraConnection.execute(CassandraConnection.java:447)
Note: the cluster and key space are not correct
When using cassandra-jdbc-1.1.2.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main INFO jdbc.CassandraConnection - Connected to localhost:9160 using Keyspace authdb and CQL version 3.0.0
java.sql.SQLSyntaxErrorException: Cannot execute/prepare CQL2 statement since the CQL has been set to CQL3(This might mean your client hasn't been upgraded correctly to use the new CQL3 methods introduced in Cassandra 1.2+).
Note: in this instance the cluster and keyspace appear to be correct.
The error when using the 1.2.1 jar is because you have an old version of the cassandra-thrift jar. You need to keep that in sync with the cassandra-jdbc version. The cassandra-thrift jar is in the lib directory of the binary download.