I am trying to extract data from Db2 to spark using read.jdbc . i am unable to pass with UR string in the query.
How do we set isolation as UR in the spark jdbc read.
import json
#spark = SparkSession.builder.config('spark.driver.extraClassPath', '/home/user/db2jcc4.jar').getOrCreate()
jdbcUrl = "jdbc:db2://{0}:{1}/{2}".format("db2.1234.abcd.com", "3910", "DSN")
connectionProperties = {
"user" : "user1",
"password" : "password1",
"driver" : "com.ibm.db2.jcc.DB2Driver",
"fetchsize" : "100000"
}
pushdown_query = "(SELECT T6.COLUMN1, T6.COLUMN2 ,TO_DATE('07/11/2019 10:52:24', 'MM/DD/YYYY HH24:MI:SS') AS INSERT_DATE FROM DB1.T6 WITH UR ) ALIAS"
print(jdbcUrl)
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, column="COLUMN1", lowerBound=1, upperBound=12732076, numPartitions=5, properties=connectionProperties)
This is failing with error : com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-199, SQLSTATE=42601, SQLERRMC=UR;;FETCH , ) OFFSET LIMIT INTERSECT ORDER GROUP WHERE HAVING JOIN, DRIVER=4.13.80
if i remove the UR it is working. Is there a way to pass query with UR in spark jdbc read?
There is connection parameter in jdbc but this is mentioned only applied to writing
isolationLevel The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.
will the below do the trick ?
connectionProperties = {
"user" : "user1",
"password" : "password1",
"driver" : "com.ibm.db2.jcc.DB2Driver",
"fetchsize" : "100000",
"isolationLevel" : "READ_UNCOMMITTED"
}
According to DB2 documentation , while connecting to db2 in connection details we can pass
defaultIsolationLevel=1 which means Uncommitted Reads.
Check out the link: https://www.ibm.com/support/pages/how-set-isolation-level-db2-jdbc-database-connections
According to the Documentation and to this Blog the isolationLevel is ignored in a read action.
To be honest, I don't understand why, since the java.sql.connection setIsolationLevel sets a default for the whole connection and afaik the read does not set the isolationLevel by itself.
Nevertheless, here is offered a different approach.
So the following should work out for you:
#spark = SparkSession.builder.config('spark.driver.extraClassPath', '/home/user/db2jcc4.jar').getOrCreate()
jdbcUrl = "jdbc:db2://{0}:{1}/{2}".format("db2.1234.abcd.com", "3910", "DSN")
connectionProperties = {
"user" : "user1",
"password" : "password1",
"driver" : "com.ibm.db2.jcc.DB2Driver",
"fetchsize" : "100000"
}
df = spark.read.jdbc(url=jdbcUrl, table="DB1.T6", predicates=["1=1 WITH UR"], properties=connectionProperties).select("COLUMN1", "COLUMN2", ...)
I Used the 1=1 clause to make a valid where condition.
This apporoch does look, like there must be a cleaner way, but it works fine
Related
I am using logastash 7.6 with the output-jdbc plugin, but I get an error and I understand that it is because in the event it sends me all the fields to be indexed that are part of #metadata.
Probe just putting the event name without # and it works for me.
How can I get a single field within a #metada set?
ERROR:
ERROR logstash.outputs.jdbc - JDBC - Exception. Not retrying {:exception=>#, :statement=>"UPDATE table SET estate = 'P' WHERE codigo = ? ", :event=>"{\"properties\":{\"rangoAltura1\":null,\"rangoAltura2\":null,\"codigo\":\"DB_001\",\"rangoAltura3\":null,\"descrip\":\"CARLOS PEREZ\",\"codigo\":\"106\",\"rangoAltura5\":null,\"active\":true},\"id\":\"DB_001_555\"}"}
My .conf:
statement => ["UPDATE table SET estate = 'A' WHERE entidad = ? ","%{[#metadata][miEntidad]}"]
{[#metadata][miEntidad]} -----> map['entidad_temp'] = event.get('entidad')
According to the output jdbc plugin README you have it set correctly/
Maybe try the following as a work-around:
statement => ["UPDATE table SET estate = 'A' WHERE entidad = ? ","[#metadata][miEntidad]"]
The H2 Database has a list of commands starting with SET, in particular SET DB_CLOSE_DELAY. I would like to find out what the value of DB_CLOSE_DELAY is. I am using JDBC. Setting is easy
cx.createStatement.execute("SET DB_CLOSE_DELAY 0")
but none of the following returns the actual value of DB_CLOSE_DELAY:
cx.createStatement.executeQuery("DB_CLOSE_DELAY")
cx.createStatement.executeQuery("VALUES(#DB_CLOSE_DELAY)")
cx.createStatement.executeQuery("GET DB_CLOSE_DELAY")
cx.createStatement.executeQuery("SHOW DB_CLOSE_DELAY")
Help would be greatly appreciated.
You can access this and other settings in the INFORMATION_SCHEMA.SETTINGS table - for example:
String url = "jdbc:h2:mem:;DB_CLOSE_DELAY=3";
Connection conn = DriverManager.getConnection(url, "sa", "the password goes here");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM INFORMATION_SCHEMA.SETTINGS where name = 'DB_CLOSE_DELAY'");
while (rs.next()) {
System.out.println(rs.getString("name"));
System.out.println(rs.getString("value"));
}
In this test, I use an unnamed in-memory database, and I explicitly set the delay to 3 seconds when I create the DB.
The output from the print statements is:
DB_CLOSE_DELAY
3
Recently I am studying Apache Calcite, by now I can use explain plan for via JDBC to view the logical plan, and I am wondering how can I view the physical sql in the plan execution? Since there may be bugs in the physical sql generation so I need to make sure the correctness.
val connection = DriverManager.getConnection("jdbc:calcite:")
val calciteConnection = connection.asInstanceOf[CalciteConnection]
val rootSchema = calciteConnection.getRootSchema()
val dsInsightUser = JdbcSchema.dataSource("jdbc:mysql://localhost:13306/insight?useSSL=false&serverTimezone=UTC", "com.mysql.jdbc.Driver", "insight_admin","xxxxxx")
val dsPerm = JdbcSchema.dataSource("jdbc:mysql://localhost:13307/permission?useSSL=false&serverTimezone=UTC", "com.mysql.jdbc.Driver", "perm_admin", "xxxxxx")
rootSchema.add("insight_user", JdbcSchema.create(rootSchema, "insight_user", dsInsightUser, null, null))
rootSchema.add("perm", JdbcSchema.create(rootSchema, "perm", dsPerm, null, null))
val stmt = connection.createStatement()
val rs = stmt.executeQuery("""explain plan for select "perm"."user_table".* from "perm"."user_table" join "insight_user"."user_tab" on "perm"."user_table"."id"="insight_user"."user_tab"."id" """)
val metaData = rs.getMetaData()
while(rs.next()) {
for(i <- 1 to metaData.getColumnCount) printf("%s ", rs.getObject(i))
println()
}
result is
EnumerableCalc(expr#0..3=[{inputs}], proj#0..2=[{exprs}])
EnumerableHashJoin(condition=[=($0, $3)], joinType=[inner])
JdbcToEnumerableConverter
JdbcTableScan(table=[[perm, user_table]])
JdbcToEnumerableConverter
JdbcProject(id=[$0])
JdbcTableScan(table=[[insight_user, user_tab]])
There is a Calcite Hook, Hook.QUERY_PLAN that is triggered with the JDBC query strings. From the source:
/** Called with a query that has been generated to send to a back-end system.
* The query might be a SQL string (for the JDBC adapter), a list of Mongo
* pipeline expressions (for the MongoDB adapter), et cetera. */
QUERY_PLAN;
You can register a listener to log any query strings, like this in Java:
Hook.QUERY_PLAN.add((Consumer<String>) s -> LOG.info("Query sent over JDBC:\n" + s));
It is possible to see the generated SQL query by setting calcite.debug=true system property. The exact place where this is happening is in JdbcToEnumerableConverter. As this is happening during the execution of the query you will have to remove the "explain plan for"
from stmt.executeQuery.
Note that by setting debug mode to true you will get a lot of other messages as well as other information regarding generated code.
I am using SQL 'select' to access a db2 table with schemaname.tablename as follows:
select 'colname' from schemaname.tablename
The tablename has 'colname' = SERVER_POOL_NAME for sure . yet I get the following error :
"Invalid parameter: Unknown column name SERVER_POOL_NAME . ERRORCODE=-4460, SQLSTATE=null"
I am using db2 v10.1 FP0 jdbc driver version 3.63.123. JDBC 3.0 spec
The application is run as db2 administrator and also Windows 2008 admin
I saw a discussion about this issue at : db2jcc4.jar Invalid parameter: Unknown column name
But i do not know where the connection parameter 'useJDBC4ColumnNameAndLabelSemantics should be set ( to value =2)
I saw the parameter should appear in com.ibm.db2.jcc.DB2BaseDataSource ( see: http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=%2Fcom.ibm.db2.luw.apdv.java.doc%2Fsrc%2Ftpc%2Fimjcc_r0052607.html)
But i do not find this file on my DB2 installation . maybe it is packed in a .jar file
Any advice ?
There is a link on the page you're referring to, showing you the ways to set properties. Specifically, you can populate a Properties object with desired values and supply it to the getConnection() call:
String url = "jdbc:db2://host:50000/yourdb";
Properties props = new Properties();
props.setProperty("useJDBC4ColumnNameAndLabelSemantics", "2");
// set other required properties
Connection c = DriverManager.getConnection(url, props);
Alternatively, you can embed property name/value pairs in the JDBC URL itself:
String url = "jdbc:db2://host:50000/yourdb:useJDBC4ColumnNameAndLabelSemantics=2;";
// set other required properties
Connection c = DriverManager.getConnection(url);
Note that each name/value pair must be terminated by a semicolon, even the last one.
For historic reasons i have some tables with char(3) primary key in our database. For example the country table.
When i find the entity with:
String id = "D ";
Country c = em.find(Country.class, id);
i can see afterwards:
c.getId() --> "D" and not "D "
The entitiy is read again and again from the database. The caching does not work for some reason. I guess in the cache the id is existing as "D" and not as "D ".
20130423 09:15:14,495 FINEST query Execute query ReadObjectQuery(name="country" referenceClass=Country )
20130423 09:15:14,498 FINEST connection reconnecting to external connection pool
20130423 09:15:14,498 FINE sql SELECT countryID, countryNAME, countryTELCODE, countryTOPLEVELDOMAIN, countryINTTELPREFIX FROM geo.COUNTRY WHERE (countryID = ?)
bind => [D ]
20130423 09:15:14,508 FINEST query Execute query ReadObjectQuery(name="country" referenceClass=Country )
20130423 09:15:14,508 FINEST connection reconnecting to external connection pool
20130423 09:15:14,508 FINE sql SELECT countryID, countryNAME, countryTELCODE, countryTOPLEVELDOMAIN, countryINTTELPREFIX FROM geo.COUNTRY WHERE (countryID = ?)
bind => [D ]
I tried to set the #Column(length=3) but it has no effect.
Does anybody has an idea why the cache does not work like it should.
Thanks Hasan
By default EclipseLink trims trailing spaces from CHAR fields. This can be disabled.
See,
http://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_avoid_trimming_the_trailing_spaces_on_a_CHAR_field.3F