How to add a CSV in the ClickHouse JDBC library - jdbc

This is in a Java project with the ClickHouse JDBC library:
<dependency>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.2-patch11</version>
</dependency>
I want to try to insert a CSV through a:
String url = "jdbc:ch://localhost";
ClickHouseDataSource dataSource = new ClickHouseDataSource(url, new Properties());
ClickHouseConnection conn = dataSource.getConnection("default", "password");
ClickHousePreparedStatement clickHousePreparedStatement = (ClickHousePreparedStatement) conn.prepareStatement("INSERT INTO table FORMAT CSV < ?");
I have tried to simulate something like this with:
PreparedStatement ps = conn.prepareStatement("INSERT INTO table FORMAT CSV < ?");
ps.setString(1, "simulated csv rows separated with commas");
It works, but it is not what I'm looking for. I want to know if the library has something specific, the documentation I have not found anything.
How can I attach a CSV in the ClickHouse JDBC library?

Related

H2 show value of DB_CLOSE DELAY (set by SET DB_CLOSE_DELAY)

The H2 Database has a list of commands starting with SET, in particular SET DB_CLOSE_DELAY. I would like to find out what the value of DB_CLOSE_DELAY is. I am using JDBC. Setting is easy
cx.createStatement.execute("SET DB_CLOSE_DELAY 0")
but none of the following returns the actual value of DB_CLOSE_DELAY:
cx.createStatement.executeQuery("DB_CLOSE_DELAY")
cx.createStatement.executeQuery("VALUES(#DB_CLOSE_DELAY)")
cx.createStatement.executeQuery("GET DB_CLOSE_DELAY")
cx.createStatement.executeQuery("SHOW DB_CLOSE_DELAY")
Help would be greatly appreciated.
You can access this and other settings in the INFORMATION_SCHEMA.SETTINGS table - for example:
String url = "jdbc:h2:mem:;DB_CLOSE_DELAY=3";
Connection conn = DriverManager.getConnection(url, "sa", "the password goes here");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM INFORMATION_SCHEMA.SETTINGS where name = 'DB_CLOSE_DELAY'");
while (rs.next()) {
System.out.println(rs.getString("name"));
System.out.println(rs.getString("value"));
}
In this test, I use an unnamed in-memory database, and I explicitly set the delay to 3 seconds when I create the DB.
The output from the print statements is:
DB_CLOSE_DELAY
3

How to read a COMPRESS()-ed H2 blob column via JDBC?

I have a file-based H2 database (engine version 1.4.196) with a mediumblob column containing data returned by the COMPRESS() function:
create table foo (compressed_data mediumblob);
...
insert into foo (compressed_data) values (COMPRESS(STRINGTOUTF8('Test'), 'DEFLATE'));
(The table is created and filled by flyway.)
I'd like to read this data in a JDBC client without calling DECOMPRESS() first. (I want to do the decompression client-side for compatibility with another system). I've tried to read the data via an InflaterInputStream, which can uncompress DEFLATE data:
try (InputStream dbStream = rs.getBinaryStream("compressed_data");
InflaterInputStream inflaterStream = new InflaterInputStream(dbStream);
) {
inflaterStream.read();
...
But this causes an error:
java.util.zip.ZipException: incorrect header check
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
...
Is there any way I can get InflaterInputStream-compatible compressed data from a column in H2?
Since you are already using H2 JDBC to access the database you can simply retrieve the compressed data with getBytes and use the expand method of org.h2.tools.CompressTool to uncompress it:
// .java source file is Cp1252 encoded
String sql = "SELECT COMPRESS(STRINGTOUTF8('fermé'), 'DEFLATE') AS foo";
ResultSet rs = st.executeQuery(sql);
rs.next();
byte[] bytesOut = rs.getBytes(1);
byte[] expanded = org.h2.tools.CompressTool.getInstance().expand(bytesOut);
String strOut = new String(expanded, "UTF-8");

Netezza Batch Insert is very slow even in Batch execute mode

I am referring to this documentation. http://www-01.ibm.com/support/docview.wss?uid=swg21981328. As per the article if we use executeBatch method then inserts will be faster (The Netezza JDBC driver may detect a batch insert, and under the covers convert this to an external table load and external table load will be faster). I had to execute millions of insert statements and i am getting only a speed of 500 records per minute per connection max. Is there any better way to load data faster to netezza via jdbc connection? I am using spark and jdbc connection to insert the records.Why external table via loading is not happening even when i am executing in batches. Given below is the spark code i am using,
Dataset<String> insertQueryDataSet.foreachPartition( partition -> {
Connection conn = NetezzaConnector.getSingletonConnection(url, userName, pwd);
conn.setAutoCommit(false);
int commitBatchCount = 0;
int insertBatchCount = 0;
Statement statement = conn.createStatement();
//PreparedStatement preparedStmt = null;
while(partition.hasNext()){
insertBatchCount++;
//preparedStmt = conn.prepareStatement(partition.next());
statement.addBatch(partition.next());
//statement.addBatch(partition.next());
commitBatchCount++;
if(insertBatchCount % 10000 == 0){
LOGGER.info("Before executeBatch.");
int[] execCount = statement.executeBatch();
LOGGER.info("After execCount." + execCount.length);
LOGGER.info("Before commit.");
conn.commit();
LOGGER.info("After commit.");
}
}
//execute remaining statements
statement.executeBatch();
int[] execCount = statement.executeBatch();
LOGGER.info("After execCount." + execCount.length);
conn.commit();
conn.close();
});
I tried this approach(batch insert) but found very slow,
So I put all data in CSV & do external table load for each csv.
InsertReq="Insert into "+ tablename + " select * from external '"+ filepath + "' using (maxerrors 0, delimiter ',' unase 2000 encoding 'internal' remotesource 'jdbc' escapechar '\' )";
Jdbctemplate.execute(InsertReq);
Since I was using java so JDBC as source & note that csv file path is in single quotes .
Hope this helps.
If you find better than this approach, don't forget to post. :)

How to save BigQuery query results to another table?

I want save query results into new table.
With BigQuery online editor like bigquery.cloud.google i easily do it with micro-solution from Felipe Hoffa.
Results with ~150.000.000 rows inserted with several seconds.
But how do i run query with "Destination Table" parameters via BigQuery API?
By using the Jobs.insert API call.
For example, in Java:
[...]
TableReference tableRef = new TableReference();
tableRef.setProjectId("<project>");
tableRef.setDatasetId("<dataset>");
tableRef.setTableId("<name>");
JobConfigurationQuery queryConfig = new JobConfigurationQuery();
queryConfig.setDestinationTable(tableRef);
queryConfig.setAllowLargeResults(true);
queryConfig.setQuery("some sql");
queryConfig.setCreateDisposition(CREATE_IF_NEEDED);
queryConfig.setWriteDisposition(WRITE_APPEND);
JobConfiguration config = new JobConfiguration().setQuery(queryConfig);
Job job = new Job();
job.setConfiguration(config);
Bigquery.Jobs.Insert insert = bigquery.jobs().insert("<projectid>", job);
JobReference jobId = insert.execute().getJobReference();
[...]

Reading hive data from soapUI framework

I am new to SoapUI framework. I am trying to use soapUI framework for testing REST API. While testing for REST API, I need to verify data from backend database as well like Hive and Cassandra.
I could do the setup for SoapUI and could test a query on Cassandra using groovy script that is provided SoapUI framework. But when I searched for connecting to hive using SoapUI, I couldn't find any reference for that. Also on there sites, JDBC drivers are not provided but hive is not mentioned there.
So is there any option to connect to hive from SoapUI framework?
Should I think about using Hive JDBC driver from SoapUI?
Thanks for your help!
I believe you should be able to use it for different data bases using following ways:
JDBC test step
Groovy Script (you should be able to use almost java code)
Either of the ways, copy the drivers/libraries into SOAPUI_HOME/bin/ext directory and restart SoapUI
Here is the link for client code (in Java) to connect to Hive.
Sample connection code from above link(so should be able to use in groovy) :
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.executeQuery("drop table " + tableName);
ResultSet res = stmt.executeQuery("create table " + tableName + " (key int, value string)");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);

Resources