How to read a COMPRESS()-ed H2 blob column via JDBC? - jdbc

I have a file-based H2 database (engine version 1.4.196) with a mediumblob column containing data returned by the COMPRESS() function:
create table foo (compressed_data mediumblob);
...
insert into foo (compressed_data) values (COMPRESS(STRINGTOUTF8('Test'), 'DEFLATE'));
(The table is created and filled by flyway.)
I'd like to read this data in a JDBC client without calling DECOMPRESS() first. (I want to do the decompression client-side for compatibility with another system). I've tried to read the data via an InflaterInputStream, which can uncompress DEFLATE data:
try (InputStream dbStream = rs.getBinaryStream("compressed_data");
InflaterInputStream inflaterStream = new InflaterInputStream(dbStream);
) {
inflaterStream.read();
...
But this causes an error:
java.util.zip.ZipException: incorrect header check
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
...
Is there any way I can get InflaterInputStream-compatible compressed data from a column in H2?

Since you are already using H2 JDBC to access the database you can simply retrieve the compressed data with getBytes and use the expand method of org.h2.tools.CompressTool to uncompress it:
// .java source file is Cp1252 encoded
String sql = "SELECT COMPRESS(STRINGTOUTF8('fermé'), 'DEFLATE') AS foo";
ResultSet rs = st.executeQuery(sql);
rs.next();
byte[] bytesOut = rs.getBytes(1);
byte[] expanded = org.h2.tools.CompressTool.getInstance().expand(bytesOut);
String strOut = new String(expanded, "UTF-8");

Related

How to add a CSV in the ClickHouse JDBC library

This is in a Java project with the ClickHouse JDBC library:
<dependency>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.2-patch11</version>
</dependency>
I want to try to insert a CSV through a:
String url = "jdbc:ch://localhost";
ClickHouseDataSource dataSource = new ClickHouseDataSource(url, new Properties());
ClickHouseConnection conn = dataSource.getConnection("default", "password");
ClickHousePreparedStatement clickHousePreparedStatement = (ClickHousePreparedStatement) conn.prepareStatement("INSERT INTO table FORMAT CSV < ?");
I have tried to simulate something like this with:
PreparedStatement ps = conn.prepareStatement("INSERT INTO table FORMAT CSV < ?");
ps.setString(1, "simulated csv rows separated with commas");
It works, but it is not what I'm looking for. I want to know if the library has something specific, the documentation I have not found anything.
How can I attach a CSV in the ClickHouse JDBC library?

Netezza Batch Insert is very slow even in Batch execute mode

I am referring to this documentation. http://www-01.ibm.com/support/docview.wss?uid=swg21981328. As per the article if we use executeBatch method then inserts will be faster (The Netezza JDBC driver may detect a batch insert, and under the covers convert this to an external table load and external table load will be faster). I had to execute millions of insert statements and i am getting only a speed of 500 records per minute per connection max. Is there any better way to load data faster to netezza via jdbc connection? I am using spark and jdbc connection to insert the records.Why external table via loading is not happening even when i am executing in batches. Given below is the spark code i am using,
Dataset<String> insertQueryDataSet.foreachPartition( partition -> {
Connection conn = NetezzaConnector.getSingletonConnection(url, userName, pwd);
conn.setAutoCommit(false);
int commitBatchCount = 0;
int insertBatchCount = 0;
Statement statement = conn.createStatement();
//PreparedStatement preparedStmt = null;
while(partition.hasNext()){
insertBatchCount++;
//preparedStmt = conn.prepareStatement(partition.next());
statement.addBatch(partition.next());
//statement.addBatch(partition.next());
commitBatchCount++;
if(insertBatchCount % 10000 == 0){
LOGGER.info("Before executeBatch.");
int[] execCount = statement.executeBatch();
LOGGER.info("After execCount." + execCount.length);
LOGGER.info("Before commit.");
conn.commit();
LOGGER.info("After commit.");
}
}
//execute remaining statements
statement.executeBatch();
int[] execCount = statement.executeBatch();
LOGGER.info("After execCount." + execCount.length);
conn.commit();
conn.close();
});
I tried this approach(batch insert) but found very slow,
So I put all data in CSV & do external table load for each csv.
InsertReq="Insert into "+ tablename + " select * from external '"+ filepath + "' using (maxerrors 0, delimiter ',' unase 2000 encoding 'internal' remotesource 'jdbc' escapechar '\' )";
Jdbctemplate.execute(InsertReq);
Since I was using java so JDBC as source & note that csv file path is in single quotes .
Hope this helps.
If you find better than this approach, don't forget to post. :)

ParquetWriter outputs empty parquet file in a java stand alone program

I tried to convert existing avro file to parquet. But the output parquet file is empty. I am not sure what I did wrong...
My code snippet:
FileReader<GenericRecord> fileReader = DataFileReader.openReader(
new File("output/users.avro"), new GenericDatumReader<GenericRecord>());
Schema avroSchema = fileReader.getSchema();
// generate the corresponding Parquet schema
MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);
// choose compression scheme
CompressionCodecName compressionCodecName = CompressionCodecName.UNCOMPRESSED;
// set Parquet file block size and page size values
int pageSize = 64 * 1024;
Path outputPath = new Path("output/users.parquet");
// create a parquet writer using builder
ParquetWriter parquetWriter = (ParquetWriter) AvroParquetWriter.builder(outputPath)
.withSchema(avroSchema)
.withCompressionCodec(compressionCodecName)
.withPageSize(pageSize)
.build();
// read avro, write parquet
while (fileReader.hasNext()) {
GenericRecord record = fileReader.next();
System.out.println(record);
parquetWriter.write(record);
}
I had the same problem and found that I needed to close the parquetWriter before the data was committed to the file. It just needs you to add
parquetWriter.close();
after the while loop.

ODBC-JDBC Bridge Windows-1251 (Cyrillic) character encoding

I've got a dbase file with window 1251 encoding. I've made an ODBC datasource for this file and I would like to access it with JDBC-ODBC bridge from java with the following code:
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
System.out.println("TEST: Русский язык");
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
Properties properties = new Properties();
properties.put("charSet", "windows-1251");
Connection conn = DriverManager.getConnection( "jdbc:odbc:test_ds",properties);
Statement statement = conn.createStatement();
String sql = "SELECT * FROM table";
ResultSet resultSet = statement.executeQuery(sql);
while (resultSet.next()) {
System.out.println(resultSet.getString("DESC"));
}
Character encoding is OK for test println (2nd line), but encoding for data from dbase is incorrect.
I'm able to open dbf file with LibreOffice with the same encoding where everything is correct.
Is there any chance to set encoding correctly?
Is it possible to set connection encoding in the connection string?
EDIT: After further investigation, I think the problem is ODBC related. I tried to open the defined datasource with Excel and the encoding was wrong. (I had no chance to set encoding either for odbc connection, neither for Excel connection)
This works for me on cyrillic characters:
properties.put("charSet", "Cp1251");

db2 Invalid parameter: Unknown column name SERVER_POOL_NAME . ERRORCODE=-4460, SQLSTATE=null

I am using SQL 'select' to access a db2 table with schemaname.tablename as follows:
select 'colname' from schemaname.tablename
The tablename has 'colname' = SERVER_POOL_NAME for sure . yet I get the following error :
"Invalid parameter: Unknown column name SERVER_POOL_NAME . ERRORCODE=-4460, SQLSTATE=null"
I am using db2 v10.1 FP0 jdbc driver version 3.63.123. JDBC 3.0 spec
The application is run as db2 administrator and also Windows 2008 admin
I saw a discussion about this issue at : db2jcc4.jar Invalid parameter: Unknown column name
But i do not know where the connection parameter 'useJDBC4ColumnNameAndLabelSemantics should be set ( to value =2)
I saw the parameter should appear in com.ibm.db2.jcc.DB2BaseDataSource ( see: http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=%2Fcom.ibm.db2.luw.apdv.java.doc%2Fsrc%2Ftpc%2Fimjcc_r0052607.html)
But i do not find this file on my DB2 installation . maybe it is packed in a .jar file
Any advice ?
There is a link on the page you're referring to, showing you the ways to set properties. Specifically, you can populate a Properties object with desired values and supply it to the getConnection() call:
String url = "jdbc:db2://host:50000/yourdb";
Properties props = new Properties();
props.setProperty("useJDBC4ColumnNameAndLabelSemantics", "2");
// set other required properties
Connection c = DriverManager.getConnection(url, props);
Alternatively, you can embed property name/value pairs in the JDBC URL itself:
String url = "jdbc:db2://host:50000/yourdb:useJDBC4ColumnNameAndLabelSemantics=2;";
// set other required properties
Connection c = DriverManager.getConnection(url);
Note that each name/value pair must be terminated by a semicolon, even the last one.

Resources