how to import huge tsv file into h2 in memory database with spring boot

how to import huge tsv file into h2 in memory database with spring boot - spring-boot

I have a huge tsv files and I need to import them into my h2 in memory database.
I can read it with Scanner and import it line by line but it takes for hours !
is there any faster way to import tsv file into h2 in memory database ?

Use insert into select convert for direct importing from file into your h2 table.
How to read CSV file into H2 database :
public static void main (String [] args) throws Exception {
Connection conn = null;
Statement stmt = null;
Class.forName("org.h2.Driver");
conn = DriverManager.getConnection("jdbc:h2:~/test", "", "");
stmt = conn.createStatement();
stmt.execute("drop table if exists csvdata");
stmt.execute("create table csvdata (id int primary key, name varchar(100), age int)");
stmt.execute("insert into csvdata ( id, name, age ) select convert( \"id\",int ), \"name\", convert( \"age\", int) from CSVREAD( 'c:\\tmp\\sample.csv', 'id,name,age', null ) ");
ResultSet rs = stmt.executeQuery("select * from csvdata");
while (rs.next()) {
System.out.println("id " + rs.getInt("id") + " name " + rs.getString("name") + " age " + rs.getInt("age") );
}
stmt.close();
}
Or
SELECT * FROM CSVREAD('test.csv');
-- Read a file containing the columns ID, NAME with
SELECT * FROM CSVREAD('test2.csv', 'ID|NAME', 'charset=UTF-8 fieldSeparator=|');
SELECT * FROM CSVREAD('data/test.csv', null, 'rowSeparator=;');
-- Read a tab-separated file
SELECT * FROM CSVREAD('data/test.tsv', null, 'rowSeparator=' || CHAR(9));
SELECT "Last Name" FROM CSVREAD('address.csv');
SELECT "Last Name" FROM CSVREAD('classpath:/org/acme/data/address.csv');
h2 csvread function
NOTE: You can specify file's field separator for these commands.

Related

JdbcPagingItemReader Spring batch skipping last element

I have a table with this structure:
CNMA_CO_PLATFORM_MESSAGE|AUDI_TI_CREATION|FIELD4|OTHER FIELDS
test-jj#2774#20210422112434957#00026129|22/04/21 11:24:34,957000000|11|..
test-jj2#2774#20210422112434957#00026129|22/04/21 11:24:34,957000000|12|..
test-jj3#2774#20210422112434957#00026129|22/04/21 11:24:34,957000000|13|..
This combination is the PRIMARY_KEY of the table:
CNMA_CO_PLATFORM_MESSAGE|AUDI_TI_CREATION
Well, I have an JdbcPagingItemReader defined like this (Pagesize is 1):
#StepScope
#Bean
public JdbcPagingItemReader<PendingNotificationDTO> pendingNotificationReader(
#Value("#{stepExecution}") StepExecution stepExecution){
final JdbcPagingItemReader<PendingNotificationDTO> reader = new JdbcPagingItemReader<>();
reader.setDataSource(daoDataSource);
reader.setName("pendingNotificationReader");
//Creamos la Query
final OraclePagingQueryProvider oraclePagingQueryProvider = new OraclePagingQueryProvider();
oraclePagingQueryProvider.setSelectClause("SELECT " +
" cegct.AUDI_TI_CREATION, "+
" CNMA_CO_PLATFORM_MESSAGE, " +
" OTHERFIELDS... ");
oraclePagingQueryProvider.setFromClause("FROM TABLE1 cegct " +
" JOIN TABLE1 notip ON cegct.field1 = notip.field1 " +
" AND notip.field2 = :frSur ");
oraclePagingQueryProvider.setWhereClause("WHERE "
+ " cegct.field3 = 0 "
+ " AND cegct.field4 in (:notifStatusList) ");
//Indicamos conjunto de campos no repetibles para poder paginar
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("CNMA_CO_PLATFORM_MESSAGE", Order.DESCENDING);
sortKeys.put("AUDI_TI_CREATION", Order.DESCENDING);
oraclePagingQueryProvider.setSortKeys(sortKeys );
reader.setQueryProvider(oraclePagingQueryProvider);
String frSur = stepExecution.getJobExecution().getExecutionContext().getString(Constants.FM_ROLE_SUR_ZK);
String notifStatus = stepExecution.getJobExecution().getExecutionContext().getString(Constants.STATUS_REPORTS);
Map<String, Object> parameters = new HashMap<>();
parameters.put("frSur", frSur);
parameters.put("notifStatusList", Arrays.asList(StringUtils.split(notifStatus, ",")));
reader.setParameterValues(parameters );
Integer initLoaded = stepExecution.getJobExecution().getExecutionContext().getInt(Constants.RECOVER_PENDING_NOT_COMMIT);
reader.setPageSize(initLoaded);
reader.setRowMapper(new BeanPropertyRowMapper<PendingNotificationDTO>(PendingNotificationDTO.class));
return reader;
}
(I hide some irrelevant fields and table names)
Well, I run a test and my 3 records are valid to the select, these are selected one to one by the page size. Anyway, the first chunk-reader generated select my "test-jj3#..." record, my second chunk-reader select "test-jj2#.." and my third chunk-reader doesn't select doesn't recover any record (It should recover last 'test-jj#...' element.
These are the generated sqls (I hide some sensible no relevant fields)
First chunk, Select 1 register
SELECT * FROM (
SELECT
cegct.AUDI_TI_CREATION
CNMA_CO_PLATFORM_MESSAGE, [otherfields]
FROM [FROM]
WHERE [where]
ORDER BY CNMA_CO_PLATFORM_MESSAGE DESC, AUDI_TI_CREATION DESC
) WHERE ROWNUM <= 1;
Second chunk, Select 1 register (Here, the rownum filter by the sortkeys)
SELECT * FROM (
SELECT
cegct.AUDI_TI_CREATION
CNMA_CO_PLATFORM_MESSAGE, [otherfields]
FROM [FROM]
WHERE [where]
ORDER BY CNMA_CO_PLATFORM_MESSAGE DESC, AUDI_TI_CREATION DESC
) WHERE
ROWNUM <= 1 AND (
(CNMA_CO_PLATFORM_MESSAGE < 'test-jj3#2774#20210422112434957#00026129')
OR
(CNMA_CO_PLATFORM_MESSAGE = 'test-jj3#2774#20210422112434957#00026129' AND AUDI_TI_CREATION < TO_DATE('2021-04-22 11:24:34', 'YYYY-MM-DD HH24:MI:SS'))
);
Third chunk, select 0 registers
SELECT * FROM (
SELECT
cegct.AUDI_TI_CREATION
CNMA_CO_PLATFORM_MESSAGE, [otherfields]
FROM [FROM]
WHERE [where]
ORDER BY CNMA_CO_PLATFORM_MESSAGE DESC, AUDI_TI_CREATION DESC
) WHERE
ROWNUM <= 1 AND (
(CNMA_CO_PLATFORM_MESSAGE < 'test-jj2#2774#20210422112434957#00026129')
OR
(CNMA_CO_PLATFORM_MESSAGE = 'test-jj2#2774#20210422112434957#00026129' AND AUDI_TI_CREATION < TO_DATE('2021-04-22 11:24:34', 'YYYY-MM-DD HH24:MI:SS'))
);
Sorry for my english, I hope you can understand my problem.
Logs for the Prepared SQL Statement
Executing prepared SQL statement [SELECT * FROM (
SELECT
cegct.AUDI_TI_CREATION,
CNMA_CO_PLATFORM_MESSAGE,
OTHERFIELDS...
FROM TABLE1 cegct
JOIN TABLE2 notip ON cegct.field1 = notip.field1
AND notip.field2 = ?
WHERE cegct.field3 = 0
AND cegct.field4 in (?, ?, ?)
ORDER BY CNMA_CO_PLATFORM_MESSAGE DESC, AUDI_TI_CREATION DESC) WHERE ROWNUM <= 1]
20221116 12:52:43.560 TRACE org.springframework.jdbc.core.StatementCreatorUtils [[ # ]] - Setting SQL statement parameter value: column index 1, parameter value [1], value class [java.lang.String], SQL type unknown
20221116 12:52:43.560 TRACE org.springframework.jdbc.core.StatementCreatorUtils [[ # ]] - Setting SQL statement parameter value: column index 2, parameter value [11], value class [java.lang.String], SQL type unknown
20221116 12:52:43.560 TRACE org.springframework.jdbc.core.StatementCreatorUtils [[ # ]] - Setting SQL statement parameter value: column index 3, parameter value [12], value class [java.lang.String], SQL type unknown
20221116 12:52:43.560 TRACE org.springframework.jdbc.core.StatementCreatorUtils [[ # ]] - Setting SQL statement parameter value: column index 4, parameter value [13], value class [java.lang.String], SQL type unknown

A bind variable is a single value; therefore when you use:
AND cegct.field4 in (:notifStatusList)
Then :notifStatusList is a single string and is NOT a list of values and you effectively doing the same as:
AND cegct.field4 = :notifStatusList
If the bind variable :notifStatusList is a single value then it will work; however, when you try to pass in multiple values then it will not match those multiple values but will try to match field4 to the entire delimited list (which fails and will filter out all the rows).
If you want to pass a delimited string then use:
AND ',' || :notifStatusList || ',' LIKE '%,' || cegct.field4 || ',%'
Alternatively, pass the values as an array (rather than a delimited string) into an Oracle collection and then test to see if it is in that collection.

JDBC Batch Insert with Returning Clause

Is there any way to get the values of affected rows using returning clause in JAVA while using JDBC Batch Insert statement? I am able to get the required values of a single row affected.But not for all Batch Inserts?
Code :
try {
String query = "INSERT INTO temp ( "
+ "org_node_id, org_node_category_id, org_node_name, "
+ "customer_id, created_by, created_date_time, "
+ "updated_date_time, activation_Status )"
+ " VALUES (seq_org_node_id.nextval, 11527, 'Abcd', 9756, 1, sysdate, sysdate, 'AC')"
+" returning org_node_id, org_node_name INTO ?, ?";
con = DBUtils.getOASConnection();
OraclePreparedStatement ps = (OraclePreparedStatement) con.prepareStatement(query);
ps.registerReturnParameter(1, Types.INTEGER);
ps.registerReturnParameter(2, Types.VARCHAR);
ps.execute();
ResultSet rs = ps.getReturnResultSet();
rs.next();
System.out.println("Org ID : "+ rs.getInt(1));
System.out.println("Org Name : "+ rs.getString(2));
} catch (SQLException e) {
e.printStackTrace();
}

Batching INSERT .. RETURNING statements isn't supported by ojdbc, but bulk insertion can work using PL/SQL's FORALL command.
Given a table...
CREATE TABLE x (
i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
j VARCHAR2(50),
k DATE DEFAULT SYSDATE
);
...and types...
CREATE TYPE t_i AS TABLE OF NUMBER(38);
/
CREATE TYPE t_j AS TABLE OF VARCHAR2(50);
/
CREATE TYPE t_k AS TABLE OF DATE;
/
...you can work around this limitation by running a bulk insert, and bulk collecting the results (as I've shown also in this blog post) like this:
try (Connection con = DriverManager.getConnection(url, props);
CallableStatement c = con.prepareCall(
"DECLARE "
+ " v_j t_j := ?; "
+ "BEGIN "
+ " FORALL j IN 1 .. v_j.COUNT "
+ " INSERT INTO x (j) VALUES (v_j(j)) "
+ " RETURNING i, j, k "
+ " BULK COLLECT INTO ?, ?, ?; "
+ "END;")) {
// Bind input and output arrays
c.setArray(1, ((OracleConnection) con).createARRAY(
"T_J", new String[] { "a", "b", "c" })
);
c.registerOutParameter(2, Types.ARRAY, "T_I");
c.registerOutParameter(3, Types.ARRAY, "T_J");
c.registerOutParameter(4, Types.ARRAY, "T_K");
// Execute, fetch, and display output arrays
c.execute();
Object[] i = (Object[]) c.getArray(2).getArray();
Object[] j = (Object[]) c.getArray(3).getArray();
Object[] k = (Object[]) c.getArray(4).getArray();
System.out.println(Arrays.asList(i));
System.out.println(Arrays.asList(j));
System.out.println(Arrays.asList(k));
}
The results are:
[1, 2, 3]
[a, b, c]
[2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0]

How to use ampersand in JDBC?

In oracle we using select * from table_name where column_name=&value in similar way how to use ampersand in JDBC?
stmt = conn.createStatement();
String sql;
sql="select emp_name from employees"+" where emp_no=?";
ResultSet rs=stmt.executeQuery(sql);
while(rs.next()){
String emp_name=rs.getString("emp_name");
System.out.println(emp_name);
}
i wrote the above code but it is not working(showing error)

Did you read the article I provided the link to?
You use the question mark ? to point out places in your query where you want to specify a parameter, and you have to use PreparedStatement. I can't test it, but it should be something like this:
// some code to obtain the Connection object
PreparedStatement stmt = null;
String yourQuery = " SELECT emp_name FROM employees WHERE emp_no = ? ";
try {
stmt = conn.prepareStatement(yourQuery);
stmt.setLong(1, 252);
ResultSet rs = stmt.executeQuery();
while(rs.next()) {
String emp_name = rs.getString("emp_name");
System.out.println(emp_name);
}
} finally {
// close the stmt etc.
}

I'd suggest using a PreparedStatement - from memory it's something like
Connection conn = getConnection();
PreparedStatement pstmnt = conn.prepareStatement("Select * from employees where emp_no =?");
pstmnt.setLong(1,emp_no);
ResultSet rs = pstmnt.executeQuery();
but the link that #Przemyslaw Kruglej high light above will almost certainly have a good example ( I haven;t read it though ... )

How can I get table creation scripts on teradata with jdbc?

I want to get table creation script on teradata with jdbc.
I used this code which I found it on stackoverflow :
StringBuilder sb = new StringBuilder( 1024 );
if ( columnCount > 0 ) {
sb.append( "Create table ").append( rsmd.getTableName( 1 ) ).append( " ( " );
}
for ( int i = 1; i <= columnCount; i ++ ) {
if ( i > 1 ) sb.append( ", " );
String columnName = rsmd.getColumnLabel( i );
String columnType = rsmd.getColumnTypeName( i );
sb.append( columnName ).append( " " ).append( columnType );
int precision = rsmd_ddl.getPrecision( i );
if ( precision != 0 ) {
sb.append( "( " ).append( precision ).append( " )" );
}
} // for columns
sb.append( " ) " );
But the problem is : when the type is VARCHAR the precision is 0 but in teradata the column is VARCHAR(100) but how can I find 100 ?
Thanks.

getPrecision is for Decimals, you should use getColumnDisplaySize for chars.
There are lots of samples in the Teradata JDBC reference:
http://developer.teradata.com/doc/connectivity/jdbc/reference/current/frameset.html
Sample T20100JD shows how to ectract metadata.

getPrecision is for Decimals, you should use getColumnDisplaySize for chars
Teradata has a flaw/bug in their JDBC driver implementation in that they are not properly implementing the contract of the interface.
The Java API documentation for interface java.sql.ResultSetMetaData explicitly defines the expected behavior for the getPrecision() method with different datatypes:
int getPrecision(int column)
throws SQLException
Get the designated column's specified column size.
For numeric data, this is the maximum precision.
For character data, this is the length in characters.
...
The Teradata JDBC driver incorrectly returns 0 when getPrecision() is called for a VARCHAR column. Therefore, when working with Teradata JDBC one must use getColumnDisplaySize().

Comparisons of Oracle DATE column with java.sql.timestamp via JOOQ

I am using jooq to build queries for Oracle. Everything works fine except for dates:
public static void main(String[] args) throws SQLException {
java.sql.Timestamp now = new java.sql.Timestamp(new Date().getTime());
Connection con = DriverManager.getConnection(... , ... , ...);
final Factory create = new OracleFactory(con);
Statement s = con.createStatement();
s.execute("create table test_table ( test_column DATE )");
s.execute("insert into test_table values (to_date('20111111', 'yyyymmdd'))");
// -- using to_date
ResultSet rs = s.executeQuery("select count(1) from test_table where test_column<to_date('20121212', 'yyyymmdd')");
rs.next();
System.out.println(""+rs.getInt(1));
rs.close();
// -- using a preparedstatement with java.sql.timestamp
PreparedStatement ps = con.prepareStatement("select count(1) from test_table where test_column<?");
ps.setTimestamp(1,now);
rs = ps.executeQuery();
rs.next();
System.out.println(""+rs.getInt(1));
rs.close();
// -- using jooq with java.sql.timestamp
final org.jooq.Table<org.jooq.Record> table = create.tableByName("TEST_TABLE");
final org.jooq.SelectSelectStep sss = create.select(create.count());
final org.jooq.SelectJoinStep sjs = sss.from(table);
final org.jooq.SelectConditionStep scs = sjs.where(create.fieldByName("TEST_COLUMN").lessThan(now));
System.out.println(scs.toString());
rs = s.executeQuery(scs.toString());
rs.next();
System.out.println(""+rs.getInt(1));
rs.close();
s.close();
}
Gives the following output:
1
1
select count(*) from "TEST_TABLE" where "TEST_COLUMN" < '2012-12-12 19:42:34.957'
Exception in thread "main" java.sql.SQLDataException: ORA-01861: literal does not match format string
I would have thought that JOOQ would check the type of Object in lessThan(Object)
to determine whether it can come up with a reasonable conversion, but apparently it
just does an Object.toString() in this case. I also remember that I never had issues with date queries via JOOQ in MySQL (although this is a while back). What am I doing wrong?

I suspect that this issue is due to the fact that create.fieldByName() doesn't know the type of the column (hence, Object), and coerces that unknown type on the right hand side of the comparison predicate. That should be fixed in jOOQ. I have registered #2007 for this:
https://github.com/jOOQ/jOOQ/issues/2007
In the mean time, try explicitly setting the type on your field:
create.fieldByName(Timestamp.class, "TEST_COLUMN").lessThan(now)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to import huge tsv file into h2 in memory database with spring boot - spring-boot

I have a huge tsv files and I need to import them into my h2 in memory database. I can read it with Scanner and import it line by line but it takes for hours ! is there any faster way to import tsv file into h2 in memory database ?

Related

JdbcPagingItemReader Spring batch skipping last element

JDBC Batch Insert with Returning Clause

How to use ampersand in JDBC?

How can I get table creation scripts on teradata with jdbc?

Comparisons of Oracle DATE column with java.sql.timestamp via JOOQ

Categories

Resources