How to write the data from Mysql into a file using jdbc code and file writer? - jdbc

String selectTableSQL = "select JobID, MetadataJson from raasjobs join metadata using (JobID) where JobCreatedDate > '2014-07-01';";
File file = new File("/users/t_shetd/file.txt");
try {
dbConnection = getDBConnection();
statement = dbConnection.createStatement();
System.out.println(selectTableSQL);
// execute select SQL stetement
ResultSet rs = statement.executeQuery(selectTableSQL);
if (!file.exists()) {
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
while (rs.next()) {
String JobID = rs.getString("JobID");
String Metadata = rs.getString("MetadataJson");
bw.write(selectTableSQL);
bw.close();
System.out.println("Done");
// Now i am only getting the output done

If I understand your question, then this
while (rs.next()) {
String JobID = rs.getString("JobID");
String Metadata = rs.getString("MetadataJson");
bw.write(selectTableSQL);
bw.close();
System.out.println("Done");
}
Should be something like (following Java capitalization conventions),
while (rs.next()) {
String jobId = rs.getString("JobID");
String metaData = rs.getString("MetadataJson");
bw.write(String.format("Job ID: %s, MetaData: %s", jobId, metaData));
}
bw.close(); // <-- finish writing first!
System.out.println("Done");
In your version, you close the output after printing the first line from the ResultSet. After that, nothing else will write (because the File is closed).

Related

Read packed decimal and convert to numeric in spring boot

All,
I am using a Spring boot application to store data in DB. I am getting this data from IBM MQ through Kafka topic.
I am getting messages in EBCDIC format, so used cobol copybook, JRecord, cb2xml jars to convert to readable format and store in DB.
Now i am getting another file also in the same manner, but after conversion the data looks like this:
10020REFUNDONE
10021REFUNDTWO ·" ÷/
10022REFUNDTHREE oú^ "
10023REFUNDFOUR ¨jÄ ò≈
Here is how i am converting to readable format from ebcdic:
AbstractLineReader reader = null;
StringBuffer finalBuffer = new StringBuffer();
try {
String copybook = "/ds_header.cbl";
reader = CustomCobolProvider.getInstance().getLineReader(copybook, Convert.FMT_MAINFRAME, new BufferedInputStream(new ByteArrayInputStream(salesData)));
AbstractLine line;
while ((line = reader.read()) != null) {
if (null != line.getFieldValue(REC_TYPE)){
finalBuffer.append(line.getFullLine());
}
}
}
and this is my getLineReader method:
public AbstractLineReader getLineReader(String copybook, int numericType, InputStream fileStream) throws Exception {
String font = "";
if (numericType == 1) {
font = "cp037";
}
InputStream stream = CustomCobolProvider.class.getResourceAsStream(copybook);
if(stream == null ) throw new RuntimeException("Can't Load the Copybook Metadata file from Resource....");
LayoutDetail copyBook = ((ExternalRecord)this.copybookInt.loadCopyBook(stream, copybook, CopybookLoader.SPLIT_REDEFINE, 0, font, CommonBits.getDefaultCobolTextFormat(), Convert.FMT_MAINFRAME, 0, (AbsSSLogger)null).setFileStructure(Constants.IO_FIXED_LENGTH)).asLayoutDetail();
AbstractLineReader ret = LineIOProvider.getInstance().getLineReader(copyBook, (LineProvider)null);
ret.open(fileStream, copyBook);
return ret;
}
I am stuck here with the numeric conversion, i got to know it is coming in packed decimal.
I have nil knowledge on cobol and mainframe, referred few sites and got to know how to convert from ebcdic to readable format. Please help!
The problem is getFullLine() method does not do any field translation; you need to access individual fields. You can use the line.getFieldIterator(0) to get a field iterator for the line.
Also unless you are using an ancient version of JRecord, you are better off using the JRecordInterface1 class.
Some thing like the following should work:
StringBuffer finalBuffer = new StringBuffer();
try {
ICobolIOBuilder iob = JRecordInterface1.COBOL .newIOBuilder(copybookName)
.setFont("cp037")
.setFileOrganization(Constants.IO_FIXED_LENGTH)
;
AbstractLineReader reader = iob.newReader(dataFile);
while ((line = reader.read()) != null) {
String sep = "";
for (AbstractFieldValue fv : line.getFieldIterator(0)) {
finalBuffer.append(sep).append(fv);
sep = "\t";
}
finalBuffer.append("\n");
}
reader.close();
} catch (Exception e) {
// what ever ....
}
Other points
With MQ data source you do not need to create line-readers. You can create lines directly from a byte array:
ICobolIOBuilder iob = JRecordInterface1.COBOL .newIOBuilder(copybookName)
.setFont("cp037")
.setFileOrganization(Constants.IO_FIXED_LENGTH)
;
AbstractLine line = iob.newLine(byteArrayFromMq);
for (AbstractFieldValue fv : line.getFieldIterator(0)) {
// what ever
}

Memory issues when running Spark job on relatively large input

I am running a spark cluster with 50 machines. Each machine is a VM with 8-core, and 50GB memory (41 seems to be available to Spark).
I am running on several input folders, I estimate the size of input to be ~250GB gz compressed.
Although it seems to me that the amount and configuration of machines I am using seems to be sufficient, after about 40 minutes of run the job fail, I can see following errors in the logs:
2558733 [Result resolver thread-2] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 345.0 in stage 1.0 (TID 345, hadoop-w-3.c.taboola-qa-01.internal): java.lang.OutOfMemoryError: Java heap space
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
java.lang.StringCoding.decode(StringCoding.java:193)
java.lang.String.<init>(String.java:416)
java.lang.String.<init>(String.java:481)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:699)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:660)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
and also:
2653545 [Result resolver thread-2] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 122.1 in stage 1.0 (TID 392, hadoop-w-22.c.taboola-qa-01.internal): java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
java.lang.StringCoding.decode(StringCoding.java:193)
java.lang.String.<init>(String.java:416)
java.lang.String.<init>(String.java:481)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:699)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:660)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
How do I go about debugging such an issue?
EDIT: I Found the root cause of the problem. It is this piece of code:
private static final int MAX_FILE_SIZE = 40194304;
....
....
JavaPairRDD<String, List<String>> typedData = filePaths.mapPartitionsToPair(new PairFlatMapFunction<Iterator<String>, String, List<String>>() {
#Override
public Iterable<Tuple2<String, List<String>>> call(Iterator<String> filesIterator) throws Exception {
List<Tuple2<String, List<String>>> res = new ArrayList<>();
String fileType = null;
List<String> linesList = null;
if (filesIterator != null) {
while (filesIterator.hasNext()) {
try {
Path file = new Path(filesIterator.next());
// filter non-trc files
if (!file.getName().startsWith("1")) {
continue;
}
fileType = getType(file.getName());
Configuration conf = new Configuration();
CompressionCodecFactory compressionCodecs = new CompressionCodecFactory(conf);
CompressionCodec codec = compressionCodecs.getCodec(file);
FileSystem fs = file.getFileSystem(conf);
ContentSummary contentSummary = fs.getContentSummary(file);
long fileSize = contentSummary.getLength();
InputStream in = fs.open(file);
if (codec != null) {
in = codec.createInputStream(in);
} else {
throw new IOException();
}
byte[] buffer = new byte[MAX_FILE_SIZE];
BufferedInputStream bis = new BufferedInputStream(in, BUFFER_SIZE);
int count = 0;
int bytesRead = 0;
try {
while ((bytesRead = bis.read(buffer, count, BUFFER_SIZE)) != -1) {
count += bytesRead;
}
} catch (Exception e) {
log.error("Error reading file: " + file.getName() + ", trying to read " + BUFFER_SIZE + " bytes at offset: " + count);
throw e;
}
Iterable<String> lines = Splitter.on("\n").split(new String(buffer, "UTF-8").trim());
linesList = Lists.newArrayList(lines);
// get rid of first line in file
Iterator<String> it = linesList.iterator();
if (it.hasNext()) {
it.next();
it.remove();
}
//res.add(new Tuple2<>(fileType,linesList));
} finally {
res.add(new Tuple2<>(fileType, linesList));
}
}
}
return res;
}
Particularly allocating a buffer of size 40M for each file in order to read the content of the file using BufferedInputStream. This causes the stack memory to end at some point.
The thing is:
If I read line by line (which does not require a buffer), it will be
very non-efficient read
If I allocate one buffer and reuse it for
each file read - is it possible in parallelism sense? Or will it get
overwritten by several threads?
Any suggestions are welcome...
EDIT 2: Fixed first memory issue by moving the byte array allocation outside the iterator, so it gets reused by all partition elements. But there is still the new String(buffer, "UTF-8").trim()) which gets created for the split purpose - that's an object that gets also created every time. I could use a stringbuffer/builder but then how would I set the charset encoding without a String object ?
Eventually I changed the code as follows:
// Transform list of files to list of all files' content in lines grouped by type
JavaPairRDD<String,List<String>> typedData = filePaths.mapToPair(new PairFunction<String, String, List<String>>() {
#Override
public Tuple2<String, List<String>> call(String filePath) throws Exception {
Tuple2<String, List<String>> tuple = null;
try {
String fileType = null;
List<String> linesList = new ArrayList<String>();
Configuration conf = new Configuration();
CompressionCodecFactory compressionCodecs = new CompressionCodecFactory(conf);
Path path = new Path(filePath);
fileType = getType(path.getName());
tuple = new Tuple2<String, List<String>>(fileType, linesList);
// filter non-trc files
if (!path.getName().startsWith("1")) {
return tuple;
}
CompressionCodec codec = compressionCodecs.getCodec(path);
FileSystem fs = path.getFileSystem(conf);
InputStream in = fs.open(path);
if (codec != null) {
in = codec.createInputStream(in);
} else {
throw new IOException();
}
BufferedReader r = new BufferedReader(new InputStreamReader(in, "UTF-8"), BUFFER_SIZE);
// Get rid of the first line in the file
r.readLine();
// Read all lines
String line;
while ((line = r.readLine()) != null) {
linesList.add(line);
}
} catch (IOException e) { // Filtering of files whose reading went wrong
log.error("Reading of the file " + filePath + " went wrong: " + e.getMessage());
} finally {
return tuple;
}
}
});
So now I do not use a buffer in size of 40M but rather build the lines list dynamically using an array list. This solved my current memory issue, but now I got other strange errors failing the job. Will report those in a different question...

Unable to solve Java form of jdbc query with variables

sql3 = "select avg(eff),min(eff),max(eff) from(select ("+baseParam+"/"+denom+"))as eff from ras)as s"; This is query whose output i want.
When i execute the code i get the error stating check your mysql version for syntax. I am using string to store the name of columns. I want to find the efficienccy with respect to 2000 Job_Render i.e. efficiency for each job_render. But what i get is total efficiency of all job_render. when i use the sql syntax with their direct column names. I have commented that query too for the reference. I want to find efficiency of each job render with respect to their 2000 JOBID. Bottom line is i want to find efficiency of 2000 JOBID each whose formula is Job_Render/LC_Final+LC_Preview. I have stored Job_Render in String baseParam and sum of both LC in String Denom. Please help me out.
public class Efficiency {
static final String DB_DRIVER = "com.mysql.jdbc.Driver";
static final String DB_CONNECTION = "jdbc:mysql://localhost:3306/";
static final String DB_USER = "root";
static final String DB_PASSWORD = "root";
static final String dbName = "raas";
public static void main(String[] args) {
try{
effFunc();
}
catch (Exception q){
q.getMessage();
}
}
static void effFunc() throws ClassNotFoundException,SQLException{
Connection conn = null;
// STEP 2: Register JDBC driver
Class.forName(DB_DRIVER);
// STEP 3: Open a connection
System.out.println("Connecting to a selected database...");
conn = DriverManager.getConnection(DB_CONNECTION + dbName, DB_USER,
DB_PASSWORD);
System.out.println("Connected database successfully...");
String baseParam;
//String[] subParam ;
baseParam= "Job_Render";
String sql3="";
String denom="";
final String[] COL={ "LC_Final","LC_Preview"};
denom = "(" + COL[0] + "+" + COL[1] + ")";
Statement stmt = null;
stmt = conn.createStatement();
// sql3 = "select 'Efficiency' Field,avg(eff),min(eff),max(eff) from(select (Job_Render/(LC_Final+LC_Preview))as eff from ras)as s";
sql3 = "select avg(eff),min(eff),max(eff) from(select ("+baseParam+"/"+denom+"))as eff from ras)as s";
System.out.println(sql3);
//
try{
ResultSet res = stmt.executeQuery(sql3);
//System.out.println(res);
while(res.next()){
// String JobID = res.getString("JobID");
// System.out.println("Job ID : " + JobID);
String col1 = res.getString(1);
System.out.println(col1);
double avg = res.getDouble(2);
System.out.println("Average of eff is:"+avg);
double min = res.getDouble(3);
System.out.println("Min of eff is:"+min);
double max = res.getDouble(4);
System.out.println("Max of eff is:"+max);
}}
catch(Exception e){
e.getStackTrace();
System.out.println(e.getMessage());
}
}}
Your code runs the query sql1 which is the empty string,
String sql1="";
// sql1 = "select * from raas.jobs";
ResultSet res = stmt.executeQuery(sql1); // sql1 = ""
should be
ResultSet res = stmt.executeQuery(sql3);
Edit
Then to get the value(s)
String col1 = res.getString(1);
double avg = res.getDouble(2);
double min = res.getDouble(3);
double max = res.getDouble(4);

Getting wrong output in xls/xlsx while converting it from csv

So i've changed a csv to xls/xlsx but i'm getting one character per cell. I've used pipe(|) as a delimiter in my csv.
Here is one line from the csv:
4.0|sdfa#sdf.nb|plplplp|plplpl|plplp|1988-11-11|M|asdasd#sdf.ghgh|sdfsadfasdfasdfasdfasdf|asdfasdf|3.4253242E7|234234.0|true|true|
But in excel i'm getting as
4 . 0 | s d f a
Here's the code:
try {
String csvFileAddress = "manage_user_info.csv"; //csv file address
String xlsxFileAddress = "manage_user_info.xls"; //xls file address
HSSFWorkbook workBook = new HSSFWorkbook();
HSSFSheet sheet = workBook.createSheet("sheet1");
String currentLine=null;
int RowNum=0;
BufferedReader br = new BufferedReader(new FileReader(csvFileAddress));
while ((currentLine = br.readLine()) != null) {
String str[] = currentLine.split("|");
RowNum++;
HSSFRow currentRow=sheet.createRow(RowNum);
for(int i=0;i<str.length;i++){
currentRow.createCell(i).setCellValue(str[i]);
}
}
FileOutputStream fileOutputStream = new FileOutputStream(xlsxFileAddress);
workBook.write(fileOutputStream);
fileOutputStream.close();
System.out.println("Done");
} catch (Exception ex) {
System.out.println(ex.getMessage()+"Exception in try");
}
The pipe symbol must be escaped in a regular expression:
String str[] = currentLine.split("\\|");
It is a logical operator (quote from the Javadoc of java.util.regex.Pattern):
X|Y Either X or Y

How to append text to an oracle clob

Is it possible to append text to an oracle 9i clob without rereading and rewriting the whole content?
I have tried this:
PreparedStatement stmt = cnt.prepareStatement(
"select OUT from QRTZ_JOBEXEC where EXEC_ID=? "
+ "for update",
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_UPDATABLE);
try {
stmt.setLong(1, id);
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
Clob clob = rs.getClob(1);
long len = clob.length();
Writer writer = clob.setCharacterStream(len+1);
try {
PrintWriter out = new PrintWriter(writer);
out.println(line);
out.close();
} finally {
writer.close();
}
rs.updateClob(1, clob);
rs.updateRow();
}
rs.close();
} finally {
stmt.close();
}
But I'm getting an "Unsupported feature" exception on the call to setCharacterStream.
DBMS_LOB.APPEND is the key.
if you are just adding text then you could try a simple
UPDATE qrtz_jobexec SET out = out || ? WHERE exex_id=?

Resources