JDBC ResultSet to ArrayList, ArrayList to .txt file - jdbc

I got the following piece of code that retrieves all rows in a table:
String MakeTXT = "USE SRO_VT_SHARD Select * from _RefTeleLink";
pst = conn.prepareStatement(MakeTXT);
rs = pst.executeQuery();
ArrayList<String> links = new ArrayList<>();
int i = 1;
String rows = "";
while (rs.next()) {
for (i = 1; i <= 22; i++) {
links.add(rs.getString(i));
if (i == 22) {
links.add("\n");
}
}
}
rows = String.join("\t", links);
System.out.println(rows);
}
}
What I want to do is:
Select all rows from the table. See result: prnt.sc/egbh4o
Write all selected rows to a .txt file
.txt file has to look something like this (literally copy pasted the rows): http://prntscr.com/egbhn4
What my code currently outputs:
output
It does this because there are 22 columns, and when the loop reaches 22, it adds an enter to the ArrayList.
What I'm actually looking for is a way to copy an entire row using ResultSet, instead of using a for loop to loop 22 times, and make a row of the 22 results.
Have looked everywhere but couldn't find anything.. :(

You do not need an ArrayList to hold the column values as they are read. I'd better use a StringBuilder as show below, concatenating tabs inside the loop and then replacing the last one with a line feed.
String MakeTXT = "USE SRO_VT_SHARD Select * from _RefTeleLink";
Statement stm = conn.createStatement();
ResultSet rs = stm.executeQuery(MakeTXT);
List<String> rows = new ArrayList<>();
StringBuilder row = new StringBuilder();
ResultSetMetaData meta = rs.getMetaData();
final int colCount = meta.getColumnCount();
while (rs.next()) {
row.setLength(0);
for (int c=0; c<=colCount; c++)
row.append(rs.getString(c)).append("\t");
row.setCharAt(row.length()-1, '\n');
rows.add(row.toString());
}
rs.close();
stm.close();

Related

How to read data from Excel File saved in an blob in azure with EPPlus library

I'm trying to read my excel files saved in my azure storage container like this
string connectionString = Environment.GetEnvironmentVariable("AZURE_STORAGE_CONNECTION_STRING");
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("concursos");
foreach (BlobItem blobItem in containerClient.GetBlobs())
{
BlobClient blobClient = containerClient.GetBlobClient(blobItem.Name);
ExcelPackage.LicenseContext = LicenseContext.NonCommercial;
using (var stream=blobClient.OpenRead(new BlobOpenReadOptions(true)))
using (ExcelPackage package = new ExcelPackage(stream))
{
ExcelWorksheet worksheet = package.Workbook.Worksheets.FirstOrDefault();
int colCount = worksheet.Dimension.End.Column;
int rowCount = worksheet.Dimension.End.Row;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= colCount; col++)
{
Console.WriteLine(" Row:" + row + " column:" + col + " Value:" + worksheet.Cells[row, col].Value.ToString().Trim());
}
}
But the sentence
ExcelWorksheet worksheet = package.Workbook.Worksheets.FirstOrDefault();
throws me an error:System.NullReferenceException: 'Object reference not set to an instance of an object.' worksheet was null
I debug an see fine my stream an my package
The excels in blobs are like this one .xls
Any idea, please?
Thanks
Please check if worksheet is empty .This error occurs if there is empty sheet with empty coumns and rows.
I tried to reproduce the same
Initially I tried to read a excel sheet with EPplus , where starting column and rows are filled and not empty and could execute and read successfully using the same code as yours.
Then I removed column1 to be empty and stored in blob and tried to read it and got null reference exception.
The Dimension object of the ExcelWorksheet will be null if the worksheet was just initialized and is empty .
And so throws null reference exception, AFAIK , the only way is to check if files are empty or to add content to it before accessing them so that if columns are empty , it would not throw exception.
worksheet.Cells[1, 1].Value = "Some text value";
Same way try to add worksheet, to avoid exception if in case there are no sheets in container blob.
ExcelWorksheet worksheet = new ExcelPackage().Workbook.Worksheets.Add("Sheet1");
This code will not throw an exception since the Dimension object was initialized by adding content to the worksheet.If the loaded
ExcelWorksheet already contains data, you will not face this issue.
ExcelWorksheet worksheet = package.Workbook.Worksheets.First();
//or ExcelWorksheet worksheet = package.Workbook.Worksheets[0];
// Add below line to add new sheet , if no sheets are present and returning null exception
//ExcelWorksheet worksheet = new ExcelPackage().Workbook.Worksheets.Add("Sheet1");
//Add below line to add column and row , if sheet is empty and returning null exception
worksheet.Cells[1, 1].Value = " This is the end of worksheet";
int colCount = worksheet.Dimension.End.Column;
int rowCount = worksheet.Dimension.End.Row;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= colCount; col++)
{
Console.WriteLine(" Row:" + row + " column:" + col + " Value:" + worksheet.Cells[row, col].Value.ToString().Trim());
}
}
You can alternatively check if the value is null.
if(worksheet.cells[row,column].value != null)
{
//proceed with code
}
The problem was the file extension of the excel files in blobs
Only works fone with .xlsx not with .xls
Thanks

how to process big file with comparison of each line in that file with remaining all lines in same file?

I have csv file with 5,00,000 records in it. Fields in csv file are as follows
No, Name, Address
Now i want to compare name and address from each record with name and address of all remaining records.
I was doing it in following way
List<String> lines = new ArrayList<>();
BufferedReader firstbufferedReader = new BufferedReader(new FileReader(newFile(pathname)));
while ((line = firstbufferedReader.readLine()) != null) {
lines.add(line);
}
firstbufferedReader.close();
for (int i = 0; i < lines.size(); i++)
{
csvReader = new CSVReader(new StringReader(lines.get(i)));
csvReader = null;
for (int j = i + 1; j < lines.size(); j++)
{
csvReader = new CSVReader(new StringReader(lines.get(j)));
csvReader = null;
application.linesToCompare(lines.get(i),lines.get(j));
}
}
linesToCompare Function will extract name and address from respective parameters and do comaprison. If i found records to be 80% matching(based on name and address) i am marking them as duplicates.
But my this approach is taking too much time to process that csv file.
I want a faster approach may be some kind of map reduce or anything.
Thanks in advance
It is taking a long time because it looks like you are reading the file a huge amount of times.
You first read the file into the lines List, then for every entry you read it again, then inside that you read it again!. Instead of doing this, read the file once into your lines array and then use that to compare the entries against each other.
Something like this might work for you:
List<String> lines = new ArrayList<>();
BufferedReader firstbufferedReader = new BufferedReader(new FileReader(newFile(pathname)));
while ((line = firstbufferedReader.readLine()) != null) {
lines.add(line);
}
firstbufferedReader.close();
for (int i = 0; i < lines.size(); i++)
{
for (int j = i + 1; j < lines.size(); j++)
{
application.linesToCompare(lines.get(i),lines.get(j));
}
}

Hbase scan with offset

Is there a way to scan a HBase table getting, for example, the first 100
results, then later get the next 100 and so on... Just like in SQL we do
with LIMIT and OFFSET?
My row keys are uuid
You can do it multiple ways. The easiest one is a page filter. Below is the code example from HBase: The Definitive Guide, page 150.
private static final byte[] POSTFIX = new byte[] { 0x00 };
Filter filter = new PageFilter(15);
int totalRows = 0; byte[] lastRow = null;
while (true) {
Scan scan = new Scan();
scan.setFilter(filter);
if (lastRow != null) {
byte[] startRow = Bytes.add(lastRow, POSTFIX);
System.out.println("start row: " + Bytes.toStringBinary(startRow));
scan.setStartRow(startRow);
}
ResultScanner scanner = table.getScanner(scan);
int localRows = 0;

 Result result;

 while ((result = scanner.next()) != null) {
System.out.println(localRows++ + ": " + result);
totalRows++;
lastRow = result.getRow();
}
scanner.close();
if (localRows == 0) break;
}

System.out.println("total rows: " + totalRows);
Or you can set catching on scan for the limit you want and then change the start row to the last row + 1 from the prev scan for every get.

Most Frequent 3 page sequence in a weblog

Given a web log which consists of fields 'User ' 'Page url'. We have to find out the most frequent 3-page sequence that users takes.
There is a time stamp. and it is not guaranteed that the single user access will be logged sequentially it could be like user1 Page1 user2 Pagex user1 Page2 User10 Pagex user1 Page 3 her User1s page sequence is page1-> page2-> page 3
Assuming your log is stored in timestamp order, here's an algorithm to do what you need:
Create a hashtable 'user_visits' mapping user ID to the last two pages you observed them to visit
Create a hashtable 'visit_count' mapping 3-tuples of pages to frequency counts
For each entry (user, URL) in the log:
If 'user' exists in user_visits with two entries, increment the entry in visit_count corresponding to the 3-tuple of URLs by one
Append 'URL' to the relevant entry in user_visits, removing the oldest entry if necessary.
Sort the visit_count hashtable by value. This is your list of most popular sequences of URLs.
Here's an implementation in Python, assuming your fields are space-separated:
fh = open('log.txt', 'r')
user_visits = {}
visit_counts = {}
for row in fh:
user, url = row.split(' ')
prev_visits = user_visits.get(user, ())
if len(prev_vists) == 2:
visit_tuple = prev_vists + (url,)
visit_counts[visit_tuple] = visit_counts.get(visit_tuple, 0) + 1
user_visits[user] = (prev_vists[1], url)
popular_sequences = sorted(visit_counts, key=lambda x:x[1], reverse=True)
Quick and dirty:
Build a list of url/timestamps per
user
sort each list by timestamp
iterate over each list
for each 3 URL sequence, create or increment a counter
find the highest count in the URL sequence count list
foreach(entry in parsedLog)
{
users[entry.user].urls.add(entry.time, entry.url)
}
foreach(user in users)
{
user.urls.sort()
for(i = 0; i < user.urls.length - 2; i++)
{
key = createKey(user.urls[i], user.urls[i+1], user.urls[i+2]
sequenceCounts.incrementOrCreate(key);
}
}
sequenceCounts.sortDesc()
largestCountKey = sequenceCounts[0]
topUrlSequence = parseKey(largestCountkey)
Here's a bit of SQL assuming you could get your log into a table such as
CREATE TABLE log (
ord int,
user VARCHAR(50) NOT NULL,
url VARCHAR(255) NOT NULL,
ts datetime
) ENGINE=InnoDB;
If the data is not sorted per user then (assuming that ord column is the number of the line from the log file)
SELECT t.url, t2.url, t3.url, count(*) c
FROM
log t INNER JOIN
log t2 ON t.user = t2.user INNER JOIN
log t3 ON t2.user = t3.user
WHERE
t2.ord IN (SELECT MIN(ord)
FROM log i
WHERE i.user = t.user AND i.ord > t.ord)
AND
t3.ord IN (SELECT MIN(ord)
FROM log i
WHERE i.user = t.user AND i.ord > t2.ord)
GROUP BY t.user, t.url, t2.url, t3.url
ORDER BY c DESC
LIMIT 10;
This will give top ten 3 stop paths for a user. Alternatively if you can get it ordered by user and time you can join on rownumbers more easily.
Source code in Mathematica
s= { {user},{page} } (* load List (log) here *)
sortedListbyUser=s[[Ordering[Transpose[{s[[All, 1]], Range[Length[s]]}]] ]]
Tally[Partition [sortedListbyUser,3,1]]
This problem is similar to
Find k most frequent words from a file
Here is how you can solve it :
Group each triplet (page1, page2, page3) into a word
Apply the algorithm mentioned here
1.Reads user page access urls from file line by line,these urls separated by separator,eg:
u1,/
u1,main
u1,detail
The separator is comma.
2.Store each page's visit count to map:pageVisitCounts;
3.Sort the visit count map by value in descend order;
public static Map<String, Integer> findThreeMaxPagesPathV1(String file, String separator, int depth) {
Map<String, Integer> pageVisitCounts = new HashMap<String, Integer>();
if (file == null || "".equals(file)) {
return pageVisitCounts;
}
try {
File f = new File(file);
FileReader fr = new FileReader(f);
BufferedReader bf = new BufferedReader(fr);
Map<String, List<String>> userUrls = new HashMap<String, List<String>>();
String currentLine = "";
while ((currentLine = bf.readLine()) != null) {
String[] lineArr = currentLine.split(separator);
if (lineArr == null || lineArr.length != (depth - 1)) {
continue;
}
String user = lineArr[0];
String page = lineArr[1];
List<String> urlLinkedList = null;
if (userUrls.get(user) == null) {
urlLinkedList = new LinkedList<String>();
} else {
urlLinkedList = userUrls.get(user);
String pages = "";
if (urlLinkedList.size() == (depth - 1)) {
pages = urlLinkedList.get(0).trim() + separator + urlLinkedList.get(1).trim() + separator + page;
} else if (urlLinkedList.size() > (depth - 1)) {
urlLinkedList.remove(0);
pages = urlLinkedList.get(0).trim() + separator + urlLinkedList.get(1).trim() + separator + page;
}
if (!"".equals(pages) && null != pages) {
Integer count = (pageVisitCounts.get(pages) == null ? 0 : pageVisitCounts.get(pages)) + 1;
pageVisitCounts.put(pages, count);
}
}
urlLinkedList.add(page);
System.out.println("user:" + user + ", urlLinkedList:" + urlLinkedList);
userUrls.put(user, urlLinkedList);
}
bf.close();
fr.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return pageVisitCounts;
}
public static void main(String[] args) {
String file = "/home/ieee754/Desktop/test-access.log";
String separator = ",";
Map<String, Integer> pageVisitCounts = findThreeMaxPagesPathV1(file, separator, 3);
System.out.println(pageVisitCounts.size());
Map<String, Integer> result = MapUtil.sortByValueDescendOrder(pageVisitCounts);
System.out.println(result);
}

Execute sql statement via JDBC with CLOB binding

I have the following query (column log is of type CLOB):
UPDATE table SET log=? where id=?
The query above works fine when using the setAsciiStream method to put a value longer than 4000 characters into the log column.
But instead of replacing the value, I want to append it, hence my query looks like this:
UPDATE table SET log=log||?||chr(10) where id=?
The above query DOES NOT work any more and I get the following error:
java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column
It looks to me like you have to use a PL/SQL block to do what you want. The following works for me, assuming there's an entry with id 1:
import oracle.jdbc.OracleDriver;
import java.sql.*;
import java.io.ByteArrayInputStream;
public class JDBCTest {
// How much test data to generate.
public static final int SIZE = 8192;
public static void main(String[] args) throws Exception {
// Generate some test data.
byte[] data = new byte[SIZE];
for (int i = 0; i < SIZE; ++i) {
data[i] = (byte) (64 + (i % 32));
}
ByteArrayInputStream stream = new ByteArrayInputStream(data);
DriverManager.registerDriver(new OracleDriver());
Connection c = DriverManager.getConnection(
"jdbc:oracle:thin:#some_database", "user", "password");
String sql =
"DECLARE\n" +
" l_line CLOB;\n" +
"BEGIN\n" +
" l_line := ?;\n" +
" UPDATE table SET log = log || l_line || CHR(10) WHERE id = ?;\n" +
"END;\n";
PreparedStatement stmt = c.prepareStatement(sql);
stmt.setAsciiStream(1, stream, SIZE);
stmt.setInt(2, 1);
stmt.execute();
stmt.close();
c.commit();
c.close();
}
}
BLOBs are not mutable from SQL (well, besides setting them to NULL), so to append, you would have to download the blob first, concatenate locally, and upload the result again.
The usual solution is to write several records to the database with a common key and a sequence which tells the DB how to order the rows.

Resources