while(line.contains("^")) loop not breaking - writer

this is my class:
import java.io.*;
public class Test
{
public static void main(String[] args) throws FileNotFoundException, IOException
{
BufferedReader br = new BufferedReader(new FileReader("file2.txt"));
BufferedWriter bw = new BufferedWriter(new FileWriter("file.txt"));
int i = 0;
String line;
while ((line = br.readLine()) != null) {
while(line.contains("^")) {
i ++;
line = line.replaceFirst("^", Integer.toString(i));
}
bw.write(line + "\n");
}
br.close();
bw.close();
}
}
the file2.txt and file.txt are exactly the same and I want to make the lines that look like
<wpt lat="26.381418638" lon="-80.101236298"><ele>0</ele><time> </time><name>Waypoint #^</name><desc> </desc></wpt>
to look like
<wpt lat="26.381418638" lon="-80.101236298"><ele>0</ele><time> </time><name>Waypoint #5</name><desc> </desc></wpt>
When I run it though, it goes on an infinite loop. Any advice will help. Thanks!

line = line.replaceFirst("^", Integer.toString(i));
replaceFirst's first argument is a regular expression, and "^" as a regular expression means "the start of the string". So this command just keeps prepending values to the start of the string, and never removes any circumflexes. Instead, you should write:
line = line.replaceFirst("\\^", Integer.toString(i));

The String.replaceFirst method takes a regular expression which has special characters for certain operations - one of these characters is the^ character. You need to escape it to look for occurances of it (In Java, since backslash is special in strings, this would be "\\^" in the "replaceFirst" argument)

Related

Append a String to the end of the existing String with specific position in a text file in Java

Exp -
In a text file we have the following topics with some description.
#Repeat the annotation
It is the major topic for .....
#Vector analysis
It covers all the aspects of sequential....
#Cloud Computing
Create header accounts for all the users
We have to add / append new Tags to the Topics in specific line
For exp-
#Repeat the annotation #Maven build
#Cloud Computing #SecondYear
File f = new File("/user/imp/value/GSTR.txt");
FileReader fr = new FileReader(f);
Object fr1;
while((fr1 = fr.read()) != null) {
if(fr1.equals("#Repeat the annotation")) {
FileWriter fw = new FileWriter(f,true);
fw.write("#Maven build");
fw.close();
}
}
****** #Maven is getting added to the last line of the text file but not at the specific position next to the topic
The output is written to the file GSTR_modified.txt. The code along with an example input file is also available here. The code in the github repository reads the file "input.txt" and writes to the file "output.txt".
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
public class Main {
public static void main(String[] args) throws IOException {
// Create a list to store the file content.
ArrayList<String> list = new ArrayList<>();
// Store the file content in the list. Each line becomes an element in the list.
try (BufferedReader br = new BufferedReader(new FileReader("/user/imp/value/GSTR.txt""))) {
String line;
while ((line = br.readLine()) != null) {
list.add(line);
}
}
// Iterate the list of lines.
for (int i = 0; i < list.size(); i++) {
// line is the element at the index i.
String line = list.get(i);
// Check if a line is equal to "#Repeat the annotation"
if (line.contains("#Repeat the annotation")){
// Set the list element at index i to the line itself concatenated with
// the string " #Maven build".
list.set(i,line.concat(" #Maven build"));
}
// Same pattern as above.
if (line.contains("#Cloud Computing")){
list.set(i,line.concat(" #SecondYear"));
}
}
// Write the contents of the list to a file.
FileWriter writer = new FileWriter("GSTR_modified.txt");
for(String str: list) {
// Append newline character \n to each element
// and write it to file.
writer.write(str+"\n");
}
writer.close();
}
}

Using stanford parser to parse Chinese

here is my code, mostly from the demo. The program runs perfectly, but the result is very wrong. It did not spilt the words.
Thank you
public static void main(String[] args) {
LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/xinhuaFactored.ser.gz");
demoAPI(lp);
}
public static void demoAPI(LexicalizedParser lp) {
// This option shows loading and using an explicit tokenizer
String sent2 = "我爱你";
TokenizerFactory<CoreLabel> tokenizerFactory =
PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
Tokenizer<CoreLabel> tok =
tokenizerFactory.getTokenizer(new StringReader(sent2));
List<CoreLabel> rawWords2 = tok.tokenize();
Tree parse = lp.apply(rawWords2);
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();
// You can also use a TreePrint object to print trees and dependencies
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}
Did you make sure to segment the words? For example try running it again with "我 爱 你." as the sentence. I believe from the command line the parser will segment automatically, however I'm not sure what it does from within Java.

Getting wrong output in xls/xlsx while converting it from csv

So i've changed a csv to xls/xlsx but i'm getting one character per cell. I've used pipe(|) as a delimiter in my csv.
Here is one line from the csv:
4.0|sdfa#sdf.nb|plplplp|plplpl|plplp|1988-11-11|M|asdasd#sdf.ghgh|sdfsadfasdfasdfasdfasdf|asdfasdf|3.4253242E7|234234.0|true|true|
But in excel i'm getting as
4 . 0 | s d f a
Here's the code:
try {
String csvFileAddress = "manage_user_info.csv"; //csv file address
String xlsxFileAddress = "manage_user_info.xls"; //xls file address
HSSFWorkbook workBook = new HSSFWorkbook();
HSSFSheet sheet = workBook.createSheet("sheet1");
String currentLine=null;
int RowNum=0;
BufferedReader br = new BufferedReader(new FileReader(csvFileAddress));
while ((currentLine = br.readLine()) != null) {
String str[] = currentLine.split("|");
RowNum++;
HSSFRow currentRow=sheet.createRow(RowNum);
for(int i=0;i<str.length;i++){
currentRow.createCell(i).setCellValue(str[i]);
}
}
FileOutputStream fileOutputStream = new FileOutputStream(xlsxFileAddress);
workBook.write(fileOutputStream);
fileOutputStream.close();
System.out.println("Done");
} catch (Exception ex) {
System.out.println(ex.getMessage()+"Exception in try");
}
The pipe symbol must be escaped in a regular expression:
String str[] = currentLine.split("\\|");
It is a logical operator (quote from the Javadoc of java.util.regex.Pattern):
X|Y Either X or Y

Problems during counting strings in the txt file

I am developing a progam which reads a text file and creates a report. The content of the report is the following: the number of every string in file, its "status", and some symbols of every string beginning. It works well with file up to 100 Mb.
But when I run the program with input files which are bigger than 1,5Gb in size and contain more than 100000 lines, I get the following error:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source) at
> java.lang.String.<init>(Unknown Source) at
> java.lang.StringBuffer.toString(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:771) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:723) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:745) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1512) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1528) at
> org.apache.commons.io.ReadFileToListSample.main(ReadFileToListSample.java:43)
I increased VM arguments up to -Xms128m -Xmx1600m (in eclipse run configuration) but this did not help. Specialists from OTN forum advised me to read some books and improve my program's performance. Could anybody help me to improve it? Thank you.
code:
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.util.List;
public class ReadFileToList {
public static void main(String[] args) throws FileNotFoundException
{
File file_out = new File ("D:\\Docs\\test_out.txt");
FileOutputStream fos = new FileOutputStream(file_out);
PrintStream ps = new PrintStream (fos);
System.setOut (ps);
// Create a file object
File file = new File("D:\\Docs\\test_in.txt");
FileReader fr = null;
LineNumberReader lnr = null;
try {
// Here we read a file, sample.txt, using FileUtils
// class of commons-io. Using FileUtils.readLines()
// we can read file content line by line and return
// the result as a List of string.
List<String> contents = FileUtils.readLines(file);
//
// Iterate the result to print each line of the file.
fr = new FileReader(file);
lnr = new LineNumberReader(fr);
for (String line : contents)
{
String begin_line = line.substring(0, 38); // return 38 chars from the string
String begin_line_without_null = begin_line.replace("\u0000", " ");
String begin_line_without_null_spaces = begin_line_without_null.replaceAll(" +", " ");
int stringlenght = line.length();
line = lnr.readLine();
int line_num = lnr.getLineNumber();
String status;
// some correct length for if
int c_u_length_f = 12;
int c_ea_length_f = 13;
int c_a_length_f = 2130;
int c_u_length_e = 3430;
int c_ea_length_e = 1331;
int c_a_length_e = 442;
int h_ext = 6;
int t_ext = 6;
if ( stringlenght == c_u_length_f ||
stringlenght == c_ea_length_f ||
stringlenght == c_a_length_f ||
stringlenght == c_u_length_e ||
stringlenght == c_ea_length_e ||
stringlenght == c_a_length_e ||
stringlenght == h_ext ||
stringlenght == t_ext)
status = "ok";
else status = "fail";
System.out.println(+ line_num + stringlenght + status + begin_line_without_null_spaces);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Also specialists from OTN said that this programm opens the input and reading it twice. May be some mistakes in "for statement"? But I can't find it.
Thank you.
You're declaring variables inside the loop and doing a lot of uneeded work, including reading the file twice - not good for peformance either. You can use the line number reader to get the line number and the text and reuse the line variable (declared outside the loop). Here's a shortened version that does what you need. You'll need to complete the validLength method to check all the values since I included only the first couple of tests.
import java.io.*;
public class TestFile {
//a method to determine if the length is valid implemented outside the method that does the reading
private static String validLength(int length) {
if (length == 12 || length == 13 || length == 2130) //you can finish it
return "ok";
return "fail";
}
public static void main(String[] args) {
try {
LineNumberReader lnr = new LineNumberReader(new FileReader(args[0]));
BufferedWriter out = new BufferedWriter(new FileWriter(args[1]));
String line;
int length;
while (null != (line = lnr.readLine())) {
length = line.length();
line = line.substring(0,38);
line = line.replace("\u0000", " ");
line = line.replace("+", " ");
out.write( lnr.getLineNumber() + length + validLength(length) + line);
out.newLine();
}
out.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Call this as java TestFile D:\Docs\test_in.txt D:\Docs\test_in.txt or replace the args[0] and args[1] with the file names if you want to hard code them.

How to get a substring in some length for special chars like Chinese

For example, I can get 80 chars with {description?substring(0, 80)} if description is in English, but for Chinese chars, I can get only about 10 chars, and there is a garbage char at the end always.
How can I get 80 chars for any language?
FreeMarker relies on String#substring to do the actual (UTF-16-chars-based?) substring calculation, which doesn't work well with Chinese characters. Instead one should uses Unicode code points. Based on this post and FreeMarker's own substring builtin I hacked together a FreeMarker TemplateMethodModelEx implementation which operates on code points:
public class CodePointSubstring implements TemplateMethodModelEx {
#Override
public Object exec(List args) throws TemplateModelException {
int argCount = args.size(), left = 0, right = 0;
String s = "";
if (argCount != 3) {
throw new TemplateModelException(
"Error: Expecting 1 string and 2 numerical arguments here");
}
try {
TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
s = tsm.getAsString();
} catch (ClassCastException cce) {
String mess = "Error: Expecting numerical argument here";
throw new TemplateModelException(mess);
}
try {
TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
left = tnm.getAsNumber().intValue();
tnm = (TemplateNumberModel) args.get(2);
right = tnm.getAsNumber().intValue();
} catch (ClassCastException cce) {
String mess = "Error: Expecting numerical argument here";
throw new TemplateModelException(mess);
}
return new SimpleScalar(getSubstring(s, left, right));
}
private String getSubstring(String s, int start, int end) {
int[] codePoints = new int[end - start];
int length = s.length();
int i = 0;
for (int offset = 0; offset < length && i < codePoints.length;) {
int codepoint = s.codePointAt(offset);
if (offset >= start) {
codePoints[i] = codepoint;
i++;
}
offset += Character.charCount(codepoint);
}
return new String(codePoints, 0, i);
}
}
You can put an instance of it into your data model root, e.g.
SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);
and use the custom substring method in FTL:
${substring(description, 0, 80)}
I tested it with non-Chinese characters, which still worked, but so far I haven't tried it with Chinese characters. Maybe you want to give it a try.

Resources