Reading a file with newlines as a tuple in pig

Reading a file with newlines as a tuple in pig - hadoop

Is it possible to change the record delimiter from newline to some other string so as to read a file with newlines into a single tuple in pig.

Yes.
A = LOAD '...' USING PigStorage(',') AS (...); //comma is the delimeter for fields
SET textinputformat.record.delimiter '<delimeter>'; // record delimeter, by default it is `\n`. You can change to any delimeter.

As mentioned here
You can use PigStorage
A = LOAD '/some/path/COMMA-DELIM-PREFIX*' USING PigStorage(',') AS (f1:chararray, ...);
B = LOAD '/some/path/SEMICOLON-DELIM-PREFIX*' USING PigStorage('\t') AS (f1:chararray, ...);
You can even try writing load/store UDF.
There is java code example for both load and store.
Load Functions : LoadFunc abstract class has the main methods for loading data and for most use cases it would suffice to extend it. You can read more here
Example
The loader implementation in the example is a loader for text data
with line delimiter as '\n' and '\t' as default field delimiter (which
can be overridden by passing a different field delimiter in the
constructor) - this is similar to current PigStorage loader in Pig.
The implementation uses an existing Hadoop supported Inputformat -
TextInputFormat - as the underlying InputFormat.
public class SimpleTextLoader extends LoadFunc {
protected RecordReader in = null;
private byte fieldDel = '\t';
private ArrayList<Object> mProtoTuple = null;
private TupleFactory mTupleFactory = TupleFactory.getInstance();
private static final int BUFFER_SIZE = 1024;
public SimpleTextLoader() {
}
/**
* Constructs a Pig loader that uses specified character as a field delimiter.
*
* #param delimiter
* the single byte character that is used to separate fields.
* ("\t" is the default.)
*/
public SimpleTextLoader(String delimiter) {
this();
if (delimiter.length() == 1) {
this.fieldDel = (byte)delimiter.charAt(0);
} else if (delimiter.length() > 1 & & delimiter.charAt(0) == '\\') {
switch (delimiter.charAt(1)) {
case 't':
this.fieldDel = (byte)'\t';
break;
case 'x':
fieldDel =
Integer.valueOf(delimiter.substring(2), 16).byteValue();
break;
case 'u':
this.fieldDel =
Integer.valueOf(delimiter.substring(2)).byteValue();
break;
default:
throw new RuntimeException("Unknown delimiter " + delimiter);
}
} else {
throw new RuntimeException("PigStorage delimeter must be a single character");
}
}
#Override
public Tuple getNext() throws IOException {
try {
boolean notDone = in.nextKeyValue();
if (!notDone) {
return null;
}
Text value = (Text) in.getCurrentValue();
byte[] buf = value.getBytes();
int len = value.getLength();
int start = 0;
for (int i = 0; i < len; i++) {
if (buf[i] == fieldDel) {
readField(buf, start, i);
start = i + 1;
}
}
// pick up the last field
readField(buf, start, len);
Tuple t = mTupleFactory.newTupleNoCopy(mProtoTuple);
mProtoTuple = null;
return t;
} catch (InterruptedException e) {
int errCode = 6018;
String errMsg = "Error while reading input";
throw new ExecException(errMsg, errCode,
PigException.REMOTE_ENVIRONMENT, e);
}
}
private void readField(byte[] buf, int start, int end) {
if (mProtoTuple == null) {
mProtoTuple = new ArrayList<Object>();
}
if (start == end) {
// NULL value
mProtoTuple.add(null);
} else {
mProtoTuple.add(new DataByteArray(buf, start, end));
}
}
#Override
public InputFormat getInputFormat() {
return new TextInputFormat();
}
#Override
public void prepareToRead(RecordReader reader, PigSplit split) {
in = reader;
}
#Override
public void setLocation(String location, Job job)
throws IOException {
FileInputFormat.setInputPaths(job, location);
}
}
Store Functions : StoreFunc abstract class has the main methods for storing data and for most use cases it should suffice to extend it
Example
The storer implementation in the example is a storer for text data
with line delimiter as '\n' and '\t' as default field delimiter (which
can be overridden by passing a different field delimiter in the
constructor) - this is similar to current PigStorage storer in Pig.
The implementation uses an existing Hadoop supported OutputFormat -
TextOutputFormat as the underlying OutputFormat.
public class SimpleTextStorer extends StoreFunc {
protected RecordWriter writer = null;
private byte fieldDel = '\t';
private static final int BUFFER_SIZE = 1024;
private static final String UTF8 = "UTF-8";
public PigStorage() {
}
public PigStorage(String delimiter) {
this();
if (delimiter.length() == 1) {
this.fieldDel = (byte)delimiter.charAt(0);
} else if (delimiter.length() > 1delimiter.charAt(0) == '\\') {
switch (delimiter.charAt(1)) {
case 't':
this.fieldDel = (byte)'\t';
break;
case 'x':
fieldDel =
Integer.valueOf(delimiter.substring(2), 16).byteValue();
break;
case 'u':
this.fieldDel =
Integer.valueOf(delimiter.substring(2)).byteValue();
break;
default:
throw new RuntimeException("Unknown delimiter " + delimiter);
}
} else {
throw new RuntimeException("PigStorage delimeter must be a single character");
}
}
ByteArrayOutputStream mOut = new ByteArrayOutputStream(BUFFER_SIZE);
#Override
public void putNext(Tuple f) throws IOException {
int sz = f.size();
for (int i = 0; i < sz; i++) {
Object field;
try {
field = f.get(i);
} catch (ExecException ee) {
throw ee;
}
putField(field);
if (i != sz - 1) {
mOut.write(fieldDel);
}
}
Text text = new Text(mOut.toByteArray());
try {
writer.write(null, text);
mOut.reset();
} catch (InterruptedException e) {
throw new IOException(e);
}
}
#SuppressWarnings("unchecked")
private void putField(Object field) throws IOException {
//string constants for each delimiter
String tupleBeginDelim = "(";
String tupleEndDelim = ")";
String bagBeginDelim = "{";
String bagEndDelim = "}";
String mapBeginDelim = "[";
String mapEndDelim = "]";
String fieldDelim = ",";
String mapKeyValueDelim = "#";
switch (DataType.findType(field)) {
case DataType.NULL:
break; // just leave it empty
case DataType.BOOLEAN:
mOut.write(((Boolean)field).toString().getBytes());
break;
case DataType.INTEGER:
mOut.write(((Integer)field).toString().getBytes());
break;
case DataType.LONG:
mOut.write(((Long)field).toString().getBytes());
break;
case DataType.FLOAT:
mOut.write(((Float)field).toString().getBytes());
break;
case DataType.DOUBLE:
mOut.write(((Double)field).toString().getBytes());
break;
case DataType.BYTEARRAY: {
byte[] b = ((DataByteArray)field).get();
mOut.write(b, 0, b.length);
break;
}
case DataType.CHARARRAY:
// oddly enough, writeBytes writes a string
mOut.write(((String)field).getBytes(UTF8));
break;
case DataType.MAP:
boolean mapHasNext = false;
Map<String, Object> m = (Map<String, Object>)field;
mOut.write(mapBeginDelim.getBytes(UTF8));
for(Map.Entry<String, Object> e: m.entrySet()) {
if(mapHasNext) {
mOut.write(fieldDelim.getBytes(UTF8));
} else {
mapHasNext = true;
}
putField(e.getKey());
mOut.write(mapKeyValueDelim.getBytes(UTF8));
putField(e.getValue());
}
mOut.write(mapEndDelim.getBytes(UTF8));
break;
case DataType.TUPLE:
boolean tupleHasNext = false;
Tuple t = (Tuple)field;
mOut.write(tupleBeginDelim.getBytes(UTF8));
for(int i = 0; i < t.size(); ++i) {
if(tupleHasNext) {
mOut.write(fieldDelim.getBytes(UTF8));
} else {
tupleHasNext = true;
}
try {
putField(t.get(i));
} catch (ExecException ee) {
throw ee;
}
}
mOut.write(tupleEndDelim.getBytes(UTF8));
break;
case DataType.BAG:
boolean bagHasNext = false;
mOut.write(bagBeginDelim.getBytes(UTF8));
Iterator<Tuple> tupleIter = ((DataBag)field).iterator();
while(tupleIter.hasNext()) {
if(bagHasNext) {
mOut.write(fieldDelim.getBytes(UTF8));
} else {
bagHasNext = true;
}
putField((Object)tupleIter.next());
}
mOut.write(bagEndDelim.getBytes(UTF8));
break;
default: {
int errCode = 2108;
String msg = "Could not determine data type of field: " + field;
throw new ExecException(msg, errCode, PigException.BUG);
}
}
}
#Override
public OutputFormat getOutputFormat() {
return new TextOutputFormat<WritableComparable, Text>();
}
#Override
public void prepareToWrite(RecordWriter writer) {
this.writer = writer;
}
#Override
public void setStoreLocation(String location, Job job) throws IOException {
job.getConfiguration().set("mapred.textoutputformat.separator", "");
FileOutputFormat.setOutputPath(job, new Path(location));
if (location.endsWith(".bz2")) {
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);
} else if (location.endsWith(".gz")) {
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
}
}
}

Related

Are there any Dart resources that would split a command-line String into a List<String> of arguments?

Are there any Dart resources that would split a command-line String into a List<String> of arguments?
ArgsParser takes a List<String> of already split arguments usually from main(List<String>).

To answer my own question,
I've converted a Java function I liked into a Dart Converter<String, List<String>) class:
import 'dart:convert';
/// Splits a `String` into a list of command-line argument parts.
/// e.g. "command -p param" -> ["command", "-p", "param"]
///
class CommandlineConverter extends Converter<String, List<String>>
{
#override
List<String> convert(String input)
{
if (input == null || input.isEmpty)
{
//no command? no string
return [];
}
final List<String> result = new List<String>();
var current = "";
String inQuote;
bool lastTokenHasBeenQuoted = false;
for (int index = 0; index < input.length; index++)
{
final token = input[index];
if (inQuote != null)
{
if (token == inQuote)
{
lastTokenHasBeenQuoted = true;
inQuote = null;
}
else
{
current += token;
}
}
else
{
switch (token)
{
case "'": // '
case '"': // ""
inQuote = token;
continue;
case " ": // space
if (lastTokenHasBeenQuoted || current.isNotEmpty)
{
result.add(current);
current = "";
}
break;
default:
current += token;
lastTokenHasBeenQuoted = false;
}
}
}
if (lastTokenHasBeenQuoted || current.isNotEmpty)
{
result.add(current);
}
if (inQuote != null)
{
throw new Exception("Unbalanced quote $inQuote in input:\n$input");
}
return result;
}
}

how to exclude checking words in quotations using java

It's the first time I'm doing this so I didn't want to be lengthy. I'm building a cross reference from reading a java program. I'm to exclude java keywords, commented words and words in quotations. I got through with excluding the java keywords and the commented words but I'm having problems excluding those in quotes.
public class CrossReference {
static Scanner in;
static PrintWriter out;
static int currentLine = 0;
public static void main(String[] args) throws IOException {
in = new Scanner (new FileReader("keywords.txt"));
out = new PrintWriter (new FileWriter("crossreference.out"));
LinkedList keywords = new LinkedList();
while (in.hasNextLine()) {
String word = in.nextLine();
keywords.addTail(new NodeData(word));
}
in = new Scanner (new FileReader("program.txt"));
BinaryTree bst = new BinaryTree();
while(in.hasNextLine()){
String line = in.nextLine();
out.printf("%3d. %s\n", ++currentLine,line);
getWordsOnLine(line,bst,keywords);
}
out.printf("\nWords LineNumber\n\n");
bst.inOrder();
out.close();
}
public static void getWordsOnLine(String inputLine, BinaryTree bst, LinkedList keywords){
Scanner inLine = new Scanner(inputLine);
inLine.useDelimiter("[^a-zA-Z//\"*]+");
boolean b = true;
while(inLine.hasNext() && b){
String word = inLine.next().toLowerCase();
if (word.contains("/") || word.contains("\"") || word.contains("*")) {
b = false;
} //this works for the commented words but not so well for the ones in quotes as it also excludes words after those in quotes
else {
boolean key = false;
Node curr = keywords.head;
while (curr != null) {
if (curr.data.str.equals(word)) key = true;
curr = curr.next;
}
if (key == false) {
TreeNode node = bst.findOrInsert(new TreeNodeData(word));
ListNode p = new ListNode(currentLine);
p.next = node.data.firstLine;
node.data.firstLine = p;
}
}
}
}
}

Split your string by space into an array.
Iterate through the array, checking which elements start with quotes and are words
Sum it
So the code would look like:
public class HelloWorld
{
public static void main(String[] args)
{
String a = "COPY PASTE ORIGINAL HERE";
String[] arr = a.split(" ");
int count = 0;
for(String each: arr){
if(each.charAt(0) != '\"' && each.charAt(0) < '0' || each.charAt(0) > '9'){
count++;
}
}
System.out.println("words="+count);
}
}

Duplication with Chunks in Spring Batch

I have a huge file, I need to read it and dump it into DB. Any invalid records(invalid length, duplicate keys, etc), if present need to be written into a Error Report. Due to the huge size of the file we tried using the chunk-size(commit-interval) as 1000/5000/10000. In the process I found that the data was being processed redundantly due to the usage of chunks and thus my Error Report is incorrect, it not only has the actual invalid records from the input-file but also the duplicates from the chunks.
Code snippet:
#Bean
public Step readAndWriteStudentInfo() {
return stepBuilderFactory.get("readAndWriteStudentInfo")
.<Student, Student>chunk(5000).reader(studentFileReader()).faultTolerant()
.skipPolicy(skipper)..listener(listener).processor(new ItemProcessor<Student, Student>() {
#Override
public Student process(Student Student) throws Exception {
if(processedRecords.contains(Student)){
return null;
}else {
processedRecords.add(Student);
return Student;
}
}
}).writer(studentDBWriter()).build();
}
#Bean
public ItemReader<Student> studentFileReader() {
FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource(studentInfoFileName));
reader.setLineMapper(new DefaultLineMapper<Student>() {
{
setLineTokenizer(new FixedLengthTokenizer() {
{
setNames(classProperties50);
setColumns(range50);
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Student>() {
{
setTargetType(Student.class);
}
});
}
});
reader.setSaveState(false);
reader.setLinesToSkip(1);
reader.setRecordSeparatorPolicy(new TrailerSkipper());
return reader;
}
#Bean
public ItemWriter<Student> studentDBWriter() {
JdbcBatchItemWriter<Student> writer = new JdbcBatchItemWriter<>();
writer.setSql(insertQuery);
writer.setDataSource(datSource);
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Student>());
return writer;
}
I've tried with various chunk sizes, 10, 100, 1000, 5000. The accuracy of my error report deteriorates with the increase in chunk size. Writing to Error Report is happening from my implementation of Skip Policy, kindly do let me know if that code is required too to help me out.
How do I ensure that my writer picks up unique set of records in each chunk?
Skipper Implementation:
#Override
public boolean shouldSkip(Throwable t, int skipCount) throws SkipLimitExceededException {
String exception = t.getClass().getSimpleName();
if (t instanceof FileNotFoundException) {
return false;
}
switch (exception) {
case "FlatFileParseException":
FlatFileParseException ffpe = (FlatFileParseException) t;
String errorMessage = "Line no = " + ffpe.getLineNumber() + " " + ffpe.getMessage() + " Record is ["
+ ffpe.getInput() + "].\n";
writeToRecon(errorMessage);
return true;
case "SQLException":
SQLException sE = (SQLException) t;
String sqlErrorMessage = sE.getErrorCode() + " Record is [" + sE.getCause() + "].\n";
writeToRecon(sqlErrorMessage);
return true;
case "BatchUpdateException":
BatchUpdateException batchUpdateException = (BatchUpdateException) t;
String btchUpdtExceptionMsg = batchUpdateException.getMessage() + " " + batchUpdateException.getCause();
writeToRecon(btchUpdtExceptionMsg);
return true;
default:
return false;
}

JeroMQ: connection does not recover reliably

I have two applications, sending messages asynchronously in both directions. I am using sockets of type ZMQ.DEALER on both sides. The connection status is additionally controlled by heartbeating.
I have now problems to get the connection reliably recovering after connection problems (line failure or application restart on one side). When I restart the applicaton on the server side (the side doing the bind()), the client side will not always reconnect successfully and then needs to be restarted, especially when the local buffer has reached the HWM limit.
I did not find any other way to make the connection recovery reliable, other than resetting the complete ZMQ.Context in case of heartbeat failures or if send() returned false. I will then call Context.term() and will create Context and Socket again. This seemed to work fine in my tests. But now I observed occasional and hangups inside Context.term(), which are rare and hard to reproduce. I know, that creating the Context should be done just once at application startup, but as said I found no other way to re-establish a broken connection.
I am using JeroMQ 0.3.4. The source of a test application is below, ~200 lines of code.
Any hints to solve this are very much appreciated.
import java.util.Calendar;
import org.zeromq.ZMQ;
public class JeroMQTest {
public interface IMsgListener {
public void newMsg(byte[] message);
}
final static int delay = 100;
final static boolean doResetContext = true;
static JeroMQTest jeroMQTest;
static boolean isServer;
private ZMQ.Context zContext;
private ZMQ.Socket zSocket;
private String address = "tcp://localhost:9889";
private long lastHeartbeatReceived = 0;
private long lastHeartbeatReplyReceived;
private boolean sendStat = true, serverIsActive = false, receiverInterrupted = false;
private Thread receiverThread;
private IMsgListener msgListener;
public static void main(String[] args) {
isServer = args.length > 0 && args[0].equals("true");
if (isServer) {
new JeroMQTest().runServer();
}
else {
new JeroMQTest().runClient();
}
}
public void runServer() {
msgListener = new IMsgListener() {
public void newMsg(byte[] message) {
String msgReceived = new String(message);
if (msgReceived.startsWith("HEARTBEAT")) {
String msgSent = "HEARTBEAT_REP " + msgReceived.substring(10);
sendStat = zSocket.send(msgSent.getBytes());
System.out.println("heartbeat rcvd, reply sent, status:" + sendStat);
lastHeartbeatReceived = getNow();
} else {
System.out.println("msg received:" + msgReceived);
}
}
};
createJmq();
sleep(1000);
int ct = 1;
while (true) {
boolean heartbeatsOk = lastHeartbeatReceived > getNow() - delay * 4;
if (heartbeatsOk) {
serverIsActive = true;
String msg = "SERVER " + ct;
sendStat = zSocket.send(msg.getBytes());
System.out.println("msg sent:" + msg + ", status:" + sendStat);
ct++;
}
if (serverIsActive && (!heartbeatsOk || !sendStat)) {
serverIsActive = false;
if (doResetContext) {
resetContext();
}
}
sleep(delay);
}
}
public void runClient() {
msgListener = new IMsgListener() {
public void newMsg(byte[] message) {
String msgReceived = new String(message);
if (msgReceived.startsWith("HEARTBEAT_REP")) {
System.out.println("HEARTBEAT_REP received:" + msgReceived);
lastHeartbeatReplyReceived = getNow();
}
else {
System.out.println("msg received:" + msgReceived);
}
}
};
createJmq();
sleep(1000);
int ct = 1;
boolean reconnectDone = false;
while (true) {
boolean heartbeatsOK = lastHeartbeatReplyReceived > getNow() - delay * 4;
String msg = "HEARTBEAT " + (ct++);
sendStat = zSocket.send(msg.getBytes());
System.out.println("heartbeat sent:" + msg + ", status:" + sendStat);
sleep(delay / 2);
if (sendStat) {
msg = "MSG " + ct;
sendStat = zSocket.send(msg.getBytes());
System.out.println("msg sent:" + msg + ", status:" + sendStat);
reconnectDone = false;
}
if ((!heartbeatsOK && lastHeartbeatReplyReceived > 0) || (!sendStat && !reconnectDone)) {
if (doResetContext) {
resetContext();
}
lastHeartbeatReplyReceived = 0;
reconnectDone = true;
}
sleep(delay / 2);
}
}
public void resetContext() {
closeJmq();
sleep(1000);
createJmq();
System.out.println("resetContext done");
}
private void createJmq() {
zContext = ZMQ.context(1);
zSocket = zContext.socket(ZMQ.DEALER);
zSocket.setSendTimeOut(100);
zSocket.setReceiveTimeOut(100);
zSocket.setSndHWM(10);
zSocket.setRcvHWM(10);
zSocket.setLinger(100);
if (isServer) {
zSocket.bind(address);
} else {
zSocket.connect(address);
}
receiverThread = new Thread() {
public void run() {
receiverInterrupted = false;
try {
ZMQ.Poller poller = new ZMQ.Poller(1);
poller.register(zSocket, ZMQ.Poller.POLLIN);
while (!receiverInterrupted) {
if (poller.poll(100) > 0) {
byte byteArr[] = zSocket.recv(0);
msgListener.newMsg(byteArr);
}
}
poller.unregister(zSocket);
} catch (Throwable e) {
System.out.println("Exception in ReceiverThread.run:" + e.getMessage());
}
}
};
receiverThread.start();
}
public void closeJmq() {
receiverInterrupted = true;
sleep(100);
zSocket.close();
zContext.term();
}
long getNow() {
Calendar now = Calendar.getInstance();
return (long) (now.getTime().getTime());
}
private static void sleep(int mSleep) {
try {
Thread.sleep(mSleep);
} catch (InterruptedException e) {
}
}
}

What is the fastest way to read from standard input in Scala? [duplicate]

I am reading bunch of integers separated by space or newlines from the standard in using Scanner(System.in).
Is there any faster way of doing this in Java?

Is there any faster way of doing this in Java?
Yes. Scanner is fairly slow (at least according to my experience).
If you don't need to validate the input, I suggest you just wrap the stream in a BufferedInputStream and use something like String.split / Integer.parseInt.
A small comparison:
Reading 17 megabytes (4233600 numbers) using this code
Scanner scanner = new Scanner(System.in);
while (scanner.hasNext())
sum += scanner.nextInt();
took on my machine 3.3 seconds. while this snippet
BufferedReader bi = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = bi.readLine()) != null)
for (String numStr: line.split("\\s"))
sum += Integer.parseInt(numStr);
took 0.7 seconds.
By messing up the code further (iterating over line with String.indexOf / String.substring) you can get it down to about 0.1 seconds quite easily, but I think I've answered your question and I don't want to turn this into some code golf.

I created a small InputReader class which works just like Java's Scanner but outperforms it in speed by many magnitudes, in fact, it outperforms the BufferedReader as well. Here is a bar graph which shows the performance of the InputReader class I have created reading different types of data from standard input:
Here are two different ways of finding the sum of all the numbers coming from System.in using the InputReader class:
int sum = 0;
InputReader in = new InputReader(System.in);
// Approach #1
try {
// Read all strings and then parse them to integers (this is much slower than the next method).
String strNum = null;
while( (strNum = in.nextString()) != null )
sum += Integer.parseInt(strNum);
} catch (IOException e) { }
// Approach #2
try {
// Read all the integers in the stream and stop once an IOException is thrown
while( true ) sum += in.nextInt();
} catch (IOException e) { }

If you asking from competitive programming point of view, where if the submission is not fast enough, it will be TLE.
Then you can check the following method to retrieve String from System.in.
I have taken from one of the best coder in java(competitive sites)
private String ns()
{
int b = skip();
StringBuilder sb = new StringBuilder();
while(!(isSpaceChar(b))){ // when nextLine, (isSpaceChar(b) && b != ' ')
sb.appendCodePoint(b);
b = readByte();
}
return sb.toString();
}`

You can read from System.in in a digit by digit way. Look at this answer: https://stackoverflow.com/a/2698772/3307066.
I copy the code here (barely modified). Basically, it reads integers, separated by anything that is not a digit. (Credits to the original author.)
private static int readInt() throws IOException {
int ret = 0;
boolean dig = false;
for (int c = 0; (c = System.in.read()) != -1; ) {
if (c >= '0' && c <= '9') {
dig = true;
ret = ret * 10 + c - '0';
} else if (dig) break;
}
return ret;
}
In my problem, this code was approx. 2 times faster than using StringTokenizer, which was already faster than String.split(" ").
(The problem involved reading 1 million integers of up to 1 million each.)

StringTokenizer is a much faster way of reading string input separated by tokens.
Check below example to read a string of integers separated by space and store in arraylist,
String str = input.readLine(); //read string of integers using BufferedReader e.g. "1 2 3 4"
List<Integer> list = new ArrayList<>();
StringTokenizer st = new StringTokenizer(str, " ");
while (st.hasMoreTokens()) {
list.add(Integer.parseInt(st.nextToken()));
}

In programming perspective this customized Scan and Print class is way better than Java inbuilt Scanner and BufferedReader classes.
import java.io.InputStream;
import java.util.InputMismatchException;
import java.io.IOException;
public class Scan
{
private byte[] buf = new byte[1024];
private int total;
private int index;
private InputStream in;
public Scan()
{
in = System.in;
}
public int scan() throws IOException
{
if(total < 0)
throw new InputMismatchException();
if(index >= total)
{
index = 0;
total = in.read(buf);
if(total <= 0)
return -1;
}
return buf[index++];
}
public int scanInt() throws IOException
{
int integer = 0;
int n = scan();
while(isWhiteSpace(n)) /* remove starting white spaces */
n = scan();
int neg = 1;
if(n == '-')
{
neg = -1;
n = scan();
}
while(!isWhiteSpace(n))
{
if(n >= '0' && n <= '9')
{
integer *= 10;
integer += n-'0';
n = scan();
}
else
throw new InputMismatchException();
}
return neg*integer;
}
public String scanString()throws IOException
{
StringBuilder sb = new StringBuilder();
int n = scan();
while(isWhiteSpace(n))
n = scan();
while(!isWhiteSpace(n))
{
sb.append((char)n);
n = scan();
}
return sb.toString();
}
public double scanDouble()throws IOException
{
double doub=0;
int n=scan();
while(isWhiteSpace(n))
n=scan();
int neg=1;
if(n=='-')
{
neg=-1;
n=scan();
}
while(!isWhiteSpace(n)&& n != '.')
{
if(n>='0'&&n<='9')
{
doub*=10;
doub+=n-'0';
n=scan();
}
else throw new InputMismatchException();
}
if(n=='.')
{
n=scan();
double temp=1;
while(!isWhiteSpace(n))
{
if(n>='0'&&n<='9')
{
temp/=10;
doub+=(n-'0')*temp;
n=scan();
}
else throw new InputMismatchException();
}
}
return doub*neg;
}
public boolean isWhiteSpace(int n)
{
if(n == ' ' || n == '\n' || n == '\r' || n == '\t' || n == -1)
return true;
return false;
}
public void close()throws IOException
{
in.close();
}
}
And the customized Print class can be as follows
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
public class Print
{
private BufferedWriter bw;
public Print()
{
this.bw = new BufferedWriter(new OutputStreamWriter(System.out));
}
public void print(Object object)throws IOException
{
bw.append("" + object);
}
public void println(Object object)throws IOException
{
print(object);
bw.append("\n");
}
public void close()throws IOException
{
bw.close();
}
}

You can use BufferedReader for reading data
BufferedReader inp = new BufferedReader(new InputStreamReader(System.in));
int t = Integer.parseInt(inp.readLine());
while(t-->0){
int n = Integer.parseInt(inp.readLine());
int[] arr = new int[n];
String line = inp.readLine();
String[] str = line.trim().split("\\s+");
for(int i=0;i<n;i++){
arr[i] = Integer.parseInt(str[i]);
}
And for printing use StringBuffer
StringBuffer sb = new StringBuffer();
for(int i=0;i<n;i++){
sb.append(arr[i]+" ");
}
System.out.println(sb);

Here is the full version fast reader and writer. I also used Buffering.
import java.io.*;
import java.util.*;
public class FastReader {
private static StringTokenizer st;
private static BufferedReader in;
private static PrintWriter pw;
public static void main(String[] args) throws IOException {
in = new BufferedReader(new InputStreamReader(System.in));
pw = new PrintWriter(new BufferedWriter(new OutputStreamWriter(System.out)));
st = new StringTokenizer("");
pw.close();
}
private static int nextInt() throws IOException {
return Integer.parseInt(next());
}
private static long nextLong() throws IOException {
return Long.parseLong(next());
}
private static double nextDouble() throws IOException {
return Double.parseDouble(next());
}
private static String next() throws IOException {
while(!st.hasMoreElements() || st == null){
st = new StringTokenizer(in.readLine());
}
return st.nextToken();
}
}

Reading from disk, again and again, makes the Scanner slow. I like to use the combination of BufferedReader and Scanner to get the best of both worlds. i.e. speed of BufferredReader and rich and easy API of the scanner.
Scanner scanner = new Scanner(new BufferedReader(new InputStreamReader(System.in)));

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Reading a file with newlines as a tuple in pig - hadoop

Is it possible to change the record delimiter from newline to some other string so as to read a file with newlines into a single tuple in pig.

Yes. A = LOAD '...' USING PigStorage(',') AS (...); //comma is the delimeter for fields SET textinputformat.record.delimiter '<delimeter>'; // record delimeter, by default it is `\n`. You can change to any delimeter.

Related

Are there any Dart resources that would split a command-line String into a List<String> of arguments?

how to exclude checking words in quotations using java

Duplication with Chunks in Spring Batch

JeroMQ: connection does not recover reliably

What is the fastest way to read from standard input in Scala? [duplicate]

Categories

Resources