How to write many arguments to the output file from reducer? - hadoop

I have a text file as below
250788965731,20090906,200937,200909,621,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,5,0,1,6.25,635-10-104-40163.
I'm just a beginner in hadoop.I faced the following problem.
How do i print the entire line in my output file? As far as i know only A key & A Value can be written to the output file. How to write this entire line with many arguments in my output file. Or how do i write atleast a few arguments of it in a output file?

Use the TextOutputFormat and write the line as a Text writable as the key. Make the value null
context.write( new Text("your output line") , null);

Related

Difference between cat file_name | sort, sort < file_name, sort file_name in bash

Although they do give the same results, I wonder if there is some difference between them and which is the most appropriate way to sort something contained in a file.
Another thing which intrigues me is the use of delimiters, I noticed that the sort filter only works if you separate the strings with a new line, are there any ways to do this without having to write the new strings in a separate line
The sort(1) command reads lines of text, analyzes and sorts them, and writes out the result. The command is intended to read lines, and lines in unix/linux are terminated by a new line.
The command takes its first non-option argument as the file to read; if there is no specification it reads standard input. So:
sort file_name
is a command line with such argument. The other two examples, "... | sort" and "sort < ..." do not specify the file to read directly to sort(1), but use its standard input. The effect, for what sort(1) is concerned, is the same.
ways to do this without having to write the new strings in a separate line
Ultimately no. But if you want you can feed sort using another filter (a program), which reads the file non-linefeed-separated and creates lines to pass to sort. If such program exists and is named "myparse", you can do:
myparse non-linefeed-separated-file | sort
The solution using cat involves creating a second process unnecessarily. This could be a performance issue if you perform many of such operation in a loop.
When doing input redirection to your file, the shell is setting up the association of file with std input. If the file would not exist, the shell complains about the file being missing.
When passing the file name as explicit argument, the sort process has to care about opening the file and to report an error if there is an accessability problem with it.

How to remove invisible new line characters in Perl

I am writing a shell script in perl that takes values from two databases and compares them. When the script is finished it outputs a report that is supposed to be formatted this way:
Table Name Date value from Database 1 value from Database 2 Difference
The output is printed into a report file, but even when it is output to the command console it looks like this:
tablename 2017-06-20 7629628
7629628
0
Here's my code that makes the string then outputs it to the file:
$outputstring="$tablelist[0] $DATErowcount0[$output_iteration] $rowcount0[$output_iteration] $TDrowcount0[$output_iteration] $count_dif\n";
print FILE $outputstring;
There seems to be a newline character hidden after $rowcount0[$output_iteration] and before $count_dif. What do I need to do to fix this/print it all in one line?
To fill the arrays with values, values are read from files created by SQL commands.
Here's some of the code:
`$num_from_TDfile=substr $r2, 16;
$date_from_TDfile = substr $r2, 0, 12;
$TDrowcount0[$TDnum_rows0]=$num_from_TDfile;
$DATETDrowcount0[$TDnum_rows0]=$date_from_TDfile;
$TDnum_rows0=$TDnum_rows0+1;`
Adding the chomp to each of the strings taken from the files as suggested by tadman fixed the output so that it was all on one line rather than three lines as in the question's example.

Inserting new line when joining files in VBScript

I have two text files that I want to combine ..I am using the below code to do that ..the issue is at the start of the second file this code is inserting some weird characters like spaces..Is there a way to insert a new line instead of using writeline.
Set txsOutput = FSO.CreateTextFile(strOutputPath)
Set txsInput = FSO.OpenTextFile(strInputPath,1)
txsOutput.Writeline txsInput.ReadAll
Thanks
.ReadAll() reads the trailing EOL(s) of the file. .Writeline will add a further EOL. Use .Write instead to get an exact copy of the first input file as the head of the output file.
If the "weird characters like spaces" are - unwanted - parts of the first file, you'll have to use string ops (Instr, Left, Replace, ...) or a RegExp to clean the data.
If they come from the second file (assuming you used .ReadAll for that too), you should check the encoding of that file and/or clean the data using the methods above.

Hadoop sort example fails with 'not a SequenceFile'. How set the SequenceFile

I'm trying to run bin/hadoop jar hadoop-examples-1.0.4.jar sort input output
but get an error "java.io.IOException: hdfs://master:9000/usr/ubuntu/input/file1 not a SequenceFile"
If I run bin/hadoop jar hadoop-examples-1.0.4.jar wordcount input output It's work.
So I can't figure out how to deal with it
The error message here is exactly right; the sort example is expecting a sequence file - a flat file of binary keys and values as input, the kind that are often generated as output from MapReduce jobs.
However, the wordcount example is not expecting a sequence file in particular as input, merely a text file which is read in with the keys being the offset (line number) into the file, with the value being the line content.
Seeing as the input files you have are not sequence files per se, sort cannot run using them.
#Jork, If you observe sort the example given in hadoop-examples-1.0.4.jar, You can change the Input and Output Formats through command line arguements, Or You can change in the program from SequenceFileInputFormat to Text type. hadoop
I had the same issue. Here , https://wiki.apache.org/hadoop/Sort, it says "The inputs and outputs must be Sequence files."
You should convert your input file to a hadoop sequence file, I wish there was an easier way. I found this tutorial helpful, good luck! https://examples.javacodegeeks.com/enterprise-java/apache-hadoop/hadoop-sequence-file-example/

How do I pass a text file or a string to a Ruby script and output the result in the command line?

As an exercise to learn Ruby, I would like to create a script that will be run from the terminal. It should accept as input either a string or a text file and it should output the result of various string parsing mechanisms that I will write myself.
To get me started, would you please translate this pseudo-code into proper Ruby for me?
In terminal: ruby myscript.rb (either a string or a text file).
In myscript.rb: Retrieve input. Set my_input to the input.
Set my_output to the result of various_string_parsing_voodoo (done to my_input).
puts my_output
I intend to actually write the code myself, but if someone could supply me with a skeleton .rb file to send in "Hello World" and get "[World] is pleased by your [hello]" or something equally inane that'd be a great help.
Here are some key pieces:
ARGV is an array containing the arguments you passed when running your script from command line.
the File class contains several utilies. For example, File.exists?(path) returns true if the path exists, and File.file?(path) returns true if the path exists and is a file (not a dir).
I think this may help you quite a bit.

Resources