How to Delete an entry from MapFile in Hadoop - hadoop

Is there any solution to delete an entry from MapFile in Hadoop. I could able to read and write entries to a MapFile, but i am totally unaware of deleting or updating an entry from it. Is there any good solution for the same ? Any help is appreciated. Thanks in Advance.

hdfs is basically supports data warehousing facilities. You can not modify existing content of any hdfs file, at most you can append new content at bottom of fine.
You can refer similar question

Suppose file contain below 2 lines
hi hello world
this is fine
Now in mapper write logic string which contains "hello" , and pass it to reducer phase.
now the reducer output will contain only "hi hello world"
If you want any other than please specify with short use case.

Related

How to write a Custom Input Format

I am a newbie to Hadoop and I have a situation where only one line per 4 lines of the input text is relevant. Currently I am using the default TextInputFormat and a conditional logic to skip all the other three lines which is irrelevant.
How can I use a Custom Input Format to handle this. Since Am new to hadoop I don't know much about CustomInputFormat. Any help would be appreciated. Thanks !
I think you can use NLineInputFormat where you can specify how many line constructs one record. This could be easy & ready to use solution.
If you want to implement your own input format then it you would probably implement custom input format & record reader to specify what constructs your one record.
below is one of of the example
http://deep-developers.blogspot.in/2014/06/custom-input-split-and-custom.html

Reading and writing to hadoop sequence file using scala

I just started using scalding and trying to find examples of reading a text file and writing to a hadoop sequence file.
Any help is appreciated.
You can use com.twitter.scalding.WritableSequenceFile (please note that you have to use the fully quantified name, otherwise it picks up the cascading one). Hope this helps.

storing a file in an already occupied location in Pig

It seems that Pig prevents us from reusing an output directory. In that case, I want to write a Pig UDF that will accept a filename as parameter, open the file within the UDF and append the contents to the already existing file at the location. Is this possible?
Thanks in advance
It may be possible, but I don't know that it's advisable. Why not just have a new output directory? For example, if ultimately you want all your results in /path/to/results, STORE the output of the first run into /path/to/results/001, the next run into /path/to/results/002, and so on. This way you can easily identify bad data from any failed jobs, and if you want all of it together, you can just do hdfs -cat /path/to/results/*/*.
If you don't actually want to append but instead want to just replace the existing contents, you can use Pig's RMF shell command:
%DEFINE output /path/to/results
RMF $output
STORE results INTO '$output';

Pig removing parentheses when storing output

I'm new in programming Pig and currently I'm trying to implement my Hadoop jobs with pig.
So far my Pig programs work. I've got some output files stored as *.txt with semicolon as delimiter.
My problem is that Pig adds parentheses around the tuple's...
Is it possible to store the output in a file without these parentheses? Only storing the values? Maybe by overwriting the PigStorage method with an UDF?
Does anyone have a hint for me?
I want to read my output files into a RDBMS (Oracle) without the parentheses.
You probably need to write your own custom Storer. See: http://wiki.apache.org/pig/Pig070LoadStoreHowTo.
Shouldn't be too difficult to just write it as a plain CSV or whatever. There's also a pre-existing DBStorage class that you might be able to use to write directly to Oracle if you want.
For people who find find this topic first, question is answered here:
Remove brackets and commas in output from Pig
use the FLATTEN command in your script like this:
output = FOREACH [variable] GENERATE FLATTEN (($1, $2, $3));<br>
STORE output INTO '[path]' USING PigStorage(,);
notice the second set of parentheses for the output you want to flatten.

How to read a PS file in reverse order?

I have a PS file to be read in reverse order and process accordingly. Do we have a way to mention to read the file in reverse order in FD in COBOL module? OR do we have something to achieve the same using SORT?
Note: Reading the records into a buffer (array) and using it in reverse order would be the first idea that comes to mind but that way doesnt work for file with millions of records.
Your suggestions will be appreciated.
I do not believe there is a standard method for doing this in COBOL. However, if the file contains fixed length records you might try processing it as a relative file and just run thourgh it by descending record number. The other option is, as you suggest, sort it in reverse order then process as "normal".
If the device the file is on supports it, you can use "OPEN INPUT fname REVERSED". But the file will need to be on a tape or a device that is pretending to ba a tape.
Some versions of COBOL support a READ LAST statement to get the last record on the file. Then use READ PRIOR to read the file in reverse order. Not sure what COBOL version you're working with.

Resources