Run LoadIncrementalHFiles from Java client

Run LoadIncrementalHFiles from Java client - hadoop

I want to call hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/myuser/map_data/hfiles mytable method from my Java client code.
When I run the application I get the following exception:
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file webhdfs://myserver.de:50070/user/myuser/map_data/hfiles/b/b22db8e263b74a7dbd8e36f9ccf16508
at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:477)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:520)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:632)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:549)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:546)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getDecompressor(Compression.java:327)
at org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:422)
at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:529)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlock(HFileBlock.java:1350)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlockWithBlockType(HFileBlock.java:1356)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.<init>(HFileReaderV2.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV3.<init>(HFileReaderV3.java:77)
at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:467)
... 8 more
Running the hbase ... command above from console on my Hadoop server works perfectly. But when I try to run these from my Java Code using HBase /Hadoop client libraries it fails with the exception!
Here a code snippet:
public static void main(String[] args) {
try {
Configuration conf = loginFromKeyTab("REALM.DE", "server.de", "user", "C:/user.keytab");
conf.set("fs.webhdfs.impl", org.apache.hadoop.hdfs.web.WebHdfsFileSystem.class.getName());
conf.set("hbase.zookeeper.quorum", "server1.de,server2.de,server3.de");
conf.set("zookeeper.znode.parent", "/hbase-secure");
conf.set("hbase.master.kerberos.principal", "hbase/_HOST#REALM.DE");
conf.set("hbase.regionserver.kerberos.principal", "hbase/_HOST#REALM.DE");
conf.set("hbase.security.authentication", "kerberos");
Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("mytable"));
RegionLocator locator = connection.getRegionLocator(table.getName());
Job job = Job.getInstance(conf, "Test Bulk Load");
//HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
//Configuration conf2 = job.getConfiguration();
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(HDFS_PATH), connection.getAdmin(), table, locator);
} catch(Exception e) {
e.printStackTrace();
}
}
Do I need to add a dependency to my project? But how / where / which version?
I'm working with HDP 2.5 which contains HBase 1.1.2 and Hadoop 2.7.3

I found another solution for my issue: I'm using a Java program that runs a Process instance that calls the LoadIncrementalHFiles method automatically (running directly on the Hadoop node), instead of using the LoadIncrementalHFiles class itself in my code!
Here the code snippet of my solution:
TreeSet<String> subDirs = getHFileDirectories(new Path(HDFS_OUTPUT_PATH), conf); // The HDFS_OUTPUT_PATH directory contains many HFile sub-directories
for(String hFileDir : subDirs) {
String pathToReadFrom = HDFS_OUTPUT_PATH + "/" + hFileDir;
String[] execCode = {"hbase", "org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles", "-Dcreate.table=no", pathToReadFrom, "mytable"}; // Important: Separate each parameter here!!!
ProcessBuilder pb = new ProcessBuilder(execCode);
pb.redirectErrorStream(true);
final Process p = pb.start();
new Thread(new Runnable() {
public void run() {
BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = null;
try {
while ((line = input.readLine()) != null)
System.out.println(line);
} catch (IOException e) {
e.printStackTrace();
}
}
}).start();
p.waitFor();
int exitCode = p.exitValue();
System.out.println(" ==> Exit Code: " + exitCode);
}
System.out.println("Finished");
If somebody has another solution (e.g. how to use the LoadIncrementalHFiles class directly in code), let me know. Thank you!

Related

How to load H20 trained model in Java UDF

I am trying to load trained xgboost model to be used in custom UDF written in Java. File is in zip format and stored in hdfs.
I have tried to read it using Path class but it's not working.
import org.apache.hadoop.fs.Path;
public EasyPredictModelWrapper loadModel(String xgBoostModelFile) {
if (model == null) {
synchronized (_lockObject) {
if (model == null) {
log.info("Model has not been loaded, loading ...");
try {
Path path = new Path(xgBoostModelFile);
model = new EasyPredictModelWrapper(MojoModel.load(path)); // Doesn't compile since MojoModel only takes string as an input.
} catch (IOException e) {
log.error("Got an exception while trying to load xgBoostModel \n", e);
}
}
}
}
return model;
}
I Want to successfully load model.zip

Got answer in H20 slack community.
FileSystem fs = FileSystem.get(new Configuration());
Path path = new Path(xgBoostModelFile);
FSDataInputStream inputStream = fs.open(path);
MojoReaderBackend mojoReaderBackend = MojoReaderBackendFactory.createReaderBackend(inputStream,CachingStrategy.MEMORY);
model = new EasyPredictModelWrapper(MojoModel.load(mojoReaderBackend));

How to call .sh script from a JavaFX button`s action handler? [duplicate]

It is quite simple to run a Unix command from Java.
Runtime.getRuntime().exec(myCommand);
But is it possible to run a Unix shell script from Java code? If yes, would it be a good practice to run a shell script from within Java code?

You should really look at Process Builder. It is really built for this kind of thing.
ProcessBuilder pb = new ProcessBuilder("myshellScript.sh", "myArg1", "myArg2");
Map<String, String> env = pb.environment();
env.put("VAR1", "myValue");
env.remove("OTHERVAR");
env.put("VAR2", env.get("VAR1") + "suffix");
pb.directory(new File("myDir"));
Process p = pb.start();

You can use Apache Commons exec library also.
Example :
package testShellScript;
import java.io.IOException;
import org.apache.commons.exec.CommandLine;
import org.apache.commons.exec.DefaultExecutor;
import org.apache.commons.exec.ExecuteException;
public class TestScript {
int iExitValue;
String sCommandString;
public void runScript(String command){
sCommandString = command;
CommandLine oCmdLine = CommandLine.parse(sCommandString);
DefaultExecutor oDefaultExecutor = new DefaultExecutor();
oDefaultExecutor.setExitValue(0);
try {
iExitValue = oDefaultExecutor.execute(oCmdLine);
} catch (ExecuteException e) {
System.err.println("Execution failed.");
e.printStackTrace();
} catch (IOException e) {
System.err.println("permission denied.");
e.printStackTrace();
}
}
public static void main(String args[]){
TestScript testScript = new TestScript();
testScript.runScript("sh /root/Desktop/testScript.sh");
}
}
For further reference, An example is given on Apache Doc also.

I think you have answered your own question with
Runtime.getRuntime().exec(myShellScript);
As to whether it is good practice... what are you trying to do with a shell script that you cannot do with Java?

I would say that it is not in the spirit of Java to run a shell script from Java. Java is meant to be cross platform, and running a shell script would limit its use to just UNIX.
With that said, it's definitely possible to run a shell script from within Java. You'd use exactly the same syntax you listed (I haven't tried it myself, but try executing the shell script directly, and if that doesn't work, execute the shell itself, passing the script in as a command line parameter).

Yes it is possible to do so. This worked out for me.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.omg.CORBA.portable.InputStream;
public static void readBashScript() {
try {
Process proc = Runtime.getRuntime().exec("/home/destino/workspace/JavaProject/listing.sh /"); //Whatever you want to execute
BufferedReader read = new BufferedReader(new InputStreamReader(
proc.getInputStream()));
try {
proc.waitFor();
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
while (read.ready()) {
System.out.println(read.readLine());
}
} catch (IOException e) {
System.out.println(e.getMessage());
}
}

Here is my example. Hope it make sense.
public static void excuteCommand(String filePath) throws IOException{
File file = new File(filePath);
if(!file.isFile()){
throw new IllegalArgumentException("The file " + filePath + " does not exist");
}
if(isLinux()){
Runtime.getRuntime().exec(new String[] {"/bin/sh", "-c", filePath}, null);
}else if(isWindows()){
Runtime.getRuntime().exec("cmd /c start " + filePath);
}
}
public static boolean isLinux(){
String os = System.getProperty("os.name");
return os.toLowerCase().indexOf("linux") >= 0;
}
public static boolean isWindows(){
String os = System.getProperty("os.name");
return os.toLowerCase().indexOf("windows") >= 0;
}

Yes, it is possible and you have answered it! About good practises, I think it is better to launch commands from files and not directly from your code. So you have to make Java execute the list of commands (or one command) in an existing .bat, .sh , .ksh ... files.
Here is an example of executing a list of commands in a file MyFile.sh:
String[] cmd = { "sh", "MyFile.sh", "\pathOfTheFile"};
Runtime.getRuntime().exec(cmd);

To avoid having to hardcode an absolute path, you can use the following method that will find and execute your script if it is in your root directory.
public static void runScript() throws IOException, InterruptedException {
ProcessBuilder processBuilder = new ProcessBuilder("./nameOfScript.sh");
//Sets the source and destination for subprocess standard I/O to be the same as those of the current Java process.
processBuilder.inheritIO();
Process process = processBuilder.start();
int exitValue = process.waitFor();
if (exitValue != 0) {
// check for errors
new BufferedInputStream(process.getErrorStream());
throw new RuntimeException("execution of script failed!");
}
}

As for me all things must be simple.
For running script just need to execute
new ProcessBuilder("pathToYourShellScript").start();

The ZT Process Executor library is an alternative to Apache Commons Exec. It has functionality to run commands, capturing their output, setting timeouts, etc.
I have not used it yet, but it looks reasonably well-documented.
An example from the documentation: Executing a command, pumping the stderr to a logger, returning the output as UTF8 string.
String output = new ProcessExecutor().command("java", "-version")
.redirectError(Slf4jStream.of(getClass()).asInfo())
.readOutput(true).execute()
.outputUTF8();
Its documentation lists the following advantages over Commons Exec:
Improved handling of streams
Reading/writing to streams
Redirecting stderr to stdout
Improved handling of timeouts
Improved checking of exit codes
Improved API
One liners for quite complex use cases
One liners to get process output into a String
Access to the Process object available
Support for async processes ( Future )
Improved logging with SLF4J API
Support for multiple processes

This is a late answer. However, I thought of putting the struggle I had to bear to get a shell script to be executed from a Spring-Boot application for future developers.
I was working in Spring-Boot and I was not able to find the file to be executed from my Java application and it was throwing FileNotFoundFoundException. I had to keep the file in the resources directory and had to set the file to be scanned in pom.xml while the application was being started like the following.
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
<include>**/*.properties</include>
<include>**/*.sh</include>
</includes>
</resource>
</resources>
After that I was having trouble executing the file and it was returning error code = 13, Permission Denied. Then I had to make the file executable by running this command - chmod u+x myShellScript.sh
Finally, I could execute the file using the following code snippet.
public void runScript() {
ProcessBuilder pb = new ProcessBuilder("src/main/resources/myFile.sh");
try {
Process p;
p = pb.start();
} catch (IOException e) {
e.printStackTrace();
}
}
Hope that solves someone's problem.

Here is an example how to run an Unix bash or Windows bat/cmd script from Java. Arguments can be passed on the script and output received from the script. The method accepts arbitrary number of arguments.
public static void runScript(String path, String... args) {
try {
String[] cmd = new String[args.length + 1];
cmd[0] = path;
int count = 0;
for (String s : args) {
cmd[++count] = args[count - 1];
}
Process process = Runtime.getRuntime().exec(cmd);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
try {
process.waitFor();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
while (bufferedReader.ready()) {
System.out.println("Received from script: " + bufferedReader.readLine());
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
System.exit(1);
}
}
When running on Unix/Linux, the path must be Unix-like (with '/' as separator), when running on Windows - use '\'. Hier is an example of a bash script (test.sh) that receives arbitrary number of arguments and doubles every argument:
#!/bin/bash
counter=0
while [ $# -gt 0 ]
do
echo argument $((counter +=1)): $1
echo doubling argument $((counter)): $(($1+$1))
shift
done
When calling
runScript("path_to_script/test.sh", "1", "2")
on Unix/Linux, the output is:
Received from script: argument 1: 1
Received from script: doubling argument 1: 2
Received from script: argument 2: 2
Received from script: doubling argument 2: 4
Hier is a simple cmd Windows script test.cmd that counts number of input arguments:
#echo off
set a=0
for %%x in (%*) do Set /A a+=1
echo %a% arguments received
When calling the script on Windows
runScript("path_to_script\\test.cmd", "1", "2", "3")
The output is
Received from script: 3 arguments received

It is possible, just exec it as any other program. Just make sure your script has the proper #! (she-bang) line as the first line of the script, and make sure there are execute permissions on the file.
For example, if it is a bash script put #!/bin/bash at the top of the script, also chmod +x .
Also as for if it's good practice, no it's not, especially for Java, but if it saves you a lot of time porting a large script over, and you're not getting paid extra to do it ;) save your time, exec the script, and put the porting to Java on your long-term todo list.

I think with
System.getProperty("os.name");
Checking the operating system on can manage the shell/bash scrips if such are supported.
if there is need to make the code portable.

String scriptName = PATH+"/myScript.sh";
String commands[] = new String[]{scriptName,"myArg1", "myArg2"};
Runtime rt = Runtime.getRuntime();
Process process = null;
try{
process = rt.exec(commands);
process.waitFor();
}catch(Exception e){
e.printStackTrace();
}

Just the same thing that Solaris 5.10 it works like this ./batchstart.sh there is a trick I don´t know if your OS accept it use \\. batchstart.sh instead. This double slash may help.

for linux use
public static void runShell(String directory, String command, String[] args, Map<String, String> environment)
{
try
{
if(directory.trim().equals(""))
directory = "/";
String[] cmd = new String[args.length + 1];
cmd[0] = command;
int count = 1;
for(String s : args)
{
cmd[count] = s;
count++;
}
ProcessBuilder pb = new ProcessBuilder(cmd);
Map<String, String> env = pb.environment();
for(String s : environment.keySet())
env.put(s, environment.get(s));
pb.directory(new File(directory));
Process process = pb.start();
BufferedReader inputReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter outputReader = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
BufferedReader errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
int exitValue = process.waitFor();
if(exitValue != 0) // has errors
{
while(errReader.ready())
{
LogClass.log("ErrShell: " + errReader.readLine(), LogClass.LogMode.LogAll);
}
}
else
{
while(inputReader.ready())
{
LogClass.log("Shell Result : " + inputReader.readLine(), LogClass.LogMode.LogAll);
}
}
}
catch(Exception e)
{
LogClass.log("Err: RunShell, " + e.toString(), LogClass.LogMode.LogAll);
}
}
public static void runShell(String path, String command, String[] args)
{
try
{
String[] cmd = new String[args.length + 1];
if(!path.trim().isEmpty())
cmd[0] = path + "/" + command;
else
cmd[0] = command;
int count = 1;
for(String s : args)
{
cmd[count] = s;
count++;
}
Process process = Runtime.getRuntime().exec(cmd);
BufferedReader inputReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter outputReader = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
BufferedReader errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
int exitValue = process.waitFor();
if(exitValue != 0) // has errors
{
while(errReader.ready())
{
LogClass.log("ErrShell: " + errReader.readLine(), LogClass.LogMode.LogAll);
}
}
else
{
while(inputReader.ready())
{
LogClass.log("Shell Result: " + inputReader.readLine(), LogClass.LogMode.LogAll);
}
}
}
catch(Exception e)
{
LogClass.log("Err: RunShell, " + e.toString(), LogClass.LogMode.LogAll);
}
}
and for usage;
ShellAssistance.runShell("", "pg_dump", new String[]{"-U", "aliAdmin", "-f", "/home/Backup.sql", "StoresAssistanceDB"});
OR
ShellAssistance.runShell("", "pg_dump", new String[]{"-U", "aliAdmin", "-f", "/home/Backup.sql", "StoresAssistanceDB"}, new Hashmap<>());

How to write huge data from Java to HDFS

Our Java application generates huge data (long running program), but unable to store the data efficiently.
Public class HDFSWriter {
FSDataOutputStream out = null;
FileSystem fs = null;
Configuration conf = null;
static int linescounter = 0;
void CreateHDFSFile() {
Path filePath = new Path("filename.CSV");
conf = new Configuration();
fs = FileSystem.get(conf);
out = fs.create(filePath);
}
void writeHDFSFile(String csvLine) {
out.writeBytes(csvLine);
linescounter++;
if(linescounter>=500) {
linescounter=0;
out.writeBytes(csvLine);
//out.hsync();
//out.hflush();
}
}
void close() {
fs.close();
}
}
CreateHDFSFile method is called start of the program.
writeHDFSFile method is called for each line to insert into HDFS File.
close method is called at end of the program.
Even though I invoke hsync or hflush, data is not appearing in HDFS. It's appearing only after the complete program is completed i.e, after fs.close().
How to make the data available during the HDFS file is created or at every time interval or particular number of records?

distcp java api exit the application

i need to copy files between aws s3 and our local hdfs, i tried to use distcp java api but the problem with it is at the end of distcp it called System.exit(), which stopped my app too, so if i have multiple folders/files to copy and i used multiple threads, each thread perform a distcp command, the first thread who finish the distcp will stop the app, thus stop the rest of distcp, is there any other way to avoid this, i know i can write up my own MR job to do the copied but want to know if there other options
my code:
List<Future<Void>> calls = new ArrayList<Future<Void>>();
for (String dir : s3Dirs) {
final String[] args = new String[4];
args[0] = "-log";
args[1] = LOG_DIR;
args[2] = S3_DIR;
args[3] = LOCAL_HDFS_DIR
calls.add(_exec.submit(new Callable<Void>() {
#Override
public Void call() throws Exception {
try {
DistCp.main(args); <-- Distcp command
} catch (Exception e) {
System.out.println("Failed to copy files from " + args[2] + " to " + args[3]);
}
return null;
}
}));
}
for (Future<Void> f : calls) {
try {
f.get();
} catch (Exception e) {
LOGGER.error("Error while distcp", e);
}
}
Distcp main()
public static void main(String argv[]) {
int exitCode;
try {
DistCp distCp = new DistCp();
Cleanup CLEANUP = new Cleanup(distCp);
ShutdownHookManager.get().addShutdownHook(CLEANUP,
SHUTDOWN_HOOK_PRIORITY);
exitCode = ToolRunner.run(getDefaultConf(), distCp, argv);
}
catch (Exception e) {
LOG.error("Couldn't complete DistCp operation: ", e);
exitCode = DistCpConstants.UNKNOWN_ERROR;
}
System.exit(exitCode); <--- exit here
}

I have used distcp before and never faced the System.exit() problem, even with multiple threads. Try, instead of using the Distcp like that, using the ToolRunner to invoke a distcp call(like it is used on the Distcp Test cases from the hadoop tools package). The Distcp Test cases use the ToolRunner to run distcp and it allows you to run it with multiple threads. I am copying the code snippet from the above link here:
public void testCopyFromLocalToLocal() throws Exception {
Configuration conf = new Configuration();
FileSystem localfs = FileSystem.get(LOCAL_FS, conf);
MyFile[] files = createFiles(LOCAL_FS, TEST_ROOT_DIR+"/srcdat");
ToolRunner.run(new DistCp(new Configuration()),
new String[] {"file:///"+TEST_ROOT_DIR+"/srcdat",
"file:///"+TEST_ROOT_DIR+"/destdat"});
assertTrue("Source and destination directories do not match.",
checkFiles(localfs, TEST_ROOT_DIR+"/destdat", files));
deldir(localfs, TEST_ROOT_DIR+"/destdat");
deldir(localfs, TEST_ROOT_DIR+"/srcdat");
}

Bufferreader and Bufferwriter for reading and writing hdfs files

I'm trying to read from a hdfs file line by line and then create a hdfs file and write to it line by line. The code that I use looks like this:
Path FileToRead=new Path(inputPath);
FileSystem hdfs = FileToRead.getFileSystem(new Configuration());
FSDataInputStream fis = hdfs.open(FileToRead);
BufferedReader reader = new BufferedReader(new InputStreamReader(fis));
String line;
line = reader.readLine();
while (line != null){
String[] lineElem = line.split(",");
for(int i=0;i<10;i++){
MyMatrix[i][Integer.valueOf(lineElem[0])-1] = Double.valueOf(lineElem[i+1]);
}
line=reader.readLine();
}
reader.close();
fis.close();
Path FileToWrite = new Path(outputPath+"/V");
FileSystem fs = FileSystem.get(new Configuration());
FSDataOutputStream fileOut = fs.create(FileToWrite);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(fileOut));
writer.write("check");
writer.close();
fileOut.close();
When I run this code in my outputPath file V has not been created. But if I replace the part for reading with the part for writing the file will be created and check is written into it.
Can anyone please help me understand how to use them correctly to be able to read first the whole file and then write to the file line by line?
I have also tried another code for reading from one file and writing to another one but the file will be created but there is nothing written into it!
I use sth like this:
hadoop jar main.jar program2.Main input output
Then in my first job I read from arg[0] and write to a file in args[1]+"/NewV" using map reduce classes and it works.
In my other class (non map reduce)I use args[1]+"/NewV" as input path and output+"/V_0" as output path (I pass these strings to constructor). here is the code for the class :
public class Init_V {
String inputPath, outputPath;
public Init_V(String inputPath, String outputPath) throws Exception {
this.inputPath = inputPath;
this.outputPath = outputPath;
try{
FileSystem fs = FileSystem.get(new Configuration());
Path FileToWrite = new Path(outputPath+"/V.txt");
Path FileToRead=new Path(inputPath);
BufferedWriter output = new BufferedWriter
(new OutputStreamWriter(fs.create(FileToWrite,
true)));
BufferedReader reader = new
BufferedReader(new InputStreamReader(fs.open(FileToRead)));
String data;
data = reader.readLine();
while ( data != null )
{
output.write(data);
data = reader.readLine();
}
reader.close();
output.close(); }catch(Exception e){
}
}
}

I think, you need to understand how hadoop works properly. In hadoop, many thing is done by the system, you are just giving input and output path, then they are opened and created by hadoop if the paths are valid. Check the following example;
public int run (String[] args) throws Exception{
if(args.length != 3){
System.err.println("Usage: MapReduce <input path> <output path> ");
ToolRunner.printGenericCommandUsage(System.err);
}
Job job = new Job();
job.setJarByClass(MyClass.class);
job.setNumReduceTasks(5);
job.setJobName("myclass");
FileInputFormat.addInputPath(job, new Path(args[0]) );
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
return job.waitForCompletion(true) ? 0:1 ;
}
/* ----------------------main---------------------*/
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new MyClass(), args);
System.exit(exitCode);
}
As you see here, you only initialize necessary variables and reading&writing is done by hadoop.
Also, in your Mapper class you are saying context.write(key, value) inside map, and similarly in your Reduce class you are doing same, it writes for you.
If you use BufferedWriter/Reader it will write to your local file system not to HDFS. To see files in HDFS you should write hadoop fs -ls <path>, the files you are looking by ls command are in your local file system
EDIT: In order to use read/write you should know the followings: Let say you have N machine in your hadoop network. When you want to read, you will not know which mapper is reading, similarly writing. So, all mappers and reducer should have those paths not to give exception.
I dont know if you could use any other class but you can use two methods for your specific reason: startup and cleanup. These methods are used only once in each map and reduce worker. So if you want to read and write you can use that files. Reading and writing is same as normal java code. For example, you want to see something for each key, and want to write it to a txt. You can do the following:
//in reducer
BufferedReader bw ..;
void startup(...){
bw = new ....;
}
void reduce(...){
while(iter.hasNext()){ ....;
}
bw.write(key, ...);
}
void cleanup(...){
bw.close();
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Run LoadIncrementalHFiles from Java client - hadoop

Related

How to load H20 trained model in Java UDF

How to call .sh script from a JavaFX button`s action handler? [duplicate]

How to write huge data from Java to HDFS

distcp java api exit the application

Bufferreader and Bufferwriter for reading and writing hdfs files

Categories

Resources