distcp java api exit the application - hadoop

i need to copy files between aws s3 and our local hdfs, i tried to use distcp java api but the problem with it is at the end of distcp it called System.exit(), which stopped my app too, so if i have multiple folders/files to copy and i used multiple threads, each thread perform a distcp command, the first thread who finish the distcp will stop the app, thus stop the rest of distcp, is there any other way to avoid this, i know i can write up my own MR job to do the copied but want to know if there other options
my code:
List<Future<Void>> calls = new ArrayList<Future<Void>>();
for (String dir : s3Dirs) {
final String[] args = new String[4];
args[0] = "-log";
args[1] = LOG_DIR;
args[2] = S3_DIR;
args[3] = LOCAL_HDFS_DIR
calls.add(_exec.submit(new Callable<Void>() {
#Override
public Void call() throws Exception {
try {
DistCp.main(args); <-- Distcp command
} catch (Exception e) {
System.out.println("Failed to copy files from " + args[2] + " to " + args[3]);
}
return null;
}
}));
}
for (Future<Void> f : calls) {
try {
f.get();
} catch (Exception e) {
LOGGER.error("Error while distcp", e);
}
}
Distcp main()
public static void main(String argv[]) {
int exitCode;
try {
DistCp distCp = new DistCp();
Cleanup CLEANUP = new Cleanup(distCp);
ShutdownHookManager.get().addShutdownHook(CLEANUP,
SHUTDOWN_HOOK_PRIORITY);
exitCode = ToolRunner.run(getDefaultConf(), distCp, argv);
}
catch (Exception e) {
LOG.error("Couldn't complete DistCp operation: ", e);
exitCode = DistCpConstants.UNKNOWN_ERROR;
}
System.exit(exitCode); <--- exit here
}

I have used distcp before and never faced the System.exit() problem, even with multiple threads. Try, instead of using the Distcp like that, using the ToolRunner to invoke a distcp call(like it is used on the Distcp Test cases from the hadoop tools package). The Distcp Test cases use the ToolRunner to run distcp and it allows you to run it with multiple threads. I am copying the code snippet from the above link here:
public void testCopyFromLocalToLocal() throws Exception {
Configuration conf = new Configuration();
FileSystem localfs = FileSystem.get(LOCAL_FS, conf);
MyFile[] files = createFiles(LOCAL_FS, TEST_ROOT_DIR+"/srcdat");
ToolRunner.run(new DistCp(new Configuration()),
new String[] {"file:///"+TEST_ROOT_DIR+"/srcdat",
"file:///"+TEST_ROOT_DIR+"/destdat"});
assertTrue("Source and destination directories do not match.",
checkFiles(localfs, TEST_ROOT_DIR+"/destdat", files));
deldir(localfs, TEST_ROOT_DIR+"/destdat");
deldir(localfs, TEST_ROOT_DIR+"/srcdat");
}

Related

spring batch error NoSuchJobException: No such job (either in registry or in historical data)

I am new to spring batch. I am trying to write a code to run uncompleted jobs. I am actually getting below error:
org.springframework.batch.core.launch.NoSuchJobException: No such
job (either in registry or in historical data)
Below is the code I've tried, could anyone please tell me what went wrong below?
void restartUncompletedJobs() {
try {
String jobName = "job1";
Job job = jobRegistry.getJob(jobName); //HERE GETTING EXCEPTION
List<Long> jobInstances = jobOperator.getJobInstances(job.getName(), 0, 5);
for (Long jobInstanceId : jobInstances) {
Set<Long> jobRunningExecutions = jobOperator.getRunningExecutions(jobName);
if (jobRunningExecutions.size() > 0) {
jobOperator.startNextInstance(jobName);
} else {
jobOperator.restart(jobInstanceId);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
You need to populate the registry yourself or register a JobRegistryBeanPostProcessor in your application context to populate all jobs in the registry.

Run LoadIncrementalHFiles from Java client

I want to call hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/myuser/map_data/hfiles mytable method from my Java client code.
When I run the application I get the following exception:
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file webhdfs://myserver.de:50070/user/myuser/map_data/hfiles/b/b22db8e263b74a7dbd8e36f9ccf16508
at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:477)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:520)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:632)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:549)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:546)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getDecompressor(Compression.java:327)
at org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:422)
at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:529)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlock(HFileBlock.java:1350)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlockWithBlockType(HFileBlock.java:1356)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.<init>(HFileReaderV2.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV3.<init>(HFileReaderV3.java:77)
at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:467)
... 8 more
Running the hbase ... command above from console on my Hadoop server works perfectly. But when I try to run these from my Java Code using HBase /Hadoop client libraries it fails with the exception!
Here a code snippet:
public static void main(String[] args) {
try {
Configuration conf = loginFromKeyTab("REALM.DE", "server.de", "user", "C:/user.keytab");
conf.set("fs.webhdfs.impl", org.apache.hadoop.hdfs.web.WebHdfsFileSystem.class.getName());
conf.set("hbase.zookeeper.quorum", "server1.de,server2.de,server3.de");
conf.set("zookeeper.znode.parent", "/hbase-secure");
conf.set("hbase.master.kerberos.principal", "hbase/_HOST#REALM.DE");
conf.set("hbase.regionserver.kerberos.principal", "hbase/_HOST#REALM.DE");
conf.set("hbase.security.authentication", "kerberos");
Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("mytable"));
RegionLocator locator = connection.getRegionLocator(table.getName());
Job job = Job.getInstance(conf, "Test Bulk Load");
//HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
//Configuration conf2 = job.getConfiguration();
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(HDFS_PATH), connection.getAdmin(), table, locator);
} catch(Exception e) {
e.printStackTrace();
}
}
Do I need to add a dependency to my project? But how / where / which version?
I'm working with HDP 2.5 which contains HBase 1.1.2 and Hadoop 2.7.3
I found another solution for my issue: I'm using a Java program that runs a Process instance that calls the LoadIncrementalHFiles method automatically (running directly on the Hadoop node), instead of using the LoadIncrementalHFiles class itself in my code!
Here the code snippet of my solution:
TreeSet<String> subDirs = getHFileDirectories(new Path(HDFS_OUTPUT_PATH), conf); // The HDFS_OUTPUT_PATH directory contains many HFile sub-directories
for(String hFileDir : subDirs) {
String pathToReadFrom = HDFS_OUTPUT_PATH + "/" + hFileDir;
String[] execCode = {"hbase", "org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles", "-Dcreate.table=no", pathToReadFrom, "mytable"}; // Important: Separate each parameter here!!!
ProcessBuilder pb = new ProcessBuilder(execCode);
pb.redirectErrorStream(true);
final Process p = pb.start();
new Thread(new Runnable() {
public void run() {
BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = null;
try {
while ((line = input.readLine()) != null)
System.out.println(line);
} catch (IOException e) {
e.printStackTrace();
}
}
}).start();
p.waitFor();
int exitCode = p.exitValue();
System.out.println(" ==> Exit Code: " + exitCode);
}
System.out.println("Finished");
If somebody has another solution (e.g. how to use the LoadIncrementalHFiles class directly in code), let me know. Thank you!

How to call .sh script from a JavaFX button`s action handler? [duplicate]

It is quite simple to run a Unix command from Java.
Runtime.getRuntime().exec(myCommand);
But is it possible to run a Unix shell script from Java code? If yes, would it be a good practice to run a shell script from within Java code?
You should really look at Process Builder. It is really built for this kind of thing.
ProcessBuilder pb = new ProcessBuilder("myshellScript.sh", "myArg1", "myArg2");
Map<String, String> env = pb.environment();
env.put("VAR1", "myValue");
env.remove("OTHERVAR");
env.put("VAR2", env.get("VAR1") + "suffix");
pb.directory(new File("myDir"));
Process p = pb.start();
You can use Apache Commons exec library also.
Example :
package testShellScript;
import java.io.IOException;
import org.apache.commons.exec.CommandLine;
import org.apache.commons.exec.DefaultExecutor;
import org.apache.commons.exec.ExecuteException;
public class TestScript {
int iExitValue;
String sCommandString;
public void runScript(String command){
sCommandString = command;
CommandLine oCmdLine = CommandLine.parse(sCommandString);
DefaultExecutor oDefaultExecutor = new DefaultExecutor();
oDefaultExecutor.setExitValue(0);
try {
iExitValue = oDefaultExecutor.execute(oCmdLine);
} catch (ExecuteException e) {
System.err.println("Execution failed.");
e.printStackTrace();
} catch (IOException e) {
System.err.println("permission denied.");
e.printStackTrace();
}
}
public static void main(String args[]){
TestScript testScript = new TestScript();
testScript.runScript("sh /root/Desktop/testScript.sh");
}
}
For further reference, An example is given on Apache Doc also.
I think you have answered your own question with
Runtime.getRuntime().exec(myShellScript);
As to whether it is good practice... what are you trying to do with a shell script that you cannot do with Java?
I would say that it is not in the spirit of Java to run a shell script from Java. Java is meant to be cross platform, and running a shell script would limit its use to just UNIX.
With that said, it's definitely possible to run a shell script from within Java. You'd use exactly the same syntax you listed (I haven't tried it myself, but try executing the shell script directly, and if that doesn't work, execute the shell itself, passing the script in as a command line parameter).
Yes it is possible to do so. This worked out for me.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.omg.CORBA.portable.InputStream;
public static void readBashScript() {
try {
Process proc = Runtime.getRuntime().exec("/home/destino/workspace/JavaProject/listing.sh /"); //Whatever you want to execute
BufferedReader read = new BufferedReader(new InputStreamReader(
proc.getInputStream()));
try {
proc.waitFor();
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
while (read.ready()) {
System.out.println(read.readLine());
}
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
Here is my example. Hope it make sense.
public static void excuteCommand(String filePath) throws IOException{
File file = new File(filePath);
if(!file.isFile()){
throw new IllegalArgumentException("The file " + filePath + " does not exist");
}
if(isLinux()){
Runtime.getRuntime().exec(new String[] {"/bin/sh", "-c", filePath}, null);
}else if(isWindows()){
Runtime.getRuntime().exec("cmd /c start " + filePath);
}
}
public static boolean isLinux(){
String os = System.getProperty("os.name");
return os.toLowerCase().indexOf("linux") >= 0;
}
public static boolean isWindows(){
String os = System.getProperty("os.name");
return os.toLowerCase().indexOf("windows") >= 0;
}
Yes, it is possible and you have answered it! About good practises, I think it is better to launch commands from files and not directly from your code. So you have to make Java execute the list of commands (or one command) in an existing .bat, .sh , .ksh ... files.
Here is an example of executing a list of commands in a file MyFile.sh:
String[] cmd = { "sh", "MyFile.sh", "\pathOfTheFile"};
Runtime.getRuntime().exec(cmd);
To avoid having to hardcode an absolute path, you can use the following method that will find and execute your script if it is in your root directory.
public static void runScript() throws IOException, InterruptedException {
ProcessBuilder processBuilder = new ProcessBuilder("./nameOfScript.sh");
//Sets the source and destination for subprocess standard I/O to be the same as those of the current Java process.
processBuilder.inheritIO();
Process process = processBuilder.start();
int exitValue = process.waitFor();
if (exitValue != 0) {
// check for errors
new BufferedInputStream(process.getErrorStream());
throw new RuntimeException("execution of script failed!");
}
}
As for me all things must be simple.
For running script just need to execute
new ProcessBuilder("pathToYourShellScript").start();
The ZT Process Executor library is an alternative to Apache Commons Exec. It has functionality to run commands, capturing their output, setting timeouts, etc.
I have not used it yet, but it looks reasonably well-documented.
An example from the documentation: Executing a command, pumping the stderr to a logger, returning the output as UTF8 string.
String output = new ProcessExecutor().command("java", "-version")
.redirectError(Slf4jStream.of(getClass()).asInfo())
.readOutput(true).execute()
.outputUTF8();
Its documentation lists the following advantages over Commons Exec:
Improved handling of streams
Reading/writing to streams
Redirecting stderr to stdout
Improved handling of timeouts
Improved checking of exit codes
Improved API
One liners for quite complex use cases
One liners to get process output into a String
Access to the Process object available
Support for async processes ( Future )
Improved logging with SLF4J API
Support for multiple processes
This is a late answer. However, I thought of putting the struggle I had to bear to get a shell script to be executed from a Spring-Boot application for future developers.
I was working in Spring-Boot and I was not able to find the file to be executed from my Java application and it was throwing FileNotFoundFoundException. I had to keep the file in the resources directory and had to set the file to be scanned in pom.xml while the application was being started like the following.
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
<include>**/*.properties</include>
<include>**/*.sh</include>
</includes>
</resource>
</resources>
After that I was having trouble executing the file and it was returning error code = 13, Permission Denied. Then I had to make the file executable by running this command - chmod u+x myShellScript.sh
Finally, I could execute the file using the following code snippet.
public void runScript() {
ProcessBuilder pb = new ProcessBuilder("src/main/resources/myFile.sh");
try {
Process p;
p = pb.start();
} catch (IOException e) {
e.printStackTrace();
}
}
Hope that solves someone's problem.
Here is an example how to run an Unix bash or Windows bat/cmd script from Java. Arguments can be passed on the script and output received from the script. The method accepts arbitrary number of arguments.
public static void runScript(String path, String... args) {
try {
String[] cmd = new String[args.length + 1];
cmd[0] = path;
int count = 0;
for (String s : args) {
cmd[++count] = args[count - 1];
}
Process process = Runtime.getRuntime().exec(cmd);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
try {
process.waitFor();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
while (bufferedReader.ready()) {
System.out.println("Received from script: " + bufferedReader.readLine());
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
System.exit(1);
}
}
When running on Unix/Linux, the path must be Unix-like (with '/' as separator), when running on Windows - use '\'. Hier is an example of a bash script (test.sh) that receives arbitrary number of arguments and doubles every argument:
#!/bin/bash
counter=0
while [ $# -gt 0 ]
do
echo argument $((counter +=1)): $1
echo doubling argument $((counter)): $(($1+$1))
shift
done
When calling
runScript("path_to_script/test.sh", "1", "2")
on Unix/Linux, the output is:
Received from script: argument 1: 1
Received from script: doubling argument 1: 2
Received from script: argument 2: 2
Received from script: doubling argument 2: 4
Hier is a simple cmd Windows script test.cmd that counts number of input arguments:
#echo off
set a=0
for %%x in (%*) do Set /A a+=1
echo %a% arguments received
When calling the script on Windows
runScript("path_to_script\\test.cmd", "1", "2", "3")
The output is
Received from script: 3 arguments received
It is possible, just exec it as any other program. Just make sure your script has the proper #! (she-bang) line as the first line of the script, and make sure there are execute permissions on the file.
For example, if it is a bash script put #!/bin/bash at the top of the script, also chmod +x .
Also as for if it's good practice, no it's not, especially for Java, but if it saves you a lot of time porting a large script over, and you're not getting paid extra to do it ;) save your time, exec the script, and put the porting to Java on your long-term todo list.
I think with
System.getProperty("os.name");
Checking the operating system on can manage the shell/bash scrips if such are supported.
if there is need to make the code portable.
String scriptName = PATH+"/myScript.sh";
String commands[] = new String[]{scriptName,"myArg1", "myArg2"};
Runtime rt = Runtime.getRuntime();
Process process = null;
try{
process = rt.exec(commands);
process.waitFor();
}catch(Exception e){
e.printStackTrace();
}
Just the same thing that Solaris 5.10 it works like this ./batchstart.sh there is a trick I don´t know if your OS accept it use \\. batchstart.sh instead. This double slash may help.
for linux use
public static void runShell(String directory, String command, String[] args, Map<String, String> environment)
{
try
{
if(directory.trim().equals(""))
directory = "/";
String[] cmd = new String[args.length + 1];
cmd[0] = command;
int count = 1;
for(String s : args)
{
cmd[count] = s;
count++;
}
ProcessBuilder pb = new ProcessBuilder(cmd);
Map<String, String> env = pb.environment();
for(String s : environment.keySet())
env.put(s, environment.get(s));
pb.directory(new File(directory));
Process process = pb.start();
BufferedReader inputReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter outputReader = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
BufferedReader errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
int exitValue = process.waitFor();
if(exitValue != 0) // has errors
{
while(errReader.ready())
{
LogClass.log("ErrShell: " + errReader.readLine(), LogClass.LogMode.LogAll);
}
}
else
{
while(inputReader.ready())
{
LogClass.log("Shell Result : " + inputReader.readLine(), LogClass.LogMode.LogAll);
}
}
}
catch(Exception e)
{
LogClass.log("Err: RunShell, " + e.toString(), LogClass.LogMode.LogAll);
}
}
public static void runShell(String path, String command, String[] args)
{
try
{
String[] cmd = new String[args.length + 1];
if(!path.trim().isEmpty())
cmd[0] = path + "/" + command;
else
cmd[0] = command;
int count = 1;
for(String s : args)
{
cmd[count] = s;
count++;
}
Process process = Runtime.getRuntime().exec(cmd);
BufferedReader inputReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter outputReader = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
BufferedReader errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
int exitValue = process.waitFor();
if(exitValue != 0) // has errors
{
while(errReader.ready())
{
LogClass.log("ErrShell: " + errReader.readLine(), LogClass.LogMode.LogAll);
}
}
else
{
while(inputReader.ready())
{
LogClass.log("Shell Result: " + inputReader.readLine(), LogClass.LogMode.LogAll);
}
}
}
catch(Exception e)
{
LogClass.log("Err: RunShell, " + e.toString(), LogClass.LogMode.LogAll);
}
}
and for usage;
ShellAssistance.runShell("", "pg_dump", new String[]{"-U", "aliAdmin", "-f", "/home/Backup.sql", "StoresAssistanceDB"});
OR
ShellAssistance.runShell("", "pg_dump", new String[]{"-U", "aliAdmin", "-f", "/home/Backup.sql", "StoresAssistanceDB"}, new Hashmap<>());

How to write huge data from Java to HDFS

Our Java application generates huge data (long running program), but unable to store the data efficiently.
Public class HDFSWriter {
FSDataOutputStream out = null;
FileSystem fs = null;
Configuration conf = null;
static int linescounter = 0;
void CreateHDFSFile() {
Path filePath = new Path("filename.CSV");
conf = new Configuration();
fs = FileSystem.get(conf);
out = fs.create(filePath);
}
void writeHDFSFile(String csvLine) {
out.writeBytes(csvLine);
linescounter++;
if(linescounter>=500) {
linescounter=0;
out.writeBytes(csvLine);
//out.hsync();
//out.hflush();
}
}
void close() {
fs.close();
}
}
CreateHDFSFile method is called start of the program.
writeHDFSFile method is called for each line to insert into HDFS File.
close method is called at end of the program.
Even though I invoke hsync or hflush, data is not appearing in HDFS. It's appearing only after the complete program is completed i.e, after fs.close().
How to make the data available during the HDFS file is created or at every time interval or particular number of records?

Hadoop dir/file last modification times

Is there a way to get the last modified times of all dirs and files in hdfs? I want to create page that displays the information, but I have no clue how to go about getting the last mod times all in one .txt file.
See if it helps :
public class HdfsDemo {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/core-site.xml"));
conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
System.out.println("Enter the directory name : ");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
Path path = new Path(br.readLine());
displayDirectoryContents(fs, path);
fs.close();
}
private static void displayDirectoryContents(FileSystem fs, Path rootDir) {
// TODO Auto-generated method stub
try {
FileStatus[] status = fs.listStatus(rootDir);
for (FileStatus file : status) {
if (file.isDir()) {
System.out.println("DIRECTORY : " + file.getPath() + " - Last modification time : " + file.getModificationTime());
displayDirectoryContents(fs, file.getPath());
} else {
System.out.println("FILE : " + file.getPath() + " - Last modification time : " + file.getModificationTime());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
One thing to notice though, getModificationTime() returns the modification time of file in milliseconds since January 1, 1970 UTC.
You probably have to iterate through the files and directories, to get the status of each path - you can use the below code (just sample) - but I'm not sure, how efficient that would be, if you have large set of files and directories.
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://<namenod_ip_address:<port>");
conf.set("mapred.job.tracker", "<jobtracker_ip_address>:<port>");
conf.setBoolean("fs.hdfs.impl.disable.cache", true);
FileSystem lfs = FileSystem.get(l_configuration);
fs.getFileStatus(new Path("/your/path")).getModificationTime();
hadoop fs -stat
#hadoop commands fs
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#stat

Resources