I want to call hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/myuser/map_data/hfiles mytable method from my Java client code.
When I run the application I get the following exception:
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file webhdfs://myserver.de:50070/user/myuser/map_data/hfiles/b/b22db8e263b74a7dbd8e36f9ccf16508
at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:477)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:520)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:632)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:549)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:546)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getDecompressor(Compression.java:327)
at org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:422)
at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:529)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlock(HFileBlock.java:1350)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlockWithBlockType(HFileBlock.java:1356)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.<init>(HFileReaderV2.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV3.<init>(HFileReaderV3.java:77)
at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:467)
... 8 more
Running the hbase ... command above from console on my Hadoop server works perfectly. But when I try to run these from my Java Code using HBase /Hadoop client libraries it fails with the exception!
Here a code snippet:
public static void main(String[] args) {
try {
Configuration conf = loginFromKeyTab("REALM.DE", "server.de", "user", "C:/user.keytab");
conf.set("fs.webhdfs.impl", org.apache.hadoop.hdfs.web.WebHdfsFileSystem.class.getName());
conf.set("hbase.zookeeper.quorum", "server1.de,server2.de,server3.de");
conf.set("zookeeper.znode.parent", "/hbase-secure");
conf.set("hbase.master.kerberos.principal", "hbase/_HOST#REALM.DE");
conf.set("hbase.regionserver.kerberos.principal", "hbase/_HOST#REALM.DE");
conf.set("hbase.security.authentication", "kerberos");
Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("mytable"));
RegionLocator locator = connection.getRegionLocator(table.getName());
Job job = Job.getInstance(conf, "Test Bulk Load");
//HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
//Configuration conf2 = job.getConfiguration();
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(HDFS_PATH), connection.getAdmin(), table, locator);
} catch(Exception e) {
e.printStackTrace();
}
}
Do I need to add a dependency to my project? But how / where / which version?
I'm working with HDP 2.5 which contains HBase 1.1.2 and Hadoop 2.7.3
I found another solution for my issue: I'm using a Java program that runs a Process instance that calls the LoadIncrementalHFiles method automatically (running directly on the Hadoop node), instead of using the LoadIncrementalHFiles class itself in my code!
Here the code snippet of my solution:
TreeSet<String> subDirs = getHFileDirectories(new Path(HDFS_OUTPUT_PATH), conf); // The HDFS_OUTPUT_PATH directory contains many HFile sub-directories
for(String hFileDir : subDirs) {
String pathToReadFrom = HDFS_OUTPUT_PATH + "/" + hFileDir;
String[] execCode = {"hbase", "org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles", "-Dcreate.table=no", pathToReadFrom, "mytable"}; // Important: Separate each parameter here!!!
ProcessBuilder pb = new ProcessBuilder(execCode);
pb.redirectErrorStream(true);
final Process p = pb.start();
new Thread(new Runnable() {
public void run() {
BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = null;
try {
while ((line = input.readLine()) != null)
System.out.println(line);
} catch (IOException e) {
e.printStackTrace();
}
}
}).start();
p.waitFor();
int exitCode = p.exitValue();
System.out.println(" ==> Exit Code: " + exitCode);
}
System.out.println("Finished");
If somebody has another solution (e.g. how to use the LoadIncrementalHFiles class directly in code), let me know. Thank you!
It is quite simple to run a Unix command from Java.
Runtime.getRuntime().exec(myCommand);
But is it possible to run a Unix shell script from Java code? If yes, would it be a good practice to run a shell script from within Java code?
You should really look at Process Builder. It is really built for this kind of thing.
ProcessBuilder pb = new ProcessBuilder("myshellScript.sh", "myArg1", "myArg2");
Map<String, String> env = pb.environment();
env.put("VAR1", "myValue");
env.remove("OTHERVAR");
env.put("VAR2", env.get("VAR1") + "suffix");
pb.directory(new File("myDir"));
Process p = pb.start();
You can use Apache Commons exec library also.
Example :
package testShellScript;
import java.io.IOException;
import org.apache.commons.exec.CommandLine;
import org.apache.commons.exec.DefaultExecutor;
import org.apache.commons.exec.ExecuteException;
public class TestScript {
int iExitValue;
String sCommandString;
public void runScript(String command){
sCommandString = command;
CommandLine oCmdLine = CommandLine.parse(sCommandString);
DefaultExecutor oDefaultExecutor = new DefaultExecutor();
oDefaultExecutor.setExitValue(0);
try {
iExitValue = oDefaultExecutor.execute(oCmdLine);
} catch (ExecuteException e) {
System.err.println("Execution failed.");
e.printStackTrace();
} catch (IOException e) {
System.err.println("permission denied.");
e.printStackTrace();
}
}
public static void main(String args[]){
TestScript testScript = new TestScript();
testScript.runScript("sh /root/Desktop/testScript.sh");
}
}
For further reference, An example is given on Apache Doc also.
I think you have answered your own question with
Runtime.getRuntime().exec(myShellScript);
As to whether it is good practice... what are you trying to do with a shell script that you cannot do with Java?
I would say that it is not in the spirit of Java to run a shell script from Java. Java is meant to be cross platform, and running a shell script would limit its use to just UNIX.
With that said, it's definitely possible to run a shell script from within Java. You'd use exactly the same syntax you listed (I haven't tried it myself, but try executing the shell script directly, and if that doesn't work, execute the shell itself, passing the script in as a command line parameter).
Yes it is possible to do so. This worked out for me.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.omg.CORBA.portable.InputStream;
public static void readBashScript() {
try {
Process proc = Runtime.getRuntime().exec("/home/destino/workspace/JavaProject/listing.sh /"); //Whatever you want to execute
BufferedReader read = new BufferedReader(new InputStreamReader(
proc.getInputStream()));
try {
proc.waitFor();
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
while (read.ready()) {
System.out.println(read.readLine());
}
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
Here is my example. Hope it make sense.
public static void excuteCommand(String filePath) throws IOException{
File file = new File(filePath);
if(!file.isFile()){
throw new IllegalArgumentException("The file " + filePath + " does not exist");
}
if(isLinux()){
Runtime.getRuntime().exec(new String[] {"/bin/sh", "-c", filePath}, null);
}else if(isWindows()){
Runtime.getRuntime().exec("cmd /c start " + filePath);
}
}
public static boolean isLinux(){
String os = System.getProperty("os.name");
return os.toLowerCase().indexOf("linux") >= 0;
}
public static boolean isWindows(){
String os = System.getProperty("os.name");
return os.toLowerCase().indexOf("windows") >= 0;
}
Yes, it is possible and you have answered it! About good practises, I think it is better to launch commands from files and not directly from your code. So you have to make Java execute the list of commands (or one command) in an existing .bat, .sh , .ksh ... files.
Here is an example of executing a list of commands in a file MyFile.sh:
String[] cmd = { "sh", "MyFile.sh", "\pathOfTheFile"};
Runtime.getRuntime().exec(cmd);
To avoid having to hardcode an absolute path, you can use the following method that will find and execute your script if it is in your root directory.
public static void runScript() throws IOException, InterruptedException {
ProcessBuilder processBuilder = new ProcessBuilder("./nameOfScript.sh");
//Sets the source and destination for subprocess standard I/O to be the same as those of the current Java process.
processBuilder.inheritIO();
Process process = processBuilder.start();
int exitValue = process.waitFor();
if (exitValue != 0) {
// check for errors
new BufferedInputStream(process.getErrorStream());
throw new RuntimeException("execution of script failed!");
}
}
As for me all things must be simple.
For running script just need to execute
new ProcessBuilder("pathToYourShellScript").start();
The ZT Process Executor library is an alternative to Apache Commons Exec. It has functionality to run commands, capturing their output, setting timeouts, etc.
I have not used it yet, but it looks reasonably well-documented.
An example from the documentation: Executing a command, pumping the stderr to a logger, returning the output as UTF8 string.
String output = new ProcessExecutor().command("java", "-version")
.redirectError(Slf4jStream.of(getClass()).asInfo())
.readOutput(true).execute()
.outputUTF8();
Its documentation lists the following advantages over Commons Exec:
Improved handling of streams
Reading/writing to streams
Redirecting stderr to stdout
Improved handling of timeouts
Improved checking of exit codes
Improved API
One liners for quite complex use cases
One liners to get process output into a String
Access to the Process object available
Support for async processes ( Future )
Improved logging with SLF4J API
Support for multiple processes
This is a late answer. However, I thought of putting the struggle I had to bear to get a shell script to be executed from a Spring-Boot application for future developers.
I was working in Spring-Boot and I was not able to find the file to be executed from my Java application and it was throwing FileNotFoundFoundException. I had to keep the file in the resources directory and had to set the file to be scanned in pom.xml while the application was being started like the following.
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
<include>**/*.properties</include>
<include>**/*.sh</include>
</includes>
</resource>
</resources>
After that I was having trouble executing the file and it was returning error code = 13, Permission Denied. Then I had to make the file executable by running this command - chmod u+x myShellScript.sh
Finally, I could execute the file using the following code snippet.
public void runScript() {
ProcessBuilder pb = new ProcessBuilder("src/main/resources/myFile.sh");
try {
Process p;
p = pb.start();
} catch (IOException e) {
e.printStackTrace();
}
}
Hope that solves someone's problem.
Here is an example how to run an Unix bash or Windows bat/cmd script from Java. Arguments can be passed on the script and output received from the script. The method accepts arbitrary number of arguments.
public static void runScript(String path, String... args) {
try {
String[] cmd = new String[args.length + 1];
cmd[0] = path;
int count = 0;
for (String s : args) {
cmd[++count] = args[count - 1];
}
Process process = Runtime.getRuntime().exec(cmd);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
try {
process.waitFor();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
while (bufferedReader.ready()) {
System.out.println("Received from script: " + bufferedReader.readLine());
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
System.exit(1);
}
}
When running on Unix/Linux, the path must be Unix-like (with '/' as separator), when running on Windows - use '\'. Hier is an example of a bash script (test.sh) that receives arbitrary number of arguments and doubles every argument:
#!/bin/bash
counter=0
while [ $# -gt 0 ]
do
echo argument $((counter +=1)): $1
echo doubling argument $((counter)): $(($1+$1))
shift
done
When calling
runScript("path_to_script/test.sh", "1", "2")
on Unix/Linux, the output is:
Received from script: argument 1: 1
Received from script: doubling argument 1: 2
Received from script: argument 2: 2
Received from script: doubling argument 2: 4
Hier is a simple cmd Windows script test.cmd that counts number of input arguments:
#echo off
set a=0
for %%x in (%*) do Set /A a+=1
echo %a% arguments received
When calling the script on Windows
runScript("path_to_script\\test.cmd", "1", "2", "3")
The output is
Received from script: 3 arguments received
It is possible, just exec it as any other program. Just make sure your script has the proper #! (she-bang) line as the first line of the script, and make sure there are execute permissions on the file.
For example, if it is a bash script put #!/bin/bash at the top of the script, also chmod +x .
Also as for if it's good practice, no it's not, especially for Java, but if it saves you a lot of time porting a large script over, and you're not getting paid extra to do it ;) save your time, exec the script, and put the porting to Java on your long-term todo list.
I think with
System.getProperty("os.name");
Checking the operating system on can manage the shell/bash scrips if such are supported.
if there is need to make the code portable.
String scriptName = PATH+"/myScript.sh";
String commands[] = new String[]{scriptName,"myArg1", "myArg2"};
Runtime rt = Runtime.getRuntime();
Process process = null;
try{
process = rt.exec(commands);
process.waitFor();
}catch(Exception e){
e.printStackTrace();
}
Just the same thing that Solaris 5.10 it works like this ./batchstart.sh there is a trick I don´t know if your OS accept it use \\. batchstart.sh instead. This double slash may help.
for linux use
public static void runShell(String directory, String command, String[] args, Map<String, String> environment)
{
try
{
if(directory.trim().equals(""))
directory = "/";
String[] cmd = new String[args.length + 1];
cmd[0] = command;
int count = 1;
for(String s : args)
{
cmd[count] = s;
count++;
}
ProcessBuilder pb = new ProcessBuilder(cmd);
Map<String, String> env = pb.environment();
for(String s : environment.keySet())
env.put(s, environment.get(s));
pb.directory(new File(directory));
Process process = pb.start();
BufferedReader inputReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter outputReader = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
BufferedReader errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
int exitValue = process.waitFor();
if(exitValue != 0) // has errors
{
while(errReader.ready())
{
LogClass.log("ErrShell: " + errReader.readLine(), LogClass.LogMode.LogAll);
}
}
else
{
while(inputReader.ready())
{
LogClass.log("Shell Result : " + inputReader.readLine(), LogClass.LogMode.LogAll);
}
}
}
catch(Exception e)
{
LogClass.log("Err: RunShell, " + e.toString(), LogClass.LogMode.LogAll);
}
}
public static void runShell(String path, String command, String[] args)
{
try
{
String[] cmd = new String[args.length + 1];
if(!path.trim().isEmpty())
cmd[0] = path + "/" + command;
else
cmd[0] = command;
int count = 1;
for(String s : args)
{
cmd[count] = s;
count++;
}
Process process = Runtime.getRuntime().exec(cmd);
BufferedReader inputReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter outputReader = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
BufferedReader errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
int exitValue = process.waitFor();
if(exitValue != 0) // has errors
{
while(errReader.ready())
{
LogClass.log("ErrShell: " + errReader.readLine(), LogClass.LogMode.LogAll);
}
}
else
{
while(inputReader.ready())
{
LogClass.log("Shell Result: " + inputReader.readLine(), LogClass.LogMode.LogAll);
}
}
}
catch(Exception e)
{
LogClass.log("Err: RunShell, " + e.toString(), LogClass.LogMode.LogAll);
}
}
and for usage;
ShellAssistance.runShell("", "pg_dump", new String[]{"-U", "aliAdmin", "-f", "/home/Backup.sql", "StoresAssistanceDB"});
OR
ShellAssistance.runShell("", "pg_dump", new String[]{"-U", "aliAdmin", "-f", "/home/Backup.sql", "StoresAssistanceDB"}, new Hashmap<>());
Is there a way to get the last modified times of all dirs and files in hdfs? I want to create page that displays the information, but I have no clue how to go about getting the last mod times all in one .txt file.
See if it helps :
public class HdfsDemo {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/core-site.xml"));
conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
System.out.println("Enter the directory name : ");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
Path path = new Path(br.readLine());
displayDirectoryContents(fs, path);
fs.close();
}
private static void displayDirectoryContents(FileSystem fs, Path rootDir) {
// TODO Auto-generated method stub
try {
FileStatus[] status = fs.listStatus(rootDir);
for (FileStatus file : status) {
if (file.isDir()) {
System.out.println("DIRECTORY : " + file.getPath() + " - Last modification time : " + file.getModificationTime());
displayDirectoryContents(fs, file.getPath());
} else {
System.out.println("FILE : " + file.getPath() + " - Last modification time : " + file.getModificationTime());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
One thing to notice though, getModificationTime() returns the modification time of file in milliseconds since January 1, 1970 UTC.
You probably have to iterate through the files and directories, to get the status of each path - you can use the below code (just sample) - but I'm not sure, how efficient that would be, if you have large set of files and directories.
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://<namenod_ip_address:<port>");
conf.set("mapred.job.tracker", "<jobtracker_ip_address>:<port>");
conf.setBoolean("fs.hdfs.impl.disable.cache", true);
FileSystem lfs = FileSystem.get(l_configuration);
fs.getFileStatus(new Path("/your/path")).getModificationTime();
hadoop fs -stat
#hadoop commands fs
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#stat