I am building an kafka consumer app which needs SASL_SSL config. Some how apache kafka is not recognizing truststore file located in classpath and looks like there is an open request to enhance it in kafka(KAFKA-7685).
In the mean time what would be the best way to solve this problem. Same app needs to deployed in PCF too so solution should work both during local windows based development and PCF (linux).
Any solution would be highly appreciated.
Here is the code which does file copy to java temp dir
String tempDirPath = System.getProperty("java.io.tmpdir");
System.out.println("Temp dir : " + tempDirPath);
File truststoreConf = ResourceUtils.getFile("classpath:Truststore.jks");
File truststoreFile = new File(tempDirPath + truststoreConf.getName());
FileUtils.copyFile(truststoreConf, truststoreFile);
System.setProperty("ssl.truststore.location", truststoreFile.getAbsolutePath());
You could use a ClassPathResource and FileCopyUtils to copy it from the jar to a file in a temporary directory in main() before creating the SpringApplication.
Root cause of this issue was resource filtering enabled. Maven during resource filtering corrupts the binary file. So if you have that enabled, disable it
I am measuring the run-times of a spark job with different resource configurations and need to compare the run time of each stage. I can see them in UI only when the job is running.
I run my job on a Hadoop cluster and use Yarn as the resource manager.
Is there any way to keep each stage's run-time? Is there any log for them?
UPDATE:
I read the monitoring document which is mentioned in the comment and add the following lines but it doesn't work:
in spark-defaults.conf :
spark.eventLog.enabled true
spark.eventLog.dir hdfs:///[nameNode]:8020/[PathToSparkEventLogDir]
spark.history.fs.logDirectory
hdfs:///[nameNode]:8020/[PathTosparkLogDirectory]
in spark-env.sh:
export SPARK_PUBLIC_DNS=[nameNode]
SPARK_HISTORY_OPTS="-Dspark.eventLog.enabled=true"
SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$sparkHistoryDir"
SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider"
SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.cleaner.enabled=true"
SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.cleaner.interval=7d"
It looks for /tmp/spark-events/ folder and when I create it and start the history server, it doesn't show any complete or incomplete application.
Note I tried the logDirectory value without port number too but it didn't work.
I could run the Spark History Server and see the history of completed and incompleted applications by applying the following commands:
Set the public DNS value in conf/spark-env.sh
export SPARK_PUBLIC_DNS= NameNode-IP
Add these properties to SparkConf in my Java code:
SparkConf conf = new SparkConf()
.set("spark.eventLog.enabled", "true")
.set("spark.eventLog.dir", "hdfs:///user/[user-path]/sparkEventLog")
.set("spark.history.fs.logDirectory", "hdfs:///user/[user-path]/sparkEventLog")
Create the property file ( spark/conf/history.properties ) containg the following lines
spark.eventLog.enabled true
spark.eventLog.dir hdfs:///user/[user-path]/sparkEventLog
spark.history.fs.logDirectory hdfs:///user/[user-path]/sparkEventLog
Start the history server:
./sbin/start-history-server.sh --properties-file ./conf/history.properties
Note: The properties eventLog.dir and eventLog.dir should have the save values.
I have an XML file containing some name and values that I want to read from in my Spark application. How do I use the Hadoop Configuration to read in these values and use them in my code?
I tried uploading the XML file to HDFS , but I'm not sure what the key is supposed to be when I used conf.get()
Maybe you forgot to include these lines to your code:
val conf = new Configuration()
conf.addResource(new Path(<path-to-file>))
I need to build a common utility for unix/Windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.
WebHDFS can be one of the option but curious to know if anything else available.
Any suggestions?
I usually make a maven project and I add this dependency to my pom.xml file:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.1</version>
</dependency>
Then push data into HDFS it's very easy with the hadoop java api, this is a simple example just to see how it works:
String namenodeLocation = "hdfs://[your-namenode-ip-address]:[hadoop:listening-port]/";
Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get( new URI( namenodeLocation ), configuration );
Path file = new Path(namenodeLocation+"/myWonderful.data");
FSDataOutputStream outStream = hdfs.create(file);
byte[] coolDataToPushToHDFS = new byte[1500];
outStream.write(coolDataToPushToHDFS);
outStream.close();
hdfs.close();
It's a really simple program. I think the steps you have to do are:
Let users choose the input/data to push
Use hadoop java api to send file/data to your cluster
Give some feedback to the user.
You can also append information to a file, not only create new file.
Give a look to the documentation: https://hadoop.apache.org/docs/current/api/
I am trying to write a program to connect to HBase. However when I execute following command
HBaseConfiguration.create(); I get following error: .
"hbase-default.xml file seems to be for and old version of HBase (null), this version is 0.92.1-cdh4.1.2.
When I dig deep and debug inside observe following:
class HBaseConfiguration
private static void checkDefaultsVersion(Configuration conf) {
if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE))return;
String defaultsVersion = conf.get("hbase.defaults.for.version");
String thisVersion = VersionInfo.getVersion();
if (!thisVersion.equals(defaultsVersion)) {
throw new RuntimeException(
"hbase-default.xml file seems to be for and old version of HBase (" +
defaultsVersion + "), this version is " + thisVersion);
}
}
In my case HBase returns default version as null, I am not sure why its returning as null as I checked the corresponding entry in hbase-default.xml packaged with the HBase.jar it has correct entry.
When I try the same thing from a standalone program it works as expected.
Guyz, Please let me know if you have any questions.
Thanks in advance,
Rohit
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
</property>
</configuration>
Add this to a hbase-default.xml and put the file in the classpath or resource foldr. I got it when i ran from within spring hadoop environment. By adding above file to reosurce folder of the job jar i was able to solve tis-
finally found the workaround to this problem...
The problem is hbase-default.xml is not included in your classpath.
I added hbase-default.xml in target/test-classes ( it will vary in your case ), you can just add hbase-default.xml in various folder and see what works for you.
NOTE : This is just workaround, not the solution
Solution will be load the proper jars ( which I haven't figured out yet )
I've been getting this error using HBase1.1.1.
I created a simple HBase client and it worked fine. Then I built a simple RMI service, and that worked fine. But when I tried putting my simple HBase query code into RMI service I started getting this error on the HBaseConfiguration.create() call. After playing a bit, I found that the HBaseConfiguration.create() call works OK if placed before the security manager stuff that is in my main(). I get the error if the call is placed after block of code containing security manager calls...
Configuration conf = HBaseConfiguration.create(); // This works
if(System.getSecurityManager() == null)
{
System.setSecurityManager(new SecurityManager());
} // End if
// Configuration conf = HBaseConfiguration.create(); // This fails
I get the error if the create() call happens in main() after that security manager block, or in code within the class that is instantiated by main(). I don't get the error if create() is called within a static{ } block in my RMI service class (which I believe gets called before main()), or in main() before the security manager block, as shown.
BTW, the jar files that I include in my class path in order to get a minimal client to run are the following:
commons-codec-1.9.jar,
commons-collections-3.2.1.jar,
commons-configuration-1.6.jar,
commons-lang-2.6.jar,
commons-logging-1.2.jar,
guava-12.0.1.jar,
hadoop-auth-2.5.1.jar,
hadoop-common-2.5.1.jar,
hbase-client-1.1.1.jar,
hbase-common-1.1.1.jar,
hbase-hadoop2-compat-1.1.1.jar,
hbase-it-1.1.1-tests.jar,
hbase-protocol-1.1.1.jar,
htrace-core-3.1.0-incubating.jar,
log4j-1.2.17.jar,
netty-all-4.0.23.Final.jar,
protobuf-java-2.5.0.jar,
slf4j-api-1.7.7.jar,
slf4j-log4j12-1.7.5.jar
Had a similar problem where the error was
java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (0.98.3-hadoop2), this version is Unknown
at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:70)
at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:102)
at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:113)
In my case I had same set of jar files at two different levels of classpath, removed from one level and it worked fine.
In my case the issue was caused by old java version (1.5), which was default on the server. But it works fine with 1.7.
In my code, I used this to solve my error.
val config = HBaseConfiguration.create() //error
val config = new Configuration() //works