I can run hive on Tez,But can not see any job in tez ui.
And it will drive me Crazy!
and the user and name are null in timelineserver
the config is blow:
tez-site.xml
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<description>URL for where the Tez UI is hosted</description>
<name>tez.tez-ui.history-url.base</name>
<value>http://10.0.0.51:8080/tez-ui</value>
</property>
and yarn-site.xml
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:2191</value>
</property>
And the url:
http://10.0.0.51:8188/ws/v1/timeline/TEZ_DAG_ID
http://10.0.0.51:8188/ws/v1/timeline/TEZ_APPLICATION_ATTEMPT
http://10.0.0.51:8188/ws/v1/timeline/TEZ_APPLICATION
All of them,I just get the same reponse below:
{
entities: [ ]
}
In my case found that it was YARN ACL problem.
So the following helped me:
yarn.acl.enable = false
or
yarn.admin.acl = activity_analyzer,yarn,dr.who,admin
As a bonus, gathered full configuration for HDP 3.1 + TEZ 0.9.2:
YARN configuration:
yarn.timeline-service.enabled = true
yarn.acl.enable = false
yarn.admin.acl = activity_analyzer,yarn,dr.who,admin
yarn.timeline-service.webapp.address = <host>:8188
yarn.timeline-service.version = 2,0f
yarn.timeline-service.hostname = <host>
yarn.timeline-service.http-cross-origin.enabled = true
yarn.timeline-service.http-cross-origin.allowed-origins = *
yarn.resourcemanager.system-metrics-publisher.enabled = true
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes = org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl
TEZ configuration:
yarn.timeline-service.enabled = true
tez.tez-ui.history-url.base = http://<host>/tez-ui/
tez.am.tez-ui.history-url.template = __HISTORY_URL_BASE__?viewPath=/#/tez-app/__APPLICATION_ID__
tez.history.logging.service.class = org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService
tez.dag.history.logging.enabled = true
tez.am.history.logging.enabled = true
tez.allow.disabled.timeline-domains = true
Hive configuration:
hive_timeline_logging_enabled = true
hive.exec.pre.hooks = org.apache.hadoop.hive.ql.hooks.ATSHook
hive.exec.post.hooks = org.apache.hadoop.hive.ql.hooks.ATSHook,org.apache.atlas.hive.hook.HiveHook
hive.exec.failure.hooks = org.apache.hadoop.hive.ql.hooks.ATSHook
HDFS configuration:
hadoop.http.filter.initializers = org.apache.hadoop.security.HttpCrossOriginFilterInitializer
As far as I see in Timeline Server UI you do not have information about jobs. In Timeline and Tez UI, you can't see information about jobs that you ran before Timeline Server was enabled since Timeline Server uses LevelDB storage and should publish information during a job running.
Related
H2O in spark cluster mode giving different predictions from spark local mode. H2O in spark local is giving better than spark cluster why it is happening ,can you help me? Tell me whether it's H2O behaviour.
Two Data set are being used. One for training the model and another for scoring.
trainingData.csv : 1.8MB (number of rows are 2211),
testingData.csv : 1.8MB (number of rows are 2211),
Driver Memory : 1G,
Executors Memory: 1G,
Number Of Executors : 1
The following command is being used over cluster :=>
nohup /usr/hdp/current/spark2-client/bin/spark-submit --class com.inn.sparkrunner.h2o.GradientBoostingAlgorithm --master yarn --driver-memory 1G --executor-memory 1G --num-executors 1 --deploy-mode cluster spark-runner-1.0.jar > tool.log &
1)Main Method
public static void main(String args[]) {
SparkSession sparkSession = getSparkSession();
H2OContext h2oContext = getH2oContext(sparkSession);
UnseenDataTestDRF(sparkSession, h2oContext);
}
2)h2o context is being created.
private static H2OContext getH2oContext(SparkSession sparkSession) {
H2OConf h2oConf = new H2OConf(sparkSession.sparkContext()).setInternalClusterMode();
H2OContext orCreate = H2OContext.getOrCreate(sparkSession.sparkContext(), h2oConf);
return orCreate;
}
3)spark session is being created.
public static SparkSession getSparkSession() {
SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example").master("yarn")
.getOrCreate();
return spark;
}
4)Setting GBM parameters.
private static GBMParameters getGBMParam(H2OFrame asH2OFrame) {
GBMParameters gbmParam = new GBMParameters();
gbmParam._response_column = "high";
gbmParam._train = asH2OFrame._key;
gbmParam._ntrees = 10;
gbmParam._seed = 1;
return gbmParam;
}
My pom.xml reads as :-
<property>
<name>haltonfailure</name>
<value>false</value>
</property>
<property>
<name>delegateCommandSystemProperties</name>
<value>true</value>
</property>
</properties>
<reportsDirectory>test-output\${buildTag}</reportsDirectory>
<suiteXmlFiles>
<suiteXmlFile>${inputXML}</suiteXmlFile>
</suiteXmlFiles>
</configuration>
</plugin>
</plugins>
And all the dynamic parameters have been passed using Jenkins. But how can I read the dynamic inputXML name in my testbase class so that I can apply some conditions on the basis of this xml file? As whenever I tried to read this file, I always get ${inputXML} but I need the value which I have passed from Jenkins.
Please help here.
It's been long time so, I'm no sure you are still looking for the solution.
But...
You will need to create a Suite xml dynamically before running your test.
Create a java program(It doesn't really need to be java though) to generate suite xml file.
Run Maven with parameters
Your java program would be like this.
String suiteName = System.getProperty("suiteName", ""); // <-- get argument value from your VM
XmlSuite suite1 = new XmlSuite();
suite1.setName(suiteName);
// XmlTest
XmlTest test1 = new XmlTest(suite1);
test1.setName("TmpTest1");
Map<String, String> parameterMap = new HashMap<String, String>();
parameterMap.put("datasheetPathListPath", "This is the parameter");
test1.setParameters(parameterMap);
List<XmlClass> classes1 = new ArrayList<XmlClass>();
XmlClass xmlClass1 = new XmlClass("your.package.ClassName");
classes1.add(0, xmlClass1);
test1.setXmlClasses(classes1);
String fileContent = suite1.toXml();
FileWriter fileWriter;
try {
fileWriter = new FileWriter("filepath/generated_testNg_file.xml");
fileWriter.write(fileContent);
fileWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(suite1.toXml());
}
Then send values with mvn like this
mvn
clean
compile
test-compile
exec:java
-Dexec.cleanupDaemonThreads=false
-Dexec.classpathScope="test"
-Dexec.mainClass="package.YourJavaClassNameThatGeneratsSuiteXmlFile"
-DsuiteName=YourSuiteName
test
If your java program is not under test (/src/test/) remove the line for -Dexec.classpathScope
The mvn command will compile your code before running your program. Your program will create a suite XML file. Then it will run tests with the suite XML file.
Brand new to HDFS here.
I've got this small section of code to test out appending to a file:
val path: Path = new Path("/tmp", "myFile")
val config = new Configuration()
val fileSystem: FileSystem = FileSystem.get(config)
val outputStream = fileSystem.append(path)
outputStream.writeChars("what's up")
outputStream.close()
It is failing with this message:
Not supported
java.io.IOException: Not supported
at org.apache.hadoop.fs.ChecksumFileSystem.append(ChecksumFileSystem.java:352)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
I looked at the source for ChecksumFileSystem.java, and it seems to be hardcoded to not support appending:
#Override
public FSDataOutputStream append(Path f, int bufferSize,
Progressable progress) throws IOException {
throw new IOException("Not supported");
}
How to make this work? Is there some way to change the default file system to some other implementation that does support append?
It turned out that I needed to actually run a real hadoop namenode and datanode. I am new to hadoop and did not realize this. Without this, it will use your local filesystem which is a ChecksumFileSystem, which does not support append. So I followed the blog post here to get it up and running on my system, and now I am able to append.
The append method has to be called on outputstream not on filesystem. filesystem.get() is just used to connect to your HDFS. First set dfs.support.append as true in hdfs-site.xml
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
stop all your demon services using stop-all.sh and restart it again using start-all.sh. Put this in your main method.
String fileuri = "hdfs/file/path"
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(fileuri),conf);
FSDataOutputStream out = fs.append(new Path(fileuri));
PrintWriter writer = new PrintWriter(out);
writer.append("I am appending this to my file");
writer.close();
fs.close();
I have single node Hadoop 1.2.1 cluster running on VM.
My hdfs-site.xml looks like this:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
</description>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
<description>Does HDFS allow appends to files?
</description>
</property>
</configuration>
Now when I'm trying to run the following code from Eclipse it returns me always false:
Configuration config = new Configuration();
config.set("mapred.job.tracker","10.0.0.6:54311");
config.set("fs.default.name","hdfs://10.0.0.6:54310");
FileSystem fs = FileSystem.get(config);
boolean flag = Boolean.getBoolean(fs.getConf().get("dfs.support.append"));
System.out.println("dfs.support.append is set to be " + flag);
Now If I'm trying to append to existing file I'll get the following error:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Append is not supported. Please see the dfs.support.append configuration parameter
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1781)
at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:725)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy1.append(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.append(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:933)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:922)
at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:659)
at com.vanilla.hadoop.AppendToHdfsFile.main(AppendToHdfsFile.java:29)
What is wrong? Am I missing something?
You should try with a 2.X.X version or 0.2X version because appending a file on hdfs after hadoop 0.20.2. See more information at here and here
Append is not supported since 1.0.3. Anyway, if you really need the previous functionality, to turn on the append functionality set the flag "dfs.support.broken.append" to true.
hadoop.apache.org/docs/r1.2.1/releasenotes.html
Let's now start with configuring the file system:
public FileSystem configureFileSystem(String coreSitePath, String hdfsSitePath) {
FileSystem fileSystem = null;
try {
Configuration conf = new Configuration();
conf.setBoolean("dfs.support.append", true);
Path coreSite = new Path(coreSitePath);
Path hdfsSite = new Path(hdfsSitePath);
conf.addResource(coreSite);
conf.addResource(hdfsSite);
fileSystem = FileSystem.get(conf);
} catch (IOException ex) {
System.out.println("Error occurred while configuring FileSystem");
}
return fileSystem;
}
Make sure that the property dfs.support.append in hdfs-site.xml is set to true.
You can either set it manually by editing the hdfs-site.xml file or programmatically using:
conf.setBoolean("dfs.support.append", true);
Let's start with appending to a file in HDFS.
public String appendToFile(FileSystem fileSystem, String content, String dest) throws IOException {
Path destPath = new Path(dest);
if (!fileSystem.exists(destPath)) {
System.err.println("File doesn't exist");
return "Failure";
}
Boolean isAppendable = Boolean.valueOf(fileSystem.getConf().get("dfs.support.append"));
if(isAppendable) {
FSDataOutputStream fs_append = fileSystem.append(destPath);
PrintWriter writer = new PrintWriter(fs_append);
writer.append(content);
writer.flush();
fs_append.hflush();
writer.close();
fs_append.close();
return "Success";
}
else {
System.err.println("Please set the dfs.support.append property to true");
return "Failure";
}
}
To see whether the data has been correctly written to HDFS, let's write a method to read from HDFS and return the content as a String.
public String readFromHdfs(FileSystem fileSystem, String hdfsFilePath) {
Path hdfsPath = new Path(hdfsFilePath);
StringBuilder fileContent = new StringBuilder("");
try{
BufferedReader bfr=new BufferedReader(new InputStreamReader(fileSystem.open(hdfsPath)));
String str;
while ((str = bfr.readLine()) != null) {
fileContent.append(str+"\n");
}
}
catch (IOException ex){
System.out.println("----------Could not read from HDFS---------\n");
}
return fileContent.toString();
}
After that, we have successfully written and read the file in HDFS. It's time to close the file system.
public void closeFileSystem(FileSystem fileSystem){
try {
fileSystem.close();
}
catch (IOException ex){
System.out.println("----------Could not close the FileSystem----------");
}
}
Before executing the code, you should have Hadoop running on your system.
You just need to go to HADOOP_HOME and run following command:
./sbin/start-all.sh
For Complete Reference use https://github.com/ksimar/HDFS_AppendAPI
I wanted to read file from hadoop system, I could do that using the below code
String uri = theFilename;
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
InputStream in = null;
try {
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
To run this file I have to run hadoop jar myjar.jar com.mycompany.cloud.CatFile /filepathin_hadoop
That works. But How can I do that same from other program, I mean without using hadoop jar command.
You can add your core-site.xml to that Configuration object so it knows the URI for your HDFS instance. This method requires HADOOP_HOME to be set.
Configuration conf = new Configuration();
Path coreSitePath = new Path(System.getenv("HADOOP_HOME"), "conf/core-site.xml");
conf.addResource(coreSitePath);
FileSystem hdfs = FileSystem.get(conf);
// rest of code the same
Now, without using hadoop jar you can open a connection to your HDFS instance.
Edit: Have to use conf.addResource(Path). If you use a String arg it, looks in the classpath for that filename.
There is another configuration method set(parameterName,value). If you use this method, you dont have to specify the location of core-site.xml. This would be useful for accessing HDFS from remote location like webserver.
Usage as follows :
String uri = theFilename;
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://10.132.100.211:8020/");
FileSystem fs = FileSystem.get(conf);
// Rest of the code