I have installed livy server in cloudera in /usr/share. I want to change set the LIVY_CONF_DIR so that i can manage the config files like log4j.properties.
Cloudera says this is possible but i could not find how to define it.
https://github.com/cloudera/livy#building-livy
Snippet from the github
Livy Configuration
Livy uses a few configuration files under configuration the directory, which by default is the conf directory under the Livy installation. An alternative configuration directory can be provided by setting the LIVY_CONF_DIR environment variable when starting Livy.
Related
As part of my intellij environment set up I need to connect to a remote hadoop cluster and access the files in my local spark code.
Is there any way to connect to hadoop remote environment without creating hadoop local instance?
A connection code snippet would be the ideal answer.
If you have a keytab file to authenticate to the cluster, this is one way I've done it:
val conf: Configuration: = new Configuration()
conf.set("hadoop.security.authentication", "Kerberos")
UserGroupInformation.setConfiguration(conf)
UserGroupInformation.loginUserFromKeytab("user-name", "path/to/keytab/on/local/machine")
FileSystem.get(conf)
I believe to do this, you might also need some configuration xml docs. Namely core-site.xml, hdfs-site.xml, and mapred-site.xml. These are somewhere usually under /etc/hadoop/conf/.
You would put those under a directory in your program and mark it as Resources directory in IntelliJ.
I have installed hadoop on ubuntu on virtual box(host os Windows 7).I have also installed Apache spark, configured SPARK_HOME in .bashrc and added HADOOP_CONF_DIR to spark-env.sh. Now when I start the spark-shell it throws error and do not initialize spark context, sql context. Am I missing something in installation and also I would want to run it on a cluster (hadoop 3 node cluster is set up).
I have the same issue when trying to install Spark local with Windows 7. Please make sure the below paths is correct and I am sure I will work with you. I answer same question in this link So, you can follow the below and it will work.
Create JAVA_HOME variable: C:\Program Files\Java\jdk1.8.0_181\bin
Add the following part to your path: ;%JAVA_HOME%\bin
Create SPARK_HOME variable: C:\spark-2.3.0-bin-hadoop2.7\bin
Add the following part to your path: ;%SPARK_HOME%\bin
The most important part Hadoop path should include bin file before winutils.ee as the following: C:\Hadoop\bin Sure you will locate winutils.exe inside this path.
Create HADOOP_HOME Variable: C:\Hadoop
Add the following part to your path: ;%HADOOP_HOME%\bin
Now you can run the cmd and write spark-shell it will work.
I have written a MR job and have run it in local mode with following configuration settings
mapred.local.dir=<<local directory having good amount of space>>
fs.default.name=file:///
mapred.job.tracker=local
on Hadoop 1.x
Now I am using Hadoop 2.x and the same Job I am running with the same Configuration settings, but I am getting error :
Disk Out of Space
Is it that If I switch from Hadoop 1.x to 2.x (using Hadoop-2.6 jars), the same Configuration Settings to change the Tmp Dir not work.??
What are the new Settings to configure the "tmp" directory of MR1 (mapred API) on Hadoop 2.6.
Kindly advice.
Regards
Cheers :))
Many properties in 1.x have been deprecated and replaced with new properties in 2.x.
mapred.child.tmp has been replaced by mapreduce.task.tmp.dir
mapred.local.dir has been replaced by mapreduce.cluster.local.dir
Have a look at complete list of deprecated properties and new equivalent properties at Apache website link
It can be done by setting
mapreduce.cluster.local.dir=<<local directory having good amount of space>>
I setup a CDH4
Now I can configure the hadoop on the web page.
I want to know where did the cdh put the configuration file on the local file system.
for example, I want to find the core-site.xml, but where is it?
By default, the installation of CDH has the conf directory located in
/etc/hadoop/
You could always use the following command to find the file:
$ sudo find / -name "core-site.xml"
I had installed hadoop stable version successfully. but confused while installing hadoop -2.0.0 version.
I want to install hadoop-2.0.0-alpha on two nodes, using federation on both machines. rsi-1, rsi-2 are hostnames.
what should be values of below properties for implementation of federation. Both machines are also used for datanodes too.
fs.defaulFS dfs.federation.nameservices dfs.namenode.name.dir dfs.datanode.data.dir yarn.nodemanager.localizer.address yarn.resourcemanager.resource-tracker.address yarn.resourcemanager.scheduler.address yarn.resourcemanager.address
One more point, in stable version of hadoop i have configuration files under conf folder in installation directory.
But in 2.0.0-aplha version, there is etc/hadoop directory and it doesnt have mapred-site.xml, hadoop-env.sh. do i need to copy conf folder under share folder into hadoop-home directory? or do i need to copy these files from share folder into etc/hadoop directory?
Regards, Rashmi
You can run hadoop-setup-conf.sh in sbin folder. It instructs you step-by-step to configure.
Please remember when it asks you to input the directory path, you should use full link
e.g., when it asks for conf directory, you should input /home/user/Documents/hadoop-2.0.0/etc/hadoop
After completed, remember to check every configuration file in etc/hadoop.
As my experience, I modified JAVA_HOME variable in hadoop-env.sh and some properties in core-site.xml, mapred-site.xml.
Regards