where did the configuration file stored in CDH4 - hadoop

I setup a CDH4
Now I can configure the hadoop on the web page.
I want to know where did the cdh put the configuration file on the local file system.
for example, I want to find the core-site.xml, but where is it?

By default, the installation of CDH has the conf directory located in
/etc/hadoop/
You could always use the following command to find the file:
$ sudo find / -name "core-site.xml"

Related

how to add a jar file in hive

I'm trying to add hive-contrib-0.10.0.jar in hive using ADD JAR hive-contrib-0.10.0.jar command but it always saying hive-contrib-0.10.0.jar does not exist.
I'm using HDP 2.1 version right now. I also added this jar file into /user/root folder using hue and run the command
ADD JAR hdfs:///hive-contrib-0.10.0.jar
but it giving me same error jar file doesn't exist.
Is there any way to solve this problem.
Where should I keep this jar file so that it will run successfully and what will be the command to be used?
upload the JAR file into hdfs path
Add the JAR File using Add command and HDFS full PATH
Example:
hadoop fs -put ~/Downloads/hive.jar /lib/
open hive shell
add jar hdfs:///lib/hive.jar
I see following issues with your approach. Before adding make sure you are able to list the file on Local file system or hdfs where ever it exists.
The jar you are trying to add is by default in hive class path as its part of $HIVE_HOME/lib (on local file system where ever you have hive client/service installed)
on the other hand in regards to your question about how to add jars in hive, we can add using local file system or hadoop distributed file system (HDFS)
Add jar file:///root/hive-contrib-0.10.0.jar (Given that you copied this jar on LFS root directory)
Add jar hdfs://<namenode_hostname>:8020/user/root/hive-contrib-0.10.0.jar (Given that you copied to HDFS root home)
if you want to permanently add the jars you need to do the following.
1. Hive-site.xml ( /etc/hive/conf )
<property>
<name>hive.aux.jars.path</name>
<value>file:///mnt1/hive-jars/hive-contrib-2.1.1.jar</value>
</property>
add hive-contrib-2.1.1.jar to the path "/mnt1/hive-jars" configured in hive-site.xml
This should ideally work after restarting hive-server2.
3. sudo stop hive-server2
4. sudo start hive-server2
But sometimes it does not work. i am not sure why so you can use the following dirty way.
put your jar file in the following path so that hive automatically picks it up while restart.
add hive-contrib-2.1.1.jar to /usr/lib/hive-hcatalog/share/hcatalog
sudo stop hive-server2
sudo start hive-server2
I have read these answers above which was very useful. And I combined all into one solution:
put jars into local disk and give read/write permission
chmod -R 777 /tmp/json.jar
upload to hdfs file system and give permissions too:
hdfs dfs -put /tmp/json.jar hdfs://1.1.1.1:8020/jars/
hdfs dfs -chmod -R 777 hdfs://1.1.1.1:8020/jars/
add jar into hive env.
add jar hdfs://1.1.1.1:8020/jars/json.jar
You have to give the full path to the jar JAR and not only its name.
Don't guess the location. Check the file system to see that it is there, before trying to add it.

Spark installed but no command 'hdfs' or 'hadoop' found

I am a new pyspark user.
I just downloaded and installed a spark cluster ("spark-2.0.2-bin-hadoop2.7.tgz")
after installation I wanted to access the file system (upload local files to cluster). But when I tried to type hadoop or hdfs in command it will say "no command found".
Am I gonna install hadoop/HDFS (I thought it's built in the spark, I don't get)?
Thanks in advance.
You have to install hadoop first to access HDFS.
Follow this http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Choose the latest version of hadoop from the apache site.
Once you done with hadoop setup go to spark http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz download this, Extract files. Setup java_home and hadoop_home in spark-env.sh.
You don't have hdfs or hadoop on classpath so this is the reason why you are getting message: "no command found".
If you run \yourparh\hadoop-2.7.1\bin\hdfs dfs -ls / it should works and show root content.
But, You can add your hadoop/bin (hdfs, hadoop ...) commands to classpath with something like this:
export PATH $PATH:$HADOOP_HOME/bin
where HADOOP_HOME is your env. variable with path to hadoop installation folder (download and install is required)

Nutch 2.0 and Hadoop. How to prevent caching of conf/regex-urlfilter.txt

I have nutch 2.x and hadoop 1.2.1 on single machine.
I configure seed.txt, conf/regex-urlfilter.txt and run command
crawl urls/seed.txt TestCrawl http://localhost:8088/solr/ 2
Then I want to change rules in conf/regex-urlfilter.txt
I changed it in 2 files:
~$ find . -name 'regex-urlfilter.txt'
./webcrawer/apache-nutch-2.2.1/conf/regex-urlfilter.txt
./webcrawer/apache-nutch-2.2.1/runtime/local/conf/regex-urlfilter.txt
Then I run
crawl urls/seed.txt TestCrawl2 http://localhost:8088/solr/ 2
But changes in regex-urlfilter.txt doesn't affect.
Hadoop report that it use file.
cat /home/hadoop/data/hadoop-unjar6761544045585295068/regex-urlfilter.txt
When I see content of file I see old file
How to force hadoop to use new config?
This settings stored in arhive file
/home/hadoop/webcrawer/apache-nutch-2.2.1/build/apache-nutch-2.2.1.job
Run
ant clean
ant runtime
to replace it with new settings or edit arhive file /home/hadoop/webcrawer/apache-nutch-2.2.1/build/apache-nutch-2.2.1.job

HADOOP_HOME and hadoop streaming

Hi I am trying to run hadoop on a server that has hadoop installed but I have no idea the directory where hadoop resides. The server was configure by the server admin.
In order to load hadoop I use the use command from the dotkit package.
There may be several solutions but wanted to know where the hadoop package was installed, how to set up the $HADOOP_HOME variable, and how to approp run a hadoop streaming job, such as $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/mapred/contrib/streaming/hadoop-streaming.jar, aka, http://wiki.apache.org/hadoop/HadoopStreaming.
Thanks! any help would be greatly appreciated!
If you're using a cloudera distribution then it's most probably in /usr/lib/hadoop, otherwise it could be anywhere (at the discretion of your system admin).
There are some tricks you can use to try and locate it:
locate hadoop-env.sh (assuming that locate has been installed and updatedb has been run recently)
If the machine you're running this on is running a hadoop service (such as data node, job tracker, task tracker, name node), then you can perform a process list and grep for the hadoop command: ps axww | grep hadoop
Failing the above two, look for the hadoop root directory in some common locations such as: /usr/lib, /usr/local, /opt
Failing all this, and assuming your current user has the permissions: find / -name hadoop-env.sh
If you're install with rpm then it's most probably in /etc/hadoop.
Why don't you try:
echo $HADOOP_HOME
Obiviously the above env variable has to be set before you could even issue hadoop executables from anywhere on the box.

apache Hadoop-2.0.0 aplha version installation in full cluster using fedration

I had installed hadoop stable version successfully. but confused while installing hadoop -2.0.0 version.
I want to install hadoop-2.0.0-alpha on two nodes, using federation on both machines. rsi-1, rsi-2 are hostnames.
what should be values of below properties for implementation of federation. Both machines are also used for datanodes too.
fs.defaulFS dfs.federation.nameservices dfs.namenode.name.dir dfs.datanode.data.dir yarn.nodemanager.localizer.address yarn.resourcemanager.resource-tracker.address yarn.resourcemanager.scheduler.address yarn.resourcemanager.address
One more point, in stable version of hadoop i have configuration files under conf folder in installation directory.
But in 2.0.0-aplha version, there is etc/hadoop directory and it doesnt have mapred-site.xml, hadoop-env.sh. do i need to copy conf folder under share folder into hadoop-home directory? or do i need to copy these files from share folder into etc/hadoop directory?
Regards, Rashmi
You can run hadoop-setup-conf.sh in sbin folder. It instructs you step-by-step to configure.
Please remember when it asks you to input the directory path, you should use full link
e.g., when it asks for conf directory, you should input /home/user/Documents/hadoop-2.0.0/etc/hadoop
After completed, remember to check every configuration file in etc/hadoop.
As my experience, I modified JAVA_HOME variable in hadoop-env.sh and some properties in core-site.xml, mapred-site.xml.
Regards

Resources