Cloudera CDH4 installation - hadoop

I see the below step in CDH4 MRV1 installation instructions at:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
Step 4: Create the MapReduce system directories:
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
But I am not able to figure out where this directories are used. Are these well known locations or these directory paths are configured in some config files. If so can I use any structure I want?

These directories will be used by MapReduce. They match the value specified in the default configuration. I would not change them unless you have a good reason. But in the end, sure, you can configure a different location once you're up and running.

Related

Kylin Sample Cube on Cloudera doesn't work properly

I'm just trying to figure out what's going wrong with my SampleCube, but I don't know how to just find a solution.
First of all, I'm using Cloudera, cdh 5.8.0, Hadoop 2.6.0. I have Hive, HBase and so on.
I had to download binaries for cdh from Kylin's site, and...
Problems which I had and were solved:
1) I had to set a variable KYLIN_HOME, because neither bin/check-env.sh nor bin/kylin.sh start worked properly. I'd just set it with:
$ echo "export KYLIN_HOME=/home/cloudera/Kylin_Folder/apache_kylin" >> ~/.bashrc
$ source ~/.bashrc
2) I had just problems with mkdir and creating a "/kylin" folder. I found a solution and tried instruction below. It works.
sudo -u hdfs hadoop fs -mkdir /kylin
3) And now I try to do sample from Kylin's site
But my cube has no storage at all! That's what I have:
Overall view
When I opened a buld view screen, my build stopped at "#1 Step Name: Create Intermediate Flat Hive Table"
And when I click "Log", I see that:
Log inside
Please, help me with that, I would be grateful.
OK, then. I've just found what I had to do.
Steps:
1) Download Kylin for CDH 5.7/5.8 and extract to /opt
2) Export KYLIN_HOME in .bash_profile
3) Restart CDH
4) Run services in cloudera in order: ZooKeeper, HDFS, HBase, Hive, Hue, YARN
5) Add cloudera user to hdfs group: sudo usermod -a -G hdfs cloudera
6) Create kylin folder: sudo -u hdfs hadoop fs -mkdir /kylin
7) Change ownership: sudo -u hdfs hadoop fs -chown cloudera:supergroup /kylin
8) Change permissions: sudo -u hdfs hadoop fs -chmod go+w /kylin
9) Load sample: $KYLIN_HOME/bin/sample.sh
10) Start kylin: $KYLIN_HOME/bin/kylin.sh start
11) Navigate to: http://quickstart.cloudera:7070/kylin

No folders in Hadoop 2.6 after installing

I am new to Hadoop. I succesfully installed hadoop 2.6 in my Ubuntu 12.04 by follwing the below link.
Hadoop 2.6 Installation
All services are running. But when I try to load file from local to HDFS, but it not at all showing folders in HDFS like /user or /data
hduse#vijee-Lenovo-IdeaPad-S510p:~$ jps
4163 SecondaryNameNode
4374 ResourceManager
3783 DataNode
3447 NameNode
5048 RunJar
18538 Jps
4717 NodeManager
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop fs -ls hdfs:/
No output
If I run the above command: hadoop fs -ls hdfs:/, it is not showing any folder. I installed Pig as well and now I want to load data to Pig in mapreduce mode. In most of the websites they mentioned blindly URI in place of HDFS path. Please guide how to create folders and load data in the hdfs path.
If you are using plain vanilla hadoop, you will not see any directories. You have to create those.
You can start creating by running hadoop fs -mkdir /user

error while running any hadoop hdfs file system command

I am very new to hadoop. I am referring "hadoop for dummies" book.
I have setup a vm with following specs
hadoop version 2.0.6-alpha
bigtop
os centos
problem is while running any hdfs file system command I am getting following error
example command : hadoop hdfs dfs -ls
error : Could not find or load main class hdfs
Please advice
Regards,
Try running:
hadoop fs -ls
or
hdfs dfs -ls
what do they return?
fs and dfs are the same commands.
Difference between `hadoop dfs` and `hadoop fs`
Remove either hadoop or hdfs and the command should run.

Hadoop filesystem reads linux filesystem instead of hdfs?

I have a strange thing happening, when I read hadoop filesystem it shows me linux filesystem not the hadoop one, anyone is familiar with this issue?
Thanks,
Mika
This will happen if a valid hadoop configuration is not found.
e.g. if you do:
hadoop fs -ls
and there is no configuration is found at the default location, then you will see the linux filesystem. You can test this by adding either the -conf option after the "hadoop" command e.g.
hadoop -conf=<path-to-conf-files> fs -ls

Hadoop dfs -ls returns list of files in my hadoop/ dir

I've set up a sigle-node Hadoop configuration running via cygwin under Win7. After starting Hadoop bybin/start-all.sh I run bin/hadoop dfs -ls which returns me a list of files in my hadoop directory. Then I run bin/hadoop datanode -formatbin/hadoop namenode -format but -ls still returns me the contents of my hadoop directory. As far as I understand it should return nothing(empty folder). What am I doing wrong?
Did you edit the core-site.xml and mapred-site.xml under conf folder ?
It seems like your hadoop cluster is in local mode.
I know this question is quite old, but directory structure in Hadoop has changed a bit (version 2.5 )
Jeroen's current version would be.
hdfs dfs -ls hdfs://localhost:9000/users/smalldata
Also Just for information - use of start-all.sh and stop-all.sh has been deprecated, instead one should use start-dfs.sh and start-yarn.sh
I had the same problem and solved it by explicitly specifying the URL to the NameNode.
To list all directories in the root of your hdfs space do the following:
./bin/hadoop dfs -ls hdfs://<ip-of-your-server>:9000/
The documentation says something about a default hdfs point in the configuration, but I cannot find it. If someone knows what they mean please enlighten us.
This is where I got the info: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#Overview
Or you could just do:
Run stop-all.sh.
Remove dfs data and name directories
Namenode -format
Run start-all.sh

Resources