Unable to read a file from HDFS using spark shell in ubuntu - hadoop

I have installed spark and hadoop in standalone modes on ubuntu virtualbox for my learning. I am able to do normal hadoop mapreduce operations on hdfs without using spark. But when I use below code in spark-shell,
val file=sc.textFile("hdfs://localhost:9000/in/file")
scala>file.count()
I get "input path does not exist." error. The core-site.xml has fs.defaultFS with value hdfs://localhost:9000. If I give localhost without the port number, I get "Connection refused" error as it is listening on default port 8020. Hostname and localhost are set to loopback addresses 127.0.0.1 and 127.0.1.1 in etc/hosts.
Kindly let me know how to resolve this issue.
Thanks in advance!

I am able to read and write into the hdfs using
"hdfs://localhost:9000/user/<user-name>/..."
Thank you for your help..

Probably your configuration is alright, but the file is missing, or in an unexpected
location...
1) try:
sc.textFile("hdfs://in/file")
sc.textFile("hdfs:///user/<USERNAME>/in/file")
with USERNAME=hadoop, or your own username
2) try on the command line (outside spark-shell) to access that directory/file :
hdfs dfs -ls /in/file

Related

Hadoop installation trouble

I am installing hadoop on my windows PC. I was able to start yarn and dfs. I ran this command hadoop fs -mkdir /in
It displays the following error:
-mkdir: java.net.UnknownHostException: master
I am newbie Please explain me what has to be done ?
The error indicates the host master cannot be resolved to an IP address. You probably need to add a mapping for it in your hosts file so it can be resolved.

hortonworks sandbox connection refuse

i just start to learn hadoop with hortonworks sandbox
i have HDP 2.3 on a virtualBox and in the setting i have a Bridged Adapter network and a NAT,
when i start the machine everything is ok i can do some hadoop command i can connect the Ambari at 127.0.0.1:8080
but when i run the script in /etc/lib/hue/tools/start_scripts/gen_hosts.sh
to generate hosts with a different ip address every thing going wrong and i can't execute a simple hadoop command like hadoop fs -ls /user
i get this error
ls: Call From sandbox.hortonworks.com/10.24.244.85 to sandbox.hortonworks.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
as i said i just started to learn hadoop and i am not a network expert so i will preciate any help
thank you.
I found that you have to restart the services ( HDFS,MapReduce,Hbase,..) from
Ambari when you generate a host.
Hope this will help someone.
turn on name NameNode / HDFS it will work

java.net.ConnectException: Connection refused error when running Hive

I'm trying work through a hive tutorial in which I enter the following:
load data local inpath '/usr/local/Cellar/hive/0.11.0/libexec/examples/files/kv1.txt' overwrite into table pokes;
Thits results in the following error:
FAILED: RuntimeException java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
I see that there are some replies on SA having to do with configuring my ip address and local host, but I'm not familiar with the concepts in the answers. I'd appreciate anything you can tell me about the fundamentals of what causes this kind of answer and how to fix it. Thanks!
This is because hive is not able to contact your namenode
Check if your hadoop services has started properly.
Run the command jps to see what all services are running.
The reason why you get this error is that Hive needs hadoop as its base. So, you need to start Hadoop first.
Here are some steps.
Step1: download hadoop and unzip it
Step2: cd #your_hadoop_path
Step3: ./bin/hadoop namenode -format
Step4: ./sbin/start-all.sh
And then, go back to #your_hive_path and start hive again
Easy way i found to edit the /etc/hosts file. default it looks like
127.0.0.1 localhost
127.0.1.1 user_user_name
just edit and make 127.0.1.1 to 127.0.0.1 thats it , restart your shell and restart your cluster by start-all.sh
same question when set up hive.
solved by change my /etc/hostname
formerly it is my user_machine_name
after I changed it to localhost, then it went well
I guess it is because hadoop may want to resolve your hostname using this /etc/hostname file, but it directed it to your user_machine_name while the hadoop service is running on localhost
I was able to resolve the issue by executing the below command:
start-all.sh
This would ensure that the Hive service has started.
Then starting the Hive was straight forward.
I had a similar problem with a connection timeout:
WARN DFSClient: Failed to connect to /10.165.0.27:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
DFSClient was resolving nodes by internal IP. Here's the solution for this:
.config("spark.hadoop.dfs.client.use.datanode.hostname", "true")

Single node hadoop -how to copy local filesystem file to hadoop file system

I have installed hadoop. Now i am trying to copy one local file-system to hadoop file-system using below command.
hadoop fs -copyFromLocal /mnt/PRO/wvdial.conf hdfs://virus/mydata/hdfs/datanode/wvdial.conf
but I am getting below error.
copyFromLocal: Call From Virus/127.0.0.1 to Virus:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I gone through "http://wiki.apache.org/hadoop/ConnectionRefused" documentation and I found that "Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this) ".
Yes I have my host "Virus" mapped to 127.0.0.1 at /etc/hosts. I have installed hadoop on single node.So I have to map my host to 127.0.0.1. Now what change I should make my configuration so that I could copy my local filesyatem file to hdfs?
Please find hadoop configuration from my hadoop installation documentation "http://omsopensource.blogspot.in/search/label/Hadoop". I am using Fedora 19.

hadoop conf "fs.default.name" can't be setted ip:port format directly?

all
I have setupped a hadoop cluster in fully distributed mode. First, I set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in hostname:port format, and chang /etc/hosts correspondingly, the cluster works succesfully.
Then I use another way, I set set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in ip:port format. It dosen't work.
I find
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'...
in namenode log file and
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhos
t name. Using 'localhost'...
java.net.UnknownHostException: slave01: slave01: Name or service not known
in datanode log file.
In my opinion,ip and hostname is equivalent. Is there something wrong in my hadoop conf?
maybe there is a wrong configured hostname in /etc,
you should check hostname /etc/hosts /etc/HOSTNAME (rhel/debian) or rc.conf (archlinux) etc.
I got your point. This is because of that you probably wrote in mapred-site.xml, hdfs://ip:port (it starts with hdfs, this is wrong) but when you write hostname:port, you probably did not write hdfs at the beginning of the value which is correct way. THerefore, firstone did not work,but, second has worked
Fatih haltas
I found answer here.
It seems that HDFS uses host name only for it's all communication and display purposes, so we can NOT use ip directly in core-site.xml and mapred-site.xml

Resources