Unable to open/download files in HDFS using Hadoop Web UI - hadoop

I have configured a standalone single node Hadoop environment in an external RHEL server. I am trying to view the files in HDFS using Hadoop Web UI:
http://<host ip>:50070/explorer.html#/user. I am able to browse through the directories and even delete files in HDFS. But i am unable to download or Open(Preview) and view any files in HDFS using web UI. I can see all the files and open them from command line in the server.
I am getting an error as displayed in below screenshot when i am trying to open a file:
Similar error is thrown while uploading files too. I am not using HUE due to some restrictions imposed on the server for the time period.
I am new to Hadoop. Can some one please help me out here?

The main reason for failure of some operations via webui is permissions issue as the default user for it is drwho ex: Permission denied: user=dr.who, access=WRITE, inode="/":suyash:supergroup:drwxr-xr-x.
Workaround is to to use the console to do these operations if HUE is unavailable. Also could you try changing the browser/ clearing cache?

Related

Integrate local HDFS filesystem browser with IntelliJ IDEA

I studied MapReduce paradigm using the HDFS cluster of my university, accessing to it by HUE. From HUE I am able to browse files, read/edit them and so on.
So in that cluster I need:
a normal folder where I put the MapReduce.jar
the access to the results in the HDFS
I like very much write MapReduce applications so I have configured correctly a local HDFS as personal playground but for now I can access to it only thorough really time-wasting command line (such as those).
I can access "directly" to the HDFS of my thorough IntelliJ IDEA by the mean of SFTP remote host connection, following is the "user normal folder":
And here is the HDFS from HUE from which I get the results:
Obviously in my local machine the "normal user folder" is where I am with the shell, but I can browse HDFS to get results only by command line.
I wish I could do such a thing even for local HDFS. Following is the best I could do:
I know that it is possible to access HDFS by http://localhost:50070/explorer.html#/ but it is very terrible.
I looked for some plugins, but I did not find anything useful. Using the command line in the long run becomes tiring.
I can access "directly" to the HDFS of my thorough IntelliJ IDEA by the mean of SFTP remote host ...
Following is the best I could do...
Neither of those are HDFS.
Is the user folder of the machine you SSH'd to
Is only the NameNode data directory on your local machine
Hue uses WebHDFS, and connects through http://namenode:50070
What you would need is a plugin that can connect to the same API, which is not over SSH, or a simple file mount.
If you wanted a file mount, you need to setup an NFS Gateway, and you mount the NFS drive like any other network attached storage.
In Production environments, you would write your code, push it to Github, then Jenkins (for example) would build the code and push it to HDFS for you.

how to load text files into hdfs through oozie workflow in a cluster

I am trying to load text/csv files in hive scripts with oozie and schedule it on daily basis. Text files are in local unix file system.
I need to put those text files into hdfs before executing the hive scripts in a oozie workflow.
In a real time cluster we don't know job will run on which node.it will run randomly in any one of the node in cluster.
can any one provide me the solution.
Thanks in advance.
Not sure I understand what you want to do.
The way I see it, it can't work:
Oozie server has access to HDFS files only (same as Hive)
your data is on a local filesystem somewhere
So why don't you load your files into HDFS beforehand? The transfer may be triggered either when the files are available (post-processing action in the upstream job) or at fixed time (using Linux CRON).
You don't even need the Hadoop libraries on the Linux box if the WebHDFS service is active on your NameNode - just use CURL and a HTTP upload.

Locating Cloudera Manager HDFS config files

I've installed a cluster via Cloudera Manager, and now I need to launch the cluster manually.
I've been using the following command:
$ sudo -u hdfs hadoop namenode / datanode / jobtracker
But then the dfs.name.dir is set up /tmp. I can't seem to find where cloudera manager has the HDFS config files. The ones in /usr/lib/hadoop-02*/conf seem to be minimal. They're missing the dfs.name.dir which is what I'm looking for particularly. I'm on an RHLE 6 system, by the way. Being lazy, I though I could just copy over cloudera manager's HDFS config files, so I don't have to manually create them, the copy them over to 6 nodes :)
Thanks
I was facing same problem.
I was changing configuration parameters from cloudera manager ui but was clueless where my changes were getting updated on local file system.
I ran grep command and found out that in my case configuration were stored at /var/run/cloudera-scm-agent/process/*-hdfs-NAMENODE directory.
So David is right, whenever we change configs from ui and restart service, it creates new config. settings in /var/run/cloudera-scm-agent/process/ directory.
Using CentOS 6.5, the Cloudera Manager special files do not show up in a SEARCH FILES result because their permissions are set to hide from all but the 'hdfs' user. In addition, there are multiple versions of hdfs-site.xml on the local drive some of which have partial amounts of real settings. The actual settings file is in the DATANODE folder not the NAMENODE folder as evidenced by the lack of dfs.datanode.data.dir values in the latter.
Cloudera manager deploying config file each time you start cluster, each time in different directory. Directories are named after process id or something like this.
The configuration is passed explicitly to each deamon as parameter. So if you will look into command line of each hadoop deamons you can see where is configuration sitting (or just grep over disk for hdfs-site.xml. Names of config files are the same as usual.
I was in the same boat and found this answer:
To allow Hadoop client users to work with the HDFS, MapReduce, YARN
and HBase services you created, Cloudera Manager generates client
configuration files that contain the relevant configuration files with
the settings from your services. These files are deployed
automatically by Cloudera Manager based on the services you have
installed, when you add a service, or when you add a Gateway role on a
host.
You can download and distribute these client configuration files
manually to the users of a service, if necessary.
The Client Configuration URLs command on the cluster Actions menu
opens a pop-up that displays links to the client configuration zip
files created for the services installed in your cluster. You can
download these zip files by clicking the link.
See Deploying Client Configuration Files for more information on this
topic.
On our system I got there via http://your_server:7180/cmf/services/status and clicked the Actions popup under the Add Cluster button. Hope that helps.

Namenode UI - Browse File System not working in psedo-distributed mode

I have installed Hadoop 0.20.2 in psuedo distributed mode (all daemons on single machine).
It's up and running and I'm able to access HDFS through command line and run the jobs and I'm able to see the output.
But I am not able to browse the file system using UI provide by Hadoop.
http://namenode:50070/dfshealth.jsp.. it shows version and cluster status.. When i click on browse filesystem its not showing anything. Is there any issue with this?
I'm able to list the contents using hdfs shell commands, and In Cluster mode it's working fine.
Only in distributed mode I'm not able to browse the file system.. any inputs on this is appreciated. I have installed hadoop1.0.0 in psudodistributed mode too, and facing the same problem.
try this:
vi /usr/local/hadoop/conf/core-site.xml
And change this line:
<value>hdfs://localhost:54310</value>
to
<value>hdfs://[your IP]:54310</value>
Add the hostname and IP of namenode into the hosts file of the system from where you are browsing the above URL. If not done, then on clicking "Browse the filesystem" link will fail.
ok i was also facing same problem...
first my namenode Storage ditectory was tmp folder so when i restart my machine all data is lost.
so i changed my namenode storage directory to other location in my hadrdisk.
and than i was also facing same problem i cant browse my file system.
when i check permission of that folder there is no permission given to that folder and i was unable to change that permission.
so i copied the hadoop folder from my tmp folder to my home folder and change my namenode storage directory to that folder in home dirctory.
and my problem is solved.
Open /etc/hadoop/conf/core-site.xml
and change this
hdfs://localhost:8020
to
hdfs://(your-ip):8020
then restart hadoop-datanode service
Check logs if this also doesn't work.
Here is my analysis. I'm having the same problem and I'm using AWS. the "Browse the filesystem" link points to nn_browsedfscontent.jsp.
nn_browsedfscontent.jsp typically does the following
fetch the datanode ip address
fetch the datanode port (50075)
redirect the request to the ipaddress:port.
In the case of aws, a server instance has private DNS (available only between instances) and public DNS (available to access externally, internet).
In step #1, the ip address fetched is the private DNS and not public dns
In step #3, the ip address:50075 is private dns:50075 which will fail as it is not accessible publicly.
I replaced the private dns:50075 with public dns:50075 and I was able to browse the filesystem contents.
My knowledge of javascript is very poor and so unable to modify the nn_browsedfscontent.jsp to solve this problem. Not sure if has already been resolved.

how to read a file from HDFS through browser

How to provide a link a HDFS file, so that clicking on that url it will downlaod the HDFS file..
Please provide me the inputs..
Thanks
MRK
Check the HDFS Proxy Guide.
There is also Hoop which is being contributed to Hadoop by Cloudera. Currently it's targeted for the 0.24 release. But, it can be build, installed and configured manually using the instructions at the Hoop Site.
While HDFS Proxy supports only W, HOOP supports R/W to HDFS. Plan is to replace HDFS Proxy with Hoop.
While the above options are proxy based, another option is to to directly access the NameNode without a proxy. Browse the file system (http://namenode:50070/nn_browsedfscontent.jsp) and go to the file for which the URL has to be shared.
Edit: Also check WebHDFS.
If you have hue installed in your cluster you could try
http://www.namenode:8888/filebrowser
or they also have standard chrome extension at this link that will basically covert the hdfs link for you
http://wiki.pentaho.com/display/BAD/Simple+Chrome+Extension+to+browse+HDFS+volumes
There is the Hue File Browser: upload/download files, list directories, change permissions, view directly different types of file... from any client with a Web Browser

Resources