how to enable HDFS file view in Ambari on hortonworks sandbox? - hortonworks-data-platform

how to enable HDFS file view in Ambari on hortonworks sandbox ?
I logged in with admin user.
I tried using admin->Manage view
but I count not found any where file system view or something like that.

It does not directly answer your question, but should be able to view the HDFS filesystem at [ambari.host]:50070/explorer.html#/
To enable the HDFS file view in Ambari, the steps are well described in the Hortonworks documentation : you need to change the HDFS configuration, and then add a files view instance. Have a look here : http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_ambari_views_guide/content/ch_using_files_view.html

Ambari Files View is one of the views shipped by Ambari 2.1.0 in the IOP 4.1 release. The View provides a web user interface for browsing HDFS, create/remove directories, downloading/uploading files, etc. The cluster must have HDFS and WebHDFS deployed in order to use the Ambari Files View.

If you are downloading Hortonworks Sandbox HDP after 13 jun 2019 please go through this website https://www.roseindia.net/bigdata/hadoop/install-hortonworks-sandbox-on-virtualbox.shtml and make sure you are downloading current version of Sandbox HDP(Now 3.1 version is available for sandbox) and VirtualBox 6.0 is present so make sure you are downloading both of above mentioned ..
This worked for me..

Related

Plain vanilla Hadoop installation vs Hadoop installation using Ambari

I recently downloaded hadoop distribution from Apache and got it up and running quite fast; download the hadoop tar ball, untar it at a location and some configuration setting. The thing here is that I am able to see the various configuration files like: yarn-site.xml, hdfs-site.xml etc; and I know the hadoop home location.
Next, I installed hadoop (HDP) Using Ambari.
Here comes the confusion part. It seems Ambarin installs the hdp in /usr/hdp; however the directory structure in plain vanilla hadoop vs Ambari is totally different. I am not able to locate the configuration files e.g. yarn-site.xml etc.
So can anyone help me demystify this?
All the configuration changes must be done via the Ambari UI. There is no use for the configuration files since Ambari persists the configurations in Ambari Database.
If you still need them, they are under /etc/hadoop/conf/.
It's true that configuration changes must be made via Ambari UI and that those configurations are stored in a database.
Why is it necessary to change these configuration properties in Ambari UI and not directly on disk?
Every time a service is restarted and it has a stale configuration the ambari-agent is responsible for writing the latest configuration to disk. They are written to /etc/<service-name>/conf. If you were to make changes directly to the configuration files on disk they would get overwritten by the aforementioned process.
However the configuration files found on disk DO still have a use...
The configuration files (on disk) are used by the various hadoop daemons when they're started/running.
Basically the benefit of using Ambari UI in Cluster Hadoop deployment. It will give you central management point.
For example:
10 pcs Hadoop cluster setup.
Plain vanilla Hadoop:
If you change any configuration you must be changed in 10 pcs
Ambari UI :
Due to configuration store in db. you just change in management portal all changes effect reflected on all node by single point change.

No Access Audit found Ranger

I am working on Apache Ranger to enable data security across my Hadoop platform, which is working fine but I am not able to see Access Audit on Ranger Portal.
I have enabled Audit to DB, Audit to HDFS and Audit provider summary as well for respective components on ambari.
Please help me to see Access Audit on Ranger Portal.
check the namenode log (normally under /var/log/hadoop/hdfs/...-namenode.log) and see if the driver of your DB can be found or if an exception is thrown. If the latter is the case, add the driver JAR to e.g. /usr/share/java/ to make sure the driver class is available.
Had met the same problem.
Followed every instruction but the hdfs plugin didn't take effect.
Solved by upgrading hadoop from 2.6.3 to 2.7.2.
As in apach ranger offical site, it says ranger 0.5 only works with hadoop 2.7+.

Can't start impala after updating CDH (5.0.0 -> 5.0.2)

I wasn't able to start impala (server, state-store, catalog) after updating to cdh 5.0.2. From what I found, the startup script is expecting the executables to be found in /usr/lib/impala/sbin. There was no such directory. Instead there were /usr/lib/impala/sbin-debug and /usr/lib/impala/sbin-retail. I could finally start impala by creating a symlink
ln -s /usr/lib/impala/sbin-retail /usr/lib/impala/sbin
However I'm still puzzled about the issue. What is the correct form to start impala. Perhaps there is some sort of config variable that lets you choose wether you want to run "debug" or "retail" version.
You can read Cloudera-Manager-Installation-Guide. I think it can be helpful for you.
You can try to update with Cloudera Manager
Installing Impala after Upgrading Cloudera Manager
If you have just upgraded Cloudera Manager from a version that did not support Impala, the Impala software is
not installed automatically. (Upgrading Cloudera Manager does not automatically upgrade CDH or other managed
services). You can add Impala using parcels; go to the Hosts tab, and select the Parcels tab. You should see at
least one Impala parcel available for download. See Parcels for detailed instructions on using parcels to install
or upgrade Impala. If you do not see any Impala parcels available, click the Edit Settings button on the Parcels
page to go to the Parcel configuration settings and verify that the Impala parcel repo URL
(http://archive.cloudera.com/impala/parcels/latest/) has been configured in the Parcels configuration page.
See Parcel Configuration Settings for more details.
Post Installation Configuration
See The Impala Service in Managing Clusters with Cloudera Manager for instructions on configuring the Impala
service.
Cloudera Manager 5.0.2 supports Impala 1.2.1 or later.
If the version of your Impala service is 1.1 or earlier the upgrade will cause the non uvailability of Impala so you need to upgrade the Impala as well to 1.2.1 or later.

Locating Cloudera Manager HDFS config files

I've installed a cluster via Cloudera Manager, and now I need to launch the cluster manually.
I've been using the following command:
$ sudo -u hdfs hadoop namenode / datanode / jobtracker
But then the dfs.name.dir is set up /tmp. I can't seem to find where cloudera manager has the HDFS config files. The ones in /usr/lib/hadoop-02*/conf seem to be minimal. They're missing the dfs.name.dir which is what I'm looking for particularly. I'm on an RHLE 6 system, by the way. Being lazy, I though I could just copy over cloudera manager's HDFS config files, so I don't have to manually create them, the copy them over to 6 nodes :)
Thanks
I was facing same problem.
I was changing configuration parameters from cloudera manager ui but was clueless where my changes were getting updated on local file system.
I ran grep command and found out that in my case configuration were stored at /var/run/cloudera-scm-agent/process/*-hdfs-NAMENODE directory.
So David is right, whenever we change configs from ui and restart service, it creates new config. settings in /var/run/cloudera-scm-agent/process/ directory.
Using CentOS 6.5, the Cloudera Manager special files do not show up in a SEARCH FILES result because their permissions are set to hide from all but the 'hdfs' user. In addition, there are multiple versions of hdfs-site.xml on the local drive some of which have partial amounts of real settings. The actual settings file is in the DATANODE folder not the NAMENODE folder as evidenced by the lack of dfs.datanode.data.dir values in the latter.
Cloudera manager deploying config file each time you start cluster, each time in different directory. Directories are named after process id or something like this.
The configuration is passed explicitly to each deamon as parameter. So if you will look into command line of each hadoop deamons you can see where is configuration sitting (or just grep over disk for hdfs-site.xml. Names of config files are the same as usual.
I was in the same boat and found this answer:
To allow Hadoop client users to work with the HDFS, MapReduce, YARN
and HBase services you created, Cloudera Manager generates client
configuration files that contain the relevant configuration files with
the settings from your services. These files are deployed
automatically by Cloudera Manager based on the services you have
installed, when you add a service, or when you add a Gateway role on a
host.
You can download and distribute these client configuration files
manually to the users of a service, if necessary.
The Client Configuration URLs command on the cluster Actions menu
opens a pop-up that displays links to the client configuration zip
files created for the services installed in your cluster. You can
download these zip files by clicking the link.
See Deploying Client Configuration Files for more information on this
topic.
On our system I got there via http://your_server:7180/cmf/services/status and clicked the Actions popup under the Add Cluster button. Hope that helps.

how to read a file from HDFS through browser

How to provide a link a HDFS file, so that clicking on that url it will downlaod the HDFS file..
Please provide me the inputs..
Thanks
MRK
Check the HDFS Proxy Guide.
There is also Hoop which is being contributed to Hadoop by Cloudera. Currently it's targeted for the 0.24 release. But, it can be build, installed and configured manually using the instructions at the Hoop Site.
While HDFS Proxy supports only W, HOOP supports R/W to HDFS. Plan is to replace HDFS Proxy with Hoop.
While the above options are proxy based, another option is to to directly access the NameNode without a proxy. Browse the file system (http://namenode:50070/nn_browsedfscontent.jsp) and go to the file for which the URL has to be shared.
Edit: Also check WebHDFS.
If you have hue installed in your cluster you could try
http://www.namenode:8888/filebrowser
or they also have standard chrome extension at this link that will basically covert the hdfs link for you
http://wiki.pentaho.com/display/BAD/Simple+Chrome+Extension+to+browse+HDFS+volumes
There is the Hue File Browser: upload/download files, list directories, change permissions, view directly different types of file... from any client with a Web Browser

Resources