How to view the hadoop data directory structure? - hadoop

I have partitioned table in hive. So I wanna see the directory structure in hadoop hdfs?
From documentation, I have found the following command
hadoop fs -ls /app/hadoop/tmp/dfs/data/
and /app/hadoop/tmp/dfs/data/ is my data path. But this command return
ls: Cannot access /app/hadoop/tmp/dfs/data/: No such file or
directory.
Am I missing something there?

Unless I'm mistaken, it seems you are looking for a temporary directory that you probably defined in the property hadoop.tmp.dir. This is a local directory, but when you do hadoop fs -ls you are looking at what files are available in HDFS, so you won't see anything.
Since you're looking or the Hive directories, you are looking for the following property in your hive-site.xml:
hive.metastore.warehouse.dir
The default is /user/hive/warehouse, so if you haven't changed this property you should be able to do:
hadoop fs -ls /user/hive/warehouse
And this should show you your table directories.

check whether tmp directory is correctly set in your core-site.xml file and hdfs-site.xml.
if not set, then the temporary directory of operating system(tmp in ubuntu and %temp% in windows) will be set to hadoop tmp folder, due to which you may lose your data after restarting your computer. Set this dfs.tmp.dir in both the xml and restart your cluster. It will work fine then.
even after this if it is not resolved, please give more details about partitioning table code and the table data too.

Related

How to get absolute path for directory in hadoop

I have created a directory in hadoop and copied a file to that directory.
Now i want to create external hive table which will refer the above created file.
Is there way we can find out the root dir, under which prvys dir was created.
By default, hadoop fs -ls will look at /user/$(whoami)
If you echo that path, then -ls it, you should find the prvys directory. For example, hdfs:///user/liftadmin/
If you're using Kerberos, then the user directory depends on the ticket you've initialized the session with

What does copyToLocal in the hadoop environment return?

I have a table in HDFS with the current path of /apps/hive/warehouse/ratings. I tried to download this to my local file system with the copyToLocal function in Hadoop.
The call worked and showed no errors, but when I go check in to the downloaded table is just a folder containing a file type.
Do you know what is the proper function call to download the table from HDFS as a CSV file?
This is the command that I am using at the moment
hadoop fs -copyToLocal /apps/hive/warehouse/ratings /home/maria_dev
this was to check what type of file i had
You can try
hadoop fs -get /apps/hive/warehouse/ratings /home/maria_dev
And after your file is in your local file system you can rename the file to what ever you want and add your preferred file format

How files or directories are getting stored in hadoop hdfs

I have created a file in hdfs using below command
hdfs dfs -touchz /hadoop/dir1/file1.txt
I could see the created file by using below command
hdfs dfs -ls /hadoop/dir1/
But, I could not find the location itself by using linux commands (using find or locate). I searched on internet and found following link.
How to access files in Hadoop HDFS? . It says, hdfs is virtual storage. In that case, How its taking partition which one or how much it needs to be used, where the meta data being stored
Is it taking datanode location for virtual storage which I have mentioned in hdfs-site.xml to store all the data?
I looked into datanode location and there are files available. But I could not find out anything related to my file or folder which I have created.
(I am using hadoop 2.6.0)
HDFS file system is a distributed storage system wherein the storage location is virtual and created using the disk space from all the DataNodes. While installing hadoop, you must have specified paths for dfs.namenode.name.dir and dfs.datanode.data.dir. These are the locations at which all the HDFS related files are stored on individual nodes.
While storing the data onto HDFS, it is stored as blocks of a specified size (default 128MB in Hadoop 2.X). When you use hdfs dfs commands you will see the complete files but internally HDFS stores these files as blocks. If you check the above mentioned paths on your local file system, you will see a bunch of files which correcpond to files on your HDFS. But again, you will not see them as actual files as they are split into blocks.
Check below mentioned command's output to get more details on how much space from each DataNode is used to create the virtual HDFS storage.
hdfs dfsadmin -report #Or
sudo -u hdfs hdfs dfsadmin -report
HTH
As we creating a file in local file system i.e on creating a directory in it
for ex:$/mkdir MITHUN94** it is a directory entering into that(LFS) cd MITHUN90
in that **create a new file as **$nano file1.log .
And now create a directory in** hdfs for ex: hdfs dfs -mkdir /mike90 .Here "mike90"
refers to directory name . After that creating a directory send files from LFS to hdfs. By using this command $hdfs dfs -copyFromLocal /home/gopalkrishna/file1.log
/mike90
Here '/home/gopalkrishna/file1.log' refers to pwd (present working directory)
and '/mike90' refers to directory in hdfs. By clickig $hdfs dfs -ls /mike90
the list of files .

Reading files from hdfs vs local directory

I am a beginner in hadoop. I have two doubts
1) how to access files stored in the hdfs? Is it same as using a FileReader in java.io and giving the local path or is it something else?
2) i have created a folder where i have copied the file to be stored in hdfs and the jar file of the mapreduce program. When I run the command in any directory
${HADOOP_HOME}/bin/hadoop dfs -ls
it just shows me all the files in the current dir. So does that mean all the files got added without me explicitly adding it?
Yes, it's pretty much the same. Read this post to read files from HDFS.
You should keep in mind that HDFS is different than your local file system. With hadoop dfs you access the HDFS, not the local file system. So, hadoop dfs -ls /path/in/HDFS shows you the contents of the /path/in/HDFS directory, not the local one. That's why it's the same, no matter where you run it from.
If you want to "upload" / "download" files to/from HDFS you should use the commads:
hadoop dfs -copyFromLocal /local/path /path/in/HDFS and
hadoop dfs -copyToLocal /path/in/HDFS /local/path, respectively.

Having file in hive wareshouse

I have a file sample.txt and i want to place it in hive warehouse directory (Not under the database xyz.db but directly into immediate subdirectory of warehouse). Is it possible?
To answer your question, since /user/hive/warehouse is just another folder on HDFS, you can move any file to the location without actually creating the file.
From the Hadoop Shell, you can achieve it by doing:
hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
From the Hive Prompt, you can do that by giving this command:
!hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
Here the first URL is the source location of your file and the next URL is the destination i.e. Hive Warehouse where you wish to move your file.
But such a situation does not generally occur in a real scenario.

Resources