I have a file sample.txt and i want to place it in hive warehouse directory (Not under the database xyz.db but directly into immediate subdirectory of warehouse). Is it possible?
To answer your question, since /user/hive/warehouse is just another folder on HDFS, you can move any file to the location without actually creating the file.
From the Hadoop Shell, you can achieve it by doing:
hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
From the Hive Prompt, you can do that by giving this command:
!hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
Here the first URL is the source location of your file and the next URL is the destination i.e. Hive Warehouse where you wish to move your file.
But such a situation does not generally occur in a real scenario.
Related
Hive version is 3.1.0 and sql is LOAD DATA INPATH 'filepath' OVERWRITE INTO TABLE tablename. filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). I hope hive only copies files, not moves to hive warehouse dir, because files are also used elsewhere. What should I do?
LOAD DATA command moves files. If you want to copy, use one of the above commands:
Use copyFromLocal command:
hdfs dfs -copyFromLocal <localsrc> URI
or put command:
hdfs dfs -put <localsrc> ... <dst>
If your files are already in HDFS, alternatively you can create table/partition on top of that directory, specifying location, without copying them at all. ALTER TABLE SET location also will work.
ROW FORMAT DELIMITED FIELDS TERMINATED BY '${database_delimiter}'
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION '${database_location}/Person';
Here person is expected to be a directory. Whereas person is a part-m file and not a directory.
If I understand the question correctly, Hive will indeed fail to create a table over a file. It needs to be a directory location.
Therefore, whatever process you have needs to make said directory.
For example, whatever mapper process you have, you needed to specify an output directory, and if you failed to do that, then your files are placed in some location next to other files. (MapReduce should fail saying the destination directory already exists, though).
What you could do is move all part files into a new location
$ hdfs dfs -mkdir -p ${database_location}/Person/
$ # create hive table using that location
$ hdfs dfs -mv ${database_location}/part-m* ${database_location}/Person/
$ # run hive query
Or, if you had raw files, you can do something similar
$ hdfs dfs -mkdir -p ${database_location}/Person/
$ # create hive table using that location
$ hdfs dfs -put somefile ${database_location}/Person/
$ # run hive query
Or use LOCAL DATA INPATH to read from one HDFS location to a Hive table
I have a table in HDFS with the current path of /apps/hive/warehouse/ratings. I tried to download this to my local file system with the copyToLocal function in Hadoop.
The call worked and showed no errors, but when I go check in to the downloaded table is just a folder containing a file type.
Do you know what is the proper function call to download the table from HDFS as a CSV file?
This is the command that I am using at the moment
hadoop fs -copyToLocal /apps/hive/warehouse/ratings /home/maria_dev
this was to check what type of file i had
You can try
hadoop fs -get /apps/hive/warehouse/ratings /home/maria_dev
And after your file is in your local file system you can rename the file to what ever you want and add your preferred file format
I am a beginner in hadoop. I have two doubts
1) how to access files stored in the hdfs? Is it same as using a FileReader in java.io and giving the local path or is it something else?
2) i have created a folder where i have copied the file to be stored in hdfs and the jar file of the mapreduce program. When I run the command in any directory
${HADOOP_HOME}/bin/hadoop dfs -ls
it just shows me all the files in the current dir. So does that mean all the files got added without me explicitly adding it?
Yes, it's pretty much the same. Read this post to read files from HDFS.
You should keep in mind that HDFS is different than your local file system. With hadoop dfs you access the HDFS, not the local file system. So, hadoop dfs -ls /path/in/HDFS shows you the contents of the /path/in/HDFS directory, not the local one. That's why it's the same, no matter where you run it from.
If you want to "upload" / "download" files to/from HDFS you should use the commads:
hadoop dfs -copyFromLocal /local/path /path/in/HDFS and
hadoop dfs -copyToLocal /path/in/HDFS /local/path, respectively.
I have partitioned table in hive. So I wanna see the directory structure in hadoop hdfs?
From documentation, I have found the following command
hadoop fs -ls /app/hadoop/tmp/dfs/data/
and /app/hadoop/tmp/dfs/data/ is my data path. But this command return
ls: Cannot access /app/hadoop/tmp/dfs/data/: No such file or
directory.
Am I missing something there?
Unless I'm mistaken, it seems you are looking for a temporary directory that you probably defined in the property hadoop.tmp.dir. This is a local directory, but when you do hadoop fs -ls you are looking at what files are available in HDFS, so you won't see anything.
Since you're looking or the Hive directories, you are looking for the following property in your hive-site.xml:
hive.metastore.warehouse.dir
The default is /user/hive/warehouse, so if you haven't changed this property you should be able to do:
hadoop fs -ls /user/hive/warehouse
And this should show you your table directories.
check whether tmp directory is correctly set in your core-site.xml file and hdfs-site.xml.
if not set, then the temporary directory of operating system(tmp in ubuntu and %temp% in windows) will be set to hadoop tmp folder, due to which you may lose your data after restarting your computer. Set this dfs.tmp.dir in both the xml and restart your cluster. It will work fine then.
even after this if it is not resolved, please give more details about partitioning table code and the table data too.