I need to move files written by a Hive job that look like this
/foo/0000_0
/foo/0000_1
/bar/0000_0
into a file structure that looks like this
/foo/prefix1/prefix2-0000_0
/foo/prefix1/prefix2-0000_1
/bar/prefix1/prefix2-0000_0
before migrating this out of the cluster (using s3distcp). I've been looking around hadoop fs but I can't find something that would let me do this. I don't want to rename file by file.
first, you need to create the sub directory inside /foo. For this use following command
$hdfs dfs -mkdir /foo/prefix1
this will create a sub directory in /foo. if you want to create more subdirectory inside prefix1 use this same command recursively with updated path structure.In case you are using an older version of Hadoop (1.x) replace hdfs by hadoop.
now you can move files from /foo to /foo/prefix1 using the following command.Here newfilename can be any name you want to give to your file.
$hdfs dfs -mv /foo/filename /foo/prefix1/newfilename
Hope this answer your query
Related
I'm sorry if this is a rather simple question, but I haven't found anything exactly online and just needed a quick answer.
I am trying to copy files from one HDFS directory to a new directory to make a backup. I was given something like this:
hadoop fs -mkdir one/two/three/dir1_bkp
hadoop fs -cp one/two/three/dir1/* one/two/three/dir1_bkp
This should only copy all of the files in dir1 to dir1_bkp and not affect anything in dir1, correct?
Copying doesn't affect the source location, no.
Depending on the size of the data, distcp might be a better option
I have created a directory in hadoop and copied a file to that directory.
Now i want to create external hive table which will refer the above created file.
Is there way we can find out the root dir, under which prvys dir was created.
By default, hadoop fs -ls will look at /user/$(whoami)
If you echo that path, then -ls it, you should find the prvys directory. For example, hdfs:///user/liftadmin/
If you're using Kerberos, then the user directory depends on the ticket you've initialized the session with
I have a table in HDFS with the current path of /apps/hive/warehouse/ratings. I tried to download this to my local file system with the copyToLocal function in Hadoop.
The call worked and showed no errors, but when I go check in to the downloaded table is just a folder containing a file type.
Do you know what is the proper function call to download the table from HDFS as a CSV file?
This is the command that I am using at the moment
hadoop fs -copyToLocal /apps/hive/warehouse/ratings /home/maria_dev
this was to check what type of file i had
You can try
hadoop fs -get /apps/hive/warehouse/ratings /home/maria_dev
And after your file is in your local file system you can rename the file to what ever you want and add your preferred file format
I have a file sample.txt and i want to place it in hive warehouse directory (Not under the database xyz.db but directly into immediate subdirectory of warehouse). Is it possible?
To answer your question, since /user/hive/warehouse is just another folder on HDFS, you can move any file to the location without actually creating the file.
From the Hadoop Shell, you can achieve it by doing:
hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
From the Hive Prompt, you can do that by giving this command:
!hadoop fs -mv /user/hadoop/sample.txt /user/hive/warehouse/
Here the first URL is the source location of your file and the next URL is the destination i.e. Hive Warehouse where you wish to move your file.
But such a situation does not generally occur in a real scenario.
I have created a folder to drop the result file from a Pig process using the Store command. It works the first time, but the second time it compains that the folder already exists. What is the best practice for this situiation? Documentation is sparse on this topic.
My next step will be to rename the folder to the original file name, to reduce the impact of this. Any thoughts?
You can execute fs commands from within Pig, and should be able to delete the directory by issuing a fs -rmr command before running the STORE command:
fs -rmr dir
STORE A into 'dir' using PigStorage();
The only subtly is the fs command doesn't expect quotes around the directory name, whereas the store command does expect quotes around the directory name.