Hadoop delete all files ends with a name in a folder

Hadoop delete all files ends with a name in a folder - hadoop

How can I delete all files end with a specific name on HDFS? I'm trying to type hadoop fs -rm -R /path/*<end_of_file_name>, where * is passed as a wildcard. But I received an error not able to find such file or directory.

The asterisk is expanded within your local shell; you need to quote the argument so that the full string is passed to the namenode
hadoop fs -rm "/path/*end"

Related

Hadoop/HDFS: put command fails - No such file or directory

I do not know why I cannot move a file from one directory to another. I can view the content of the file but I cannot move the same file into another directory.
WORKS FINE:
hadoop fs -cat /user/hadoopusr/project-data.txt
DOES NOT WORK:
hadoop fs -put /user/hadoopusr/project-data.txt /user/hadoopusr/Projects/MarketAnalysis
I got a No such file or directory error message. What is wrong? Please help. Thank you!

As we can read from here about the -put command:
This command is used to copy files from the local file system to the
HDFS filesystem. This command is similar to –copyFromLocal command.
This command will not work if the file already exists unless the –f
flag is given to the command. This overwrites the destination if the
file already exists before the copy
Which makes it clear why it doesn't work and throws the No such file or directory message. It's because it can't find any file with the name project-data.txt on your current directory of your local filesystem.
You plan on moving a file between directories inside the HDFS, so instead of using the -put parameter for moving, we can simply use the -mv parameter as we would in our local filesystem!
Tested it out on my own HDFS as follows:
Create the source and destination directories in HDFS
hadoop fs -mkdir source_dir dest_dir
Create an empty (for the sake of the test) file under the source directory
hadoop fs -touch source_dir/test.txt
Move the empty file to the destination directory
hadoop fs -mv source_dir/test.txt dest_dir/test.txt
(Notice how the /user/username/part of the path for the file and the destination directory is not needed, because HDFS is by default on this directory where you are working. You also should note that you have to write the full path of the destination with name of the file included.)
You can see below with the HDFS browser that the empty text file has been moved to the destination directory:

How to interpret Hadoop Grep command output

This is a very basic question concerning the output files generated from running the Grep utility inside a HDFS directory. Essentially, I've included the grep command inside a simple shell script, which is supposed to search this directory for a given string - which is a parameter to the script. The contents of the script are as follows:
#!/bin/bash
set - e
cd $HADOOP_HOME
bin/hadoop org.apache.hadoop.examples.Grep
"hdfs://localhost:9000/user/hduser" "hdfs://localhost:9000/user/hduser/out" $1
bin/hadoop fs -get "hdfs://localhost:9000/user/hduser/out/*" "/opt/data/out/"
bin/hadoop fs -rm -r "hdfs://localhost:9000/user/hduser/out"
The results sent to the hdfs out directory are copied across to a local directory in the second last line. I've deliberately placed two files in this hdfs directory, only one of which contains multiple instances of the string I'm searching for. What ends up in my /opt/data/out directory are the following 2 files.
_SUCCESS
part-r-00000
The jobs look like they ran successfully, however the only content i'm seeing between both files, is in the "part-r-0000" file, and it's literally the following.
29472 e
I suppose I was naively hoping to see the filename where the string was located, and perhaps a count of the number of times it occurred.
My question is, how and where are these values typically returned from the hadoop grep command? I've looked through the console out while the map reduce jobs where running, and there's no reference to the file name where the search string is stored. Any pointers as to how I can access this information would be appreciated, as I'm unsure how to interpret "29472 e".

I understand like...
You have some jobs' output in HDFS, which you copy to your local.
You are then trying to get the count of a string in the files.
In that case, add the code after the below line
bin/hadoop fs -get "hdfs://localhost:9000/user/hduser/out/*" "/opt/data/out/"
grep -c $1 /opt/data/out/*
This command will do what is expected.
It will give the file name and also the count of strings found in the file.

Error in Hadoop mv command for empty directory

Does hadoop filesystem shell moving of empty directory?
Assume that I have a below directory which is empty.
hadoop fs -mv /user/abc/* /user/xyz/*
When I am executing the above command , it is giving me the error
'/user/abc/*' does not exists.
However, If I put some data inside /user/abc/* , it is getting executed successfully.
Does anyone know how to handle for empty directory?
Is there any alternative to execute above command without giving error?

hadoop fs -mv /user/abc/* /user/xyz
The destination file doesn't need to add /*
I thinks you want to rename the file.
you also can use this ->
hadoop fs -mv /user/abc /user/xyz
Because you xyz file is empty，so you don't got error.
but if you xyz file has many file,you will get error as well.

This answer should be correct I believe.
hadoop fs -mv /user/abc /user/xyz
'*' is a wild card. So it's looking for any file inside the folder. When nothing found, it returns the error.
As per the command,
When you move a file, all links to otherfiles remain intact, except when youmove it to a different file system.

How to remove files inside the hadoop directory at once?

I want to remove all the files containes in hadoop directory, without removing the directory itself. I've tried using rm -r
but it removed the whole directory.

Please include a wildcard character * after the desired folder you want to delete, to avoid deleting the parent folder. Please look at the example below:
hdfs dfs -rm -r '/home/user/folder/*'

referring to the previous answer, you need to quote the asterisk:
hdfs dfs -rm -r "/home/user/folder/*"

Use hdfs command to delete all files in it. For example, if your hadoop path is /user/your_user_name/* then use asterisk to delete all files inside the specific folder.
hdfs dfs -rm -r '/user/your_user_name/*'

Remove pig directories matching pattern

I read this question about loading Pig directories from a matched pattern, but I want to run a job that deletes in the same way. I have time-stamped directories i.e. /mydir/02-03-01, /mydir/02-03-02, /mydir/02-03-03 etc and want to delete say, 02-03-01 through 02-03-01. I tried
rmf /mydir/02-03-{01,02}/
With and without quotes to no effect. Any ideas?

below one is working for me. it should be the first command in pig script.
fs -rmr -skipTrash /user/root/mydir/02-03-{01,02};
-rmr is deprecated. you can also use this
fs -rm -r -skipTrash /user/root/mydir/02-03-{01,02,03};

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio