How to remove first and last folder in 'find' result output? - bash

I want to search for folders by part of their name, which i know and it's common among these kind of folders. i used 'find' command in bash script like this
find . -type d -name "*.hg"
it just print out the whole path from current directory to the found folder itself. the foldr name has '.hg'.then i tried to use 'sed' command but i couldn't address the last part of the path. i decided to get the folder name ends in .hg save it in a variable then use 'sed' command to remove the last directory from output. i use this to get the last part, and try to save the result to a varable, no luck.
find . -type d -name "*.hg"|sed 's/*.hg$/ /'
find . -type d -name "*.hg"|awk -F/ '{print $NF}
this just print out the file names, here the folder with .hg at the end.
then i use different approach
for i in $(find . -type d -name '*.hg' );
do
$DIR = $(dirname ${i})
echo $DIR
done
this didin't work neither. can anyone point me any hint to make this works.
and yes it's homework.

You could use parameter expansion:
d=path/to/my/dir
d="${d#*/}" # remove the first dir
d="${d%/*}" # remove the last dir
echo $d # "to/my"

one problem that you have is with the pattern you are using in your sed script - there is a different pattern language used by both bash and the find command.
They use a very simple regular expression language where * means any number of any character and ? means any single character. The sed command uses a much richer regular expression language where * means any number of the previous character and . means any character (there's a lot more to it than that).
So to remove the last component of the path delivered by find you will need to use the following sed command: sed -e 's,/[^/].hg,,'
Alternatively you could use the dirname command. Pipe the output of the find command to xargs (which will run a command passing standard input as arguments to the command:
xargs -i dirname
#Pamador - that's strange. It works for me. Just to explain: the sed command needs to be quoted in single quotes just to protect against any unwanted shell expansions. The character following the 's' is a comma; what we're doing here is changing the character that sed uses to separate the two parts of the substitute command, this means that we can use the slash character without having to escape it without a preceding backslash. The next part matches any sequence of characters apart from a slash followed by any character and then hg. Honestly I should have anchored the pattern to the end of line with a $ but apart from that it's fine.
I tested it with
echo "./abc/xxx.hg" | sed -e 's,/[^/]\.hg$'
And it printed ./abc
Did I misunderstand what you wanted to do?

find . -type d -name "*.hg" | awk -v m=1 -v n=1 'NR<=m{};NR>n+m{print line[NR%n]};{line[NR%n]=$0}'
awk parameters:
m = number of lines to remove from beginning of output
n = number of
lines to remove from end of output
Bonus: If you wanted to remove 1 line from the end and you have coreutils installed, you could do this: find . -type d -name "*.hg" | ghead -n -1

Related

How to look for files that have an extra character at the end?

I have a strange situation. A group of folks asked me to look at their hacked Wordpress site. When I got in, I noticed there were extra files here and there that had an extra non-printable character at end. In Bash, it shows it as a \r.
Just next to these files with the weird character is the original file. I'm trying to locate all these suspicious files and delete them. But the correct Bash incantation is eluding me.
find . | grep -i \?
and
find . | grep -i '\r'
aren't working
How do I use bash to find them?
Remove all files with filename ending in \r (carriage return), recursively, in current directory:
find . -type f -name $'*\r' -exec rm -fv {} +
Use ls -lh instead of rm to view the file list without removing.
Use rm -fvi to prompt before each removal.
-name GLOB specifies a matching glob pattern for find.
$'\r' is bash syntax for C style escapes.
You said "non-printable character", but ls indicates it's specifically a carriage return. The pattern '*[^[:graph:]' matches filenames ending in any non printable character, which may be relevant.
To remove all files and directories matching $'*\r' and all contents recursively: find . -name $'*\r' -exec rm -rfv {} +.
You have to pass carriage return character literally to grep. Use ANSI-C quoting in Bash.
find . -name $'*\r'
find . | grep $'\r'
find . | sed '/\x0d/!d'
if it a special character
Recursive look up
grep -ir $'\r'
# sample output
# empty line
Recursive look up + just print file name
grep -lir $'\r'
# sample output
file.txt
if it not a special character
You need to escape the backslash \ with a backslash so it becomes \\
Recursive look up
grep -ir '\\r$`
# sample output
file.txt:file.php\r
Recursive look up + just print file name
grep -lir '\\r$`
# sample output
file.txt
help:
-i case insensitive
-r recursive mode
-l print file name
\ escape another backslash
$ match the end
$'' the value is a special character e.g. \r, \t
shopt -s globstar # Enable **
shopt -s dotglob # Also cover hidden files
offending_files=(**/*$'\r')
should store into the array offending_files a list of all files which are compromised in that way. Of course you could also glob for **/*$'\r'*, which searches for all files having a carriage return anywhere in the name (not necessarily at the end).
You can then log the name of those broken files (which might make sense for auditing) and remove them.

terminal unix command find problem with setting up modes

I have trouble constructing a single find line to do the following:
find all files in the current dir and sub-dir with name ending with ~. or star and end with '#'.I think I have made a fundamental mistake but not so sure after 2 hours of thinking.
This is what I came up with and it does not seem to work:
find -name '[#]' -a -name '[~#]'
macOSX terminal
You could use a combination of ls and grep to find all the files ending with either ~ or #
ls * | grep -E "*.(\~|#)"
ls -R * will show all files in the current dir and sub-dir;
grep -E will search for lines matching a regular expression;
"*.(\~|#)" will match all lines ending with either ~ or # (note that you'll need to escape the ~ with \ since it's a special character).

Using find within a for loop to extract portion of file names as a variable (bash)

I have a number of files with a piece of useful information in their names that I want to extract as a variable and use in a subsequent step. The structure of the file names is samplename_usefulbit_junk. I'm attempting to loop through these files using a predictable portion of the file name (samplename), store the whole name in a variable, and use sed to extract the useful bit. It does not work.
samples="sample1 sample2 sample3"
for i in $samples; do
filename="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n')"
usefulbit="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n' | sed 's/.*samplename//g' | sed 's/junk.*//g')"
(More steps using $usefulbit or $(usefulbit) or ${usefulbit} or something)
done
find ./$FILE_DIR -maxdepth 1 -name 'sample1*' -printf '%f\n' and find ./$FILE_DIR -maxdepth 1 -name "sample1*" -printf '%f\n' both work, but no combination of parentheses, curly brackets, or single-, double-, or backquotes has got the loop to work. Where is this going wrong?
Try this:
for file in `ls *_*_*.*`
do
echo "Full file name is: $file"
predictable_portion_filename=${file_%%_*}
echo "predictable portion in the filename is: ${predictable_portion_filename}"
echo "---"
done
PS: $variable or ${variable} or "${variable}" or "$variable" are different than $(variable) as in the last case, $( ... ) makes a sub-shell and treats anything inside as a command i.e. $(variable) will make the sub-shell to execute a command named variable
In place of ls __., you can also use (to recursively find all files with that standard file name): ls -1R *_*_*.*
In place of using ${file%%_*} you can also use: echo ${file} | cut -d'_' -f1 to get the predictable value. You can use various other ways as well (awk, sed/ etc).
Excuse me, i can't do it with bash, may i show you another approach? Here is a shell (lua-shell) i am developing, and a demo as a solution for your case:
wws$ `ls ./demo/2
sample1_red_xx.png sample2_green_xx.png sample3_blue_xx.png
wws$ source ./demo/2.lua
sample1_red_xx.png: red
sample2_green_xx.png: green
sample3_blue_xx.png: blue
wws$
I really want to know your whole plan , unless you need bash as the only tool...
Er, i fogot to paste the script:
samples={"sample1", "sample2", "sample3"}
files = lfs.collect("./demo/2")
function get_filename(prefix)
for i, file in pairs(files) do
if string.match(file.name, prefix) then return file.name end
end
end
for i = 1, #samples do
local filename = get_filename(samples[i])
vim:set(filename)
:f_lvf_hy
print(filename ..": ".. vim:clipboard())
end
The 'get_filename()' seems a little verbose... i haven't finished the lfs component.
I'm not sure whether answering my own question with my final solution is proper stackoverflow etiquette, but this is what ultimately worked for me:
for i in directory/*.ext; do
myfile="$i"
name="$(echo $i | sed 's!.*/!!g' | sed 's/_junk*.ext//g')"
some other steps
done
This way I start with the file name already a variable (in a variable?) and don't have to struggle with find and its strong opinions. It also spares me from having to make a list of sample names.
The first sed removes the directory/ and the second removes the end of the file name and extension, leaving a variable $name that I use as a prefix when generating other files in subsequent steps. So much simpler!

using find with variables in bash

I am new to bash scripting and need help:
I need to remove specific files from a directory . My goal is to find in each subdirectory a file called "filename.A" and remove all files that starts with "filename" with extension B,
that is: "filename01.B" , "filename02.B" etc..
I tried:
B_folders="$(find /someparentdirectory -type d -name "*.B" | sed 's# (.*\)/.*#\1#'|uniq)"
A_folders="$(find "$B_folders" -type f -name "*.A")"
for FILE in "$A_folders" ; do
A="${file%.A}"
find "$FILE" -name "$A*.B" -exec rm -f {}\;
done
Started to get problems when the directories name contained spaces.
Any suggestions for the right way to do it?
EDIT:
My goal is to find in each subdirectory (may have spaces in its name), files in the form: "filename.A"
if such files exists:
check if "filename*.B" exists And remove it,
That is: remove: "filename01.B" , "filename02.B" etc..
In bash 4, it's simply
shopt -s globstar nullglob
for f in some_parent_directory/**/filename.A; do
rm -f "${f%.A}"*.B
done
If the space is the only issue you can modify the find inside the for as follows:
find "$FILE" -name "$A*.B" -print0 | xargs -0 rm
man find shows:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows
file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find output. This option corre-
sponds to the -0 option of xargs.
and xarg's manual
-0 Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literal-
ly). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or
backslashes. The GNU find -print0 option produces input suitable for this mode.

How can I process a list of files that includes spaces in its names in Unix?

I'm trying to list the files in a directory and do something to them in the Mac OS X prompt.
It should go like this: for f in $(ls -1); do echo $f; done
If I have files without spaces in their names (fileA.txt, fileB.txt), the echo works fine.
If the files include spaces in their names ("file A.txt", "file B.txt"), I get 4 strings (file, A.txt, file, B.txt).
I've tried quoting the listing command, but it only changed the problem.
If I do this: for f in $(ls -1); do echo $f; done
I get: file A.txt\nfile B.txt
(It displays correctly, but it is a single string and I need the 2 lines separated.
Step away from ls if at all possible. Use find from the findutils package.
find /target/path -type f -print0 | xargs -0 your_command_here
-print0 will cause find to output the names separated by NUL characters (ASCII zero). The -0 argument to xargs tells it to expect the arguments separated by NUL characters too, so everything will work just fine.
Replace /target/path with the path under which your files are located.
-type f will only locate files. Use -type d for directories, or omit altogether to get both.
Replace your_command_here with the command you'll use to process the file names. (Note: If you run this from a shell using echo for your_command_here you'll get everything on one line - don't get confused by that shell artifact, xargs will do the expected right thing anyway.)
Edit: Alternatively (or if you don't have xargs), you can use the much less efficient
find /target/path -type f -exec your_command_here \{\} \;
\{\} \; is the escape for {} ; which is the placeholder for the currently processed file. find will then invoke your_command_here with {} ; replaced by the file name, and since your_command_here will be launched by find and not by the shell the spaces won't matter.
The second version will be less efficient since find will launch a new process for each and every file found. xargs is smart enough to pipe the commands to a newly launched process if it can figure it's safe to do so. Prefer the xargs version if you have the choice.
for f in *; do echo "$f"; done
should do what you want. Why are you using ls instead of * ?
In general, dealing with spaces in shell is a PITA. Take a look at the $IFS variable, or better yet at Perl, Ruby, Python, etc.
Here's an answer using $IFS as discussed by derobert
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
You can pipe the arguments into read. For example, to cat all files in the directory:
ls -1 | while read FILENAME; do cat "$FILENAME"; done
This means you can still use ls, as you have in your question, or any other command that produces $IFS delimited output.
The while loop makes it much easier to do several things to the argument, and makes complex processing more readable in my opinion. A contrived example:
ls -1 | while read FILE
do
echo 1: "$FILE"
echo 2: "$FILE"
done
look --quoting-style option.
for instance, --quoting-style=c would produce :
$ ls --quoting-style=c
"file1" "file2" "dir one"
Check out the manpage for xargs:
it works like this:
ls -1 /tmp/*.jpeg | xargs rm

Resources