using find with variables in bash - bash

I am new to bash scripting and need help:
I need to remove specific files from a directory . My goal is to find in each subdirectory a file called "filename.A" and remove all files that starts with "filename" with extension B,
that is: "filename01.B" , "filename02.B" etc..
I tried:
B_folders="$(find /someparentdirectory -type d -name "*.B" | sed 's# (.*\)/.*#\1#'|uniq)"
A_folders="$(find "$B_folders" -type f -name "*.A")"
for FILE in "$A_folders" ; do
A="${file%.A}"
find "$FILE" -name "$A*.B" -exec rm -f {}\;
done
Started to get problems when the directories name contained spaces.
Any suggestions for the right way to do it?
EDIT:
My goal is to find in each subdirectory (may have spaces in its name), files in the form: "filename.A"
if such files exists:
check if "filename*.B" exists And remove it,
That is: remove: "filename01.B" , "filename02.B" etc..

In bash 4, it's simply
shopt -s globstar nullglob
for f in some_parent_directory/**/filename.A; do
rm -f "${f%.A}"*.B
done

If the space is the only issue you can modify the find inside the for as follows:
find "$FILE" -name "$A*.B" -print0 | xargs -0 rm
man find shows:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows
file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find output. This option corre-
sponds to the -0 option of xargs.
and xarg's manual
-0 Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literal-
ly). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or
backslashes. The GNU find -print0 option produces input suitable for this mode.

Related

bash script remove squares prefix when reading a file content [duplicate]

For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f |
while read file
do
if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ]
then
echo "found BOM in: $file"
fi
done
Or, if you prefer short, unreadable one-liners:
find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
It doesn't work with filenames that contain a line break,
but such files are not to be expected anyway.
Is there any shorter or more elegant solution?
Are there any interesting text editors or macros for text editors?
What about this one simple command which not just finds but clears the nasty BOM? :)
find . -type f -exec sed '1s/^\xEF\xBB\xBF//' -i {} \;
I love "find" :)
Warning The above will modify binary files which contain those three characters.
If you want just to show BOM files, use this one:
grep -rl $'\xEF\xBB\xBF' .
The best and easiest way to do this on Windows:
Total Commander → go to project's root dir → find files (Alt + F7) → file types *.* → Find text "EF BB BF" → check 'Hex' checkbox → search
And you get the list :)
find . -type f -print0 | xargs -0r awk '
/^\xEF\xBB\xBF/ {print FILENAME}
{nextfile}'
Most of the solutions given above test more than the first line of the file, even if some (such as Marcus's solution) then filter the results. This solution only tests the first line of each file so it should be a bit quicker.
If you accept some false positives (in case there are non-text files, or in the unlikely case there is a ZWNBSP in the middle of a file), you can use grep:
fgrep -rl `echo -ne '\xef\xbb\xbf'` .
You can use grep to find them and Perl to strip them out like so:
grep -rl $'\xEF\xBB\xBF' . | xargs perl -i -pe 's{\xEF\xBB\xBF}{}'
I would use something like:
grep -orHbm1 "^`echo -ne '\xef\xbb\xbf'`" . | sed '/:0:/!d;s/:0:.*//'
Which will ensure that the BOM occurs starting at the first byte of the file.
For a Windows user, see this (good PHP script for finding the BOM in your project).
An overkill solution to this is phptags (not the vi tool with the same name), which specifically looks for PHP scripts:
phptags --warn ./
Will output something like:
./invalid.php: TRAILING whitespace ("?>\n")
./invalid.php: UTF-8 BOM alone ("\xEF\xBB\xBF")
And the --whitespace mode will automatically fix such issues (recursively, but asserts that it only rewrites .php scripts.)
I used this to correct only JavaScript files:
find . -iname *.js -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;
find -type f -print0 | xargs -0 grep -l `printf '^\xef\xbb\xbf'` | sed 's/^/found BOM in: /'
find -print0 puts a null \0 between each file name instead of using new lines
xargs -0 expects null separated arguments instead of line separated
grep -l lists the files which match the regex
The regex ^\xeff\xbb\xbf isn't entirely correct, as it will match non-BOMed UTF-8 files if they have zero width spaces at the start of a line
If you are looking for UTF files, the file command works. It will tell you what the encoding of the file is. If there are any non ASCII characters in there it will come up with UTF.
file *.php | grep UTF
That won't work recursively though. You can probably rig up some fancy command to make it recursive, but I just searched each level individually like the following, until I ran out of levels.
file */*.php | grep UTF

How to look for files that have an extra character at the end?

I have a strange situation. A group of folks asked me to look at their hacked Wordpress site. When I got in, I noticed there were extra files here and there that had an extra non-printable character at end. In Bash, it shows it as a \r.
Just next to these files with the weird character is the original file. I'm trying to locate all these suspicious files and delete them. But the correct Bash incantation is eluding me.
find . | grep -i \?
and
find . | grep -i '\r'
aren't working
How do I use bash to find them?
Remove all files with filename ending in \r (carriage return), recursively, in current directory:
find . -type f -name $'*\r' -exec rm -fv {} +
Use ls -lh instead of rm to view the file list without removing.
Use rm -fvi to prompt before each removal.
-name GLOB specifies a matching glob pattern for find.
$'\r' is bash syntax for C style escapes.
You said "non-printable character", but ls indicates it's specifically a carriage return. The pattern '*[^[:graph:]' matches filenames ending in any non printable character, which may be relevant.
To remove all files and directories matching $'*\r' and all contents recursively: find . -name $'*\r' -exec rm -rfv {} +.
You have to pass carriage return character literally to grep. Use ANSI-C quoting in Bash.
find . -name $'*\r'
find . | grep $'\r'
find . | sed '/\x0d/!d'
if it a special character
Recursive look up
grep -ir $'\r'
# sample output
# empty line
Recursive look up + just print file name
grep -lir $'\r'
# sample output
file.txt
if it not a special character
You need to escape the backslash \ with a backslash so it becomes \\
Recursive look up
grep -ir '\\r$`
# sample output
file.txt:file.php\r
Recursive look up + just print file name
grep -lir '\\r$`
# sample output
file.txt
help:
-i case insensitive
-r recursive mode
-l print file name
\ escape another backslash
$ match the end
$'' the value is a special character e.g. \r, \t
shopt -s globstar # Enable **
shopt -s dotglob # Also cover hidden files
offending_files=(**/*$'\r')
should store into the array offending_files a list of all files which are compromised in that way. Of course you could also glob for **/*$'\r'*, which searches for all files having a carriage return anywhere in the name (not necessarily at the end).
You can then log the name of those broken files (which might make sense for auditing) and remove them.

Find a file and delete the parent level dir

How would it possible to delete the parent dir (only one-level above) where the file is located and is found with find command like
find . -type f -name "*.root" -size 1M
which returns
./level1/level1_chunk84/file.root
So, I want to do actually delete recursively the level_chunck84 dir for example..
thanks
You can try something like:
find . -type f -name "*.root" -size 1M -print0 | \
xargs -0 -n1 -I'{}' bash -c 'fpath={}; rm -r ${fpath%%$(basename {})}'
find + xargs combo is very common. Please refer to man find and you will find a few examples showing how to use them together.
All I did here I simply added -print0 flag to your original find statement:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print
uses). This allows file names that contain newlines or other types of white space to be correctly interpreted by programs that
process the find output. This option corresponds to the -0 option of xargs.
Then piped out everything to xargs which serves as a helper to craft further commands:
- execute everything in bash subshell
- assign file path to a variable fpath={}
- extract dirname from your file path
${parameter%%word}
Remove matching suffix pattern. The word is expanded to produce a pattern just as in pathname expansion. If the pattern matches a
trailing portion of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the
shortest matching pattern (the %'' case) or the longest matching pattern (the%%'' case) deleted. If parameter is # or *, the
pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the pattern removal operation is applied to each member of the array in turn, and the
expansion is the resultant list.
- and finally remove recursively
Also there's a little shorter version of it:
find . -type f -name "*.root" -size 1M -print0 | \
xargs -0 -n1 -I'{}' bash -c 'fpath={}; rm -r ${fpath%/*}'

How to overwrite the contents in the sed, without having backup file

I have a command like this:
sed -i -e '/console.log/ s/^\/*/\/\//' *.js
which does comments out all console.log statements. But there are two things
It keeps the backup file like test.js-e , I doesn't want to do that.
Say I want to the same process recursive to the folder, how to do it?
You don't have to use -e option in this particular case as it is unnecessary. This will solve your 1st problem (as -e seems to be going as suffix for -i option).
For the 2nd part, u can try something like this:
for i in $(find . -type f -name "*.js"); do sed -i '/console.log/ s/^\/*/\/\//' $i; done;
Use find to recursively find all .js files and do the replacement.
When checking sed's help, -i takes a suffix and uses it as a backup,
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
and the output backup seems to be samefile + -e which is the second argument you're sending, try removing the space and see if that would work
sed -ie '/console.log/ s/^\/*/\/\//' *.js
As for the recursion, you could use find with -exec or xargs, please modify the find command and test it before running exec
find -name 'console.log' -type f -exec sed -ie '/console.log/ s/^\/*/\/\//' *.js \;
From your original post I presume you just want to make a C-style comment leading like:
/*
to a double back-slash style like:
//
right?
Then you can do it with this command
find . -name "*.js" -type f -exec sed -i '/console.log/ s#^/\*#//#g' '{}' \;
To be awared that:
in sed the split character normally be / but if you found that annoying to Escape when your replacing or matching string contains a / . You can change the split character to # or | as you like, I found it very useful trick.
if you do want to do is what I presumed, be sure that you should Escape the character *, because a combination of regex /* just means to match a pattern that / occurs one time or many times or none at all, that will match everything, it's very dangerous!

How can I process a list of files that includes spaces in its names in Unix?

I'm trying to list the files in a directory and do something to them in the Mac OS X prompt.
It should go like this: for f in $(ls -1); do echo $f; done
If I have files without spaces in their names (fileA.txt, fileB.txt), the echo works fine.
If the files include spaces in their names ("file A.txt", "file B.txt"), I get 4 strings (file, A.txt, file, B.txt).
I've tried quoting the listing command, but it only changed the problem.
If I do this: for f in $(ls -1); do echo $f; done
I get: file A.txt\nfile B.txt
(It displays correctly, but it is a single string and I need the 2 lines separated.
Step away from ls if at all possible. Use find from the findutils package.
find /target/path -type f -print0 | xargs -0 your_command_here
-print0 will cause find to output the names separated by NUL characters (ASCII zero). The -0 argument to xargs tells it to expect the arguments separated by NUL characters too, so everything will work just fine.
Replace /target/path with the path under which your files are located.
-type f will only locate files. Use -type d for directories, or omit altogether to get both.
Replace your_command_here with the command you'll use to process the file names. (Note: If you run this from a shell using echo for your_command_here you'll get everything on one line - don't get confused by that shell artifact, xargs will do the expected right thing anyway.)
Edit: Alternatively (or if you don't have xargs), you can use the much less efficient
find /target/path -type f -exec your_command_here \{\} \;
\{\} \; is the escape for {} ; which is the placeholder for the currently processed file. find will then invoke your_command_here with {} ; replaced by the file name, and since your_command_here will be launched by find and not by the shell the spaces won't matter.
The second version will be less efficient since find will launch a new process for each and every file found. xargs is smart enough to pipe the commands to a newly launched process if it can figure it's safe to do so. Prefer the xargs version if you have the choice.
for f in *; do echo "$f"; done
should do what you want. Why are you using ls instead of * ?
In general, dealing with spaces in shell is a PITA. Take a look at the $IFS variable, or better yet at Perl, Ruby, Python, etc.
Here's an answer using $IFS as discussed by derobert
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
You can pipe the arguments into read. For example, to cat all files in the directory:
ls -1 | while read FILENAME; do cat "$FILENAME"; done
This means you can still use ls, as you have in your question, or any other command that produces $IFS delimited output.
The while loop makes it much easier to do several things to the argument, and makes complex processing more readable in my opinion. A contrived example:
ls -1 | while read FILE
do
echo 1: "$FILE"
echo 2: "$FILE"
done
look --quoting-style option.
for instance, --quoting-style=c would produce :
$ ls --quoting-style=c
"file1" "file2" "dir one"
Check out the manpage for xargs:
it works like this:
ls -1 /tmp/*.jpeg | xargs rm

Resources