Transfer a path with space in bash - bash

I'm trying to run a program on every file on a dir.
But there is spaces in the name of the file. For example, a file can be named «/my/good/path/MY - AWFUL, FILE.DOC»
And when I'm trying to send the path to my the other tool (a python script), I've got an error saying «MY» is not a existing file. :(
Here is my current bash code:
#!/usr/bin/bash
for file in $(find "/my/pash" -name "*.DOC")
do
newvar=`program "$file"`
done
So… where is my problem?
Thanks everyone :)

Some correct answers, but no explanations so far:
a for loop is intended to iterate over words not lines. The given (unquoted) string is subject to word splitting (which is what is troubling you) and filename expansion, and then you iterate over the resulting words. You could set IFS to contain only a newline. The safest way is to use find -print0 and xargs -0 as demonstrated by Vytenis's answer

find -name "*.DOC" -print0 | xargs -r -0 -n1 program

#!/usr/bin/bash
find "/my/pash" -name "*.DOC" | while read file; do
newvar="$(program "$file")"
done
Note that this only fixes the case where a space or tab is in the file name. If you have a newline in the file name, it gets a little more complicated.

That is because the for loop will take every word inside the result of the find as an element to iterate over. for will see it as:
for file in {/my/good/path/MY, -, AWFUL, FILE.DOC}
echo "$file"
done
And will print:
/my/good/path/MY
-
AWFUL,
FILE.DOC
One solution to this problem is to use the xargs program to pass the result of the find as your python program argument:
find "/my/pash" -name "*.DOC" -print0 | xargs -0 -i program "{}"

the loop treats blanks as delimiter, so try this one:
find "/my/pash" -name "*.DOC" | while read file; do
newvar=`program "$file"`
done

Related

bash script remove squares prefix when reading a file content [duplicate]

For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f |
while read file
do
if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ]
then
echo "found BOM in: $file"
fi
done
Or, if you prefer short, unreadable one-liners:
find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
It doesn't work with filenames that contain a line break,
but such files are not to be expected anyway.
Is there any shorter or more elegant solution?
Are there any interesting text editors or macros for text editors?
What about this one simple command which not just finds but clears the nasty BOM? :)
find . -type f -exec sed '1s/^\xEF\xBB\xBF//' -i {} \;
I love "find" :)
Warning The above will modify binary files which contain those three characters.
If you want just to show BOM files, use this one:
grep -rl $'\xEF\xBB\xBF' .
The best and easiest way to do this on Windows:
Total Commander → go to project's root dir → find files (Alt + F7) → file types *.* → Find text "EF BB BF" → check 'Hex' checkbox → search
And you get the list :)
find . -type f -print0 | xargs -0r awk '
/^\xEF\xBB\xBF/ {print FILENAME}
{nextfile}'
Most of the solutions given above test more than the first line of the file, even if some (such as Marcus's solution) then filter the results. This solution only tests the first line of each file so it should be a bit quicker.
If you accept some false positives (in case there are non-text files, or in the unlikely case there is a ZWNBSP in the middle of a file), you can use grep:
fgrep -rl `echo -ne '\xef\xbb\xbf'` .
You can use grep to find them and Perl to strip them out like so:
grep -rl $'\xEF\xBB\xBF' . | xargs perl -i -pe 's{\xEF\xBB\xBF}{}'
I would use something like:
grep -orHbm1 "^`echo -ne '\xef\xbb\xbf'`" . | sed '/:0:/!d;s/:0:.*//'
Which will ensure that the BOM occurs starting at the first byte of the file.
For a Windows user, see this (good PHP script for finding the BOM in your project).
An overkill solution to this is phptags (not the vi tool with the same name), which specifically looks for PHP scripts:
phptags --warn ./
Will output something like:
./invalid.php: TRAILING whitespace ("?>\n")
./invalid.php: UTF-8 BOM alone ("\xEF\xBB\xBF")
And the --whitespace mode will automatically fix such issues (recursively, but asserts that it only rewrites .php scripts.)
I used this to correct only JavaScript files:
find . -iname *.js -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;
find -type f -print0 | xargs -0 grep -l `printf '^\xef\xbb\xbf'` | sed 's/^/found BOM in: /'
find -print0 puts a null \0 between each file name instead of using new lines
xargs -0 expects null separated arguments instead of line separated
grep -l lists the files which match the regex
The regex ^\xeff\xbb\xbf isn't entirely correct, as it will match non-BOMed UTF-8 files if they have zero width spaces at the start of a line
If you are looking for UTF files, the file command works. It will tell you what the encoding of the file is. If there are any non ASCII characters in there it will come up with UTF.
file *.php | grep UTF
That won't work recursively though. You can probably rig up some fancy command to make it recursive, but I just searched each level individually like the following, until I ran out of levels.
file */*.php | grep UTF

How find filename starts with letter and save it to variable and save it to var?

I need to write a line in bash which will find file that starts with letter "T" in specified folder and save it to $VAR.
Let's assume that C:/workspace/ has one file that starts with letter "T".
I need to find that file and save its name to variable.
Honestly, that's all I managed to create. I do not know how to create it. I think I need to use sed to do it.
FILE_PATH='C:/workspace/'T.*'' | sed "s/.*\///"
echo "$FILE_PATH"
you mean something like that?
var=$(find /path/to/special/folder -maxdepth 1 -type f -name "T*" -printf "%f\n")
I just used ls
TFileName=`ls T*`
and then after pipe I concat filename with sed at my discretion

Using find within a for loop to extract portion of file names as a variable (bash)

I have a number of files with a piece of useful information in their names that I want to extract as a variable and use in a subsequent step. The structure of the file names is samplename_usefulbit_junk. I'm attempting to loop through these files using a predictable portion of the file name (samplename), store the whole name in a variable, and use sed to extract the useful bit. It does not work.
samples="sample1 sample2 sample3"
for i in $samples; do
filename="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n')"
usefulbit="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n' | sed 's/.*samplename//g' | sed 's/junk.*//g')"
(More steps using $usefulbit or $(usefulbit) or ${usefulbit} or something)
done
find ./$FILE_DIR -maxdepth 1 -name 'sample1*' -printf '%f\n' and find ./$FILE_DIR -maxdepth 1 -name "sample1*" -printf '%f\n' both work, but no combination of parentheses, curly brackets, or single-, double-, or backquotes has got the loop to work. Where is this going wrong?
Try this:
for file in `ls *_*_*.*`
do
echo "Full file name is: $file"
predictable_portion_filename=${file_%%_*}
echo "predictable portion in the filename is: ${predictable_portion_filename}"
echo "---"
done
PS: $variable or ${variable} or "${variable}" or "$variable" are different than $(variable) as in the last case, $( ... ) makes a sub-shell and treats anything inside as a command i.e. $(variable) will make the sub-shell to execute a command named variable
In place of ls __., you can also use (to recursively find all files with that standard file name): ls -1R *_*_*.*
In place of using ${file%%_*} you can also use: echo ${file} | cut -d'_' -f1 to get the predictable value. You can use various other ways as well (awk, sed/ etc).
Excuse me, i can't do it with bash, may i show you another approach? Here is a shell (lua-shell) i am developing, and a demo as a solution for your case:
wws$ `ls ./demo/2
sample1_red_xx.png sample2_green_xx.png sample3_blue_xx.png
wws$ source ./demo/2.lua
sample1_red_xx.png: red
sample2_green_xx.png: green
sample3_blue_xx.png: blue
wws$
I really want to know your whole plan , unless you need bash as the only tool...
Er, i fogot to paste the script:
samples={"sample1", "sample2", "sample3"}
files = lfs.collect("./demo/2")
function get_filename(prefix)
for i, file in pairs(files) do
if string.match(file.name, prefix) then return file.name end
end
end
for i = 1, #samples do
local filename = get_filename(samples[i])
vim:set(filename)
:f_lvf_hy
print(filename ..": ".. vim:clipboard())
end
The 'get_filename()' seems a little verbose... i haven't finished the lfs component.
I'm not sure whether answering my own question with my final solution is proper stackoverflow etiquette, but this is what ultimately worked for me:
for i in directory/*.ext; do
myfile="$i"
name="$(echo $i | sed 's!.*/!!g' | sed 's/_junk*.ext//g')"
some other steps
done
This way I start with the file name already a variable (in a variable?) and don't have to struggle with find and its strong opinions. It also spares me from having to make a list of sample names.
The first sed removes the directory/ and the second removes the end of the file name and extension, leaving a variable $name that I use as a prefix when generating other files in subsequent steps. So much simpler!

using find with variables in bash

I am new to bash scripting and need help:
I need to remove specific files from a directory . My goal is to find in each subdirectory a file called "filename.A" and remove all files that starts with "filename" with extension B,
that is: "filename01.B" , "filename02.B" etc..
I tried:
B_folders="$(find /someparentdirectory -type d -name "*.B" | sed 's# (.*\)/.*#\1#'|uniq)"
A_folders="$(find "$B_folders" -type f -name "*.A")"
for FILE in "$A_folders" ; do
A="${file%.A}"
find "$FILE" -name "$A*.B" -exec rm -f {}\;
done
Started to get problems when the directories name contained spaces.
Any suggestions for the right way to do it?
EDIT:
My goal is to find in each subdirectory (may have spaces in its name), files in the form: "filename.A"
if such files exists:
check if "filename*.B" exists And remove it,
That is: remove: "filename01.B" , "filename02.B" etc..
In bash 4, it's simply
shopt -s globstar nullglob
for f in some_parent_directory/**/filename.A; do
rm -f "${f%.A}"*.B
done
If the space is the only issue you can modify the find inside the for as follows:
find "$FILE" -name "$A*.B" -print0 | xargs -0 rm
man find shows:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows
file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find output. This option corre-
sponds to the -0 option of xargs.
and xarg's manual
-0 Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literal-
ly). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or
backslashes. The GNU find -print0 option produces input suitable for this mode.

How can I process a list of files that includes spaces in its names in Unix?

I'm trying to list the files in a directory and do something to them in the Mac OS X prompt.
It should go like this: for f in $(ls -1); do echo $f; done
If I have files without spaces in their names (fileA.txt, fileB.txt), the echo works fine.
If the files include spaces in their names ("file A.txt", "file B.txt"), I get 4 strings (file, A.txt, file, B.txt).
I've tried quoting the listing command, but it only changed the problem.
If I do this: for f in $(ls -1); do echo $f; done
I get: file A.txt\nfile B.txt
(It displays correctly, but it is a single string and I need the 2 lines separated.
Step away from ls if at all possible. Use find from the findutils package.
find /target/path -type f -print0 | xargs -0 your_command_here
-print0 will cause find to output the names separated by NUL characters (ASCII zero). The -0 argument to xargs tells it to expect the arguments separated by NUL characters too, so everything will work just fine.
Replace /target/path with the path under which your files are located.
-type f will only locate files. Use -type d for directories, or omit altogether to get both.
Replace your_command_here with the command you'll use to process the file names. (Note: If you run this from a shell using echo for your_command_here you'll get everything on one line - don't get confused by that shell artifact, xargs will do the expected right thing anyway.)
Edit: Alternatively (or if you don't have xargs), you can use the much less efficient
find /target/path -type f -exec your_command_here \{\} \;
\{\} \; is the escape for {} ; which is the placeholder for the currently processed file. find will then invoke your_command_here with {} ; replaced by the file name, and since your_command_here will be launched by find and not by the shell the spaces won't matter.
The second version will be less efficient since find will launch a new process for each and every file found. xargs is smart enough to pipe the commands to a newly launched process if it can figure it's safe to do so. Prefer the xargs version if you have the choice.
for f in *; do echo "$f"; done
should do what you want. Why are you using ls instead of * ?
In general, dealing with spaces in shell is a PITA. Take a look at the $IFS variable, or better yet at Perl, Ruby, Python, etc.
Here's an answer using $IFS as discussed by derobert
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
You can pipe the arguments into read. For example, to cat all files in the directory:
ls -1 | while read FILENAME; do cat "$FILENAME"; done
This means you can still use ls, as you have in your question, or any other command that produces $IFS delimited output.
The while loop makes it much easier to do several things to the argument, and makes complex processing more readable in my opinion. A contrived example:
ls -1 | while read FILE
do
echo 1: "$FILE"
echo 2: "$FILE"
done
look --quoting-style option.
for instance, --quoting-style=c would produce :
$ ls --quoting-style=c
"file1" "file2" "dir one"
Check out the manpage for xargs:
it works like this:
ls -1 /tmp/*.jpeg | xargs rm

Resources