Trying to loop through all files in a directory, check them for the existence of a string, and add it if it doesn’ exist. This is what I have:
#!/bin/bash
FILES=*
for f in $FILES
do
echo "Processing $f file..."
if grep -Fxq '<?xml version="1.0" encoding="UTF-8"?>' $f
then
continue
else
echo '<?xml version="1.0" encoding="UTF-8"?>' | cat - $f > temp && mv temp $f
fi
done
… but the script stops after the first loop. Any ideas why?
A simpler solution would be to use the inplace edit option -i of the sed tool like below
sed -i '1{/^<?xml version="1.0" encoding="UTF-8"?>/!{
s/^/<?xml version="1.0" encoding="UTF-8"?>\n/}}' /path/to/files/*
What are we doing above
The inplace option -i with sed makes any change to the file written to the file.
By 1{} we are processing just the first line of the file
The /^<?xml version="1.0" encoding="UTF-8"?>/! part checks if the string is NOT(note the ! at the end) present in the beginning of the line.
If the above condition is not true we substitute the beginning of the line (^) with <?xml version="1.0" encoding="UTF-8"?>\n using
s/^/<?xml version="1.0" encoding="UTF-8"?>\n/
The rest is closing the curly brackets in the correct order :)
That said, in your original script I see variables like FILES. It is discouraged to use uppercase variables as user variables as they are reserved as environment variables and might lead to conflict. So use files instead.
Again doing
file=*
has the implication of [ word splitting ] and produce undesired results if you have non standard files that contain spaces or even new lines. What you could do is
files=( * ) # This put the files in an array
for file in "${files[#]}" # Double quoting the array prevents word splitting
do
# Do something with "$file" but why bother when you've a one-liner with sed? ;-)
done
Note: For sed manual visit [ here ]
I want to clear some things up about word splitting and filename expansion I saw here in the comments.
When using variable assignment, quoting Bash Reference Manual, only the following expansions are done: tilde expansion, parameter expansion, command substitution, arithmetic expansion. This means that there really is just an asterisk in your variable $files as there is no filename expansion taking place. So at this point you don't need to worry about newlines, spaces etc. because there are no actual files in your variable. You can see this with declare -p files.
This is the reason you don't have to quote when assigning to a variable.
var=$othervariable
is the same as:
var="$othervariable"
Now, when you use your variable $files in the for loop for f in $files (note that you cannot quote $files here because the filename expansion wouldn't take place) that variable gets expanded and undergoes word splitting. But the actual value is JUST the asterisk and word splitting won't do anything with the result! Quoting the manual again:
After word splitting, unless the -f option has been set (see The Set
Builtin), Bash scans each word for the characters ‘*’, ‘?’, and ‘[’.
If one of these characters appears, then the word is regarded as a
pattern, and replaced with an alphabetically sorted list of filenames
matching the pattern (see Pattern Matching).
Meaning of this is that filename expansion is done after variable expansion and word splitting. So the files expanded by the filename expansion won't be split by IFS! Therefore, the following code works just fine:
#!/usr/bin/env bash
files=*
for f in $files; do
echo "<<${f}>>"
done
and correctly outputs:
<<file with many spaces>>
<<filewith* weird characters[abc]>>
<<normalfile>>
A shorter version of this is obviously to use for f in * instead of the variable $files. You also definitely want to quote any usage of $f in your loop as that expansion really does undergo the word splitting.
This being said, your loop should function properly.
Related
In my folder I have following files:
roi_1_Precentral_L/
roi_1_Precentral_L_both.fig
roi_1_Precentral_L_left.fig
roi_1_Precentral_L_right.fig
roi_1_Precentral_L_slice.fig
roi_2_Precentral_R/
roi_2_Precentral_R_both.fig
...
roi_116_Vermis_10/
roi_116_Vermis_10_both.fig
roi_116_Vermis_10_left.fig
roi_116_Vermis_10_right.fig
roi_116_Vermis_10_slice.fig
I use following script to obtain a desired prefix of the filename for each of all 116 types:
for iroi in `seq 1 116`;
do
d=roi_${iroi}_*/
d2=${d:0:-1} # <-- THIS LINE IS IMPORTANT
echo $d2
done;
Desired output for iroi=1:
$ roi_1_Precentral_L
Actual output:
$ roi_1_Precentral_L roi_1_Precentral_L_both.fig roi_1_Precentral_L_left.fig roi_1_Precentral_L_right.fig roi_1_Precentral_L_slice.fig
How can I avoid shell expansion in the emphasized line of code to make desired output?
If you assign to an array, the glob will be expanded on the first line, not later as of the echo as was the case with your original code.
d=( "roi_${iroi}_"*/ )
d2=${d:0:-1} # Note that this only works with new bash. ${d%/} would be better.
echo "$d2"
If you expect multiple directories, "${d[#]%/}" will expand to the full list, with the trailing / removed from each:
d=( "roi_${iroi}_"*/ )
printf '%s\n' "${d[#]%/}"
With respect to avoiding unwanted expansions -- note that in the above, every expansion except for those on the right-hand side of a simple (string, not array) assignment is in double quotes. (Regular assignments implicitly inhibit string-splitting and glob expansion -- though it doesn't hurt to have quotes even then! This inhibition is why ${d:0:-1} was removing the / from the glob expression itself, not from its results).
Answer to Question
If you wanted, you could quote to avoid expansion of * in $d ...
d=roi_${iroi}_*/
d2="${d:0:-1}"
echo $d2
... but then you could directly write ...
d2="roi_${iroi}_*"
echo $d2
... and the output would still be the same as in your question.
Answer to Expected Output
You could do the expansion in an array and select the first array entry, then remove the / from that entry.
for iroi in {1..116}; do
d=(roi_"$iroi"_*/)
d2="${d[0]:0:-1}"
echo "$d2"
done
This matches only directories and prints the first directory without the trailing /.
I have been following the answers given in these questions
Shellscript Looping Through All Files in a Folder
How to iterate over files in a directory with Bash?
to write a bash script which goes over files inside a folder and processes them. So, here is the code I have:
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in "$INFOLDER$YEAR*.mdb";
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I am receiving this error: basename: extra operand.
I added echo $f and I realized that f contains all the filenames separated by space. But I expected to get one at a time. What could be the problem here?
You're running into problems with quoting. In the shell, double-quotes prevent word splitting and wildcard expansion; generally, you don't want these things to happen to variable's values, so you should double-quote variable references. But when you have something that should be word-split or wildcard-expanded, it cannot be double-quoted. In your for statement, you have the entire file pattern in double-quotes:
for f in "$INFOLDER$YEAR*.mdb";
...which prevents word-splitting and wildcard expansion on the variables' values (good) but also prevents it on the * which you need expanded (that's the point of the loop). So you need to quote selectively, with the variables inside quotes and the wildcard outside them:
for f in "$INFOLDER$YEAR"*.mdb;
And then inside the loop, you should double-quote the references to $f in case any filenames contain whitespace or wildcards (which are completely legal in filenames):
echo "$f"
absname="$INFOLDER$YEAR$(basename "$f")"
(Note: the double-quotes around the assignment to absname aren't actually needed -- the right side of an assignment is one of the few places in the shell where it's safe to skip them -- but IMO it's easier and safer to just double-quote all variable references and $( ) expressions than to try to keep track of where it's safe and where it's not.)
Just quote your shell variables if they are supposed to contain strings with spaces in between.
basename "$f"
Not doing so will lead to splitting of the string into separate characters (see WordSplitting in bash), thereby messing up the basename command which expects one string argument rather than multiple.
Also it would be a wise to include the * outside the double-quotes as shell globbing wouldn't work inside them (single or double-quote).
#!/bin/bash
# good practice to lower-case variable names to distinguish them from
# shell environment variables
year="2002/"
in_folder="/local/data/datasets/Convergence/"
for file in "${in_folder}${year}"*.mdb; do
# break the loop gracefully if no files are found
[ -e "$file" ] || continue
echo "$file"
# Worth noting here, the $file returns the name of the file
# with absolute path just as below. You don't need to
# construct in manually
absname=${in_folder}${year}$(basename "$file")
done
just remove "" from this line
for f in "$INFOLDER$YEAR*.mdb";
so it looks like this
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in $INFOLDER$YEAR*.mdb;
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I have written a script, which takes first and second parameter strings and the other parameters are files.The idea of the script is to replace the first parameter with the second in every line of every file .Here is my implementation ,however it does not change the content of the files ,but it prints correct information
first=$1
second=$2
shift 2
for i in $*; do
if [ -f $i]; then
sed -i -e 's/$first/$second/g' $i
fi
done
You used a single quote to enclose the sed command. Thus, the special meaning of the dollar sign (parameter expansion) is ignored and it is treated as a simple character.
Check out bash manual:
Enclosing characters in single quotes preserves the literal value of each character within the quotes.
... Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !.
You should replace them with double quotes:
sed -i -e "s/$first/$second/g" $i
Your script doesn't change the files because you are simply just printing to stdout, not to the file. They way you did it, you would need a new variable to store the new content word by word and then echo it to the original file with redirection (>).
But you can do this simply with sed, like this:
sed -i '' 's/original/new/g' file(s)
Explanation:
sed is a stream editor
-i '' means it will edit the current file and won't create any backup
s/original/new/g means substitute original word or regexp with new word or regexp. The g means global = substitute all occurencies, not just the first for every line
file(s) are all the files in which to perform the substitution. Can be * for all files in the working directory.
I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi
I was given a tip to use file globbing in stead of ls in Bash scripts, in my code I followed the instructions and replaced array=($(ls)) to:
function list_files() { for f in *; do [[ -e $f ]] || continue done }
array=($(list_files))
However the new function doen't return anything, am I doing something wrong here?
Simply write this:
array=(*)
Leaving aside that your "list_files" doesn't output anything, there are still other problems with your approach.
Unquoted command substitution (in your case "$(list_files)") will still be subject to "word splitting" and "pathname expansion" (see bash(1) "EXPANSION"), which means that if there are spaces in "list_files" output, they will be used to split it into array elements, and if there are pattern characters, they will be used to attempt to match and substitute the current directory file names as separate array elements.
OTOH, if you quote the command substitution with double quotes, then the whole output will be considered a single array element.