I was given a tip to use file globbing in stead of ls in Bash scripts, in my code I followed the instructions and replaced array=($(ls)) to:
function list_files() { for f in *; do [[ -e $f ]] || continue done }
array=($(list_files))
However the new function doen't return anything, am I doing something wrong here?
Simply write this:
array=(*)
Leaving aside that your "list_files" doesn't output anything, there are still other problems with your approach.
Unquoted command substitution (in your case "$(list_files)") will still be subject to "word splitting" and "pathname expansion" (see bash(1) "EXPANSION"), which means that if there are spaces in "list_files" output, they will be used to split it into array elements, and if there are pattern characters, they will be used to attempt to match and substitute the current directory file names as separate array elements.
OTOH, if you quote the command substitution with double quotes, then the whole output will be considered a single array element.
Related
Trying to loop through all files in a directory, check them for the existence of a string, and add it if it doesn’ exist. This is what I have:
#!/bin/bash
FILES=*
for f in $FILES
do
echo "Processing $f file..."
if grep -Fxq '<?xml version="1.0" encoding="UTF-8"?>' $f
then
continue
else
echo '<?xml version="1.0" encoding="UTF-8"?>' | cat - $f > temp && mv temp $f
fi
done
… but the script stops after the first loop. Any ideas why?
A simpler solution would be to use the inplace edit option -i of the sed tool like below
sed -i '1{/^<?xml version="1.0" encoding="UTF-8"?>/!{
s/^/<?xml version="1.0" encoding="UTF-8"?>\n/}}' /path/to/files/*
What are we doing above
The inplace option -i with sed makes any change to the file written to the file.
By 1{} we are processing just the first line of the file
The /^<?xml version="1.0" encoding="UTF-8"?>/! part checks if the string is NOT(note the ! at the end) present in the beginning of the line.
If the above condition is not true we substitute the beginning of the line (^) with <?xml version="1.0" encoding="UTF-8"?>\n using
s/^/<?xml version="1.0" encoding="UTF-8"?>\n/
The rest is closing the curly brackets in the correct order :)
That said, in your original script I see variables like FILES. It is discouraged to use uppercase variables as user variables as they are reserved as environment variables and might lead to conflict. So use files instead.
Again doing
file=*
has the implication of [ word splitting ] and produce undesired results if you have non standard files that contain spaces or even new lines. What you could do is
files=( * ) # This put the files in an array
for file in "${files[#]}" # Double quoting the array prevents word splitting
do
# Do something with "$file" but why bother when you've a one-liner with sed? ;-)
done
Note: For sed manual visit [ here ]
I want to clear some things up about word splitting and filename expansion I saw here in the comments.
When using variable assignment, quoting Bash Reference Manual, only the following expansions are done: tilde expansion, parameter expansion, command substitution, arithmetic expansion. This means that there really is just an asterisk in your variable $files as there is no filename expansion taking place. So at this point you don't need to worry about newlines, spaces etc. because there are no actual files in your variable. You can see this with declare -p files.
This is the reason you don't have to quote when assigning to a variable.
var=$othervariable
is the same as:
var="$othervariable"
Now, when you use your variable $files in the for loop for f in $files (note that you cannot quote $files here because the filename expansion wouldn't take place) that variable gets expanded and undergoes word splitting. But the actual value is JUST the asterisk and word splitting won't do anything with the result! Quoting the manual again:
After word splitting, unless the -f option has been set (see The Set
Builtin), Bash scans each word for the characters ‘*’, ‘?’, and ‘[’.
If one of these characters appears, then the word is regarded as a
pattern, and replaced with an alphabetically sorted list of filenames
matching the pattern (see Pattern Matching).
Meaning of this is that filename expansion is done after variable expansion and word splitting. So the files expanded by the filename expansion won't be split by IFS! Therefore, the following code works just fine:
#!/usr/bin/env bash
files=*
for f in $files; do
echo "<<${f}>>"
done
and correctly outputs:
<<file with many spaces>>
<<filewith* weird characters[abc]>>
<<normalfile>>
A shorter version of this is obviously to use for f in * instead of the variable $files. You also definitely want to quote any usage of $f in your loop as that expansion really does undergo the word splitting.
This being said, your loop should function properly.
I have been following the answers given in these questions
Shellscript Looping Through All Files in a Folder
How to iterate over files in a directory with Bash?
to write a bash script which goes over files inside a folder and processes them. So, here is the code I have:
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in "$INFOLDER$YEAR*.mdb";
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I am receiving this error: basename: extra operand.
I added echo $f and I realized that f contains all the filenames separated by space. But I expected to get one at a time. What could be the problem here?
You're running into problems with quoting. In the shell, double-quotes prevent word splitting and wildcard expansion; generally, you don't want these things to happen to variable's values, so you should double-quote variable references. But when you have something that should be word-split or wildcard-expanded, it cannot be double-quoted. In your for statement, you have the entire file pattern in double-quotes:
for f in "$INFOLDER$YEAR*.mdb";
...which prevents word-splitting and wildcard expansion on the variables' values (good) but also prevents it on the * which you need expanded (that's the point of the loop). So you need to quote selectively, with the variables inside quotes and the wildcard outside them:
for f in "$INFOLDER$YEAR"*.mdb;
And then inside the loop, you should double-quote the references to $f in case any filenames contain whitespace or wildcards (which are completely legal in filenames):
echo "$f"
absname="$INFOLDER$YEAR$(basename "$f")"
(Note: the double-quotes around the assignment to absname aren't actually needed -- the right side of an assignment is one of the few places in the shell where it's safe to skip them -- but IMO it's easier and safer to just double-quote all variable references and $( ) expressions than to try to keep track of where it's safe and where it's not.)
Just quote your shell variables if they are supposed to contain strings with spaces in between.
basename "$f"
Not doing so will lead to splitting of the string into separate characters (see WordSplitting in bash), thereby messing up the basename command which expects one string argument rather than multiple.
Also it would be a wise to include the * outside the double-quotes as shell globbing wouldn't work inside them (single or double-quote).
#!/bin/bash
# good practice to lower-case variable names to distinguish them from
# shell environment variables
year="2002/"
in_folder="/local/data/datasets/Convergence/"
for file in "${in_folder}${year}"*.mdb; do
# break the loop gracefully if no files are found
[ -e "$file" ] || continue
echo "$file"
# Worth noting here, the $file returns the name of the file
# with absolute path just as below. You don't need to
# construct in manually
absname=${in_folder}${year}$(basename "$file")
done
just remove "" from this line
for f in "$INFOLDER$YEAR*.mdb";
so it looks like this
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in $INFOLDER$YEAR*.mdb;
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I want to list all the files in a directory one line after another, for which I am using a sample shell script as follows
#!/bin/sh
MY_VAR="$(ls -1)"
echo "$MY_VAR"
This works out as expected, however if the same is executed using csh as follows:
#!/bin/csh
set MY_VAR = `ls -1`
echo $MY_VAR
it outputs all files in one single line, than printing one file per line.
Can any one explain why in csh ls -1 is not working as expected.
From man csh, emphasis mine:
Command substitution
Command substitution is indicated by a command enclosed in ``'. The
output from such a command is broken into separate words at blanks,
tabs and newlines, and null words are discarded. The output is vari‐
able and command substituted and put in place of the original string.
Command substitutions inside double quotes (`"') retain blanks and
tabs; only newlines force new words. The single final newline does not
force a new word in any case. It is thus possible for a command sub‐
stitution to yield only part of a word, even if the command outputs a
complete line.
By default, the shell since version 6.12 replaces all newline and car‐
riage return characters in the command by spaces. If this is switched
off by unsetting csubstnonl, newlines separate commands as usual.
The entries are assigned in a list; you can access a single entry with e.g. echo $MY_VAR[2].
This is different from the Bourne shell, which doesn't have the concept of a "list" and variables are always strings.
To print all the entries on a single line, use a foreach loop:
#!/bin/csh
set my_var = "`ls -1`"
foreach e ($my_var)
echo "$e"
end
Adding double quotes around the `ls -1` is recommended, as this will make sure it works correctly when you have filenames with a space (otherwise such files would show up as two words/list entries).
I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi
Can someone explain why the output of these two commands is different?
$ echo "NewLine1\nNewLine2\n"
NewLine1
NewLine2
<-- Note 2nd newline here
$ echo "$(echo "NewLine1\nNewLine2\n")"
NewLine1
NewLine2
$ <-- No second newline
Is there any good way that I can keep the new lines at the end of the output in "$( .... )" ? I've thought about just adding a dummy letter and removing it, but I'd quite like to understand why those new lines are going away.
Because that's what POSIX specifies and has always been like that in Bourne shells:
2.6.3 Command Substitution
Command substitution allows the output of a command to be substituted
in place of the command name itself. Command substitution shall occur
when the command is enclosed as follows:
$(command)
or (backquoted version):
`command`
The shell shall expand the command substitution by executing command
in a subshell environment (see Shell Execution Environment) and
replacing the command substitution (the text of command plus the
enclosing "$()" or backquotes) with the standard output of the
command, removing sequences of one or more <newline> characters at the
end of the substitution. Embedded <newline> characters before the end
of the output shall not be removed; however, they may be treated as
field delimiters and eliminated during field splitting, depending on
the value of IFS and quoting that is in effect. If the output contains
any null bytes, the behavior is unspecified.
One way to keep the final newline(s) would be
VAR="$(command; echo x)" # Append x to keep newline(s).
VAR=${VAR%x} # Chop x.
Vis.:
$ x="$(whoami; echo x)" ; printf '<%s>\n' "$x" "${x%x}"
<user
x>
<user
>
But why remove trailing newlines? Because more often than not you want it that way. I'm also programming in perl and I can't count the number of times where I read a line or variable and then need to chop the newline:
while (defined ($string = <>)) {
chop $string;
frobnitz($string);
}
command substitution removes every trailing newline.
It makes sense to remove one. For instance:
basename foo/bar
outputs bar\n. In:
var=$(basename foo/bar)
you want $var to contain bar, not bar\n.
However in
var=$(basename $'foo/bar\n')
You would like $var to contain bar\n (after all, newline is as valid a character as any in a file name on Unix). But all shells remove every trailing newline character. That misfeature was in the original Bourne shell and even rc which has fixed most of Bourne's flaws has not fixed that one. (though rc has the ``(){cmd} syntax to not strip any newline character).
In POSIX shells, to work around the issue, you can do:
var=$(basename -- "$file"; echo .)
var=${var%??}
Though you're then losing the exit status of basename. Which you can fix with:
var=$(basename -- "$file" && echo .) && var=${var%??}
${var%??} is to remove the last two characters. The first one is the . that we added above, the second is the one newline character added by basename, we're not removing any more as command substitution would do as the other newline characters, if any, would be part of the filename we want to get the base of, so we do want them.
In the Bourne shell which doesn't have the ${var%x} operator, you had to go a long and convoluted way to work around it.
If the newlines were not removed, then constructs like:
x="$(pwd)/filename"
would not work usefully, but the people who wrote Unix preferred useful behaviour.
Once, briefly, a very long time ago (like 1983, maybe 1984), I suffered from a shell update on a particular variant of Unix that didn't remove the trailing newline. It broke scripts all over the place. It was fixed very quickly.