BASH variables, commands affected by numeric-numbered file and folder names - bash

Issue
I have been experiencing issues with Linux commands run in folders that contain numerically numbered files and folders; e.g., files sequentially numbered 1, 2, 3 ...
For example, if I am in a folder that contains a file or folder with a numeric name that appears in my command, the output from that command output might be truncated.
Here are some examples:
$ ls -l
total 8
drwxr-xr-x 2 victoria victoria 4096 May 7 18:34 1
drwxr-xr-x 2 victoria victoria 4096 May 7 18:14 2
-rw-r--r-- 1 victoria victoria 0 May 7 18:34 3
## fail
$ a="[CPT1A] A Selective"; echo $a
1 A Selective
$ b="[CPT2A] A Selective"; echo $b
2 A Selective
$ c="[CPT3A] A Selective"; echo $c
2 A Selective
...
## pass
$ d="[CPT4A] A Selective"; echo $d
[CPT4A] A Selective
Update/solution
... per accepted answer: quote the BASH variable, when used.
$ a="[CPT1A] A Selective"; echo $a
1 A Selective
$ a="[CPT1A] A Selective"; echo "$a"
[CPT1A] A Selective

The problem is that you aren't quoting the variable when you use it -- that is, you're using echo $a instead of echo "$a". When a variable is referenced without quotes, it gets split into words (so "[CPT1A] A Selective" becomes "[CPT1A]" "A" "Selective"), and then each of those words that contains anything that looks like a filename wildcard gets expanded into a list of matching filenames.
Square-bracket expressions like [CPT1A] are actually valid wildcard expressions that match any single character within them, so if there are files named "A", "C", "P", "T", or "1", it would expand to the matching names. If there aren't any, the wildcard expression just gets passed through intact.
Solution: double-quote variable references to avoid surprises like this. The same goes for command substitutions with $( ) (or backticks, but don't use those). There are a few places where it's safe to leave them off, like in a direct assignment, but IMO it's safer to just use them everywhere than to try to keep track of the exceptions. For example, a=$b is ok, but so is a="$b". On the other hand, export a=$b might or might not work (depending on which shell you're using), but export a="$b" will work.
BTW, shellcheck.net is good at pointing these out (along with some other common mistakes).

Related

Why is bash array-splitting this line incorrectly?

In the below example, why do "[ 1]" and "[20]" split fine, but "[10000]" does not?
It appears to be converting the 10000 to 1 and dropping the brackets?
#!/bin/bash
var="[ 1]"
vararray=($var)
echo "var=${var}"
echo "vararray[0]=${vararray[0]}"
var="[20]"
vararray=($var)
echo "var=${var}"
echo "vararray[0]=${vararray[0]}"
var="[10000]"
vararray=($var)
echo "var=${var}"
echo "vararray[0]=${vararray[0]}"
Results...
$ ./bashtest.sh
var=[ 1]
ararray[0]=[
var=[20]
vararray[0]=[20]
var=[10000]
vararray[0]=1 << what?
Presume that you have a file named 1 in your current directory. (This often happens unintentionally, f/e if someone wants to run 2>&1 but runs 2>1 instead by mistake).
[20] does not glob to 1 -- it globs only to 2 or 0.
[ 1], when run with the default IFS value, is word-split into [ and 1], neither of which is a valid glob, so expanding it unquoted doesn't perform any globbing operation at all.
However, [10000] -- just like [01] -- will glob to either 0 or 1, if a file by any of those names exists. In your example scenario, you clearly had a file named 1 in your current working directory.
Don't use unquoted expansion to split strings into arrays.
Instead, use read -r -a vararray <<<"$var", optionally after explicitly setting IFS to contain only the characters you want to split on.

Line break mystery when using quotes around variable

I wrote a simple script, the whole purpose of which is to simply create a link between two different cygwin directories. It should be very simple, but since $LOCALAPPDATA can contain spaces, it wound up being far more difficult than I originally envisioned.
Here is the script:
#!/bin/sh
echo "Unlinking any existing data link..."
unlink /usr/local/astrometry/shared_data 2>/dev/null
echo "Generating link between astrometry shared_data..."
my_dir=`cygpath -u $LOCALAPPDATA/cygwin_ansvr/usr/share/astrometry/data`
echo $my_dir
my_test=`echo $my_dir`
echo $my_test
# Note here, if I use $my_dir rather than $my_test, it introduces a LINE BREAK!
ln -s "$my_test" /usr/local/astrometry/shared_data
exit 0
So, if I run the above script, here is the output:
Unlinking any existing data link...
Generating link between astrometry shared_data...
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
And the link is formed as such:
lrwxrwxrwx 1 Dwight Towler None 84 Sep 22 02:56 shared_data -> /cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
drwx------+ 1 Dwight Towler None 0 Sep 22 02:56 .
The above is the desired link (no line break).
Now, if I replace $my_test with $my_dir in the ln -s call, I instead wind up with this:
lrwxrwxrwx 1 Dwight Towler None 84 Sep 22 02:55 shared_data -> /cygdrive/c/Users/Dwight
Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
drwx------+ 1 Dwight Towler None 0 Sep 22 02:55 .
Notice the line break? I cannot figure out where that is coming from, especially since I put quotes around the variables in the ln -s call.
It is especially puzzling since the output of the echo command seems to indicate that both variables have the same content:
echo $my_dir
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
echo $my_test
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
Any ideas on what is going on?
That difference in values between 'my_dir' and 'my_test' is the result of using command substitution (my_test=echo $my_dir) to copy the my_dir to my_test. This construct will result in any consecutive white spaces (newline included) replaced with a single space.
As per man page, this command substitution, will result in the value of 'my_dir' being split by the value of IFS (by default - white spaces - spaces, tabs and new line) into words, and than the individual words are printed with a single space between them. Assuming original string contained new lines (or multiple spaces between words) - those will all get converted into a single space.
Consider the following assignment, which will result in embedded newline (between the 'first' and 'second). Using the unquoted "echo" will replace the newline with a space.
A="first
second"
echo "NO QUOTE"
echo $A
echo "QUOTED"
echo "$A"
echo "----"
The output will be
NO QUOTE
first second
QUOTED
first
second
----
Bottom line, the new line is presented in the original string ('my_dir'), and is replaced by space in the echo statement, because of the shell word/command substitution.

Read range of numbers into a for loop

So, I am building a bash script which iterates through folders named by numbers from 1 to 9. The script depends on getting the folder names by user input. My intention is to use a for loop using read input to get a folder name or a range of folder names and then do some stuff.
Example:
Let's assume I want to make a backup with rsync -a of a certain range of folders. Usually I would do:
for p in {1..7}; do
rsync -a $p/* backup.$p
done
The above would recursively backup all content in the directories 1 2 3 4 5 6 and 7 and put them into folders named as 'backup.{index-number}'. It wouldn't catch folders/files with a leading . but that is not important right now.
Now I have a similar loop in an interactive bash script. I am using select and case statements for this task. One of the options in case is this loop and it shall somehow get a range of numbers from user input. This now becomes a problem.
Problem:
If I use read to get the range then it fails when using {1..7} as input. The input is taken literally and the output is just:
{1..7}
I really would like to know why this happens. Let me use a more descriptive example with a simple echo command.
var={1..7} # fails and just outputs {1..7}
for p in $var; do echo $p;done
read var # Same result as above. Just outputs {1..7}
for p in $var; do echo $p;done
for p in {1..7}; do echo $p;done # works fine and outputs the numbers 1-7 seperated with a newline.
I've found a workaround by storing the numbers in an array. The user can then input folder names seperated by a space character like this: 1 2 3 4 5 6 7
read -a var # In this case the output is similar to the 3rd loop above
for p in ${var[#]}; do echo $p; done
This could be a way to go but when backing up 40 folders ranging from 1-40 then adding all the numbers one-by-one completely makes my script redundant. One could find a solution to one of the millennium problems in the same time.
Is there any way to read a range of numbers like {1..9} or could there be another way to get input from terminal into the script so I can iterate through the range within a for-loop?
This sounds like a question for google but I am obviously using the wrong patterns to get a useful answer. Most of similar looking issues on SO refer to brace and parameter expansion issues but this is not exactly the problem I have. However, to me it feels like the answer to this problem is going in a similar direction. I fail to understand why when a for-loop for assigning {1..7} to a variable works but doing the same like var={1..7} doesn't. Plz help -.-
EDIT: My bash version:
$ echo $BASH_VERSION
4.2.25(1)-release
EDIT2: The versatility of a brace expansion is very important to me. A possible solution should include the ability to define as many ranges as possible. Like I would like to be able to choose between backing up just 1 folder or a fixed range between f.ex 4-22 and even multiple options like folders 1,2,5,6-7
Brace expansion is not performed on the right-hand side of a variable, or on parameter expansion. Use a C-style for loop, with the user inputing the upper end of the range if necessary.
read upper
for ((i=1; i<=$upper; i++)); do
To input both a lower and upper bound separated by whitespace
read lower upper
for (i=$lower; i <= $upper; i++)); do
For an arbitrary set of values, just push the burden to the user to generate the appropriate list; don't try to implement your own parser to process something like 1,2,20-22:
while read p; do
rsync -a $p/* backup.$p
done
The input is one value per line, such as
1
2
20
21
22
Even if the user is using the shell, they can call your script with something like
printf '%s\n' 1 2 20..22 | backup.sh
It's easier for the user to generate the list than it is for you to safely parse a string describing the list.
The evil eval
$ var={1..7}
$ for i in $(eval echo $var); do echo $i; done
this also works,
$ var="1 2 {5..9}"
$ for i in $(eval echo $var); do echo $i; done
1
2
5
6
7
8
9
evil eval was a joke, that is, as long as you know what you're evaluating.
Or, with awk
$ echo "1 2 5-9 22-25" |
awk -v RS=' ' '/-/{split($0,a,"-"); while(a[1]<=a[2]) print a[1]++; next}1'
1
2
5
6
7
8
9
22
23
24
25

Using bash extended globs file masks in variables in find and loop

I am trying to match files using a pre-set file mask in a variable.
mat $ ls -lQ /tmp/Mat
total 0
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:32 "testfile1"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:33 "testfile1.gz"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:33 "testfile2"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:33 "testfile2.gz"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:38 "testfile2.gz#id=142"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:34 "testfile2test"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:34 "testfile2test.gz"
mat $ file_mask=*file2*
mat $ ls /tmp/Mat/$file_mask?(.gz)
testfile2.gz testfile2test.gz
I am trying to get: testfile2 testfile2.gz testfile2test testfile2.gz
To summarize the outcome:
tl;dr
The OP experienced unexpected behavior due to a bug in 3.x versions of bash relating to certain extended glob patterns, i.e., with shopt -s extglob in effect.
However, even without the bug, the code doesn't work as intended, because the globbing pattern *file2*?(.gz) is effectively the same as *file* - which would match files with any suffix, not just .gz.
To only match names containing file2 that either have no suffix at all, or, if they have [at least] one, with a [last] suffix of .gz, use *([^.])file2*([^.])?(*.gz) (this works fine in bash 3.x too). Note that, as with the OP's patterns, this requires extended globbing to be activated with shopt -s extglob.
The assumption is that the OP's intent is as follows:
Match only names containing file2 [before the 1st suffix, if any] that either have no suffix at all, or, if they have [at least] one, with a [last] suffix of .gz
E.g., match files file2 file2-a, some-file2, file2.gz, file2-a.gz, file2.tar.gz, but not file2.no (because it has a [last] suffix that is not '.gz').
While there is a bash 3.x bug that affects patterns such as *?(...) - see below - there's no good reason to use *?(...), because it is effectively the same as just *, given that * matches any sequence of characters, including suffixes.
The solution below is not affected by the bug.
You cannot use * for matching only the root of a filename (the part before the [first] suffix), because * matches any string, whether part of a suffix or not.
Thus, extended glob *([^.]) must be used, which matches a string of any length containing any character except . (a period).
Also, to account for the fact that a filename may have multiple suffixes, the optional .gz-matching part of the pattern should be ?(*.gz).
To put it together:
Note: shopt -s extglob must be in effect for the commands to work.
# Create test files; note the addition of "testfile2.tar.gz", which SHOULD
# match, and "testfile2.no", which should NOT match:
$ touch "testfile1" "testfile1.gz" "testfile2" "testfile2.gz" "testfile2.gz#id=142" "testfile2test" "testfile2test.gz" "testfile2.tar.gz" "testfile2.no"
$ ls -1 *([^.])file2*([^.])?(*.gz)
testfile2
testfile2.gz
testfile2.tar.gz
testfile2test
testfile2test.gz
# The same, using a variable:
$ file_mask=*([^.])file2*([^.]) # NO globbing here (no globbing in *assignments*).
$ file_mask+=?(*.gz) # Extend the pattern; still no globbing.
$ ls -1 $file_mask # Globbing happens here, due to unquoted use of the variable.
# Same output as before.
# Using a loop should work equally:
for f in *([^.])file2*([^.])?(*.gz); do echo "$f"; done
# Same output as before.
# Loop with a variable:
$ file_mask=*([^.])file2*([^.])
$ file_mask+=?(*.gz)
$ for f in $file_mask; do echo "$f"; done
# Same output as before.
Obscure extended-globbing bug in bash 3.x:
Note that the bug is unrelated to whether or not variables are used.
I don't know in what version the bug was fixed, but it's not present in 4.3.30, for instance.
In short, *?(...) mistakenly acts as if *+(...) had been specified.
In other words: independent simple pattern * followed by extended pattern ?(...) (match zero or 1 ... instance) effectively behaves like * followed by +(...) (match 1 or more ... instances).
Demonstration, observed in bash 3.2.57 (the current version on OSX 10.10.2; the OP uses 3.2.25):
$ touch f f.gz # create test files
$ ls -1 f?(.gz) # OK: finds files with basename root 'f', optionally suffixed with '.gz'
f
f.gz
# Now extend the glob with `*` after the basename root.
# This, in fact, is logically equivalent to `f*` and should
# match *all files starting with 'f'*.
$ ls -1 f*?(.gz)
f.gz
# ^ BUG: only matches the suffixed file.

"for loop" and $() combination limitation

ok, i'm working on a different kinda of script but the problem comes down to something like this: assume the following "for loop":
for i in $(ls -l); do echo $i; done
the problem is that "for loop" separate values by space, so each "i" equals to each word separated by space in 'ls -l'. Hence the output is something like this:
total
24
drwxrwxr-x.
2
james
james
4096
Oct
26
16:56
bg
.
.
.
but I want throughout each irritation variable "i" be equal to ENTIRE line of 'ls -a' instead of each word. In other word "i" be equal to entire line
"drwxrwxr-x. 2 james james 4096 Oct 26 16:56 bg"
instead of irritating through each word. I've tried many workarounds, none of them has worked and kinda freaks me out.
Is there a way to tell "for loop" to separate by new line instead of space
P.S. The above example is just for illustration (you might argue that it's a bit pointless) but my problem is something similar to that.
Instead you can
ls -l | while IFS= read -r l ; do echo "This is it: $l" ; done
or do
IFS=\\n
before running your for, but I'd avoid that due to possible side effects.

Resources