Bug in echo command in linux shell? [duplicate] - shell

This question already has answers here:
bash shell gives weird result when directory exists vs not
(2 answers)
Closed 4 years ago.
There is weird thing i noticed today
If i have file by name i or o or e
and echo any string with any of the above character within square bracet - [] then it always prints only the character
$ touch i e o
$ ls -lrh
total 0
-rw-r--r-- 1 root root 0 May 23 08:24 o
-rw-r--r-- 1 root root 0 May 23 08:24 i
-rw-r--r-- 1 root root 0 May 23 08:24 e
$ echo [offline]
e i o
$ echo [online]
e i o
$ echo [error]
e o
$ echo [soap]
o
and if i remove the file everything works fine
$ rm -f e i o
$ ls
$ echo [offline]
[offline]
$ echo [online]
[online]
$ echo [error]
[error]
$ echo [soap]
[soap]
So what is the relation between echo and these file names ?

The shell performs pathname expansion on the command line arguments. Pathname expansion looks at each unquoted argument in turn and tries to replace it with a list of matching filenames. For this purpose the following wildcards apply:
* means 0 or more characters, any characters;
? means 1 character, any character;
[<chars>] means 1 character, one of the given <chars>.
If one or more file names match, the command line argument is replaced with the list of matching file names. If no file names match, the command line argument is left as is.
So in your case:
[offline] is an unquoted command line argument, which
Includes the wildcard [...], and
The files e, i and o match the wildcard, so
The shell replaces the argument with the list of matching file names.
Morality: Always but always quote the arguments which you don't want the shell to expand. Always say echo '[offline]', never say echo [offline].

Related

Line break mystery when using quotes around variable

I wrote a simple script, the whole purpose of which is to simply create a link between two different cygwin directories. It should be very simple, but since $LOCALAPPDATA can contain spaces, it wound up being far more difficult than I originally envisioned.
Here is the script:
#!/bin/sh
echo "Unlinking any existing data link..."
unlink /usr/local/astrometry/shared_data 2>/dev/null
echo "Generating link between astrometry shared_data..."
my_dir=`cygpath -u $LOCALAPPDATA/cygwin_ansvr/usr/share/astrometry/data`
echo $my_dir
my_test=`echo $my_dir`
echo $my_test
# Note here, if I use $my_dir rather than $my_test, it introduces a LINE BREAK!
ln -s "$my_test" /usr/local/astrometry/shared_data
exit 0
So, if I run the above script, here is the output:
Unlinking any existing data link...
Generating link between astrometry shared_data...
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
And the link is formed as such:
lrwxrwxrwx 1 Dwight Towler None 84 Sep 22 02:56 shared_data -> /cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
drwx------+ 1 Dwight Towler None 0 Sep 22 02:56 .
The above is the desired link (no line break).
Now, if I replace $my_test with $my_dir in the ln -s call, I instead wind up with this:
lrwxrwxrwx 1 Dwight Towler None 84 Sep 22 02:55 shared_data -> /cygdrive/c/Users/Dwight
Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
drwx------+ 1 Dwight Towler None 0 Sep 22 02:55 .
Notice the line break? I cannot figure out where that is coming from, especially since I put quotes around the variables in the ln -s call.
It is especially puzzling since the output of the echo command seems to indicate that both variables have the same content:
echo $my_dir
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
echo $my_test
/cygdrive/c/Users/Dwight Towler/AppData/Local/cygwin_ansvr/usr/share/astrometry/data
Any ideas on what is going on?
That difference in values between 'my_dir' and 'my_test' is the result of using command substitution (my_test=echo $my_dir) to copy the my_dir to my_test. This construct will result in any consecutive white spaces (newline included) replaced with a single space.
As per man page, this command substitution, will result in the value of 'my_dir' being split by the value of IFS (by default - white spaces - spaces, tabs and new line) into words, and than the individual words are printed with a single space between them. Assuming original string contained new lines (or multiple spaces between words) - those will all get converted into a single space.
Consider the following assignment, which will result in embedded newline (between the 'first' and 'second). Using the unquoted "echo" will replace the newline with a space.
A="first
second"
echo "NO QUOTE"
echo $A
echo "QUOTED"
echo "$A"
echo "----"
The output will be
NO QUOTE
first second
QUOTED
first
second
----
Bottom line, the new line is presented in the original string ('my_dir'), and is replaced by space in the echo statement, because of the shell word/command substitution.

BASH variables, commands affected by numeric-numbered file and folder names

Issue
I have been experiencing issues with Linux commands run in folders that contain numerically numbered files and folders; e.g., files sequentially numbered 1, 2, 3 ...
For example, if I am in a folder that contains a file or folder with a numeric name that appears in my command, the output from that command output might be truncated.
Here are some examples:
$ ls -l
total 8
drwxr-xr-x 2 victoria victoria 4096 May 7 18:34 1
drwxr-xr-x 2 victoria victoria 4096 May 7 18:14 2
-rw-r--r-- 1 victoria victoria 0 May 7 18:34 3
## fail
$ a="[CPT1A] A Selective"; echo $a
1 A Selective
$ b="[CPT2A] A Selective"; echo $b
2 A Selective
$ c="[CPT3A] A Selective"; echo $c
2 A Selective
...
## pass
$ d="[CPT4A] A Selective"; echo $d
[CPT4A] A Selective
Update/solution
... per accepted answer: quote the BASH variable, when used.
$ a="[CPT1A] A Selective"; echo $a
1 A Selective
$ a="[CPT1A] A Selective"; echo "$a"
[CPT1A] A Selective
The problem is that you aren't quoting the variable when you use it -- that is, you're using echo $a instead of echo "$a". When a variable is referenced without quotes, it gets split into words (so "[CPT1A] A Selective" becomes "[CPT1A]" "A" "Selective"), and then each of those words that contains anything that looks like a filename wildcard gets expanded into a list of matching filenames.
Square-bracket expressions like [CPT1A] are actually valid wildcard expressions that match any single character within them, so if there are files named "A", "C", "P", "T", or "1", it would expand to the matching names. If there aren't any, the wildcard expression just gets passed through intact.
Solution: double-quote variable references to avoid surprises like this. The same goes for command substitutions with $( ) (or backticks, but don't use those). There are a few places where it's safe to leave them off, like in a direct assignment, but IMO it's safer to just use them everywhere than to try to keep track of the exceptions. For example, a=$b is ok, but so is a="$b". On the other hand, export a=$b might or might not work (depending on which shell you're using), but export a="$b" will work.
BTW, shellcheck.net is good at pointing these out (along with some other common mistakes).

Extracting a (variable) substring from filenames

I have a large number of files with filenames of the format
OUTPUT_11_0.175
I want to extract the two numbers, I managed to get the second number with the following:
for file in ./dir/*; do
phi=${file##*_}
echo "$phi"
done
To get the other number 11 in this case, I tried
a=${file#*_}
but this returns everything to the left of the first underscore (the directory contains an underscore) - is there some way to convince bash to go to the read 'between' the two underscores and return '11'?
$ IFS=_ read -a foo <<< "OUTPUT_11_0.175"
$ echo "${foo[0]}"
OUTPUT
$ echo "${foo[1]}"
11
$ echo "${foo[2]}"
0.175

Using bash extended globs file masks in variables in find and loop

I am trying to match files using a pre-set file mask in a variable.
mat $ ls -lQ /tmp/Mat
total 0
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:32 "testfile1"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:33 "testfile1.gz"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:33 "testfile2"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:33 "testfile2.gz"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:38 "testfile2.gz#id=142"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:34 "testfile2test"
-rw-rw-r-- 1 Mat Mat 0 Mar 3 14:34 "testfile2test.gz"
mat $ file_mask=*file2*
mat $ ls /tmp/Mat/$file_mask?(.gz)
testfile2.gz testfile2test.gz
I am trying to get: testfile2 testfile2.gz testfile2test testfile2.gz
To summarize the outcome:
tl;dr
The OP experienced unexpected behavior due to a bug in 3.x versions of bash relating to certain extended glob patterns, i.e., with shopt -s extglob in effect.
However, even without the bug, the code doesn't work as intended, because the globbing pattern *file2*?(.gz) is effectively the same as *file* - which would match files with any suffix, not just .gz.
To only match names containing file2 that either have no suffix at all, or, if they have [at least] one, with a [last] suffix of .gz, use *([^.])file2*([^.])?(*.gz) (this works fine in bash 3.x too). Note that, as with the OP's patterns, this requires extended globbing to be activated with shopt -s extglob.
The assumption is that the OP's intent is as follows:
Match only names containing file2 [before the 1st suffix, if any] that either have no suffix at all, or, if they have [at least] one, with a [last] suffix of .gz
E.g., match files file2 file2-a, some-file2, file2.gz, file2-a.gz, file2.tar.gz, but not file2.no (because it has a [last] suffix that is not '.gz').
While there is a bash 3.x bug that affects patterns such as *?(...) - see below - there's no good reason to use *?(...), because it is effectively the same as just *, given that * matches any sequence of characters, including suffixes.
The solution below is not affected by the bug.
You cannot use * for matching only the root of a filename (the part before the [first] suffix), because * matches any string, whether part of a suffix or not.
Thus, extended glob *([^.]) must be used, which matches a string of any length containing any character except . (a period).
Also, to account for the fact that a filename may have multiple suffixes, the optional .gz-matching part of the pattern should be ?(*.gz).
To put it together:
Note: shopt -s extglob must be in effect for the commands to work.
# Create test files; note the addition of "testfile2.tar.gz", which SHOULD
# match, and "testfile2.no", which should NOT match:
$ touch "testfile1" "testfile1.gz" "testfile2" "testfile2.gz" "testfile2.gz#id=142" "testfile2test" "testfile2test.gz" "testfile2.tar.gz" "testfile2.no"
$ ls -1 *([^.])file2*([^.])?(*.gz)
testfile2
testfile2.gz
testfile2.tar.gz
testfile2test
testfile2test.gz
# The same, using a variable:
$ file_mask=*([^.])file2*([^.]) # NO globbing here (no globbing in *assignments*).
$ file_mask+=?(*.gz) # Extend the pattern; still no globbing.
$ ls -1 $file_mask # Globbing happens here, due to unquoted use of the variable.
# Same output as before.
# Using a loop should work equally:
for f in *([^.])file2*([^.])?(*.gz); do echo "$f"; done
# Same output as before.
# Loop with a variable:
$ file_mask=*([^.])file2*([^.])
$ file_mask+=?(*.gz)
$ for f in $file_mask; do echo "$f"; done
# Same output as before.
Obscure extended-globbing bug in bash 3.x:
Note that the bug is unrelated to whether or not variables are used.
I don't know in what version the bug was fixed, but it's not present in 4.3.30, for instance.
In short, *?(...) mistakenly acts as if *+(...) had been specified.
In other words: independent simple pattern * followed by extended pattern ?(...) (match zero or 1 ... instance) effectively behaves like * followed by +(...) (match 1 or more ... instances).
Demonstration, observed in bash 3.2.57 (the current version on OSX 10.10.2; the OP uses 3.2.25):
$ touch f f.gz # create test files
$ ls -1 f?(.gz) # OK: finds files with basename root 'f', optionally suffixed with '.gz'
f
f.gz
# Now extend the glob with `*` after the basename root.
# This, in fact, is logically equivalent to `f*` and should
# match *all files starting with 'f'*.
$ ls -1 f*?(.gz)
f.gz
# ^ BUG: only matches the suffixed file.

My directory names contain a number. How can I find the largest number?

Within a directory, I have sub-directories, whose names are foo$i (where $i is an integer). How can I find the largest value of $i? Thank you.
My question was marked down for "not showing any research effort, being unclear or not useful" but that's not true. Previously, I had asked this question How can I delete the directory with the highest number name? but, though I could use the solution, I didn't understand it well enough to be able to modify it. I have read somewhere in Stack Overflow that answers are not meant to be tutorials. I appreciate that. That being said, it is not always easy to figure out how to modify somebody else's solution to solve your own problem. I also thought that, rather than go through what I had tried, because it would be long-winded, I thought I would just ask a simple, clear question.
The thing is, I don't understand the answers I've been given to this question and now I have another, related question. I will try to explain it clearly and hope that I'm not marked down again!
Within a directory, I have sub-directories, whose names are integers. How can I find the largest value integer, assign it a value and use it in a GNUplot script?
So far, I can find the integer value by using the code from the solution to my previous question:
ls -pq | grep '^[0-9]*/$' | sort -n | tail -n1
Perhaps this code is too elaborate for my new problem but it works.
Let's say this integer has value $INT.
Now I want to assign this value to a line of GNUplot code:
path="path/to/directory/$INT/file_name"
Please can you tell me how to assign the value of my largest directory to a variable and pass it to the GNUplot script? Thank you.
$ ls -al
total 20
drwxr-xr-x 5 myname myname 4096 Oct 13 08:25 .
drwxrwxrwt 14 root root 4096 Oct 13 08:25 ..
drwxr-xr-x 2 myname myname 4096 Oct 13 08:25 foo1
drwxr-xr-x 2 myname myname 4096 Oct 13 08:25 foo12
drwxr-xr-x 2 myname myname 4096 Oct 13 08:25 foo2
$ find . -name "a*" -type d| sed 's/.*foo//g'| sort -n | tail -n1
12
If above isn't OK to you, please let me know.
You can use the -v option for ls to sort the directories by "version strings". The last file shown is the one with the highest number.
shopt -s extglob
ls -v foo+([0-9])
It's not recommended to parse the output of ls in a script, though. If your dir names are more complicated, it can break. You can just accumulate the max in a loop:
max=0
shopt -s extglob
for f in foo+([0-9]) ; do
n=${f#foo};
if (( n > max )) ; then
max=$n
fi
done
echo "foo$max"
Whatever you do, be careful. In the moment you report the highest number, there can already be another subdir with a higher number!
Assuming you have a directory: Test/ and some subdirectories with an index (e.g. run number):
Run1
Run2
Run3
Run7
Run07
Run007
and in each subdirectory the identical filename, e.g. Data.dat, but not necessarily identical content:
1 2.0
2 5.0
3 3.0
The following script will extract the indices and return the maximum. In this specially constructed example, the maximum is 7. So, depending on the string creation for the variable myLatestFile via sprintf() you can decide whether you want to have the file, e.g. from directory 7 (%d) or 007 (%03d) being plotted.
Script: (works for gnuplot>=5.0.0 under Windows and maybe with earlier versions under Linux)
### plot a file from the subdirectory with the largest index
reset
DIR = 'Test/'
FILE = "Data.dat"
SUBDIRS = system(sprintf('dir /b "%s"',DIR)) # Windows, gnuplot>=5.0.0
# SUBDIRS = system(sprintf('ls %s',DIR)) # Linux/MacOS
mySubDirPrefix = "Run"
getIdx(s) = int(substr(s,strlen(mySubDirPrefix)+1,strlen(s)))
maxIdx = 0
do for [SUBDIR in SUBDIRS] {
idx = getIdx(SUBDIR)
maxIdx = idx>maxIdx ? maxIdx=idx : maxIdx
}
myLatestFile = sprintf("%s%s%d/%s",DIR,mySubDirPrefix,maxIdx,FILE)
plot myLatestFile u 1:2 w lp pt 7 ti sprintf("Highest directory index: %d",maxIdx)
### end of script
Result:
Workaround for gnuplot>=4.6.0 and <5.0.0 under Windows:
The use of system() as a function, i.e. FILES = system("dir /b") doesn't work with gnuplot>=4.x under Windows.
You will get a warning:
warning: system evaluation not supported by MS-Windows 32 bit
and the variable SUBDIRS will be an empty string.
As a retro-workaround you can use a temporary file:
# workaround for gnuplot>=4.6
TEMP = "dirs.txt"
system sprintf(sprintf('dir /b "%s" > %s',DIR,TEMP))
SUBDIRS = ''
stats TEMP u (SUBDIRS = SUBDIRS.' '.strcol(1)) nooutput # append all subdirs into a string

Resources