Bash: Nested variable expansion - bash

How can I nest operations in bash? e.g I know that
$(basename $var)
will give me just the final part of the path and
${name%.*}
gives me everything before the extension.
How do I combine these two calls, I want to do something like:
${$(basename $var)%.*}

As #sid-m 's answer states, you need to change the order of the two expansions because one of them (the % stuff) can only be applied to variables (by giving their name):
echo "$(basename "${var%.*}")"
Other things to mention:
You should use double quotes around every expansion, otherwise you run into trouble if you have spaces in the variable values. I already did that in my answer.
In case you know or expect a specific file extension, basename can strip that off for you as well: basename "$var" .txt (This will print foo for foo.txt in $var.)

You can do it like
echo $(basename ${var%.*})
it is just the order that needs to be changed.

Assuming you want to split the file name, here is a simple pattern :
$ var=/some/folder/with/file.ext
$ echo $(basename $var) | cut -d "." -f1
file

If you know the file extension in advance, you can tell basename to remove it, either as a second argument or via the -s option. Both these yield the same:
basename "${var}" .extension
basename -s .extension "${var}"
If you don't know the file extension in advance, you can try to grep the proper part of the string.
### grep any non-slash followed by anything ending in dot and non-slash
grep -oP '[^/]*(?=\.[^/]*$)' <<< "${var}"

Related

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

Assign part of a file name to bash variable?

I have a file and its name looks like:
12U12345._L001_R1_001.fastq.gz
I want to assign to a variable just the 12U12345 part.
So far I have:
variable=`basename $fastq | sed {s'/_S[0-9]*_L001_R1_001.fastq.gz//'}`
Note: $fastq is a variable with the full path to the file in it.
This solution currently returns the full file name, any ideas how to get this right?
Just use the built-in parameter expansion provided by the shell, instead of spawning a separate process
fastq="12U12345._L001_R1_001.fastq.gz"
printf '%s\n' "${fastq%%.*}"
12U12345
or use printf() itself to store to a new variable in one-shot
printf -v numericPart '%s' "${fastq%%.*}"
printf '%s\n' "${numericPart}"
Also bash has a built-in regular expression comparison operator, represented by =~ using which you could do
fastq="12U12345._L001_R1_001.fastq.gz"
regex='^([[:alnum:]]+)\.(.*)'
if [[ $fastq =~ $regex ]]; then
numericPart="${BASH_REMATCH[1]}"
printf '%s\n' "${numericPart}"
fi
You could use cut:
$> fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
$> variable=$(basename "$fastq" | cut -d '.' -f 1)
$> echo "$variable"
12U12345
Also, please note that:
It's better to wrap your variable inside quotes. Otherwise you command won't work with filenames that contain space(s).
You should use $() instead of the backticks.
Using Bash Parameter Expansion to extract the basename and then extract the portion of the filename you want:
fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
file="${fastq##*/}" # gives 12U12345._L001_R1_001.fastq.gz
string="${file%%.*}" # gives 12U12345
Note that Bash doesn't allow us to nest the parameter expansion. Otherwise, we could have combined statements 2 and 3 above.

what is ~$ in linux shell scripts

I see the below statement in a shell script
if [ "$file" = "conf" ] || echo $file | grep -q '~$'; then
What is ~$? I know other dollar notations like $1 $2 $# $$ $* but never saw anything like ~$.
'~$' pattern in grep matches all lines that end with '~'.
So the if portion will be executed, if the file name ends with ~ .
Actually the entire echo $file | grep -q '~$' means:
Try to match if filename ends with ~, but don't print the matching results.
If matched, execute the if part.
The '~$' does have special meaning. ie. end with ~
~$ is a sequence of two characters and has no special beaning in bash.
After all why should you be bothered about ~$ in grep -q '~$'.
It is pretty obvious that ~$ just makes a pattern.
Regarding
what is $ then
It has special meanings
when used in the context of variable, say $var.
when used in a regex saystuff$ which matches lines ending in stuff.
Please check [ Special Parameters ].

bash grep variable as pattern

I don't usually work in bash but grep could be a really fast solution in this case. I have read a lot of questions on grep and variable assignment in bash yet I do not see the error. I have tried several flavours of double quotes around $pattern, used `...`` or $(...) but nothing worked.
So here's what I try to do:
I have two files. The first contains several names. Each of them I want to use as a pattern for grep in order to search them in another file. Therefore I loop through the lines of the first file and assign the name to the variable pattern.
This step works as the variable is printed out properly.
But somehow grep does not recognize/interpret the variable. When I substitute "$pattern" with an actual name everything is fine as well. Therefore I don't think the variable assignment has a problem but the interpretation of "$pattern" as the string it should represent.
Any help is greatly appreciated!
#!/bin/bash
while IFS='' read -r line || [[ -n $line ]]; do
a=( $line )
pattern="${a[2]}"
echo "Text read from file: $pattern"
var=$(grep "$pattern" 9606.protein.aliases.v10.txt)
echo "Matched Line in Alias is: $var"
done < "$1"
> bash match_Uniprot_StringDB.sh ~/Chromatin_Computation/.../KDM.protein.tb
output:
Text read from file: "UBE2B"
Matched Line in Alias is:
Text read from file: "UTY"
Matched Line in Alias is:
EDIT
The solution drvtiny suggested works. It is necessary to get rid of the double quotes to match the string. Adding the following lines makes the script work.
pattern="${pattern#\"}"
pattern="${pattern%\"}"
Please, look at "-f FILE" option in man grep.
I advise that this option do exactly what you need without any bash loops or such other "hacks" :)
And yes, according to the output of your code, you read pattern including double quotes literally. In other words, you read from file ~/Chromatin_Computation/.../KDM.protein.tb this string:
"UBE2B"
But not
UBE2B
as you probably expect.
Maybe you need to remove double quotes on the boundaries of your $pattern?
Try to do this after reading pattern:
pattern=${pattern#\"}
pattern=${pattern%\"}

for name in `ls` and filenames with spaces

next code doesnt work because of spaces in file names, How to fix?
IFS = '\n'
for name in `ls `
do
number=`echo "$name" | grep -o "[0-9]\{1,2\}"`
if [[ ! -z "$number" ]]; then
mv "$name" "./$number"
fi
done
Just don't use command substitution: use for name in *.
Replace
for name in `ls`
with:
ls | while read name
Notice: bash variable scoping is awful. If you change a variable inside the loop, it won't take effect outside the loop (in my version it won't, in your version it will). In this example, it doesn't matter.
Notice 2: This works for file names with spaces, but fails for some other strange but valid file names. See Charles Duffy's comment below.
Looks like two potential issues:
First, the IFS variable and it's assignment should not have space in them. Instead of
IFS = '\n' it should be IFS=$'\n'
Secondly, for name in ls will cause issues with filename having spaces and newlines. If you just wish to handle filename with spaces then do something like this
for name in *
I don't understand the significance of the line
number=`echo "$name" | grep -o "[0-9]\{1,2\}"`
This will give you numbers found in filename with spaces in new lines. May be that's what you want.
For me, I had to move to use find.
find /foo/path/ -maxdepth 1 -type f -name "*.txt" | while read name
do
#do your stuff with $name
done

Resources