bash grep variable as pattern - bash

I don't usually work in bash but grep could be a really fast solution in this case. I have read a lot of questions on grep and variable assignment in bash yet I do not see the error. I have tried several flavours of double quotes around $pattern, used `...`` or $(...) but nothing worked.
So here's what I try to do:
I have two files. The first contains several names. Each of them I want to use as a pattern for grep in order to search them in another file. Therefore I loop through the lines of the first file and assign the name to the variable pattern.
This step works as the variable is printed out properly.
But somehow grep does not recognize/interpret the variable. When I substitute "$pattern" with an actual name everything is fine as well. Therefore I don't think the variable assignment has a problem but the interpretation of "$pattern" as the string it should represent.
Any help is greatly appreciated!
#!/bin/bash
while IFS='' read -r line || [[ -n $line ]]; do
a=( $line )
pattern="${a[2]}"
echo "Text read from file: $pattern"
var=$(grep "$pattern" 9606.protein.aliases.v10.txt)
echo "Matched Line in Alias is: $var"
done < "$1"
> bash match_Uniprot_StringDB.sh ~/Chromatin_Computation/.../KDM.protein.tb
output:
Text read from file: "UBE2B"
Matched Line in Alias is:
Text read from file: "UTY"
Matched Line in Alias is:
EDIT
The solution drvtiny suggested works. It is necessary to get rid of the double quotes to match the string. Adding the following lines makes the script work.
pattern="${pattern#\"}"
pattern="${pattern%\"}"

Please, look at "-f FILE" option in man grep.
I advise that this option do exactly what you need without any bash loops or such other "hacks" :)
And yes, according to the output of your code, you read pattern including double quotes literally. In other words, you read from file ~/Chromatin_Computation/.../KDM.protein.tb this string:
"UBE2B"
But not
UBE2B
as you probably expect.
Maybe you need to remove double quotes on the boundaries of your $pattern?
Try to do this after reading pattern:
pattern=${pattern#\"}
pattern=${pattern%\"}

Related

How do I use sed to store a variable?

I am trying to use sed to use as input for a variable. The user will choose from a list of files that have numbers before each to identify individual files. Then they choose a number corresponding to a name. I need to get the name of that file. My code is:
for entry in *; do
((i++))
echo "$i) $entry: "
done
echo What file # do you want to choose?:
read filenum
fileName=$(./myscript.sh | sed -n "${filenum}p")
echo $fileName ###this is to see if anything goes into fileName. nothing is ever output
echo What do you want to do with $fileName?
Ideally I would use () instead of the backtick but I can't seem to figure out how. I've looked at the links below, but can't get those ideas to work. I believe a problem may be that I am trying to include the filenum variable inside my sed.
https://www.linuxquestions.org/questions/linux-newbie-8/storing-output-of-sed-in-a-variable-in-shell-script-499997/
Store output of sed into a variable
Don't put backticks around $filenum. That will try to execute the contents of $filenum as a command. Put variables inside double quotes.
And if you do want to nest a backtick expression inside another set of backticks, you have to escape them. That's where $() becomes useful -- they nest without any hassle.
When you use sed -n, you need to use the p command to print the lines that you want to show in the output.
fileName=$(sed -n "${filenum}p" myscript.sh)
This will put the contents of line $filenum of myscript.sh in the variable.
If you actually wanted to execute myscript.sh and print the selected line of its output, you need to pipe to sed:
fileName=$(./myscript.sh | sed -n "${filenum}p")

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

Bash: Nested variable expansion

How can I nest operations in bash? e.g I know that
$(basename $var)
will give me just the final part of the path and
${name%.*}
gives me everything before the extension.
How do I combine these two calls, I want to do something like:
${$(basename $var)%.*}
As #sid-m 's answer states, you need to change the order of the two expansions because one of them (the % stuff) can only be applied to variables (by giving their name):
echo "$(basename "${var%.*}")"
Other things to mention:
You should use double quotes around every expansion, otherwise you run into trouble if you have spaces in the variable values. I already did that in my answer.
In case you know or expect a specific file extension, basename can strip that off for you as well: basename "$var" .txt (This will print foo for foo.txt in $var.)
You can do it like
echo $(basename ${var%.*})
it is just the order that needs to be changed.
Assuming you want to split the file name, here is a simple pattern :
$ var=/some/folder/with/file.ext
$ echo $(basename $var) | cut -d "." -f1
file
If you know the file extension in advance, you can tell basename to remove it, either as a second argument or via the -s option. Both these yield the same:
basename "${var}" .extension
basename -s .extension "${var}"
If you don't know the file extension in advance, you can try to grep the proper part of the string.
### grep any non-slash followed by anything ending in dot and non-slash
grep -oP '[^/]*(?=\.[^/]*$)' <<< "${var}"

Bash remove every occurrence after first in string

I'm trying to remove everything after a specific_string in a path string in Bash. I've tried using sed to no avail so far.
variable="specific_string"
input_string="/path/to/some/specific_string/specific_string.something/specific_string.something-else"
output=$(sed 's/$variable//' $input_string)
Output should be "/path/to/some/specific_string/"
Would be better if I didn't have to use commands such as sed!
The Problems
There are many problems
Variables are not evaluated inside single quotes. 's/$variable//' will be treated as a literal string, which does not contain specific_string
sed can modify text from files or STDIN, but not text given via parameters. With sed 's/...//' $input_string the /path/to/some/specific_string/.../file is opened and its content is read, instead of the path itself.
s/string// deletes only string, not the words afterwards.
Also remember to double quote your variables. cmd $variable is dangerous if the variable contains spaces. cmd "$variable" is safe.
Sed Solution
output="$(sed "s/$variable.*/$variable/" <<< "$input_string")"
GNU Grep Solution
output="$(grep -Po "^.*?$variable" <<< "$input_string")"
Pure Bash Solution
output="${input_string%%$variable*}$variable"
If you want to remove everything after "specific_string" it will remove the "/" also as it does with the following example:
output=$(echo $input_string|sed "s/${variable}.*$/${variable}/")
try with simple sed:
variable="specific_string"
input_string="/path/to/some/specific_string/specific_string.something/specific_string.something-else"
output=$(echo "$input_string" | sed "s/\(.*$variable\/\).*/\1/")
Output of variable output will be as follows.
echo $output
/path/to/some/specific_string/

sed -- replace variable in for do loop

I want to insert a counter into a text file using sed. For example, the file has the following content:
please.add.number.00
Here is the script I'm using:
for i in $(seq 0 10)
do
sed -i 's/please.add.number.00/please.add.number.$i/' filename.txt
done
But the value ($i) in the file doesn't change. I want to substitute the value of $i in this line of filename.txt. I would appreciate any help to fix this issue!
As mentioned in comments, there is a difference between these two lines:
echo '$HOME'
echo "$HOME"
Single quotes will result in $HOME, double quotes will tell you your home directory.
Based on edits to your question, it looks like your actual problem is a case of misunderstanding how sed works. The substitute command takes two parameters: a search pattern and a replacement. If the search pattern (please.add.number.00) never changes, then it will only ever be matched the first time it is run.

Resources