Regex - extract up until a match and not including that match

Regex - extract up until a match and not including that match - bash

I'm attempting to capture filenames removing both a file-extension and suffix, e.g.:
TEST_EXAMPLE_SUFFIX.file
Output = TEST_EXAMPLE
I want to do this on the basis of matching the _SUFFIX part and extracting all characters prior to that (not including _SUFFIX). Ordinarily I would use something like:
FILE_EXT=_SUFFIX
/.+?(?=$FILE_EXT)/
However when piping that together as part of a for-loop:
for t in $(ls *.fastq | sed -e /.+?(?=$READ1_EXT)/)
I get the error:
command substitution: line 14: syntax error near unexpected token `('
What have I done wrong?

Don't parse ls output , You could use bash parameter expansion to
achieve what you need
for t in *_SUFFIX.fastq
do
echo "${t%_SUFFIX.fastq}" #stips _SUFFIX.fastq part
done
References
See shell [ param expansion ].
On why [ not to parse ls output ].
Edit:
For working around repeated occurrences, you could do something like this :
Consider that you have two files of interest Test_R1.file & Test_R2.file and you expect Test to appear only once in the results do something like
declare -A arry # declaring an associative array
for t in Test_R*.file
do
arry["${t%_R*.file}"]=1
# stips _R(number).file part and makes it a key to arry
# Remember arry keys are unique.
# The assignment ie '=1' is not relevant here, you can assign any value
done
# We are all set to print the unique filenames
echo "${!arry[#]}"
# "${!arry[#]}" expands to the list of array indices (keys) for arry

You can do this using bash parameter expansion only, assuming persistent format of file names:
for file in *_SUFFIX.fastq; do echo "${file%_*}"; done
The for construct iterate over the .fastq files.
Example:
$ file=TEST_EXAMPLE_SUFFIX.fastq
$ echo "${file%_*}"
TEST_EXAMPLE

Related

How to remove duplicate with bash script command xargs when the string has some quotes ""?

I am a newbie in bash script.
Here is my environment:
Mac OS X Catalina
/bin/bash
I found here a mix of several commands to remove the duplicate string in a string.
I needed for my program which updates the .zhrc profile file.
Here is my code:
#!/bin/bash
a='export PATH="/Library/Frameworks/Python.framework/Versions/3.8/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/local/bin:"'
myvariable=$(echo "$a" | tr ':' '\n' | sort | uniq | xargs)
echo "myvariable : $myvariable"
Here is the output:
xargs: unterminated quote
myvariable :
After some test, I know that the source of the issue is due to some quotes "" inside my variable '$a'.
Why am I so sure?
Because when I execute this code for example:
#!/bin/bash
a="/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home:/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home"
myvariable=$(echo "$a" | tr ':' '\n' | sort | uniq | xargs)
echo "myvariable : $myvariable"
where $a doesn't contain any quotes, I get the correct output:
myvariable : /Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home
I tried to search for a solution for "xargs: unterminated quote" but each answer found on the web is for a particular case which doesn't correspond to my problem.
As I am a newbie and this line command is using several complex commands, I was wondering if anyone know the magic trick to make it work.

Basically, you want to remove duplicates from a colon-separated list.
I don't know if this is considered cheating, but I would do this in another language and invoke it from bash. First I would write a script for this purpose in zsh: It accepts as parameter a string with colon separtors and outputs a colon-separated list with duplicates removed:
#!/bin/zsh
original=${1?Parameter missing} # Original string
# Auxiliary array, which is set up to act like a Set, i.e. without
# duplicates
typeset -aU nodups_array
# Split the original strings on the colons and store the pieces
# into the array, thereby removing duplicates. The core idea for
# this is stolen from:
# https://stackoverflow.com/questions/2930238/split-string-with-zsh-as-in-python
nodups_array=("${(#s/:/)original}")
# Join the array back with colons and write the resulting string
# to stdout.
echo ${(j':')nodups_array}
If we call this script nodups_string, you can invoke it in your bash-setting as:
#!/bin/bash
a_path="/Library/Frameworks/Python.framework/Versions/3.8/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/local/bin:"
nodups_a_path=$(nodups_string "$a_path")
my_variable="export PATH=$nodups_a_path"
echo "myvariable : $myvariable"
The overall effect would be literally what you asked for. However, there is still an open problem I should point out: If one of the PATH components happens to contain a space, the resulting export statement can not validly be executed. This problem is also inherent into your original problem; you just didn't mention it. You could do something like
my_variable=export\ PATH='"'$nodups_a_path"'"'
to avoid this. Of course, I wonder why you take such an effort to generat a syntactically valid export command, instead of simply building the PATH by directly where it is needed.
Side note: If you would use zsh as your shell instead of bash, and only want to keep your PATH free of duplicates, a simple
typeset -iU path
would suffice, and zsh takes care of the rest.

With awk:
awk -v RS=[:\"] 'NR > 1 { pth[$0]="" } END { for (i in pth) { if (i !~ /[[:space:]]+/ && i != "" ) { printf "%s:",i } } }' <<< "$a"
Set the record separator to : and double quotes. Then when the number record is greater than one, set up an array called pth with the path as the index. At the end, loop through the array, re printing the paths separated with :

Linux script text substitutions

I want to make a few configuration files (for homeassistant) that are very similar to each other. I am aiming to use a template file as the base and put in a few substitution strings at the top of the file and use a bash script to read the substitutions and run sed with the applicable strings.
i.e.
# substitutions
# room = living_room
# switch = hallway_motion
# delay = 3
automations:
foo......
.........
entity_id: $switch
When I run the script it will look for any line beginning with a # that has a word (key) and then an = and another word (maybe string) (value) and replace anywhere that key with a $ in front is in the rest of the file.
Like what is done by esphome. https://esphome.io/guides/configuration-types.html#substitutions
I am getting stuck at finding the "keys" in the file. How can I script this so it can find all the "keys" recursively?
Or is there something that does this, or something similar, out there already?

You can do this with sed in two stages. The first stage will generate a second stage sed script to fill in your template. I'd make a small adjustment to your syntax and recommend that you require curly braces around your variable name. In other words, write your variable expansions like this:
# foo = bar
myentry: ${foo}
This makes it easier to avoid pitfalls when you have one variable name that's a prefix of another (e.g., foo and foobar).
#!/bin/bash
in="$1"
stage2=$(mktemp)
trap 'rm -f "$stage2"' EXIT
sed -n -e 's,^# \([[:alnum:]_]\+\) = \([^#]\+\),s#\${\1}#\2#g,p' "$in" > "$stage2"
sed -f "$stage2" "$in"
Provide a filename as the first argument, and it will print the filled out template on stdout.
This example code is pretty strict about white space on variable definition lines, but that can obviously be adjusted to your liking.

why does `($(hostname -I))` expand to first word of list of IPs? [duplicate]

Given the following script:
#!/bin/bash
asteriskFiles=("sip.conf" "extensions.conf")
for asteriskFile in $asteriskFiles
do
# backup current configuration file
cp somepath/${asteriskFile} test/
echo "test"
done
This gives me the output "test" only once, so the loop runs only once instead of two times (two entries in asteriskFiles array). What am I doing wrong? Thanks for any hint!

An illustration:
$ asteriskFiles=("sip.conf" "extensions.conf")
$ echo $asteriskFiles # is equivalent to echo ${asteriskFiles[0]}
sip.conf
$ echo "${asteriskFiles[#]}"
sip.conf extensions.conf
Note that the quotes are important. echo ${asteriskFiles[#]} might seem to work, but bash would wordsplit on whitespace if any of your files had whitespace in them.

Write the beginning of your loop like this
for asteriskFile in "${asteriskFiles[#]}"

The Probem
The asteriskFiles variable holds an array. If you dereference it like a scalar, you only get the first element of the array.
The Solution
You want to use the correct shell parameter expansion to access all the subscript elements. For example:
$ echo "${asteriskFiles[#]}"
sip.conf extensions.conf
The # subscript (when correctly quoted) will expand to the properly-tokenized elements of your array, which your for-loop will then be able to iterate over.

cat on a quoted variable fails

I have this code snippet:
userjobs=$(grep -rw "$USER" /my/job/dir/|awk '{print $1}'|sort|uniq|rev|cut -c 2-|rev)
for job in "${userjobs[#]}"; do
cat "$job"
done
exit 0
When I run it as is, I get the following output:
cat: /my/job/dir/45
/my/job/dir/46: No such file or directory
However, if I unquote $job, I no longer receive this behavior, and it cats each of the files as expected.
I've done some reading up on globbingand splitting to see if this is occurring, but it seems like double-quoting should prevent that from happening. Can anyone explain why the behavior is different between "$job" and $job?

This happens because your variable looks like:
userjobs='/my/job/dir/45
/my/job/dir/46'
If you expand it as an array, with "${userjobs[#]}", that it acts as an array with exactly one element -- that string. Thus, behavior is identical to:
userjobs=( [0]='/my/job/dir/45
/my/job/dir/46' )
...still exactly one string with a literal newline in it.
Thus, cat "$job" looks for a file with a literal newline in its name.
To load your result into a real array you can iterate over with "${userjobs[#]}" expanding to a distinct element per line, use:
readarray -t userjobs < <(grep ...)

userjobs needs to be an array. Put parentheses around the value when assigning it:
userjobs=($(grep -rw "$USER" /my/job/dir/|awk '{print $1}'|sort|uniq|rev|cut -c 2-|rev))

Create variable by combining text + another variable

Long story short, I'm trying to grep a value contained in the first column of a text file by using a variable.
Here's a sample of the script, with the grep command that doesn't work:
for ii in `cat list.txt`
do
grep '^$ii' >outfile.txt
done
Contents of list.txt :
123,"first product",description,20.456789
456,"second product",description,30.123456
789,"third product",description,40.123456
If I perform grep '^123' list.txt, it produces the correct output... Just the first line of list.txt.
If I try to use the variable (ie grep '^ii' list.txt) I get a "^ii command not found" error. I tried to combine text with the variable to get it to work:
VAR1= "'^"$ii"'"
but the VAR1 variable contained a carriage return after the $ii variable:
'^123
'
I've tried a laundry list of things to remove the cr/lr (ie sed & awk), but to no avail. There has to be an easier way to perform the grep command using the variable. I would prefer to stay with the grep command because it works perfectly when performing it manually.

You have things mixed in the command grep '^ii' list.txt. The character ^ is for the beginning of the line and a $ is for the value of a variable.
When you want to grep for 123 in the variable ii at the beginning of the line, use
ii="123"
grep "^$ii" list.txt
(You should use double quotes here)
Good moment for learning good habits: Continue in variable names in lowercase (well done) and use curly braces (don't harm and are needed in other cases) :
ii="123"
grep "^${ii}" list.txt
Now we both are forgetting something: Our grep will also match
1234,"4-digit product",description,11.1111. Include a , in the grep:
ii="123"
grep "^${ii}," list.txt
And how did you get the "^ii command not found" error ? I think you used backquotes (old way for nesting a command, better is echo "example: $(date)") and you wrote
grep `^ii` list.txt # wrong !

#!/bin/sh
# Read every character before the first comma into the variable ii.
while IFS=, read ii rest; do
# Echo the value of ii. If these values are what you want, you're done; no
# need for grep.
echo "ii = $ii"
# If you want to find something associated with these values in another
# file, however, you can grep the file for the values. Use double quotes so
# that the value of $ii is substituted in the argument to grep.
grep "^$ii" some_other_file.txt >outfile.txt
done <list.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio