bash to extract and store after second underscore - bash

I am trying to use bash to extract after the second _ and store that in a variable pref. I am using a loop so the below is not completely accurate, but the file structure/format is.
I can extract evertything before the first _ using pref=${bname%%_*}, but can't seem to change it to the second_`. Thank you :).
file to extract from
00-0000_Last-First_base_counts_FBN1.txt
desired output
00-0000_Last-First
bash
pref=${bname%%_}; pref=${bname%%_*.txt}

Using cut with _ as delimiter get 1st and 2nd fields:
s='00-0000_Last-First_base_counts_FBN1.txt'
cut -d_ -f1-2 <<< "$s"
00-0000_Last-First
To store in a variable:
pref=$(cut -d_ -f1-2 <<< "$s")

GNU sed and grep
$ sed -r 's/([^_]+_[^_]*).*/\1/' <<<"00-0000_Last-First_base_counts_FBN1.txt"
00-0000_Last-First
$ sed 's/_[^_]*//2g' <<< "00-0000_Last-First_base_counts_FBN1.txt"
00-0000_Last-First
$ grep -o "^[^_]*_\?[^_]*" <<<"00-0000_Last-First_base_counts_FBN1.txt"
00-0000_Last-First
To store in variable
somevar="00-0000_Last-First_base_counts_FBN1.txt";
pref=$(sed 's/_[^_]*//2g' <<< "$somevar")

As well as with sed, awk and cut, you can achieve this with expr:
$ filename=00-0000_Last-First_base_counts_FBN1.txt
$ echo $(expr match "$filename" '^\([^_]*_[^_]*\)')
00-0000_Last-First
This is echoing the capture group with in the \( and \) of the regular expression.

Related

How do you remove a section of of a file name after underscore including the underscore using bash? [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

How to remove all symbols after dot [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

sed or grep to read between a set of parentheses

I'm trying to read a version number from between a set of parentheses, from this output of some command:
Test Application version 1.3.5
card 0: A version 0x1010000 (1.0.0), 20 ch
Total known cards: 1
What I'm looking to get is 1.0.0.
I've tried variations of sed and grep:
command.sh | grep -o -P '(?<="(").*(?=")")'
command.sh | sed -e 's/(\(.*\))/\1/'
and plenty of variations. No luck :-(
Help?
You were almost there! In pgrep, use backslashes to keep literal meaning of parentheses, not double quotes:
grep -o -P '(?<=\().*(?=\))'
Having GNU grep you can also use the \K escape sequence available in perl mode:
grep -oP '\(\K[^)]+'
\K removes what has been matched so far. In this case the starting ( gets removed from match.
Alternatively you could use awk:
awk -F'[()]' 'NF>1{print $2}'
The command splits input lines using parentheses as delimiters. Once a line has been splitted into multiple fields (meaning the parentheses were found) the version number is the second field and gets printed.
Btw, the sed command you've shown should be:
sed -ne 's/.*(\(.*\)).*/\1/p'
There are a couple of variations that will work. First with grep and sed:
grep '(' filename | sed 's/^.*[(]\(.*\)[)].*$/\1/'
or with a short shell script:
#!/bin/sh
while read -r line; do
value=$(expr "$line" : ".*(\(.*\)).*")
if [ "x$value" != "x" ]; then
printf "%s\n" "$value"
fi
done <"$1"
Both return 1.0.0 for your given input file.

How to compose custom command-line argument from file lines?

I know about the xargs utility, which allows me to convert lines into multiple arguments, like this:
echo -e "a\nb\nc\n" | xargs
Results in:
a b c
But I want to get:
a:b:c
The character : is used for an example. I want to be able to insert any separator between lines to get a single argument. How can I do it?
If you have a file with multiple lines than you want to change to a single argument changing the NEWLINES by a single character, the paste command is what you need:
$ echo -en "a\nb\nc\n" | paste -s -d ":"
a:b:c
Then, your command becomes:
your_command "$(paste -s -d ":" your_file)"
EDIT:
If you want to insert more than a single character as a separator, you could use sed before paste:
your_command "$(sed -e '2,$s/^/<you_separator>/' your_file | paste -s -d "")"
Or use a single more complicated sed:
your_command "$(sed -n -e '1h;2,$H;${x;s/\n/<you_separator>/gp}' your_file)"
The example you gave is not working for me. You would need:
echo -e "a\nb\nc\n" | xargs
to get a b c.
Coming back to your need, you could do this:
echo "a b c" | awk 'OFS=":" {print $1, $2, $3}'
it will change the separator from space to : or whatever you want it to be.
You can also use sed:
echo "a b c" | sed -e 's/ /:/g
that will output a:b:c.
After all these data processing, you can use xargs to perform the command you want to. Just | xargs and do whatever you want.
Hope it helps.
You can join the lines using xargs and then replace the space(' ' ) using sed.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'
will result in
a:b:c
obviously you can use this output as argument for other command using another xargs.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'|xargs

How can I remove all text after a character in bash?

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

Resources