Parsing Strings in Bash w/out a Delimiter - bash

I've got a piece of a script I'm trying to figure out, so maybe its a simple question for someone more experienced out there.
Here is the code:
#!/bin/bash
echo "obase=2;$1" | bc
Used like:
$./script 12
Outputs:
1100
My question is, how can I parse this 4 digit number into separate digits? (to then delimit with cut -d ' ' and input those into an array...)
I'd like to be able to get the following output:
1 1 0 0
Is this even possible in BASH? I know its easier with other languages.

can use sed
echo "obase=2;$1" | bc | sed 's/./& /g'
or if you prefer longer form:
echo "obase=2;$1" | bc | sed 's/\(.\)/\1 /g'
if your sed supports -r
echo "obase=2;$1" | bc | sed -r 's/(.)/\1 /g'

To print individual digits from a string you can use fold:
s=1100
fold -w1 <<< "$s"
1
1
0
0
To create an array:
arr=( $(fold -w1 <<< "$s") )
set|grep arr
arr=([0]="1" [1]="1" [2]="0" [3]="0")

Related

Check string for first 4 or last 4 characters to match a string

I was wondering if it is possible to do this in bash with awk or sed.
I have the following sample file:
HISEQ:272:CB0A0ANXX:3:1112:15781:21284_1:N:0:CATCAC 0 ITR3p_deleted 84279 41 35= * 0 0 TTAAGGAGGCTTCCTTTTCTAAACGATTGGGTGAG JJJ0JIIIIJJJJJJJJJJJJJJJJIJJJIHJJJJ NM:i:0 AM:i:41
HISEQ:272:CB0A0ANXX:3:1115:13546:24638_1:N:0:CATCAC 16 ITR3p_deleted 84279 39 15= * 0 0 TTAAGGAGGCTTCCT BB/FFFF//FBBBBB NM:i:0 AM:i:39
HISEQ:272:CB0A0ANXX:3:1114:4292:31240_1:N:0:CATCAC 16 ITR3p_deleted 83635 45 179= * 0 0 AGATCCTATTAGATACATAGATCCTCGTCGCGATATCGCATTTTCTAACGTGATGGATATATTAA BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJIJJIJJJJJJJJ8JJJJJFFFFFFFFFFFFFFFFFFFFBFFFFFF<FFFFFFFFFFFFFFFFB<<FB<//<< NM:i:0 AM:i:45
HISEQ:272:CB0A0ANXX:3:2104:14047:17929_1:N:0:CATCAC 16 ITR3p_deleted 84274 33 5X120= * 0 0 TAAGGTTAAGGAGGCTTCCTTTTCTAATAATGATATGTATCAATCGGTGTGTAGAAAGTGTTACATCGACTCATAATATTATATTT F7/FFFFBF77///F/7FF/<</</FBF</<<F</B//<//FFFFFFB/F/FBFBF//</F/F</F<<FBBFFFFFFFFFFFF<FFFBFFFFBFF<F<FFFB/F/FBFFFFFFFFFFBFB/</<< NM:i:5 AM:i:33
And I want to check the string of the 10th column. If it starts with TTAA as in the first two examples, I want to extract those records into file-1. If it ends in TTAA such as in the third example, I would like to extract this into file-2. The fourth record would get ignored.
Can't seem to find string matches with awk.
Thanks.
try, following.
awk '$10 ~ /^TTAA/{print > "file-1";next} $10 ~ /TTAA$/{print > "file-2"}' Input_file
This should do the trick:
cat samplefile.txt | while read line; do
if [[ $(echo "$line" | awk '{print $10}' | grep '^TTAA') ]]; then
echo "$line" >> file-1.txt
fi
if [[ $(echo "$line" | awk '{print $10}' | grep 'TTAA$') ]]; then
echo "$line" >> file-2.txt
fi
done
This might work for you (GNU sed):
sed -rne '/^(\S+\s+){9}TTAA/w file1' -e '/^(\S+\s+){9}\S+TTAA\>/w file2' file
Invoke seds grep-like nature and write to separate files depending on the regexp.
N.B. a single line may be written to both output files if the regexp is matched.

sed: interpolating variables in timestamp format

I would like to use sed to extract all the lines between two specific strings from a file.
I need to do this on a script and my two strings are variables.
The strings will be in a sort of time stamp format, which means they can be something like:
2014/01/01 or 2014/01/01 08:01
I was trying with something like:
sed -n '/$1/,/$2/p' $file
or even
sed -n '/"$1"/,/"$2"/p' $file
with no luck, tried also to replace / as delimiter with ;.
I'm pretty sure the problem is due to the / and blank in input variables, but I can't figure out the proper syntax.
The syntax to use alternate regex delimiters is:
\ c regexp c
Match lines matching the regular expression regexp. The c may be any character.
https://www.gnu.org/software/sed/manual/sed.html#Addresses
So, pick one of
sed -n '\#'"$1"'#,\#'"$2"'#p' "$file"
sed -n "\\#$1#,\\#$2#p" "$file"
sed -n "$( printf '\#%s#,\#%s#p' "$1" "$2" )" "$file"
or awk
awk -v start="$1" -v end="$1" '$0 ~ start {p=1}; p; $0 ~ end {p=0}' "$file"
From the first $1 to the last $2:
sed -n "\\#$1#,\$p" "$file" | tac | sed -n "\\#$2#,\$p" | tac
This prints from the first $1 to the end, reverses the lines, prints from the first $2 to the new end, and reverses the lines again.
An example: from the first "5" to the last "7"
$ set -- 5 7
$ seq 20 | sed -n "\\#$1#,\$p" | tac | sed -n "\\#$2#,\$p" | tac
5
6
7
8
9
10
11
12
13
14
15
16
17
Try using double quotes instead of single ones.
sed -n "/$1/,/$2/p" $file

Remove all chars that are not a digit from a string

I'm trying to make a small function that removes all the chars that are not digits.
123a45a ---> will become ---> 12345
I've came up with :
temp=$word | grep -o [[:digit:]]
echo $temp
But instead of 12345 I get 1 2 3 4 5. How to I get rid of the spaces?
Pure bash:
word=123a45a
number=${word//[^0-9]}
Here's a pure bash solution
var='123a45a'
echo ${var//[^0-9]/}
12345
is this what you are looking for?
kent$ echo "123a45a"|sed 's/[^0-9]//g'
12345
grep & tr
echo "123a45a"|grep -o '[0-9]'|tr -d '\n'
12345
I would recommend using sed or perl instead:
temp="$(sed -e 's/[^0-9]//g' <<< "$word")"
temp="$(perl -pe 's/\D//g' <<< "$word")"
Edited to add: If you really need to use grep, then this is the only way I can think of:
temp="$( grep -o '[0-9]' <<< "$word" \
| while IFS= read -r ; do echo -n "$REPLY" ; done
)"
. . . but there's probably a better way. (It uses grep -o, like your solution, then runs over the lines that it outputs and re-outputs them without line-breaks.)
Edited again to add: Now that you've mentioned that you use can use tr instead, this is much easier:
temp="$(tr -cd 0-9 <<< "$word")"
What about using sed?
$ echo "123a45a" | sed -r 's/[^0-9]//g'
12345
As I read you are just allowed to use grep and tr, this can make the trick:
$ echo "123a45a" | grep -o [[:digit:]] | tr -d '\n'
12345
In your case,
temp=$(echo $word | grep -o [[:digit:]] | tr -d '\n')
tr will also work:
echo "123a45a" | tr -cd '[:digit:]'
# output: 12345
Grep returns the result on different lines:
$ echo -e "$temp"
1
2
3
4
5
So you cannot remove those spaces during the filtering, but you can afterwards, since $temp can transform itself like this:
temp=`echo $temp | tr -d ' '`
$ echo "$temp"
12345

Bash escaping and syntax

I have a small bash file which I intend to use to determine my current ping vs my average ping.
#!/bin/bash
output=($(ping -qc 1 google.com | tail -n 1))
echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc
This outputs my ping - 20 ms, which is the number I want. However, I also want to prepend a + if the number is positive and append "ms".
This brings me to my overarching problem: Bash syntax regarding escaping and such heavy "indenting" is kind of flaky.
While I'll be satisfied with an answer of how to do what I wanted, I'd like a link to, or explanation of how exactly bash syntax works dealing with this sort of thing.
output=($(ping -qc 1 google.com | tail -n 1))
echo "${output[3]}" | awk -F/ '{printf "%+fms\n", $1-20}'
The + modifier in printf tells it to print the sign, whether it's positive or negative.
And since we're using awk, there's no need to use cut or bc to get a field or do arithmetic.
Escaping is pretty awful in bash if you use the deprecated `..` style command expansion. In this case, you have to escape any backticks, which means you also have to escape any other escapes. $(..) nests a lot better, since it doesn't add another layer of escaping.
In any case, I'd just do it directly:
ping -qc 1 google.com.org | awk -F'[=/ ]+' '{n=$6}
END { v=(n-20); if(v>0) printf("+"); print v}'
Here's my take on it, recognizing that the result from bc can be treated as a string:
output=($(ping -qc 1 google.com | tail -n 1))
output=$(echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc)' ms'
[[ "$output" != -* ]] && output="+$output"
echo "$output"
Bash cannot handle floating point numbers. A workaround is to use awk like this:
#!/bin/bash
output=($(ping -qc 1 google.com | tail -n 1))
echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc | awk '{if ($1 >= 0) printf "+%fms\n", $1; else printf "%fms\n", $1}'
Note that this does not print anything if the result of bc is not positive
Output:
$ ./testping.sh
+18.209000ms

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources