extract string from another using awk

extract string from another using awk - shell

I have this variable which contain a list of string separted by space
val=00:21:5D:16:F3 00:21:5D:16:F4 00:21:5D:16:F5
I want to extract each string separated bu space " " and then assign it to val
I use this shell code but it doesn't work
while [ "$((i++))" != "10" ]; do
val$i=`echo $val | awk '{print $i}'`
echo "val$i=$val$i"
done
the desired result is :
val1="00:21:5D:16:F3"
val2="00:21:5D:16:F4"
val3="00:21:5D:16:F5"
val4=""
val5=""
val6=""
val7=""
val8=""
val9=""
val10=""
any help is appreciated even if the treatment is done with another linux utility like cut , sed , grep.

this awk script should be what are you looking for
awk -F[' '=] 'BEGIN{t=1} { for (i=2;i<=11;i++) {print "val" t "=\"" $i "\""; t+=1}}' test
there is output
system1:/depot/scripts/sh # awk -F[' '=] 'BEGIN{t=1} { for (i=2;i<=11;i++) {print "val" t "=\"" $i "\""; t+=1}}' test
val1="00:21:5D:16:F3"
val2="00:21:5D:16:F4"
val3="00:21:5D:16:F5"
val4=""
val5=""
val6=""
val7=""
val8=""
val9=""
val10=""
system:/depot/scripts/sh #
test file contains:
system:/depot/scripts/sh # cat test
val=00:21:5D:16:F3 00:21:5D:16:F4 00:21:5D:16:F5
system:/depot/scripts/sh #

thank you for your help and I want to share with you the best solution that I found
while [ "$((i++))" != "10" ]; do
val$i=`echo $val | awk -F' ' '{print $'"$i"'}'`
echo "val$i=$val$i"
done

I Know is not what you really asked, but what about using array to solve this?
like:
val=(00:21:5D:16:F3 00:21:5D:16:F4 00:21:5D:16:F5)
$ echo ${val[0]}
00:21:5D:16:F3
$ echo ${val[1]}
00:21:5D:16:F4
$ echo ${val[2]}
00:21:5D:16:F5
$ echo ${val[3]}

Related

How to extract phone number and Pin from each text line

Sample Text from the log file
2021/08/29 10:25:37 20210202GL1 Message Params [userid:user1] [timestamp:20210829] [from:TEST] [to:0214736848] [text:You requested for Pin reset. Your Customer ID: 0214736848 and PIN: 4581]
2021/08/27 00:03:18 20210202GL2 Message Params [userid:user1] [timestamp:20210827] [from:TEST] [to:0214736457] [text:You requested for Pin reset. Your Customer ID: 0214736457 and PIN: 6193]
2021/08/27 10:25:16 Thank you for joining our service; Your ID is 0214736849 and PIN is 5949
Other wording and formatting can change but ID and PIN don't change
Expected out put for each line
0214736848#4581
0214736457#6193
0214736849#5949
Below is what I have tried out using bash though am currently able to extract only the numeric values
while read p; do
NUM=''
counter=1;
text=$(echo "$p" | grep -o -E '[0-9]+')
for line in $text
do
if [ "$counter" -eq 1 ] #if is equal to 1
then
NUM+="$line" #concatenate string
else
NUM+="#$line" #concatenate string
fi
let counter++ #Increment counter
done
printf "$NUM\n"
done < logfile.log
Current output though not the expected.
2021#08#29#00#03#18#20210202#2#1#20210826#0214736457#0214736457#6193
2021#08#27#10#25#37#20210202#1#1#20210825#0214736848#0214736848#4581
2021#08#27#10#25#16#0214736849#5949

Another variation using gawk and 2 capture groups, matching 1 or more digits per group:
awk '
match($0, /ID: ([0-9]+) and PIN: ([0-9]+)/, m) {
print m[1]"#"m[2]
}
' file
Output
0214736848#4581
0214736457#6193
For the updated question, you could either match : or is if you want a more precise match, and the capture group values will be 2 and 4.
awk '
match($0, /ID(:| is) ([0-9]+) and PIN(:| is) ([0-9]+)/, m) {
print m[2]"#"m[4]
}
' file
Output
0214736848#4581
0214736457#6193
0214736849#5949

Using sed capture groups you can do:
sed 's/.* Your Customer ID: \([0-9]*\) and PIN: \([0-9]*\).*/\1#\2/g' file.txt

With your shown samples please try following awk code, you could simple do it with using different field separators. Simple explanation would be, making Customer ID: OR and PIN: OR ]$ as field separators and then keeping them in mind printing only 2nd and 3rd fields along with # as per required output by OP.
awk -v FS='Customer ID: | and PIN: |]$' '{print $2"#"$3}' Input_file

With bash and a regex:
while IFS='] ' read -r line; do
[[ "$line" =~ ID:\ ([^\ ]+).*PIN:\ ([^\ ]+)] ]]
echo "${BASH_REMATCH[1]}#${BASH_REMATCH[2]}"
done <file
Output:
0214736848#4581
0214736457#6193

Given the updated input in your question then using any sed in any shell on every Unix box:
$ sed 's/.* ID[: ][^0-9]*\([0-9]*\).* PIN[: ][^0-9]*\([0-9]*\).*/\1#\2/' file
0214736848#4581
0214736457#6193
0214736849#5949
Original answer:
Using any awk in any shell on every Unix box:
$ awk -v OFS='#' '{print $18, $21+0}' file
0214736848#4581
0214736457#6193

How do I extract the content of quoted strings from the output of a shell command

The following shell command returns an output with 3 items:
cred="$(aws sts assume-role --role-arn arn:aws:iam::01234567899:role/test --role-session-name s3-access-example --query '[Credentials.AccessKeyId, Credentials.SecretAccessKey, Credentials.SessionToken]')"
echo $cred returns the following output:
[ "ASRDTDRSIJGISGDT", "trttr435", "DF/////eraesr43" ]
How do I retrieve the value between double quotes? For example, trttr435
How to achieve this? Use regex? or other options?

IFS=', ' credArray=(`echo "$cred" | tr -d '"[]'`)
Simple as ... that
Testing
cred='[ "ASRDTDRSIJGISGDT", "trttr435", "DF/////eraesr43" ]'
IFS=', ' credArray=(`echo "$cred" | tr -d '"[]'`)
for i in "${credArray[#]}"; do echo "[$i]"; done
echo "2nd parameter is ${credArray[1]}"
Output
[ASRDTDRSIJGISGDT]
[trttr435]
[DF/////eraesr43]
2nd parameter is trttr435
Tested on Mac OS bash and CentOS bash

I didn't quite catch if the [ and ] are in the $cred or not, or what is your expected output but this will return everything between double quotes:
$ awk '{while(match($0,/"[^"]+"/)){print substr($0,RSTART+1,RLENGTH-2);$0=substr($0,RSTART+RLENGTH)}}' file
ASRDTDRSIJGISGDT
trttr435
DF/////eraesr43
You could and probably would like to:
$ echo "$cred" | awk ... # add above script here
Edit: If you just want to get the quoted string from second field ($2):
$ awk -F, '{match($2,/"[^"]+"/);print substr($2,RSTART+1,RLENGTH-2)}' file
trttr435
or even:
$ awk -F, '{gsub(/^[^"]+"|"[^"]*$/,"",$2);print $2}' file

Or use python, because the content of cred is already a valid python array:
#!/bin/bash
cred='[ "ASRDTDRSIJGISGDT", "trttr435", "DF/////eraesr43" ]'
python-script() {
local INDEX=$1
echo "arr=$cred"
echo "print(arr[$INDEX])"
}
item() {
local INDEX=$1
python-script "$INDEX" | python
}
echo "item1=$(item 1)"
echo "item2=$(item 2)"

Another crude but effective way of extracting the values you need would be to use awk with " as the split delimiter. The valid positions, in this case, would be $2, $4, $6
OUT="[ \"ASRDTDRSIJGISGDT\", \"trttr435\", \"DF/////eraesr43\" ]"
echo $OUT | awk -F '"' '{print $4}'
I would advise you to use python if you need to do a lot of string parsing.

Looping through multiline CSV rows in bash

I have the following csv file with 3 columns:
row1value1,row1value2,"row1
multi
line
value"
row2value1,row2value2,"row2
multi
line
value"
Is there a way to loop through its rows like (this does not work, it reads lines):
while read $ROW
do
#some code that uses $ROW variable
done < file.csv

Using gnu-awk you can do this using FPAT:
awk -v RS='"\n' -v FPAT='"[^"]*"|[^,]*' '{
print "Record #", NR, " =======>"
for (i=1; i<=NF; i++) {
sub(/^"/, "", $i)
printf "Field # %d, value=[%s]\n", i, $i
}
}' file.csv
Record # 1 =======>
Field # 1, value=[row1value1]
Field # 2, value=[row1value2]
Field # 3, value=[row1
multi
line
value]
Record # 2 =======>
Field # 1, value=[row2value1]
Field # 2, value=[row2value2]
Field # 3, value=[row2
multi
line
value]
However, as I commented above a dedicated CSV parser using PHP, Perl or Python will be more robust for this job.

Here is a pure bash solution. The multiline_csv.sh script translates the multiline csv into standard csv by replacing the newline characters between quotes with some replacement string. So the usage is
./multiline_csv.sh CSVFILE SEP
I placed your example script in a file called ./multi.csv. Running the command ./multiline_csv.sh ./multi.csv "\n" yielded the following output
[ericthewry#eric-arch-pc stackoverflow]$ ./multiline_csv.sh ./multi.csv "\n"
r1c2,r1c2,"row1\nmulti\nline\nvalue"
r2c1,r2c2,"row2\nmultiline\nvalue"
This can be easily translated back to the original csv file using printf:
[ericthewry#eric-arch-pc stackoverflow]$ printf "$(./multiline_csv.sh ./multi.csv "\n")\n"
r1c2,r1c2,"row1
multi
line
value"
r2c1,r2c2,"row2
multiline
value"
This might be an Arch-specific quirk of echo/sprintf (I'm not sure), but you could use some other separator string like ~~~++??//NEWLINE\\??++~~~ that you could sed out if need be.
# multiline_csv.sh
open=0
line_is_open(){
quote="$2"
(printf "$1" | sed -e "s/\(.\)/\1\n/g") | (while read char; do
if [[ "$char" = '"' ]]; then
open=$((($open + 1) % 2))
fi
done && echo $open)
}
cat "$1" | while read ln ; do
flatline="${ln}"
open=$(line_is_open "${ln}" $open)
until [[ "$open" = "0" ]]; do
if read newln
then
flatline="${flatline}$2${newln}"
open=$(line_is_open "${newln}" $open)
else
break
fi
done
echo "${flatline}"
done
Once you've done this translation, you can proceed as you would normally via the while read $ROW do ... done method.

currency parsing and conversion using shell commands

I'm looking for a shell one-liner that will parse the following example currency string PHP10000 into $245. I need to parse the number from the string, multiply it with a preset conversion factor then add a "$" prefix to the result.
So far, what I have is only this:
echo PHP10000 | sed -e 's/PHP//'
which gives 10000 as result.
Now, I'm stuck on how to do multiplication on that result.
I'm thinking awk could also give a solution to this but I'm a beginner at shell commands.
Update:
I tried:
echo PHP10000 | expr `sed -e 's/PHP//'` \* 2
and the multiplication works properly only on whole numbers. I can't use floating point numbers as it gives me this error: expr: not a decimal number: '2.1'.

value=PHP10000
factor=40.82
printf -v converted '$%.2f' "$(bc <<< "${value#PHP} / $factor")"
echo $converted # => $244.98
the ${value#PHP} part is parameter expansion that removes the PHP string from the front of the $value string
the <<< part is a bash here-string, so you're passing the formula to the bc program
bash does not do floating point arithmetic, so call bc to perform the calculation
printf -v varname is the equivalent of other languages varname = sprintf(...)

One way:
echo "PHP10000" | awk -F "PHP" '{ printf "$%d\n", $2 * .0245 }'
Results:
$245
Or to print to two decimal places:
echo "PHP10000" | awk -F "PHP" '{ printf "$%.2f\n", $2 * .0245 }'
Results:
$245.00
EDIT:
Bash doesn't support floating point operations. Use bc instead:
echo "PHP10000" | sed 's/PHP\([0-9]\+\)/echo "scale=2; \1*.0245\/1" | bc/e'
Results:
245.00

Something like:
echo PHP10000 | awk '/PHP/ { printf "$%.0f\n", .0245 * substr($1,4) }'
It can be easily extended to a multi-currency version that converts into one currency (known as quote currency), e.g.:
awk '
BEGIN {
rates["PHPUSD"]=.01
rates["GBPUSD"]=1.58
}
/[A-Z]{3}[0-9.]+/ {
pair=substr($1,1,3) "USD"
amount=substr($1,4)
print "USD" amount * rates[pair]
}
' <<EOF
PHP100
GBP100
EOF
Outputs:
USD1
USD158

Yet another alternative:
$ echo "PHP10000" | awk 'sub(/PHP/,""){ print "$" $0 * .0245 }'
$245

How to add duplicates lines to a file using Unix

I want to add duplicate lines I have tried but not able to get the desired output
I have used sed but end up with all lines duplicated (below code)
sed 'p' Data.txt > Output.txt
I have tried with awk but end up with all lines duplicated (below code)
while read line; do
commacount=`echo $line|tr ',' '\n'|wc -l`
atcount=`echo $line|tr '#' '\n'|wc -l`
echo $commacount,$atcount
if [ "$commacount == '8' && $atcount == '3'" ]; then
{
awk '{print $0}1' Data.txt > tmp
}
else
{
awk '{print $0}' Data.txt > tmp
}
fi
done < Data.txt
Data.txt
2009-09-12T05:18:#00#+10:00,2303,Dump,CAM,1,1,JUNM
2009-09-12T05:24:00+10:00,2009-09-12T05:24:#00#+10:00,2303,Dump,RIV,1,1,JUNM
2009-09-12T05:25:00+10:00,2009-09-12T05:25:#00#+10:00,2303,Dump,WSN,1,1,JUNM
2009-09-12T05:27:00+10:00,2009-09-12T05:27:#00#+10:00,2303,Dump,HWL,1,1,JUNM
2009-09-12T05:29:00+10:00,2009-09-12T05:29:#00#+10:00,2303,Dump,BWD,1,1,JUNM
2009-09-12T05:31:00+10:00,2009-09-12T05:31:#00#+10:00,2303,Dump,ASH,1,1,JUNM
2009-09-12T05:33:00+10:00,,2303,Dump,ALM,1,1,JUNM
2009-09-12T05:00:#00#+10:00,2300,Up,ALM,1,1,JUNM
2009-09-12T05:01:00+10:00,2009-09-12T05:01:#00#+10:00,2300,Up,ASH,1,1,JUNM
2009-09-12T05:04:00+10:00,2009-09-12T05:04:#00#+10:00,2300,Up,BWD,1,1,JUNM
2009-09-12T05:06:00+10:00,2009-09-12T05:06:#00#+10:00,2300,Up,HWL,1,1,JUNM
2009-09-12T05:08:00+10:00,2009-09-12T05:08:#00#+10:00,2300,Up,WSN,1,1,JUNM
2009-09-12T05:10:00+10:00,2009-09-12T05:10:#00#+10:00,2300,Up,RIV,1,1,JUNM
2009-09-12T05:17:00+10:00,,2300,Up,CAM,1,1,JUNM
2009-09-12T09:25:#00#+10:00,2305,Dump,CAM,1,1,JUNM
2009-09-12T09:28:00+10:00,2009-09-12T09:28:#00#+10:00,2305,Dump,RIV,1,1,JUNM
2009-09-12T09:29:00+10:00,2009-09-12T09:29:#00#+10:00,2305,Dump,WSN,1,1,JUNM
2009-09-12T09:31:00+10:00,2009-09-12T09:31:#00#+10:00,2305,Dump,HWL,1,1,JUNM
2009-09-12T09:32:00+10:00,2009-09-12T09:32:#00#+10:00,2305,Dump,BWD,1,1,JUNM
2009-09-12T09:34:00+10:00,2009-09-12T09:34:#00#+10:00,2305,Dump,ASH,1,1,JUNM
2009-09-12T09:41:00+10:00,,2305,Dump,ALM,1,1,JUNM
,2306,Up,ALM,1,1,JUNM
,2306,Up,ASH,1,1,JUNM
,2306,Up,BWD,1,1,JUNM
,2306,Up,HWL,1,1,JUNM
,2306,Up,WSN,1,1,JUNM
,2306,Up,RIV,1,1,JUNM
,2306,Up,CAM,1,1,JUNM
2009-09-12T06:18:#00#+10:00,4505,Dump,CAR,1,1,JUNM
2009-09-12T06:21:00+10:00,2009-09-12T06:21:#00#+10:00,4505,Dump,SEA,1,1,JUNM
2009-09-12T06:24:00+10:00,2009-09-12T06:24:#00#+10:00,4505,Dump,KAN,1,1,JUNM
Output should be
2009-09-12T05:18:#00#+10:00,2303,Dump,CAM,1,1,JUNM
2009-09-12T05:24:00+10:00,2009-09-12T05:24:#00#+10:00,2303,Dump,RIV,1,1,JUNM
2009-09-12T05:24:00+10:00,2009-09-12T05:24:#00#+10:00,2303,Dump,RIV,1,1,JUNM
2009-09-12T05:25:00+10:00,2009-09-12T05:25:#00#+10:00,2303,Dump,WSN,1,1,JUNM
2009-09-12T05:25:00+10:00,2009-09-12T05:25:#00#+10:00,2303,Dump,WSN,1,1,JUNM
2009-09-12T05:27:00+10:00,2009-09-12T05:27:#00#+10:00,2303,Dump,HWL,1,1,JUNM
2009-09-12T05:27:00+10:00,2009-09-12T05:27:#00#+10:00,2303,Dump,HWL,1,1,JUNM
2009-09-12T05:29:00+10:00,2009-09-12T05:29:#00#+10:00,2303,Dump,BWD,1,1,JUNM
2009-09-12T05:29:00+10:00,2009-09-12T05:29:#00#+10:00,2303,Dump,BWD,1,1,JUNM
2009-09-12T05:31:00+10:00,2009-09-12T05:31:#00#+10:00,2303,Dump,ASH,1,1,JUNM
2009-09-12T05:31:00+10:00,2009-09-12T05:31:#00#+10:00,2303,Dump,ASH,1,1,JUNM
2009-09-12T05:33:00+10:00,,2303,Dump,ALM,1,1,JUNM
2009-09-12T05:00:#00#+10:00,2300,Up,ALM,1,1,JUNM
2009-09-12T05:01:00+10:00,2009-09-12T05:01:#00#+10:00,2300,Up,ASH,1,1,JUNM
2009-09-12T05:01:00+10:00,2009-09-12T05:01:#00#+10:00,2300,Up,ASH,1,1,JUNM
2009-09-12T05:04:00+10:00,2009-09-12T05:04:#00#+10:00,2300,Up,BWD,1,1,JUNM
2009-09-12T05:04:00+10:00,2009-09-12T05:04:#00#+10:00,2300,Up,BWD,1,1,JUNM
2009-09-12T05:06:00+10:00,2009-09-12T05:06:#00#+10:00,2300,Up,HWL,1,1,JUNM
2009-09-12T05:06:00+10:00,2009-09-12T05:06:#00#+10:00,2300,Up,HWL,1,1,JUNM
2009-09-12T05:08:00+10:00,2009-09-12T05:08:#00#+10:00,2300,Up,WSN,1,1,JUNM
2009-09-12T05:08:00+10:00,2009-09-12T05:08:#00#+10:00,2300,Up,WSN,1,1,JUNM
2009-09-12T05:10:00+10:00,2009-09-12T05:10:#00#+10:00,2300,Up,RIV,1,1,JUNM
2009-09-12T05:10:00+10:00,2009-09-12T05:10:#00#+10:00,2300,Up,RIV,1,1,JUNM
2009-09-12T05:17:00+10:00,,2300,Up,CAM,1,1,JUNM
2009-09-12T09:25:#00#+10:00,2305,Dump,CAM,1,1,JUNM
2009-09-12T09:28:00+10:00,2009-09-12T09:28:#00#+10:00,2305,Dump,RIV,1,1,JUNM
2009-09-12T09:28:00+10:00,2009-09-12T09:28:#00#+10:00,2305,Dump,RIV,1,1,JUNM
2009-09-12T09:29:00+10:00,2009-09-12T09:29:#00#+10:00,2305,Dump,WSN,1,1,JUNM
2009-09-12T09:29:00+10:00,2009-09-12T09:29:#00#+10:00,2305,Dump,WSN,1,1,JUNM
2009-09-12T09:31:00+10:00,2009-09-12T09:31:#00#+10:00,2305,Dump,HWL,1,1,JUNM
2009-09-12T09:31:00+10:00,2009-09-12T09:31:#00#+10:00,2305,Dump,HWL,1,1,JUNM
2009-09-12T09:32:00+10:00,2009-09-12T09:32:#00#+10:00,2305,Dump,BWD,1,1,JUNM
2009-09-12T09:32:00+10:00,2009-09-12T09:32:#00#+10:00,2305,Dump,BWD,1,1,JUNM
2009-09-12T09:34:00+10:00,2009-09-12T09:34:#00#+10:00,2305,Dump,ASH,1,1,JUNM
2009-09-12T09:34:00+10:00,2009-09-12T09:34:#00#+10:00,2305,Dump,ASH,1,1,JUNM
2009-09-12T09:41:00+10:00,,2305,Dump,ALM,1,1,JUNM
,2306,Up,ALM,1,1,JUNM
,2306,Up,ASH,1,1,JUNM
,2306,Up,BWD,1,1,JUNM
,2306,Up,HWL,1,1,JUNM
,2306,Up,WSN,1,1,JUNM
,2306,Up,RIV,1,1,JUNM
,2306,Up,CAM,1,1,JUNM
2009-09-12T06:18:#00#+10:00,4505,Dump,CAR,1,1,JUNM
2009-09-12T06:21:00+10:00,2009-09-12T06:21:#00#+10:00,4505,Dump,SEA,1,1,JUNM
2009-09-12T06:21:00+10:00,2009-09-12T06:21:#00#+10:00,4505,Dump,SEA,1,1,JUNM
2009-09-12T06:24:00+10:00,2009-09-12T06:24:#00#+10:00,4505,Dump,KAN,1,1,JUNM
2009-09-12T06:24:00+10:00,2009-09-12T06:24:#00#+10:00,4505,Dump,KAN,1,1,JUNM
Is there anyway that I can get the above output.
I appreciate any help/suggestion.
Thanks
Sri

Do you want duplicate lines which have 8 non-empty columns?
Try this:
awk -F',+' 'NF==8;1' file.txt

Here's the sed version:
sed '/^[0-9-]\{10\}T[0-9]\{2\}:[0-9]\{2\}:00+10/p' file.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

extract string from another using awk - shell

thank you for your help and I want to share with you the best solution that I found while [ "$((i++))" != "10" ]; do val$i=`echo $val | awk -F' ' '{print $'"$i"'}'` echo "val$i=$val$i" done

I Know is not what you really asked, but what about using array to solve this? like: val=(00:21:5D:16:F3 00:21:5D:16:F4 00:21:5D:16:F5) $ echo ${val[0]} 00:21:5D:16:F3 $ echo ${val[1]} 00:21:5D:16:F4 $ echo ${val[2]} 00:21:5D:16:F5 $ echo ${val[3]}

Related

How to extract phone number and Pin from each text line

How do I extract the content of quoted strings from the output of a shell command

Looping through multiline CSV rows in bash

currency parsing and conversion using shell commands

How to add duplicates lines to a file using Unix

Categories

Resources