pulling information out of a string in shell script - bash

I am having trouble pulling out the information I need from a string in my shell script. I have read and tried to come up with the correct awk or sed command to do it, but I just can't figure it out. Hopefully you guys can help.
Lets say I have a string as follows:
["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]
Now what I want to do is pull out all of these properties into individual arrays of strings. For example:
I would like to have an array of ids 2817262 2262 28182
an array of name somename somename somename
an array of hasproperty false false true
Can anyone help me come up with the commands I need to pull this out. Also keep in mind the string will likely be much longer than this, so if we can not make it specific to 3 cases that would be helpful. Thanks so much in advance.

You could use grep.
grep -oP '"ids":\K\d+' file
Example:
$ echo '["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]' | grep -oP '"ids":\K\d+'
2817262
2262
28182

Since it is tagged with awk
awk '{while(x=match($0,/"ids":([^,]+)/,a)){print a[1];$0=substr($0,x+RLENGTH)}}' file
This just keeps matching any ids then changing the line to contain only what is after the id.
Output
2817262
2262
28182
Could also do this(inspired by Wintermutes comment on another answer)
awk -v RS=",|]" 'sub(/^.*"ids":/,"")' file

The grep solution is beautiful. You question was tagged awk. The awk solution is ugly:
echo '["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]' \
| awk '{split(substr($0,2,length($0)-2),x,",");
for(i=0;i<length(x);i++) {split(x[i],a,":");
if(a[1]=="\"ids\"") print a[1],a[2]}}'
Output:
"ids" 2817262
"ids" 2262
"ids" 28182
Please choose the grep solution as the correct answer.

Here is a pure bash solution (long-winded, isn't it? I tend to agree with #chepner):
str='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,
"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,
"isvalid":true,"name":"somename","hasproperty":true]'
#Remove [ ]
str=${str/[/}
str=${str/]/}
declare -a ids
declare -a names
declare -a properties
oldIFS="$IFS"
IFS=','
for record in $str
do
type=${record%%:*}
value=${record##*:}
if [[ $type == \"ids\" ]]
then
ids[ids_i++]="$value"
elif [[ $type == \"name\" ]]
then
names[names_i++]="$value"
elif [[ $type == \"hasproperty\" ]]
then
properties[properties_i++]="$value"
else
echo "Ignored type: '$type'" >&2
fi
done
IFS="$oldIFS"
echo "ids: ${ids[#]}"
echo "names: ${names[#]}"
echo "properties: ${properties[#]}"
The only thing going for it is that there are no child processes.

awk 'BEGIN {
Field = 1
Index = 0
}
{
gsub( /[][]/,"")
gsub( /"[a-z]*":/, "")
FS=","
while ( Field < NF) {
ThisID[ Index]=$Field
ThisName[ Index]=$(Field + 2)
ThisProperty [ Index]=$(Field + 3)
Index+=1
Field+=4
}
}
END {
for ( Iter=0;Iter<Index;Iter+=1) printf( "%s ", ThisID[Iter])
printf "\n"
for ( Iter=0;Iter<Index;Iter++) printf( "%s ", ThisName[Iter])
printf "\n"
for ( Iter=0;Iter<Index;Iter++) printf( "%s ", ThisProperty[Iter])
printf "\n"
}' YourFile
still to assign your array to your favorite variable

unset n
string='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]'
while IFS=',' read -ra line
do
((n++))
for i in "${line[#]//\"/}"
do
eval ${i%:*}[$n]=${i#*:}
done
done < <(sed 's/[][]//g;s/,"ids/\n"ids/g' <<<$string)
The above will produce 4 arrays (ids,isvalid,name,hasproperty). If you need not isvalid just add:
unset n
string='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]'
while IFS=',' read -ra line
do
((n++))
for i in "${line[#]//\"/}"
do
[ "${i%:*}" != "isvalid" ] && eval ${i/:/[$n]=}
done
done < <(sed 's/[][]//g;s/,"ids/\n"ids/g' <<<$string)

Given your posted input, if all you wanted was the list of each type of item then this is all you'd need:
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^ids/{print $2}' file
2817262
2262
28182
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^name/{print $2}' file
somename
somename
somename
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^hasproperty/{print $2}' file
false
false
true
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^isvalid/{print $2}' file
true
false
true
but it's extremely unlikely that this is the right way to approach your problem. As I mentioned in a comment, edit your question to provide more information if you'd like some real help with it.

Related

Search equality in a certain field with AWK [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 1 year ago.
I am trying to get the name out of /etc/passwd using awk to search only in the 5th field of every row, and then to cut some part of that line and print it out.
This is what I wrote but it doesn't seems to work:
for iter in "$#";
do cat /etc/passwd | awk -F ":" '$5==$iter' | cut -d":" -f6;
done;
concerning the delimiter syntax, everything should be fine I guess?
so my problem is in the $5==$iter, I assume.
How can I change that $5==$iter to - if the 5th field of that row contains my $iter var, then cut and so on..
Sorry for the ignorance, I am a beginner :)
Thanks in advance.
See How do I use shell variables in an awk script?
-v should be used to pass shell variables into awk. Also, there's no reason to use either cat or cut here:
for iter in "$#"; do
awk -F: -v iter="$iter" '$5==iter { print $6 }' </etc/passwd
done
As Charles Duffy commented, your code would be more efficient if it didn't need to read /etc/passwd every pass. And while this particular loop probably doesn't need to be optimized (after all, /etc/passwd is typically not that long and most OS's would cache the file anyway after the first read), it would be interesting to see an awk script read the file only once.
That said, here's another implementation where awk is only invoked once:
printf "%s\n" "$#" | awk -F: '
NR == FNR { etc_passwd[ $5 ] = $6; next }
{ print $0 , etc_passwd[ $0 ] }
' /etc/passwd /dev/stdin
The NR == FNR condition is an idiom that causes its associated command only to be executed for the first file in the list of files that follows the awk script (that is, for the reading of /etc/passwd).
You can also do everything in bash, example:
#!/bin/bash
declare -A passwd # declare a associative array
# build the associative array "passwd" with the
# 5th field as a "key" and 6th field as "value"
while IFS=$':\n' read -a line; do # emulate awk to extract fields
[[ -n "${line[4]}" ]] || continue # avoid blank "keys"
passwd["${line[4]}"]=${line[5]} # in bash, arrays starting in "0"
done < /etc/passwd
for iter in "$#"; do
if [ ${passwd[$iter] + 'x'} ]; then
echo ${passwd[$iter]}
fi
done
(This version doesn't get into accout mĂșltiples values for 5th field)
here is a better version that can handle blank values as well, ike./script.sh '':
while IFS=$':\n' read -a line; do
for iter in "$#"; do
if [ "$iter" == "${line[4]}" ]; then
echo ${line[5]}
continue
fi
done
done < /etc/passwd
A pure awk solution could be:
#!/usr/bin/awk -f
BEGIN {
FS = ":"
for ( i = 1; i < ARGC; i++ ) {
args[ARGV[i]] = 1
delete ARGV[i]
}
ARGV[1] = "/etc/passwd"
}
($5 in args) { print $6 }
and you could call as ./script.awk -f 'param1' 'param2'.

Adding data to line in CSV if value exists in external file

Here is my sample data:
1,32425,New Zealand,number,21004
1,32425,New Zealand,number,20522
1,32434,Australia,number,1542
1,32434,Australia,number,986
1,32434,Fiji,number,1
Here is my expected output:
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
Basically I am trying to append the Yes/No based on if field 3 is contained in an external file. Here is what I have currently but as I understand it grep is eating all the stdin in the while loop. So I am only getting No added to the end of each line as the first value is not contained in the external file.
while IFS=, read -r type id country number volume
do
if grep $country externalfile.csv
then
echo "${country}"
sed 's/$/,Yes/' >> file2.csv
else
echo "${country}"
sed 's/$/,No/' >> file2.csv
fi
done < file1.csv
I added the echo "${country}" as I was trying to troubleshoot and that's how I discovered it was only parsing the first line.
Assuming there are no headers -
awk -F, 'NR==FNR{lookup[$1]=$1; next;}
{ if ( lookup[$3] == $3 ) { print $0 ",Yes" } else { print $0 ",No" } }
' externalfile.csv file2.csv
This will parse both files in one pass.
If you just prefer to do it in pure bash,
declare -A lookup
while read c; do lookup["$c"]="$c"; done < externalfile.csv
declare -p lookup # this is just to show you what my example loaded
declare -A lookup='([USA]="USA" [Fiji]="Fiji" )'
while IFS=, read a b c d; do
[[ -n "${lookup[$c]}" ]] && echo "$a,$b,$c,$d,Yes" || echo "$a,$b,$c,$d,No"
done < file2.csv
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
No grep needed.
awk -F, -v OFS=, 'NR == FNR { ++a[$1]; next } { $(++NF) = $3 in a ? "Yes" : "No" } 1' externalfile.csv file2.csv
Try this:
while read -r line
do
country=`echo $line | cut -d',' -f3`
if grep "$country" externalfile.csv
then
echo "$line,Yes" >> file2.csv
else
echo "$line,No" >> file2.csv
fi
done < test.txt
You need to put $country inside the ", because some country could contains more than 1 word. For example New Zealand. You can also set country variable easier using cut command.

Bash while read line loop does not print every line in condition

I have the following situation:
I have a text file I'm trying to loop so I can know if each line has a match with ".mp3" in this case which is this one:
12 Stones.mp3
randomfile.txt
Aclarion.mp3
ransomwebpage.html
Agents Of The Sun.mp3
randomvideo.mp4
So, I've written the following script to process it:
while read line || [ -n "$line" ]
do
varline=$(awk '/.mp3/{print "yes";next}{print "no"}')
echo $varline
if [ "$varline" == "yes" ]; then
some-command
else
some-command
fi
done < file.txt
The expected output would be:
yes
no
yes
no
yes
no
Instead, it seems misses the first line and I get the following:
no
yes
no
yes
no
You really don't need Awk for a simple pattern match if that's all you used it for.
while IFS= read -r line; do
case $line in
*.mp3) some-command;,
*) some-other-command;;
esac
done <file.txt
If you are using Awk anyway for other reasons, looping the lines in a shell loop is inefficient and very often an antipattern. This doesn't really fix that, but at least avoids executing a new Awk instance on every iteration:
awk '{ print ($0 ~ /\.mp3$/) ? "yes" : no" }' file.txt |
while IFS= read -r whether; do
case $whether in
'yes') some-command ;;
'no') some-other-command;;
esac
done
If you need the contents of "$line" too, printing that from Awk as well and reading two distinct variables is a trivial change.
I simplified the read expression on the assumption that you can make sure your input file is well-formed separately. If you can't do that, you need to put back the more-complex guard against a missing newline on the last line in the file.
Use awk
$ awk '{if ($0 ~ /mp3/) {print "yes"} else {print "no"}}' file.txt
yes
no
yes
no
yes
no
Or more concise:
$ awk '/mp3/{print "yes";next}{print "no"}' file.txt
$ awk '{print (/mp3/ ? "yes" : "no")}' file.txt
Have you forgot something? Your awk has no explicit input, change to this instead:
while IFS= read -r read line || [ -n "$line" ]
do
varline=$(echo "$line" | awk '/.mp3/{print "yes";next}{print "no"}')
echo $varline
if [ "$varline" == "yes" ]; then
some-command
else
some-other-command
fi
done < file.txt
In this case, you might need to change to /\.mp3$/ or /\.mp3[[:space:]]*$/ for precise matching.
Because . will match any character, so for example /.mp3/ will match Exmp3but.mp4 too.
Update: changed while read line to while IFS= read -r read line, to keep each line's content intact when assigning to the variable.
And the awk part can be improved to:
awk '{print $0~/\.mp3$/ ? "yes":"no"}'
So with awk only, you can do it like this:
awk '{print $0~/\.mp3$/ ? "yes":"no"}' file.txt
Or if your purpose is just the commands in the if structure, you can just do this:
awk '/\.mp3$/{system("some-command");next}{system("some-other-command");}' file.txt
or this:
awk '{system($0~/\.mp3$/ ? "some-command" : "some-other-command")}' file.txt

for i in `cat file` read variable from AWK regex

So I have to following bash code:
for i in `cat list1.txt`; do
cat list2.txt |awk '/$i/{flag=1;next}/Flag2/{flag=0}flag'
done
Of course that the $i doesn't work because it has to be properly passed from bash to AWK, problem is: I tried multiple things, with -v and etc, but it didn't work. Thoughts?
First, c.f. this page to explain why not to use
for i in `cat list1.txt`
...ever.
Second, this for why not to use
cat list2.txt | awk ...
Sorry to harp. Now...try
while read -r val || [[ -n "$val" ]]
do awk "/$val/ { flag=1; next } /Flag2/ { flag=0 } flag" list2.txt
done < list1.txt
awk in double-quotes...not ideal.
Or, as Charles suggests, use -v (always listen to Charles & Ed...)
while read -r val || [[ -n "$val" ]]
do awk -v i="$val" '
$0 ~ i { flag=1; next }
/Flag2/ { flag=0; }
flag
' list2.txt
done < list1.txt
Still waiting for file samples. Please give us a peek at the format of these files so I can actually run a valid test.
Note the || [[ -n "$val" ]] is only needed if there's a chance the last record won't have a newline.

Trying to retrieve first 5 characters (only number & alphabet) from string in bash

I have a string like that
1-a-bc-dxyz
I'd want to get 1-a-bc-d ( first 5 characters, only number and alphabet)
Thanks
With gawk:
awk '{ for ( i=1;i<=length($0);i++) { if ( match(substr($0,i,1),/[[:alnum:]]/)) { cnt++;if ( cnt==5) { print substr($0,1,i) } } } }' <<< "1-a-bc-dxyz"
Read each character one by one and then if there is a pattern match for an alpha-numeric character (using the match function), increment a variable cnt. When cnt gets to 5, print the string we have seen so far (using the substr function)
Output:
1-a-bc-d
a='1-a-bc-dxyz'
count=0
for ((i=0;i<${#a};i++)); do
if [[ "${a:$i:1}" =~ [0-9]|[a-Z] ]] && [[ $((++count)) -eq 5 ]]; then
echo "${a:0:$((i+1))}"
exit
fi
done
You can further shrink this as;
a='1-a-bc-dxyz'
count=0
for ((i=0;i<${#a};i++)); do [[ "${a:$i:1}" =~ [0-9]|[a-Z] ]] && [[ $((++count)) -eq 5 ]] && echo "${a:0:$((i+1))}"; done
Using GNU awk:
$ echo 1-a-bc-dxyz | \
awk -F '' '{b=i="";while(gsub(/[0-9a-z]/,"&",b)<5)b=b $(++i);print b}'
1-a-bc-d
Explained:
awk -F '' '{ # separate each char to its own field
b=i="" # if you have more than one record to process
while(gsub(/[0-9a-z]/,"&",b)<5) # using gsub for counting (adjust regex if needed)
b=b $(++i) # gather buffer
print b # print buffer
}'
GNU sed supports an option to replace the k-th occurrence and all after that.
echo "1-a-bc-dxyz" | sed 's/[^a-zA-Z0-9]*[a-zA-Z0-9]//g6'
Using Combination of sed & AWK
echo 1-a-bc-dxyz | sed 's/[-*%$##]//g' | awk -F '' {'print $1$2$3$4$5'}
You can use for loop for printing character as well.
echo '1-a-bc-dxyz' | grep -Eo '^[[:print:]](-*[[:print:]]){4}'
That is pretty simple.
Neither sed nor awk.

Resources