for i in `cat file` read variable from AWK regex - bash

So I have to following bash code:
for i in `cat list1.txt`; do
cat list2.txt |awk '/$i/{flag=1;next}/Flag2/{flag=0}flag'
done
Of course that the $i doesn't work because it has to be properly passed from bash to AWK, problem is: I tried multiple things, with -v and etc, but it didn't work. Thoughts?

First, c.f. this page to explain why not to use
for i in `cat list1.txt`
...ever.
Second, this for why not to use
cat list2.txt | awk ...
Sorry to harp. Now...try
while read -r val || [[ -n "$val" ]]
do awk "/$val/ { flag=1; next } /Flag2/ { flag=0 } flag" list2.txt
done < list1.txt
awk in double-quotes...not ideal.
Or, as Charles suggests, use -v (always listen to Charles & Ed...)
while read -r val || [[ -n "$val" ]]
do awk -v i="$val" '
$0 ~ i { flag=1; next }
/Flag2/ { flag=0; }
flag
' list2.txt
done < list1.txt
Still waiting for file samples. Please give us a peek at the format of these files so I can actually run a valid test.
Note the || [[ -n "$val" ]] is only needed if there's a chance the last record won't have a newline.

Related

Bash while read line loop does not print every line in condition

I have the following situation:
I have a text file I'm trying to loop so I can know if each line has a match with ".mp3" in this case which is this one:
12 Stones.mp3
randomfile.txt
Aclarion.mp3
ransomwebpage.html
Agents Of The Sun.mp3
randomvideo.mp4
So, I've written the following script to process it:
while read line || [ -n "$line" ]
do
varline=$(awk '/.mp3/{print "yes";next}{print "no"}')
echo $varline
if [ "$varline" == "yes" ]; then
some-command
else
some-command
fi
done < file.txt
The expected output would be:
yes
no
yes
no
yes
no
Instead, it seems misses the first line and I get the following:
no
yes
no
yes
no
You really don't need Awk for a simple pattern match if that's all you used it for.
while IFS= read -r line; do
case $line in
*.mp3) some-command;,
*) some-other-command;;
esac
done <file.txt
If you are using Awk anyway for other reasons, looping the lines in a shell loop is inefficient and very often an antipattern. This doesn't really fix that, but at least avoids executing a new Awk instance on every iteration:
awk '{ print ($0 ~ /\.mp3$/) ? "yes" : no" }' file.txt |
while IFS= read -r whether; do
case $whether in
'yes') some-command ;;
'no') some-other-command;;
esac
done
If you need the contents of "$line" too, printing that from Awk as well and reading two distinct variables is a trivial change.
I simplified the read expression on the assumption that you can make sure your input file is well-formed separately. If you can't do that, you need to put back the more-complex guard against a missing newline on the last line in the file.
Use awk
$ awk '{if ($0 ~ /mp3/) {print "yes"} else {print "no"}}' file.txt
yes
no
yes
no
yes
no
Or more concise:
$ awk '/mp3/{print "yes";next}{print "no"}' file.txt
$ awk '{print (/mp3/ ? "yes" : "no")}' file.txt
Have you forgot something? Your awk has no explicit input, change to this instead:
while IFS= read -r read line || [ -n "$line" ]
do
varline=$(echo "$line" | awk '/.mp3/{print "yes";next}{print "no"}')
echo $varline
if [ "$varline" == "yes" ]; then
some-command
else
some-other-command
fi
done < file.txt
In this case, you might need to change to /\.mp3$/ or /\.mp3[[:space:]]*$/ for precise matching.
Because . will match any character, so for example /.mp3/ will match Exmp3but.mp4 too.
Update: changed while read line to while IFS= read -r read line, to keep each line's content intact when assigning to the variable.
And the awk part can be improved to:
awk '{print $0~/\.mp3$/ ? "yes":"no"}'
So with awk only, you can do it like this:
awk '{print $0~/\.mp3$/ ? "yes":"no"}' file.txt
Or if your purpose is just the commands in the if structure, you can just do this:
awk '/\.mp3$/{system("some-command");next}{system("some-other-command");}' file.txt
or this:
awk '{system($0~/\.mp3$/ ? "some-command" : "some-other-command")}' file.txt

bash: sed: unexpected behavior: displays everything

I wrote what I thought was a quick script I could run on a bunch of machines. Instead it print what looks like might be directory contents in a recursive search:
version=$(mysql Varnish -B --skip-column-names -e "SELECT value FROM sys_param WHERE param='PatchLevel'" | sed -n 's/^.*\([0-9]\.[0-9]*\).*$/\1/p')
if [[ $(echo "if($version == 6.10) { print 1; } else { print 0; }" | bc) -eq 1 ]]; then
status=$(dpkg-query -l | awk '{print $2}' | grep 'sg-status-polling');
cons=$(dpkg-query -l | awk '{print $2}' | grep 'sg-consolidated-poller');
if [[ "$status" != "" && "$cons" != "" ]]; then
echo "about to change /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm"; echo;
cp /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm.bkup;
sed -ir '184s!\x91\x93!\x91\x27--timeout=35\x27\x93!' /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm;
sed -n 183,185p /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm; echo;
else
echo "packages not found. Assumed to be not applicable";
fi
else
echo "This is 4.$version, skipping";
fi
The script is supposed to make sure Varnish is version 4.6.10 and has 2 custom .deb packages installed (not through apt-get). then makes a backup and edits a single line in a perl module from [] to ['--timeout=35']
it looks like its tripping up on the sed replace one liner.
There are two major problems (minor ones addressed in comments). The first is that you use the decimal code for [] instead of the hexa, so you should use \x5b\x5d instead of \x91\x93. The second problem is that if you do use the proper codes, sed will still interpret those syntactically as []. So you can't escape escaping. Here's what you should call:
sed -ri'.bkup' '184s!\[\]![\x27--timeout=35\x27]!' /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm
And this will create the backup for you (but you should double check).

pulling information out of a string in shell script

I am having trouble pulling out the information I need from a string in my shell script. I have read and tried to come up with the correct awk or sed command to do it, but I just can't figure it out. Hopefully you guys can help.
Lets say I have a string as follows:
["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]
Now what I want to do is pull out all of these properties into individual arrays of strings. For example:
I would like to have an array of ids 2817262 2262 28182
an array of name somename somename somename
an array of hasproperty false false true
Can anyone help me come up with the commands I need to pull this out. Also keep in mind the string will likely be much longer than this, so if we can not make it specific to 3 cases that would be helpful. Thanks so much in advance.
You could use grep.
grep -oP '"ids":\K\d+' file
Example:
$ echo '["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]' | grep -oP '"ids":\K\d+'
2817262
2262
28182
Since it is tagged with awk
awk '{while(x=match($0,/"ids":([^,]+)/,a)){print a[1];$0=substr($0,x+RLENGTH)}}' file
This just keeps matching any ids then changing the line to contain only what is after the id.
Output
2817262
2262
28182
Could also do this(inspired by Wintermutes comment on another answer)
awk -v RS=",|]" 'sub(/^.*"ids":/,"")' file
The grep solution is beautiful. You question was tagged awk. The awk solution is ugly:
echo '["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]' \
| awk '{split(substr($0,2,length($0)-2),x,",");
for(i=0;i<length(x);i++) {split(x[i],a,":");
if(a[1]=="\"ids\"") print a[1],a[2]}}'
Output:
"ids" 2817262
"ids" 2262
"ids" 28182
Please choose the grep solution as the correct answer.
Here is a pure bash solution (long-winded, isn't it? I tend to agree with #chepner):
str='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,
"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,
"isvalid":true,"name":"somename","hasproperty":true]'
#Remove [ ]
str=${str/[/}
str=${str/]/}
declare -a ids
declare -a names
declare -a properties
oldIFS="$IFS"
IFS=','
for record in $str
do
type=${record%%:*}
value=${record##*:}
if [[ $type == \"ids\" ]]
then
ids[ids_i++]="$value"
elif [[ $type == \"name\" ]]
then
names[names_i++]="$value"
elif [[ $type == \"hasproperty\" ]]
then
properties[properties_i++]="$value"
else
echo "Ignored type: '$type'" >&2
fi
done
IFS="$oldIFS"
echo "ids: ${ids[#]}"
echo "names: ${names[#]}"
echo "properties: ${properties[#]}"
The only thing going for it is that there are no child processes.
awk 'BEGIN {
Field = 1
Index = 0
}
{
gsub( /[][]/,"")
gsub( /"[a-z]*":/, "")
FS=","
while ( Field < NF) {
ThisID[ Index]=$Field
ThisName[ Index]=$(Field + 2)
ThisProperty [ Index]=$(Field + 3)
Index+=1
Field+=4
}
}
END {
for ( Iter=0;Iter<Index;Iter+=1) printf( "%s ", ThisID[Iter])
printf "\n"
for ( Iter=0;Iter<Index;Iter++) printf( "%s ", ThisName[Iter])
printf "\n"
for ( Iter=0;Iter<Index;Iter++) printf( "%s ", ThisProperty[Iter])
printf "\n"
}' YourFile
still to assign your array to your favorite variable
unset n
string='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]'
while IFS=',' read -ra line
do
((n++))
for i in "${line[#]//\"/}"
do
eval ${i%:*}[$n]=${i#*:}
done
done < <(sed 's/[][]//g;s/,"ids/\n"ids/g' <<<$string)
The above will produce 4 arrays (ids,isvalid,name,hasproperty). If you need not isvalid just add:
unset n
string='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]'
while IFS=',' read -ra line
do
((n++))
for i in "${line[#]//\"/}"
do
[ "${i%:*}" != "isvalid" ] && eval ${i/:/[$n]=}
done
done < <(sed 's/[][]//g;s/,"ids/\n"ids/g' <<<$string)
Given your posted input, if all you wanted was the list of each type of item then this is all you'd need:
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^ids/{print $2}' file
2817262
2262
28182
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^name/{print $2}' file
somename
somename
somename
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^hasproperty/{print $2}' file
false
false
true
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^isvalid/{print $2}' file
true
false
true
but it's extremely unlikely that this is the right way to approach your problem. As I mentioned in a comment, edit your question to provide more information if you'd like some real help with it.

Extract a certain part of a string in bash with different patterns

I have this file:
CLUSTERS=SP1,SP2,SP3
FNAME_SP1="REWARDS_BTS_SP1_<GTS>.dat"
FNAME_SP2="DUMP_LOG_SP2_<GTS>.dat"
FNAME_SP3="TEST_CASE_TABLE_SP3_<GTS>.dat"
What I want to get from these are:
REWARDS_BTS_SP1_
DUMP_LOG_SP2_
TEST_CASE_TABLE_SP3_
I loop through the CLUSTERS field, get the values, and use it to find the appropriate FNAME_<CLUSTERNAME> value. Basically, the CLUSTERS value are ALWAYS before the _<GTS> part of the string. Any string pattern will do, provided that the CLUSTERS value come before the _<GTS> at the end of the string.
Any suggestions? Here's a part of the script.
function loadClusters() {
for i in `echo ${!CLUSTER*}`
do
CLUSTER=`echo ${i} | grep $1`
if [[ -n ${CLUSTER} ]]; then
CLUSTER=${!i}
break;
fi
done
echo -e ${CLUSTER}
}
function loadClustersCampaign() {
for i in `echo ${!BPOINTS*}`
do
BPOINTS=`echo ${i} | grep $1`
if [[ -n ${BPOINTS} ]]; then
BPOINTS=${!i}
break;
fi
done
for i in `echo ${!FNAME*}`
do
FNAME=`echo ${i} | grep $1`
if [[ -n ${FNAME} ]]; then
FNAME=${!i}
break;
fi
done
echo -e ${BPOINTS}"|"${FNAME}
}
#get clusters
clusters=$(loadClusters $1)
for i in `echo $clusters | sed 's/,/ /g'`
do
file=$(loadClustersCampaign ${i/-/_} | awk -F"|" '{print $2}') ;
echo $file;
#then get the part of the $file variable
done
Fun with Shell Parameter Expansions
You can use matching-prefix notation and indirect expansion to get at the variables you want, and use the "remove suffix" expansion on each result to collect just the portions of the filename that you want. For example:
FNAME_SP1='REWARDS_BTS_SP1_<GTS>.dat'
FNAME_SP2='DUMP_LOG_SP2_<GTS>.dat'
FNAME_SP3='TEST_CASE_TABLE_SP3_<GTS>.dat'
for cluster in "${!FNAME_SP#}"; do
echo ${!cluster%%<GTS>*}
done
This will print out the following:
REWARDS_BTS_SP1_
DUMP_LOG_SP2_
TEST_CASE_TABLE_SP3_
but you could issue any valid shell command inside the loop instead of using echo.
See Also
http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
If you like an awk solution for this ,may be below will be useful.
> echo 'FNAME_SP1="REWARDS_BTS_SP1_<GTS>.dat"' | awk -F"<GTS>" '{split($1,a,"=\"");print substr(a[2],2)}'
REWARDS_BTS_SP1_
Furthur more detail below:
> cat temp
LUSTERS=SP1,SP2,SP3
FNAME_SP1="REWARDS_BTS_SP1_<GTS>.dat"
FNAME_SP2="DUMP_LOG_SP2_<GTS>.dat"
FNAME_SP3="TEST_CASE_TABLE_SP3_<GTS>.dat"
> awk -F"<GTS>" '/FNAME_SP/{split($1,a,"=");print substr(a[2],2)}' temp
REWARDS_BTS_SP1_
DUMP_LOG_SP2_
TEST_CASE_TABLE_SP3_
>

How to verify information using standard linux/unix filters?

I have the following data in a Tab delimited file:
_ DATA _
Col1 Col2 Col3 Col4 Col5
blah1 blah2 blah3 4 someotherText
blahA blahZ blahJ 2 someotherText1
blahB blahT blahT 7 someotherText2
blahC blahQ blahL 10 someotherText3
I want to make sure that the data in 4th column of this file is always an integer. I know how to do this in perl
Read each line, Store value of 4th column in a variable
check if that variable is an integer
if above is true, continue the loop
else break out of the loop with message saying file data not correct
But how would I do this in a shell script using standard linux/unix filter? My guess would be to use grep, but I am not sure how?
cut -f4 data | LANG=C grep -q '[^0-9]' && echo invalid
LANG=C for speed
-q to quit at first error in possible long file
If you need to strip the first line then use tail -n+2 or you could get hacky and use:
cut -f4 data | LANG=C sed -n '1b;/[^0-9]/{s/.*/invalid/p;q}'
awk is the tool most naturally suited for parsing by columns:
awk '{if ($4 !~ /^[0-9]+$/) { print "Error! Column 4 is not an integer:"; print $0; exit 1}}' data.txt
As you get more complex with your error detection, you'll probably want to put the awk script in a file and invoke it with awk -f verify.awk data.txt.
Edit: in the form you'd put into verify.awk:
{
if ($4 !~/^[0-9]+$/) {
print "Error! Column 4 is not an integer:"
print $0
exit 1
}
}
Note that I've made awk exit with a non-zero code, so that you can easily check it in your calling script with something like this in bash:
if awk -f verify.awk data.txt; then
# action for success
else
# action for failure
fi
You could use grep, but it doesn't inherently recognize columns. You'd be stuck writing patterns to match the columns.
awk is what you need.
I can't upvote yet, but I would upvote Jefromi's answer if I could.
Sometimes you need it BASH only, because tr, cut & awk behave differently on Linux/Solaris/Aix/BSD/etc:
while read a b c d e ; do [[ "$d" =~ ^[0-9] ]] || echo "$a: $d not a numer" ; done < data
Edited....
#!/bin/bash
isdigit ()
{
[ $# -eq 1 ] || return 0
case $1 in
*[!0-9]*|"") return 0;;
*) return 1;;
esac
}
while read line
do
col=($line)
digit=${col[3]}
if isdigit "$digit"
then
echo "err, no digit $digit"
else
echo "hey, we got a digit $digit"
fi
done
Use this in a script foo.sh and run it like ./foo.sh < data.txt
See tldp.org for more info
Pure Bash:
linenum=1; while read line; do field=($line); if ((linenum>1)); then [[ ! ${field[3]} =~ ^[[:digit:]]+$ ]] && echo "FAIL: line number: ${linenum}, value: '${field[3]}' is not an integer"; fi; ((linenum++)); done < data.txt
To stop at the first error, add a break:
linenum=1; while read line; do field=($line); if ((linenum>1)); then [[ ! ${field[3]} =~ ^[[:digit:]]+$ ]] && echo "FAIL: line number: ${linenum}, value: '${field[3]}' is not an integer" && break; fi; ((linenum++)); done < data.txt
cut -f 4 filename
will return the fourth field of each line to stdout.
Hopefully that's a good start, because it's been a long time since I had to do any major shell scripting.
Mind, this may well not be the most efficient compared to iterating through the file with something like perl.
tail +2 x.x | sort -n -k 4 | head -1 | cut -f 4 | egrep "^[0-9]+$"
if [ "$?" == "0" ]
then
echo "file is ok";
fi
tail +2 gives you all but the first line (since your sample has a header)
sort -n -k 4 sorts the file numerically on the 4th column, letters will rise to the top.
head -1 gives you the first line of the file
cut -f 4 gives you the 4th column, of the first line
egrep "^[0-9]+$" checks if the value is a number (integers in this case).
If egrep finds nothing, $? is 1, otherwise it's 0.
There's also:
if [ `tail +2 x.x | wc -l` == `tail +2 x.x | cut -f 4 | egrep "^[0-9]+$" | wc -l` ] then
echo "file is ok";
fi
This will be faster, requiring two simple scans through the file, but it's not a single pipeline.
#OP, use awk
awk '$4+0<=0{print "not ok";exit}' file

Resources