I have a text file and it's contents are like below,
foo.txt
firefox:
Installed: 24.0+build1-0ubuntu1
Candidate: 24.0+build1-0ubuntu1
I want to print the first field in the first line if the value of Installed and the Candidate are not same.If it's same then empty output is enough.
I tried,
cat foo.txt | awk '$1~/^Installed:/ {var=$2;next} $1~/^Candidate:/ {var1=$2;next} NR==1 {pkg=$1} {if(var != var1) { print pkg;} }'
But it displays nothing.It would be better if you provide an awk solution.
You were close:
$ cat foo.txt
firefox:
Installed: 24.0+build1-0foo
Candidate: 24.0+build1-0bar
$ awk '$1~/^Installed:/ {var=$2;next} $1~/^Candidate:/ {var1=$2;next} NR==1 {pkg=$1} END {if(var != var1) { print pkg;} }' foo.txt
firefox:
Related
I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv
Input:
"prefix_foo,prefix_bar"
Expected Output:
foo
bar
This is what I've so far.
$ echo "PREFIX_foo,PREFIX_bar" | awk '/PREFIX_/{x=gsub("PREFIX_", ""); print $0 }'
foo,bar
I'm unable to figure out how to print foo and bar separated by a newline. Thanks in advance!
EDIT:
Length of input is unknown so there can be more than 2 words separated by comma.
This question is more towards learning awk language, not alternative gnu utils.
You may not need awk for this. Here is pure bash solution:
s="prefix_foo,prefix_bar"
s="${s//prefix_/}"
s="${s//,/$'\n'}"
echo "$s"
foo
bar
Here is one liner gnu sed for the same:
sed 's/prefix_//g; s/,/\n/g' <<< "$s"
foo
bar
EDIT: 2nd solution Adding more generic solution here as per OP's comments, this will Look for every field and check if its having prefix then it will print that column's 2nd part(after _ one).
echo "prefix_foo,etc,bla,prefix_bar" |
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~/prefix/){
split($i,array,"_")
val=(val?val OFS:"")array[2]
}
}
if(val){
print val
}
val=""
}'
To print output field values in new line try:
echo "prefix_foo,etc,bla,prefix_bar" |
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~/prefix/){
split($i,array,"_")
print array[2]
}
}
}
'
1st solution: For simple case(specific to shown samples) could you please try following.
awk -F'[_,]' '/prefix_/{print $2,$4}' Input_file
OR
echo "prefix_foo,prefix_bar" | awk -F'[_,]' '/prefix_/{print $2,$4}'
Just trying out awk
echo "PREFIX_foo,PREFIX_bar" | awk -F, -v OFS="\n" '{gsub(/PREFIX_/,""); $1=$1}1'
Say I have this in file, (FIX Message)
35=D|11=ABC|52=123456|33=AA|44=BB|17=CC
35=D|33=ABC|11=123456|44=ZZ|17=EE|66=YY
I want to grep and print only the values after 11= and 17=, output like this.
ABC|CC
123456|EE
How do I achieve this?
Whenever there's name=value pairs in the input I find it useful for clarity, future enhancements, etc. to create a name2value array and then use that to print the values by name:
$ cat tst.awk
BEGIN { FS="[|=]"; OFS="|" }
{
delete n2v
for (i=1; i<=NF; i+=2) {
n2v[$i] = $(i+1)
}
print n2v[11], n2v[17]
}
$ awk -f tst.awk file
ABC|CC
123456|EE
Through sed,
$ sed 's/.*\b11=\([^|]*\).*\b17=\([^\|]*\).*/\1|\2/g' file
ABC|CC
123456|EE
Through grep and paste.
$ grep -oP '\b11=\K[^|]*|\b17=\K[^|]*' file | paste -d'|' - -
ABC|CC
123456|EE
Here is another awk
awk -F"11=|17=" '{for (i=2;i<NF;i++) {split($i,a,"|");printf "%s|",a[1]}split($i,a,"|");print a[1]}' file
ABC|CC
123456|EE
I would like to extract text from file using awk what basicly it works correctly but I would like to make it dymamical using variable for looking for pattern.
HOW IT SHOULD WORKS:
File test_input contains (btw: extract from HP DP omnimm -show_locked_devs)
Type: Device
Name/Id: Drive1
Pid: 28405
Host: Host1
Type: Cartridge
Name/Id: Lib1
Pid: 28405
Host: Host1
Location: 47
...
get "Pid" number for Drive1 => command find pattern (Drive1) and display next line from file test_input (28405)
cat test_input | awk 'c&&!--c;/Drive1/{c=1}'| awk '{print $2}'
28405
get "Location" number => find all "Pid" numberes and display next 2 line(record) for each match then use grep for filter "Location" from output and display 2nd field (47)
cat test_input | awk 'c&&!--c;/28405/{c=2; print $0}'| grep Location | awk '{print $2}'
47
I have noticed that double quotes in AWK can handle SHELL variables but when I use SAME command in script then I have got error message "awk: The statement cannot be correctly parsed."
DRIVE=Drive1;cat test_input | awk "c&&!--c;/$DRIVE/{c=1}" | awk '{print $2}'
28405
If you have some hints how get work variables from SHELL please let me know.
also I know that my commands and redirections are probably complicated but yeah I am not script master :)
If You need just use environment variables then you can use the ENVIRON awk built-in hash. If You want to pass arguments to awk, you can use the -v option.
An example for both:
cat >inputfile <<EOT
aaa
bbbxxx
xxxccc
ddd
EOT
VAR=xxx
awk -vVAR="$VAR" '$0~VAR {print ENVIRON["USER"]":"$0}' inputfile
I added the creation of the sample inputfile.
As I know in some awk version a white space is needed between -v and VAR.
If I may suggest you would use ' instead of " around the whole script line. It makes the life a bit easier if you use a lot of awk.
Output:
myuser:bbbxxx
myuser:xxxccc
If I understood well, You need to collect the names of all devices and all locations in non "Device" blocks. I assume clock starting with the tag Type and the tag order is always the same. If not, pls. inform me. Based on these assumptions my code looks like:
awk '$1=="Type:"{dev=$2=="Device"}
dev && $1=="Name/Id:"{name=$2}
dev && $1=="Pid:"{pids[name]=$2}
!dev && $1=="Pid:"{pid=$2}
!dev && $1=="Location:"{locs[pid]=$2}
END {
for(i in pids) {
pid = pids[i];
print i"\t"(pid in locs ? locs[pid] : "None");
}
}
' inputfile
It fills up the pids and and locs hash, then it prints all device names found in pids hash and the location belongs to this pid (if found).
Output:
Drive1 47
Of course if the location is always after the device block, the line could be printed immediately when the location found. So the END part could be dropped.
It's not clear what you want but maybe this:
$ cat tst.awk
BEGIN{ RS=""; FS="[:[:space:]]+" }
{
for (i=1;i<=NF;i+=2)
name2val[$i] = $(i+1)
}
(name2val[key] == val) && (tgt in name2val) {
print name2val[tgt]
}
$
$ awk -v key="Name/Id" -v val="Drive1" -v tgt="Pid" -f tst.awk file
28405
$
$ awk -v key="Pid" -v val="28405" -v tgt="Location" -f tst.awk file
47
I'm trying to use sed to remove the last occurrence of } from a file. So far I have this:
sed -i 's/\(.*\)}/\1/' file
But this removes as many } as there are on the end of the file. So if my file looks like this:
foo
bar
}
}
}
that command will remove all 3 of the } characters. How can I limit this to just the last occurrence?
someone game me this as a solution
sed -i '1h;1!H;$!d;g;s/\(.*\)}/\1/' file
I'm just not sure it's as good as the above awk solution.
sed is an excellent tool for simple substitutions on a single line. For anything else, just use awk, e.g. with GNU awk for gensub() and multi-char RS:
$ cat file1
foo
bar
}
}
}
$
$ cat file2
foo
bar
}}}
$
gawk -v RS='^$' -v ORS= '{$0=gensub(/\n?}([^}]*)$/,"\\1","")}1' file1
foo
bar
}
}
$
$ gawk -v RS='^$' -v ORS= '{$0=gensub(/\n?}([^}]*)$/,"\\1","")}1' file2
foo
bar
}}
$
Note that the above will remove the last } char AND a preceding newline if present as I THINK that's probably what you would actually want but if you want to ONLY remove the } and leave a trailing newline in those cases (as I think all of the currently posted sed solutions would do), then just get rid of \n? from the matching RE:
$ gawk -v RS='^$' -v ORS= '{$0=gensub(/}([^}]*)$/,"\\1","")}1' file1
foo
bar
}
}
$
And if you want to change the original file without manually specifying a tmp file, just use the -i inplace argument:
$ gawk -i inplace -v RS='^$' -v ORS= '{$0=gensub(/}([^}]*)$/,"\\1","")}1' file1
$ cat file1
foo
bar
}
}
$
With a buffer you can modify the file directly:
awk 'BEGIN{file=ARGV[1]}{a[NR]=$0}/}/{skip=NR}END{for(i=1;i<=NR;++i)if(i!=skip)print a[i]>file}' file
thnks to #jthill for remark for the 1 line file issue
sed ':a
$ !{N
ba
}
$ s/}\([^}]*\)$/\1/' YourFile
Need to load the file in buffer first. This does not remove the new line if } is alone on a line
When I read "do something with the last ...", I think "reverse the file, do something with the first ..., re-reverse the file"
tac file | awk '!seen && /}/ {$0 = gensub(/(.*)}/, "\\\1", 1); seen = 1} 1' | tac