AWK - using element on next record GETLINE? - bash

I got some problem with this basic data:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65
68;
;0
;9
;25
;25
70;
that I'd like to transform on this kind of output:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65;x
68;
;0
;9
;25
;25;x
70;
The "x" value comes if on the next line $1 exists or if $2 is null. From my understanding, I've to use getline but I don't get the way!
I've tried the following code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
print $0;
getline;
if($2="") {print $0";x"}
else {print $0}
}' $file2 > $file3
Seemed easy. I don't mention the result, totally different from my expectation.
Some clue? Is getline necessary on this problem?
OK, I continue to test some code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
getline var
if (var ~ /.*;$/) {
print $0";x";
print var;
}
else {
print $0;
print var;
}
}' $file2 > $file3
It's quite better, but still, all lines that should be marked aren't... I don't get why...

alternative one pass version
$ awk -F\; 'NR>1 {printf "%s\n", (f && $2<0?"x":"")}
{f=$1<0; printf "%s", $0}
END {print ""}' file

give this one-liner a try:
awk -F';' 'NR==FNR{if($1>0||!$2)a[NR-1];next}FNR in a{$0=$0";x"}7' file file
or
awk -F';' 'NR==FNR{if($1~/\S/||$2).....

Related

Is there a way to double print a file in bash? [duplicate]

This question already has answers here:
How to redirect output to a file and stdout
(11 answers)
Closed 1 year ago.
I'm new to Linux and programming. My problem is the following: I have a file listing 3 columns. I want to swap the first and the last column, print it to prompt AND to a new file in one line. So I swapped the columns and printed it to prompt OR to a file.
$ awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv
This is my base line to print it to prompt. But it seems impossible to print it to a new file in the same command line.
Does anyone have the idea?
Here are some examples that didn't work:
$ awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv | >liste2.csv
$ printf "$(sudo awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv > liste2.csv)"
$ cat $(sudo awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv > liste2.csv)
I think you catch the drift of what I ask.
Thanks!
Use tee command as mentioned in How to redirect output to a file and stdout
Or, redirect it within awk itself by adding print > "liste2.csv" in addition to the existing print for displaying on stdout
print it to prompt AND to a new file in one line
This sound like task for tee. Assuming
awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv
does produce correct output to standard output, this
awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv | tee liste2.csv
should write to liste2.csv and standard output

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

File into table awk

I am trying to make a table by reading the file.
Here is an example of the code I am trying to compile:
FHEAD|1|PRMPC|20200216020532|1037|S
TMBPE|2|MOD
TPDTL|3|72810|1995019|11049-|11049-|Dcto 20|0|5226468|20200216000001|20200222235959|2||1||||
TPGRP|4|5403307
TGLIST|5|5031472|1|||
TLITM|6|101055590
TPDSC|7|0|||-20||2|1|
TPGRP|8|5403308
TGLIST|9|5031473|0|||
TPDTL|13|10728|1995021|11049-|11049-|Dcto 30|0|5226469|20200216000001|20200222235959|2||1||||
TPGRP|14|5403310
TGLIST|15|5031475|1|||
TLITM|16|210000041
TLITM|17|101004522
TPDSC|113|0|||-30||2|1|
TPGRP|114|5403309
TGLIST|115|5031474|0|||
TLITM|116|101047933
TLITM|117|101004681
TLITM|118|101028161
TPDSC|119|0|||-25||2|1|
TPISR|214|101004225|2350|EA|20200216000000|COP|
TTAIL|1135
FTAIL|1136|1134
I tried to develop the code but it returns all tags in one line
for filename in "$input"*.dat;
do
echo "$filename">>"$files"
a=`awk -F'|' '$1=="FHEAD" && $5!=""{print $5}' "$filename"`
b=`awk -F'|' '$1=="TPDTL" && $3!=""{print $3}' "$filename"`
c=`awk -F'|' '$1=="TPDTL" && $4!=""{print $4}' "$filename"`
d=`awk -F'|' '$1=="TPDTL" && $10!=""{print $10}' "$filename"`
e=`awk -F'|' '$1=="TPDTL" && $11!=""{print $11}' "$filename"`
f=`awk -F'|' '$1=="TPDSC" && $6!=""{print $6}' "$filename"`
g=`awk -F'|' '$1=="TLITM" && $3!=""{print $3}' "$filename"`
For exemple:
echo -e ${d}
20200216000001 20200216000001
I wanted something like the picture.
Someone can help me?
Thanks in advance
Assuming:
The frequency of appearance of the keywords like FHEAD, TPDTL, etc
are not uniform. Use the latest one if needed.
The number of rows should be equal to the count of TLITM.
The table should be updated when TPDSC appears.
then would you please try the following:
awk 'BEGIN {FS = "|"; OFS = ","}
$1 ~ /FHEAD/ {a = $5}
$1 ~ /TPDTL/ {b = $3; c = $4; d = $10; e = $11}
$1 ~ /TLITM/ {f[++tlitm_count] = $3}
$1 ~ /TPDSC/ {g = $6;
for (i=1; i<=tlitm_count; i++) {
print a, b, c, d, e, f[i], g
}
tlitm_count = 0;
}
' *.dat
Output:
1037,72810,1995019,20200216000001,20200222235959,101055590,-20
1037,10728,1995021,20200216000001,20200222235959,210000041,-30
1037,10728,1995021,20200216000001,20200222235959,101004522,-30
1037,10728,1995021,20200216000001,20200222235959,101047933,-25
1037,10728,1995021,20200216000001,20200222235959,101004681,-25
1037,10728,1995021,20200216000001,20200222235959,101028161,-25
If you want the output delimiter to be a whitespace, please modify the value of OFS.

find unique lines based on one field only [duplicate]

Would like to print unique lines based on first field , keep the first occurrence of that line and remove duplicate other occurrences.
Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
Desired Output:
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
Have tried below command and in-complete
awk 'BEGIN { FS = OFS = "," } { !seen[$1]++ } END { for ( i in seen) print $0}' Input.csv
Looking for your suggestions ...
You put your test for "seen" in the action part of the script instead of the condition part. Change it to:
awk -F, '!seen[$1]++' Input.csv
Yes, that's the whole script:
$ cat Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
$
$ awk -F, '!seen[$1]++' Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
This should give you what you want:
awk -F, '{ if (!($1 in a)) a[$1] = $0; } END '{ for (i in a) print a[i]}' input.csv
typo there in syntax.
awk '{ if (!($1 in a)) a[$1] = $0; } END { for (i in a) print a[i]}'

Putting awk in a function

I have 4 awks that are really similiar so I want to put them in a function. My awk code is...
awk -v MYPATH="$MYPATH" -v FILE_EXT="$FILE_EXT" -v NAME_OF_FILE="$NAME_OF_FILE" -v DATE="$DATE" -v pattern="$STORED_PROCS_BEGIN" '
$0 ~ pattern {
rec = $1 OFS $2 OFS $4 OFS $7
for (i=9; i<=NF; i++) {
rec = rec OFS $i
if ($i ~ /\([01]\)/) {
break
}
}
print rec >> "'$MYPATH''$NAME_OF_FILE''$DATE'.'$FILE_EXT'"
}
' "$FILE_LOCATION"
So the pattern and regular expression differ. How can I put this awk in a function where I can replace pattern with $1 and /([01])/ with $2 if I already use those in my awk?
EDIT:
I was thinking I can do...
printFmt(){
awk -v .......
$0 ~ patten {
rec..
for..
rec..
if($i ~ search)
break
print rec
then call with printFmt set?
}
Not sure where the problem is since you already have in your code exactly what you need to do but maybe this will help by simplifying it a bit:
$ cat tst.sh
function prtStuff() {
awk -v x="$1" 'BEGIN{ print x }'
}
prtStuff "foo"
prtStuff "---"
prtStuff "bar"
$ ./tst.sh
foo
---
bar

Resources