I have 4 awks that are really similiar so I want to put them in a function. My awk code is...
awk -v MYPATH="$MYPATH" -v FILE_EXT="$FILE_EXT" -v NAME_OF_FILE="$NAME_OF_FILE" -v DATE="$DATE" -v pattern="$STORED_PROCS_BEGIN" '
$0 ~ pattern {
rec = $1 OFS $2 OFS $4 OFS $7
for (i=9; i<=NF; i++) {
rec = rec OFS $i
if ($i ~ /\([01]\)/) {
break
}
}
print rec >> "'$MYPATH''$NAME_OF_FILE''$DATE'.'$FILE_EXT'"
}
' "$FILE_LOCATION"
So the pattern and regular expression differ. How can I put this awk in a function where I can replace pattern with $1 and /([01])/ with $2 if I already use those in my awk?
EDIT:
I was thinking I can do...
printFmt(){
awk -v .......
$0 ~ patten {
rec..
for..
rec..
if($i ~ search)
break
print rec
then call with printFmt set?
}
Not sure where the problem is since you already have in your code exactly what you need to do but maybe this will help by simplifying it a bit:
$ cat tst.sh
function prtStuff() {
awk -v x="$1" 'BEGIN{ print x }'
}
prtStuff "foo"
prtStuff "---"
prtStuff "bar"
$ ./tst.sh
foo
---
bar
Related
This question already has answers here:
How to redirect output to a file and stdout
(11 answers)
Closed 1 year ago.
I'm new to Linux and programming. My problem is the following: I have a file listing 3 columns. I want to swap the first and the last column, print it to prompt AND to a new file in one line. So I swapped the columns and printed it to prompt OR to a file.
$ awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv
This is my base line to print it to prompt. But it seems impossible to print it to a new file in the same command line.
Does anyone have the idea?
Here are some examples that didn't work:
$ awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv | >liste2.csv
$ printf "$(sudo awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv > liste2.csv)"
$ cat $(sudo awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv > liste2.csv)
I think you catch the drift of what I ask.
Thanks!
Use tee command as mentioned in How to redirect output to a file and stdout
Or, redirect it within awk itself by adding print > "liste2.csv" in addition to the existing print for displaying on stdout
print it to prompt AND to a new file in one line
This sound like task for tee. Assuming
awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv
does produce correct output to standard output, this
awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv | tee liste2.csv
should write to liste2.csv and standard output
I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv
I have a file like this :
"A";"1"
"A";""
"A";""
"B";"1"
"C";"1"
"C";""
"C";""
When I have the same pattern between first part of current line and previous line, I want increment the second part of my line.
like this :
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
or if second part is empty I take the previous line and I increment it.
Do you have any idea how I can do this with a shell script or maybe with awk or sed command?
With perl:
$ perl -F';' -lane 'if ($F[1] =~ /"(\d+)"/) { $saved = $1; } else { $saved++; $F[1] = qq/"$saved"/; }
print join(";", #F)' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
With awk:
$ awk -F';' -v OFS=';' '
$2 ~ /"[0-9]+"/ { saved = substr($2, 2, length($2) - 2) }
$2 == "\"\"" { $2 = "\"" ++saved "\"" }
{ print }' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
I got some problem with this basic data:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65
68;
;0
;9
;25
;25
70;
that I'd like to transform on this kind of output:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65;x
68;
;0
;9
;25
;25;x
70;
The "x" value comes if on the next line $1 exists or if $2 is null. From my understanding, I've to use getline but I don't get the way!
I've tried the following code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
print $0;
getline;
if($2="") {print $0";x"}
else {print $0}
}' $file2 > $file3
Seemed easy. I don't mention the result, totally different from my expectation.
Some clue? Is getline necessary on this problem?
OK, I continue to test some code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
getline var
if (var ~ /.*;$/) {
print $0";x";
print var;
}
else {
print $0;
print var;
}
}' $file2 > $file3
It's quite better, but still, all lines that should be marked aren't... I don't get why...
alternative one pass version
$ awk -F\; 'NR>1 {printf "%s\n", (f && $2<0?"x":"")}
{f=$1<0; printf "%s", $0}
END {print ""}' file
give this one-liner a try:
awk -F';' 'NR==FNR{if($1>0||!$2)a[NR-1];next}FNR in a{$0=$0";x"}7' file file
or
awk -F';' 'NR==FNR{if($1~/\S/||$2).....
I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?
Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.
Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.