This question already has answers here:
How to redirect output to a file and stdout
(11 answers)
Closed 1 year ago.
I'm new to Linux and programming. My problem is the following: I have a file listing 3 columns. I want to swap the first and the last column, print it to prompt AND to a new file in one line. So I swapped the columns and printed it to prompt OR to a file.
$ awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv
This is my base line to print it to prompt. But it seems impossible to print it to a new file in the same command line.
Does anyone have the idea?
Here are some examples that didn't work:
$ awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv | >liste2.csv
$ printf "$(sudo awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv > liste2.csv)"
$ cat $(sudo awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv > liste2.csv)
I think you catch the drift of what I ask.
Thanks!
Use tee command as mentioned in How to redirect output to a file and stdout
Or, redirect it within awk itself by adding print > "liste2.csv" in addition to the existing print for displaying on stdout
print it to prompt AND to a new file in one line
This sound like task for tee. Assuming
awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv
does produce correct output to standard output, this
awk -F, ' { t = $1; $1 = $3; $3 = t; print; } ' OFS=, liste.csv | tee liste2.csv
should write to liste2.csv and standard output
Related
I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv
I have a file like this :
"A";"1"
"A";""
"A";""
"B";"1"
"C";"1"
"C";""
"C";""
When I have the same pattern between first part of current line and previous line, I want increment the second part of my line.
like this :
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
or if second part is empty I take the previous line and I increment it.
Do you have any idea how I can do this with a shell script or maybe with awk or sed command?
With perl:
$ perl -F';' -lane 'if ($F[1] =~ /"(\d+)"/) { $saved = $1; } else { $saved++; $F[1] = qq/"$saved"/; }
print join(";", #F)' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
With awk:
$ awk -F';' -v OFS=';' '
$2 ~ /"[0-9]+"/ { saved = substr($2, 2, length($2) - 2) }
$2 == "\"\"" { $2 = "\"" ++saved "\"" }
{ print }' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
I got some problem with this basic data:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65
68;
;0
;9
;25
;25
70;
that I'd like to transform on this kind of output:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65;x
68;
;0
;9
;25
;25;x
70;
The "x" value comes if on the next line $1 exists or if $2 is null. From my understanding, I've to use getline but I don't get the way!
I've tried the following code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
print $0;
getline;
if($2="") {print $0";x"}
else {print $0}
}' $file2 > $file3
Seemed easy. I don't mention the result, totally different from my expectation.
Some clue? Is getline necessary on this problem?
OK, I continue to test some code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
getline var
if (var ~ /.*;$/) {
print $0";x";
print var;
}
else {
print $0;
print var;
}
}' $file2 > $file3
It's quite better, but still, all lines that should be marked aren't... I don't get why...
alternative one pass version
$ awk -F\; 'NR>1 {printf "%s\n", (f && $2<0?"x":"")}
{f=$1<0; printf "%s", $0}
END {print ""}' file
give this one-liner a try:
awk -F';' 'NR==FNR{if($1>0||!$2)a[NR-1];next}FNR in a{$0=$0";x"}7' file file
or
awk -F';' 'NR==FNR{if($1~/\S/||$2).....
I have 4 awks that are really similiar so I want to put them in a function. My awk code is...
awk -v MYPATH="$MYPATH" -v FILE_EXT="$FILE_EXT" -v NAME_OF_FILE="$NAME_OF_FILE" -v DATE="$DATE" -v pattern="$STORED_PROCS_BEGIN" '
$0 ~ pattern {
rec = $1 OFS $2 OFS $4 OFS $7
for (i=9; i<=NF; i++) {
rec = rec OFS $i
if ($i ~ /\([01]\)/) {
break
}
}
print rec >> "'$MYPATH''$NAME_OF_FILE''$DATE'.'$FILE_EXT'"
}
' "$FILE_LOCATION"
So the pattern and regular expression differ. How can I put this awk in a function where I can replace pattern with $1 and /([01])/ with $2 if I already use those in my awk?
EDIT:
I was thinking I can do...
printFmt(){
awk -v .......
$0 ~ patten {
rec..
for..
rec..
if($i ~ search)
break
print rec
then call with printFmt set?
}
Not sure where the problem is since you already have in your code exactly what you need to do but maybe this will help by simplifying it a bit:
$ cat tst.sh
function prtStuff() {
awk -v x="$1" 'BEGIN{ print x }'
}
prtStuff "foo"
prtStuff "---"
prtStuff "bar"
$ ./tst.sh
foo
---
bar
I did make a code and now would like to make a separate file because the code is a bit long to type but I'm having troubles.
This is my code:
awk 'NF && $1!~/^#/ && $1!~/^#/' rmsd.xvg | awk '{for(i=1;i<=NF;i++) {sum[i] += $i; sumsq[i] += ($i)^2}}
END {for (i=2;i<=NF;i++) {
print "\n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)}}' | sort -u
How can this be done?
Create a file named script.awk, and put:
{ for(i=1;i<=NF;i++) {
sum[i] += $i; sumsq[i] += ($i)^2}
}
END {for (i=2;i<=NF;i++) {
print "\n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)
}
}
into it. Then use:
awk 'NF && $1!~/^#/ && $1!~/^#/' rmsd.xvg | awk -f script.awk | sort -u
But there's no need for two separate awk commands. Change the script to:
/^[##]/ { for(i=1;i<=NF;i++) {
sum[i] += $i; sumsq[i] += ($i)^2}
}
END {for (i=2;i<=NF;i++) {
print "\n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)
}
}
Then:
awk -f script.awk rmsd.xvg | sort -u
You can create a shell script as
#!/bin/bash
awk 'NF && $1!~/^#/ && $1!~/^#/' rmsd.xvg | awk '{for(i=1;i<=NF;i++) {sum[i] += $i; sumsq[i] += ($i)^2}}
END {for (i=2;i<=NF;i++) {
print "\n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)}}' | sort -u
Excecute the script as
$ bash fileName
Note that you have two awk commands. However, the first is just a filter and can trivially be combined with the second. The only issue is that instead of using NR in the END action, you'll need to keep count of how many records were acted upon by the first action. The two scripts combined, along with the adjustment for NR, would look like
NF && $1 !~ /^#/ && $1 !~ /^#/ {
for(i=1;i<=NF;i++) {
sum[i] += $i
sumsq[i] += ($i)^2
}
record_count++
}
END {
for (i=2;i<=NF;i++) {
print "\n", sum[i]/record_count, sqrt((sumsq[i]-sum[i]^2/record_count)/record_count)
}
}
I'm assuming that every line has the same number of fields; otherwise, the value of NF in the END action is just the value of NF on the last line, which may or may not have any meaning.
Once the above is saved in something like script.awk, run it with
awk -f script.awk rmsd.xvg | sort -u