TL;DR - I have a variable which looks like a format specifier ($TEMP) which I need to use with awk printf.
So by doing this:
awk '-v foo="$temp" {....{printf foo} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
Bash should see this:
awk '{.....{printf "%-5s %-6s %...\n", $1, $2, $3....} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
Sample Input:
col1 col2 col3
aourwyo5637[dfs] 67tyd 8746.0000
tomsd - 4
o938743[34* 0 834.92
.
.
.
Expected Output:
col1 col2 col3
aourwyo5637[dfs] 67tyd 8746.0000
tomsd - 4
o938743[34* 0646sggg 834.92
.
.
.
Long Version
I am new to scripting and after over 5 hours of scouring the internet and doing what I believe is a patchwork of information, I have hit a brick wall.
Scenario:
So I have a multiple random tables I need to open in a directory. Since I do not know anything about a given table except that I need to format all data that is on line 4 and all lines after line 14 of the file.
I need to make a custom printf command in awk on the fly so the padding for each column is equal to a value (say 5 SPACES) so the table looks pretty once I open it up.
This is what I am come up with so far:
awk '{
for (i=1;i<=NF;i++)
{
max_length=length($i);
if ( max_length > linesize[i] )
{
linesize[i]=max_length+5;
}
}}
END{
for (i = 1; i<=length(linesize); i++)
{
print linesize[i] >> "tempfile1.txt"
}
}' file1.txt
# remove all blank lines in tempfile1.txt
awk 'NF' tempfile1.txt > tmp && mv tmp tempfile1.txt
# Get number of entries in tempfile1.txt
n=`awk 'END {print NR}' tempfile1.txt`
# This for loop generates the pattern I need for the printf command
declare -a a
for((i=0;i<$n;i++))
do
a[i]=`awk -v i=$((i+1)) 'FNR == i {print}' tempfile1.txt`
temp+=%-${a[i]}s' '
temp2+='$'$((i+1))', '
#echo "${a[$i]}";
#echo "$sub"
done
temp='"'${temp::-2}'\n", '
# echo "$temp"
temp=$temp${temp2::-2}''
# echo "$temp"
awk <something here>
# Tried the one below and it gave an error
awk -v tem="$temp" '{printf {tem}}
So ideally what I would like is the awk command is to look like this by simply putting the bash variable temp in the awk command.
So by doing this:
awk '-v foo="$temp" {if(FNR >=14 || FNR == 4) {printf foo} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
Bash should see this:
awk '{if(FNR >=14 || FNR == 4) {printf "%-5s %-6s %...\n", $1, $2, $3....} else {print $_}}' tempfile1.txt > tmp && mv tmp tempfile1.txt
It sounds like this MIGHT be what you want but it's still hard to tell from your question:
$ cat tst.awk
BEGIN { OFS=" " }
NR==FNR {
for (i=1;i<=NF;i++) {
width[i] = (width[i] > length($i) ? width[i] : length($i))
}
next
}
{
for (i=1;i<=NF;i++) {
printf "%-*s%s", width[i], $i, (i<NF?OFS:ORS)
}
}
$ awk -f tst.awk file file
col1 col2 col3
aourwyo5637[dfs] 67tyd 8746.0000
tomsd - 4
o938743[34* 0 834.92
I ran it against the sample input from your question after removing all the spurious .s.
# Tried the one below and it gave an error
awk -v tem="$temp" '{printf {tem}}
' at end of line is missing
{tem} is wrong; just write tem
printf's , expr-list is missing
\n is missing
Corrected:
awk -v tem="$temp" "{printf tem\"\n\", $temp2 0}"
or
awk "{printf \"$temp\n\", $temp2 0}"
(simpler).
Related
Input csv - new_param.csv
value like -
ID
Identity
as-uid
cp_cus_id
evs
k_n
master.csv has value like -
A, xyz, id, abc
n, xyz, as-uid, abc, B, xyz, ne, abc
q, xyz, id evs, abc
3, xyz, k_n, abc, C, xyz, ad, abc
1, xyz, zd, abc
z, xyz, ID, abc
Require Output Updated new_param.csv - true or false in 2nd column
ID,true
Identity,false
as-uid,true
cp_cus_id,false
evs,true
k_n,true
tried below code no output -
#!/bin/bash
declare -a keywords=(`cat new_param.csv`)
length=${#keywords[#]}
for (( j=0; j<length; j++ ));
do
a= LC_ALL=C awk -v kw="${keywords[$j]}" -F, '{for (i=1;i<=NF;i++) if ($i ~ kw) {print i}}' master.csv
b=0
if [ $a -gt $b ]
then
echo true $2 >> new_param.csv
else
echo false $2 >> new_param.csv
fi
done
Please help someone !
Tried above mention code but does not helping me
getings error like -
test.sh: line 29: [: -gt: unary operator expected test.sh: line 33: -f2: command not found
awk -v RS=', |\n' 'NR == FNR { a[$0] = 1; next }
{ gsub(/,.*/, ""); b = "" b $0 (a[$0] ? ",true" : ",false") "\n" }
END { if (FILENAME == "new_param.csv") printf "%s", b > FILENAME }' master.csv new_param.csv
Try this Shellcheck-clean pure Bash code:
#! /bin/bash -p
outputs=()
while read -r kw; do
if grep -q -E "(^|[[:space:],])$kw([[:space:],]|\$)" master.csv; then
outputs+=( "$kw,true" )
else
outputs+=( "$kw,false" )
fi
done <new_param.csv
printf '%s\n' "${outputs[#]}" >new_param.csv
You may need to tweak the regular expression used with grep -E depending on what exactly you want to count as a match.
Using grep to find exact word matches:
$ grep -owf new_param.csv master.csv | sort -u
ID
as-uid
evs
k_n
Then feed this to awk to match against new_param.csv entries:
awk '
BEGIN { OFS="," }
FNR==NR { a[$1]; next }
{ print $1, ($1 in a) ? "true" : "false" }
' <(grep -owf new_param.csv master.csv | sort -u) new_param.csv
This generates:
ID,true
Identity,false
as-uid,true
cp_cus_id,false
evs,true
k_n,true
Once the results are confirmed as correct OP can add > new_param.csv to the end of the awk script, eg:
awk 'BEGIN { OFS="," } FNR==NR ....' <(grep -owf ...) new_parame.csv > new_param.csv
^^^^^^^^^^^^^^^
Alternative awk option:
Use a , for the field separator and concatenate the 3rd field for each record of the master.csv to the variable m. Second, read each record from the new-params.csv file and use the index funtion to determine whether that record exists in the m variable string.
awk -F", " '
FNR==NR{m=m$3}
FNR<NR{print $0 (index(m,$0) ? ",true" : ",false")}
' master.csv new-params.csv
Output:
ID,true
Identity,false
as-uid,true
cp_cus_id,false
evs,true
k_n,true
I am trying to use awk to extract data using a conditional statement containing an array created using another awk script.
The awk script I use for creating the array is as follows:
array=($(awk 'NR>1 { print $1 }' < file.tsv))
Then, to use this array in the other awk script
awk var="${array[#]}" 'FNR==1{ for(i=1;i<=NF;i++){ heading[i]=$i } next } { for(i=2;i<=NF;i++){ if($i=="1" && heading[i] in var){ close(outFile); outFile=heading[i]".txt"; print ">kmer"NR-1"\n"$1 >> (outFile) }}}' < input.txt
However, when I run this, the following error occurs.
awk: fatal: cannot open file 'foo' for reading (No such file or directory)
I've already looked at multiple posts on why this error occurs and on how to correctly implement a shell variable in awk, but none of these have worked so far. However, when removing the shell variable and running the script it does work.
awk 'FNR==1{ for(i=1;i<=NF;i++){ heading[i]=$i } next } { for(i=2;i<=NF;i++){ if($i=="1"){ close(outFile); outFile=heading[i]".txt"; print ">kmer"NR-1"\n"$1 >> (outFile) }}}' < input.txt
I really need that conditional statement but don't know what I am doing wrong with implementing the bash variable in awk and would appreciate some help.
Thx in advance.
That specific error messages is because you forgot -v in front of var= (it should be awk -v var=, not just awk var=) but as others have pointed out, you can't set an array variable on the awk command line. Also note that array in your code is a shell array, not an awk array, and shell and awk are 2 completely different tools each with their own syntax, semantics, scopes, etc.
Here's how to really do what you're trying to do:
array=( "$(awk 'BEGIN{FS=OFS="\t"} NR>1 { print $1 }' < file.tsv)" )
awk -v xyz="${array[*]}" '
BEGIN{ split(xyz,tmp,RS); for (i in tmp) var[tmp[i]] }
... now use `var` as you were trying to ...
'
For example:
$ cat file.tsv
col1 col2
a b c d e
f g h i j
$ cat -T file.tsv
col1^Icol2
a b^Ic d e
f g h^Ii j
$ awk 'BEGIN{FS=OFS="\t"} NR>1 { print $1 }' < file.tsv
a b
f g h
$ array=( "$(awk 'BEGIN{FS=OFS="\t"} NR>1 { print $1 }' < file.tsv)" )
$ awk -v xyz="${array[*]}" '
BEGIN {
split(xyz,tmp,RS)
for (i in tmp) {
var[tmp[i]]
}
for (idx in var) {
print "<" idx ">"
}
}
'
<f g h>
<a b>
It's easier and more efficient to process both files in a single awk:
edit: fixed issues in comment, thanks #EdMorton
awk '
FNR == NR {
if ( FNR > 1 )
var[$1]
next
}
FNR == 1 {
for (i = 1; i <= NF; i++)
heading[i] = $i
next
}
{
for (i = 2; i <= NF; i++)
if ( $i == "1" && heading[i] in var) {
outFile = heading[i] ".txt"
print ">kmer" (NR-1) "\n" $1 >> (outFile)
close(outFile)
}
}
' file.tsv input.txt
You might store string in variable, then use split function to turn that into array, consider following simple example, let file1.txt content be
A B C
D E F
G H I
and file2.txt content be
1
3
2
then
var1=$(awk '{print $1}' file1.txt)
awk -v var1="$var1" 'BEGIN{split(var1,arr)}{print "First column value in line number",$1,"is",arr[$1]}' file2.txt
gives output
First column value in line number 1 is A
First column value in line number 3 is G
First column value in line number 2 is D
Explanation: I store output of 1st awk command, which is then used as 1st argument to split function in 2nd awk command. Disclaimer: this solutions assumes all files involved have delimiter compliant with default GNU AWK behavior, i.e. one-or-more whitespaces is always delimiter.
(tested in gawk 4.2.1)
I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv
I need to find the count of employees whose salary is less than average salary of all employees.
The file with the employee details will be given as a command line argument when your script will run
example->
Input: File:
empid;empname;salary
100;A;30000
102;B;45000
103;C;15000
104;D;40000
Output:
2
my solution->
f=`awk -v s=0 'BEGIN{FS=":"}{if(NR>1){s+=$3;row++}}END{print s/row}' $file`;
awk -v a="$f" 'BEGIN{FS=":"}{if(NR!=1 && $3<a)c++}END{print c}' $file;
This is what i have tried so far
but output comes out to be
0
This one-liner should solve the problem:
awk -F';' 'NR>1{e[$1]=$3;s+=$3}
END{avg=s/(NR-1);for(x in e)if(e[x]<avg)c++;print c}' file
If you run it with your example file, it is gonna print:
2
explanation:
NR>1 skip the header
e[$1]=$3;s+=$3 : build a hashtable, and sum the salarays
END{avg=s/(NR-1); : calc the averge
for(x in e)if(e[x]<avg)c++;print c :go through the hashtables, count the element, which value < avg and output.
Could you please try following.
awk '
BEGIN{
FS=";"
}
FNR==NR{
if(FNR>1)
{
total+=$NF
count++
}
next
}
FNR==1{
avg=total/count
}
avg>$NF
' Input_file Input_file
Your script is fine except it's setting FS=":"; it should be setting FS=";" since that is what is separating your fields in the input.
avg=$(awk -F";" 'NR>1 { s+=$3;i++} END { print s/i }' f)
awk -v avg=$avg -F";" 'NR>1 && $3<avg' f
1) Ignore header and compute Average, avg
2) Ignore header and if salary is less than avg print
file=$1
salary=`sed "s/;/ /" $file | sed "s/;/ /" | awk '{print $3}' | tail -n+2`
sum=0
n=0
for line in $salary
do
((sum+=line))
((n++))
done
avg=$((sum / n))
count=0
for line in $salary
do
if [ $line -lt $avg ]
then
((count++))
fi
done
echo "No. of Emp : $count"
I have a little script to compare some columns inside a bunch of CSV files.
It's working fine, but there are some things that are bugging me.
Here is the code:
FILES=./*
for f in $FILES
do
cat -v $f | sed "s/\^A/,/g" > op_tmp.csv
awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' op_tmp.csv >> output.csv
rm op_tmp.csv
done
Just to explain:
I get all files on the directory, then i use CAT to replace the divisor ^A for a Pipe |.
Then i use the awk onliner to compare the columns i need and print the result to a output.csv.
But now i want to print the filename before every loop.
I tried using the cat sed and awk in the same line and printing the $FILENAME, but it doesn't work:
cat -v $f | sed "s/\^A/,/g" | awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' > output.csv
Can anyone help?
You can rewrite the whole script better, but assuming it does what you want for now just add
echo $f >> output.csv
before awk call.
If you want to add filename in every awk output line, you have to pass it as an argument, i.e.
awk ... -v fname="$f" '{...; print fname... etc
A rewrite:
for f in ./*; do
awk -F '\x01' -v OFS="|" '
BEGIN {
letter[1]="A"; letter[2]="C"; letter[3]="R"; letter[4]="P"; letter[5]="T"
letters["A"] = letters["C"] = letters["R"] = letters["P"] = letters["T"] = 1
}
NR == 1 {next}
$9 in letters {
count[$9,$8] += $7
seen[$8]
}
END {
print FILENAME
for (i in seen) {
sum = 0
for (j=1; j<=4; j++) {
print i, letter[j], count[letter[j],i]
sum += count[letter[j],i]
}
print i, "T", count["T",i], (count["T",i] == sum ? "ERROR" : "MATCHED")
}
}
' "$f"
done > output.csv
Notes:
your method of iterating over files will break as soon as you have a filename with a space in it
try to reduce duplication as much as possible.
newlines are free, use them to improve readability
improve your variable names i, n, etc -- here "letter" and "letters" could use improvement to hold some meaning about those symbols.
awk has a FILENAME variable (here's the actual answer to your question)
awk understands \x01 to be a Ctrl-A -- I assume that's the field separator in your input files
define an Output Field Separator that you'll actually use
If you have GNU awk (version ???) you can use the ENDFILE block and do away with the shell for loop altogether:
gawk -F '\x01' -v OFS="|" '
BEGIN {...}
FNR == 1 {next}
$9 in letters {...}
ENDFILE {
print FILENAME
for ...
# clean up the counters for the next file
delete count
delete seen
}
' ./* > output.csv