Display input file without print in awk - bash

My Code is in mid of manipulating two input files.
awk -F'|' -v PARM_VAL="${PARM_VALUE[*]}" '
BEGIN { split(PARM_VAL,pa," ") }
FNR==NR
{
for(i=1;i<=NF;i++)
a[NR,i]=$i;
}
END {printf "second value of SPPIN : "a[2,2]", parm : "pa[2]", File val : " FILENAME "First rec of SPPOUT: " $0 ;printf "\n" } ' SPP_IN SPP_OUT
I am passing parm array to awk, storing first input file in array. Just executed the above command.
My first input file is getting displayed without print. Anyway to suppress or avoid it?

Don't split FNR == NR and the { of the action.
FNR == NR
{
Put them on the same line instead.
FNR == NR {
awk is seeing FNR==NR as a pattern without an action and using the default action of print.

Related

Merge rows with same value and every 100 lines in csv file using command

I have a csv file like below:
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
...
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
...
I want combine the csv file to new csv file like below:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
http://www.z.com/4
...
http://www.z.com/100
",flower
"http://www.z.com/101
http://www.z.com/102
http://www.z.com/103
http://www.z.com/104
...
http://www.z.com/200
",flower
I want keep the first column every cell have max 100 lines http url.
Column two same value will appear in corresponding cell.
Is there a very simple command pattern to achieve this idea ?
I used command below:
awk '{if(NR%100!=0)ORS="\t";else ORS="\n"}1' test.csv > result.csv
$ awk -F, '$2!=p || n==100 {if(NR!=1) print "\"," p; printf "\""; p=$2; n=0}
{print $1; n+=1} END {print "\"," p}' test.csv
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
",flower
First set the field separator to the comma (-F,). Then:
If the second field changes ($2!=p) or if we already printed 100 lines in the current batch (n==100):
if it is not the first line, print a double quote, a comma, the previous second field and a newline,
print a double quote,
store the new second field in variable p for later comparisons,
reset line counter n.
For all lines print the first field and increment line counter n.
At the end print a double quote, a comma and the last value of second field.
1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
' Input_file
2nd solution: In case your Input_file is NOT sorted with 2nd column then try following sort + awk code.
sort -t, -k2 Input_file |
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
'
Output will be as follows:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3",flower
Given:
cat file
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
Here is a two pass awk to do this:
awk -F, 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
If you want to print either at the change of the $2 value or at some fixed line interval (like 100) you can do:
awk -F, -v n=100 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR || FNR%n==0{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
Either prints:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4"
,apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3"
,flower

How to assign awk result variable to an array and is it possible to use awk inside another awk in loop

I've started to learn bash and totally stuck with the task. I have a comma separated csv file with records like:
id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.
I need to format it this way: name and surname must start with a capital letter
add an email record that consists of the first letter of the name and full surname in lowercase
create a new csv with records from the old csv with corrected fields.
I split csv on records using awk ( cause some fields contain fields with a comma between quotes "department1 department2, department3" ).
#!/bin/bash
input="$HOME/test.csv"
exec 0<$input
while read line; do
awk -v FPAT='"[^"]*"|[^,]*' '{
...
}' $input)
done
inside awk {...} (NF=8 for each record), I tried to use certain field values ($1 $2 $3 $4 $5 $6 $7 $8):
#it doesn't work
IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv
# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ?
# as an example:
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
$5="${name_surname[0]}' '${name_surname[1]}"
email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='#domain'
$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv
how to add field values ($1 $2 $3 $4 $5 $6 $7 $8) to array and call function join for each for loop iteration to add record to new csv file?
function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[#]})
echo $result >> new.csv
This may be what you're trying to do (using gawk for FPAT as you already were doing) but without more representative sample input and the expected output it's a guess:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
n = split($5,name,/\s*/)
$7 = tolower(substr(name[1],1,1) name[n]) "#example.com"
print
}
' "${#:--}"
$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,nsurname#example.com,
2,1,,,name Surname,department1,nsurname#example.com,
3,2,,,Name Surname,"department1 department2, department3",nsurname#example.com,
I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f.
Completely working answer by Ed Morton.
If it may be will be helpful for someone, I added one more checking condition: if in CSV file more than one email address with the same name - index number is added to email local part and output is sent to file
#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|\"[^\"]*\""
}
(NR == 1) {print} #header of csv
(NR > 1) {
if (length($0) > 1) { #exclude empty lines
count = 0
n = split($5,name,/\s*/)
email_local_part = tolower(substr(name[1],1,1) name[n])
#array stores emails from csv file
a[i++] = email_local_part
#find amount of occurrences of the same email address
for (el in a) {
ret=match(a[el], email_local_part)
if (ret == 1) { count++ }
}
#add number of occurrence to email address
if (count == 1) { $7 = email_local_part "#abc.com" }
else { --count; $7 = email_local_part count "#abc.com" }
print
}
}
' "${#:--}" > new.csv

Copy one csv header to another csv with type modification

I want to copy one csv header to another in row wise with some modifications
Input csv
name,"Mobile Number","mobile1,mobile2",email2,Address,email21
test, 123456789,+123456767676,a#test.com,testaddr,a1#test.com
test1,7867778,8799787899898,b#test,com, test2addr,b2#test.com
In new csv this should be like this and file should also be created. And for sting column I will pass the column name so only that column will be converted to string
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
As you see above all these header with type modification should be inserted in different rows
I have tried with below command but this is only for copy first row
sed '1!d' input.csv > output.csv
You may try this alternative gnu awk command as well:
awk -v FPAT='"[^"]+"|[^,]+' 'NR == 1 {
for (i=1; i<=NF; ++i)
print gensub(/"/, "", "g", $i) "." ($i ~ /,/ ? "string" : "auto") "()"
exit
}' file
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
Or using sed:
sed -i -e '1i 1234567890.string(),My address is test.auto(),abc3#gmail.com.auto(),120000003.auto(),abc-003.auto(),3.com.auto()' -e '1d' test.csv
EDIT: As per OP's comment to print only first line(header) please try following.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
exit
}
' Input_file > output_file
Could you please try following, written and tested with GUN awk with shown samples.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program and setting FPAT to [^,]*|"[^"]+".
FNR==1{ ##Checking condition if this is first line then do following.
for(i=1;i<=NF;i++){ ##Running for loop from i=1 to till NF value.
if($i~/^".*,.*"$/){ ##Checking condition if current field starts from " and ends with " and having comma in between its value then do following.
gsub(/"/,"",$i) ##Substitute all occurrences of " with NULL in current field.
print $i".string()" ##Printing current field and .string() here.
}
else{ ##else do following.
print $i".auto()" ##Printing current field dot auto() string here.
}
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.

Editing text in Bash

I am trying to edit text in Bash, i got to point where i am no longer able to continue and i need help.
The text i need to edit:
Symbol Name Sector Market Cap, $K Last Links
AAPL
Apple Inc
Computers and Technology
2,006,722,560
118.03
AMGN
Amgen Inc
Medical
132,594,808
227.76
AXP
American Express Company
Finance
91,986,280
114.24
BA
Boeing Company
Aerospace
114,768,960
203.30
The text i need:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
I already tried :
sed 's/$/,/' BIPSukol.txt > BIPSukol1.txt | awk 'NR==1{print}' BIPSukol1.txt | awk '(NR-1)%5{printf "%s ", $0;next;}1' BIPSukol1.txt | sed 's/.$//'
But it doesnt quite do the job.
(BIPSukol1.txt is the name of the file i am editing)
The biggest problem you have is you do not have consistent delimiters between your fields. Some have commas, some don't and some are just a combination of 3-fields that happen to run together.
The tool you want is awk. It will allow you to treat the first line differently and then condition the output that follows with convenient counters you keep within the script. In awk you write rules (what comes between the outer {...} and then awk applies your rules in the order they are written. This allows you to "fix-up" your hap-hazard format and arrive at the desired output.
The first rule applied FNR==1 is applied to the 1st line. It loops over the fields and finds the problematic "Market Cap $K" field and considers it as one, skipping beyond it to output the remaining headings. It stores a counter count = NF - 3 as you only have 5 lines of data for each Symbol, and skips to the next record.
When count==n the next rule is triggered which just outputs the records stored in the a[] array, zeros count and deletes the a[] array for refilling.
The next rule is applied to every record (line) of input from the 2nd-on. It simply removes any whitespece from the fields by forcing awk to recalculate the fields with $1 = $1 and then stores the record in the array incrementing count.
The last rule, END is a special rule that runs after all records are processed (it lets you sum final tallies or output final lines of data) Here it is used to output the records that remain in a[] when the end of the file is reached.
Putting it altogether in another cut at awk:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
for (i=1;i<=n;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
delete a
count = 0
}
{
$1 = $1
a[++count] = $0
}
END {
for (i=1;i<=count;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
}
' file
Example Use/Output
Note: you can simply select-copy the script above and then middle-mouse-paste it into an xterm with the directory set so it contains file (you will need to rename file to whatever your input filename is)
$ awk '
> FNR==1 {
> for (i=1;i<=NF;i++)
> if ($i == "Market") {
> printf ",Market Cap $K"
> i = i + 2
> }
> else
> printf (i>1?",%s":"%s"), $i
> print ""
> n = NF-3
> count = 0
> next
> }
> count==n {
> for (i=1;i<=n;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> delete a
> count = 0
> }
> {
> $1 = $1
> a[++count] = $0
> }
> END {
> for (i=1;i<=count;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> }
> ' file
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
(note: it is unclear why you want the "Links" heading included since there is no information for that field -- but that is how your desired output is specified)
More Efficient No Array
You always have afterthoughts that creep in after you post an answer, no different than remembering a better way to answer a question as you are walking out of an exam, or thinking about the one additional question you wished you would have asked after you excuse a witness or rest your case at trial. (there was some song that captured it -- a little bit ironic :)
The following does essentially the same thing, but without using arrays. Instead it simply outputs the information after formatting it rather than buffer it in an array for output all at once. It was one of those type afterthoughts:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
print ""
count = 0
}
{
$1 = $1
printf (++count>1?",%s":"%s"), $0
}
END { print "" }
' file
(same output)
With your shown samples, could you please try following(written and tested in GNU awk). Considering that(by seeing OP's attempts) after header of Input_file you want to make every 5 lines into a single line.
awk '
BEGIN{
OFS=","
}
FNR==1{
NF--
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
OR if your awk doesn't support NF-- then try following.
awk '
BEGIN{
OFS=","
}
FNR==1{
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +Links( +)?$/,"",lastPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
NOTE: Looks like your header/first line needed special manipulation because we can't simply set , for all spaces, so taken care of it in this solution as per shown samples.
With GNU awk. If your first line is always the same.
echo 'Symbol,Name,Sector,Market Cap $K,Last,Links'
awk 'NR>1 && NF=5' RS='\n ' ORS='\n' FS='\n' OFS=',' file
Output:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

awk get the nextline

i'm trying to use awk to format a file thats contains multiple line.
Contains of file:
ABC;0;1
ABC;0;0;10
ABC;0;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12
KLM;6;18;1200
KLM;10;18;14
KLM;1;18;15
result desired:
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
I am using the code below :
awk -F ";" '{
ligne= ligne $0
ma_var = $1
{getline
if($1 != ma_var){
ligne= ligne "\n" $0
}
else {
ligne= ligne";"NF
}
}
}
END {
print ligne
} ' ${FILE_IN} > ${FILE_OUT}
the objectif is to compare the first column of the next line to the first column the current line, if it matches then add the last column of the next line to the current line, and delete the next line, else print the next line.
Kind regards,
As with life, it's a lot easier to make decisions based on what has happened (the previous line) than what will happen (the next line). Re-state your requirements as the objective is to compare the first column of the current line to the first column the previous line, if it matches then add the last column of the current line to the previous line, and delete the current line, else print the current line. and the code to implement it becomes relatively straight-forward:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 == p1 { prev = prev OFS $NF; next }
{ if (NR>1) print prev; prev=$0; p1=$1 }
END { print prev }
$ awk -f tst.awk file
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
If you're ever tempted to use getline again, be sure you fully understand everything discussed at http://awk.freeshell.org/AllAboutGetline before making a decision.
I would take a slightly different approach than Ed:
$ awk '$1 == p { printf ";%s", $NF; next } NR > 1 { print "" } {p=$1;
printf "%s" , $0} END{print ""}' FS=\; input
At each line, check if the first column matches the previous. If it does, just print the last field. If it doesn't, print the whole line with no trailing newline.

Resources