how can i modify a csv file to add new header using bash command? - bash

Hi currently i have a csv file . i want to add a new header field named budget and values as True for all records. here is my csv file.
id,address1,address2,address3,address4,addressprofile,administrator,averageclickthroughrate,contactnumber,contractid,country,createdby,createdon,currency,customercontactnumber,customerid,defaultlanguage,features,internal,inventories,lastupdated,lastupdatedby,logo,name,status,testmessagecontactlist,testmessagelimit,usedefaultclickthroughrate,zipcode
d4385ff7-247f-407a-97c6-366d8128c6c7,,,,,eb0137fc-b279-11e8-8753-570ce0b5ef9b,92059277-e2ad-4cf0-a941-0f0b52bf3421,40,,,,ab4e0287-6973-4eec-bd03-cf3669c535d0,2019-01-08 08:48:36.353+0000,,,,b04265e6-c114-470c-8bb0-d10879655ec9,[],True,"[bdf7fad0-b8cd-4a9a-9c9d-48261fd5e7c7, be25104b-90d1-4076-bb4b-44c756d06d20]",2019-04-05 09:38:15.322+0000,3363a3ad-f52a-4a8b-bc52-7a069bab31d9,,OTT,ACTIVE,ca6b6808-111c-49ac-90ac-44078e8e3db0,5,True,
this is the following result i am expecting.
id,address1,address2,address3,address4,addressprofile,administrator,averageclickthroughrate,budget,contactnumber,contractid,country,createdby,createdon,currency,customercontactnumber,customerid,defaultlanguage,features,internal,inventories,lastupdated,lastupdatedby,logo,name,status,testmessagecontactlist,testmessagelimit,usedefaultclickthroughrate,zipcode
d4385ff7-247f-407a-97c6-366d8128c6c7,,,,,eb0137fc-b279-11e8-8753-570ce0b5ef9b,92059277-e2ad-4cf0-a941-0f0b52bf3421,40,,True,,,ab4e0287-6973-4eec-bd03-cf3669c535d0,2019-01-08 08:48:36.353+0000,,,,b04265e6-c114-470c-8bb0-d10879655ec9,[],True,"[bdf7fad0-b8cd-4a9a-9c9d-48261fd5e7c7, be25104b-90d1-4076-bb4b-44c756d06d20]",2019-04-05 09:38:15.322+0000,3363a3ad-f52a-4a8b-bc52-7a069bab31d9,,OTT,ACTIVE,ca6b6808-111c-49ac-90ac-44078e8e3db0,5,True,
how can i do using shell scripting
thank you

awk can easily do the job, similar to #kvantour link:
awk 'BEGIN{FS = OFS = ","} {$8 = $8 FS (NR == 1 ? "budget" : "true")}1'
where FS: field separator, OFS: output field separator, NR: current row number
Example: https://ideone.com/oRQqhi

Related

How to process file content differently for each line using shell script?

I have a file which has this data -
view:
schema1.view1:/some-path/view1.sql
schema2.view2:/some-path/view2.sql
tables:
schema1.table1:/some-path/table1.sql
schema2.table2:/some-path/table2.sql
end:
I have to read the file and store the contents in different variables.
viewData=$(sed '/view/,/tables/!d;/tables/q' $file|sed '$d')
tableData=$(sed '/tables/,/end/!d;/end/q' $file|sed '$d')
echo $viewData
view:
schema1.view1:/some-path/view1.sql
schema2.view2:/some-path/view2.sql
echo $tableData
tables:
schema1.table1:/some-path/table1.sql
schema2.table2:/some-path/table2.sql
dataArray=("$viewData" "$tableData")
I need to use a for loop over dataArray so that I get all the components in 4 different variables.
Lets say for $viewData, the loop should be able to print like this -
objType=view
schema=schema1
view=view1
fileLoc=some-path/view1.sql
objType=view
schema=schema2
view=view2
fileLoc=some-path/view2.sql
I have tried sed and cut commands but that is not working properly. And I need to do this using shell script only.
Any help will be appreciated. Thanks!
remark: If you add a space character between the : and / in the input then you would be able to use YAML-aware tools for parsing it robustly.
Given your sample input you, can use this awk for generating the expected blocks:
awk '
match($0,/[^[:space:]]+:/) {
key = substr($0,RSTART,RLENGTH-1)
val = substr($0,RSTART+RLENGTH)
if (i = index(key,".")) {
print "objType=" type
print "schema=" substr(key,1,i-1)
print "view=" substr(key,i+1)
print "fileLoc=" val
printf "%c", 10
} else
type = key
}
' data.txt
objType=view
schema=schema1
view=view1
fileLoc=/some-path/view1.sql
objType=view
schema=schema2
view=view2
fileLoc=/some-path/view2.sql
objType=tables
schema=schema1
view=table1
fileLoc=/some-path/table1.sql
objType=tables
schema=schema2
view=table2
fileLoc=/some-path/table2.sql

(sed/awk) extract values text file and write to csv (no pattern)

I have (several) large text files from which I want to extract some values to create a csv file with all of these values.
My current solution is to have a few different calls to sed from which I save the values and then have a python script in which I combine the data in different files to a single csv file. However, this is quite slow and I want to speed it up.
The file let's call it my_file_1.txt has a structure that looks something like this
lines I don't need
start value 123
lines I don't need
epoch 1
...
lines I don't need
some epoch 18 words
stop value 234
lines I don't need
words start value 345 more words
lines I don't need
epoch 1
...
lines I don't need
epoch 72
stop value 456
...
and I would like to construct something like
file,start,stop,epoch,run
my_file_1.txt,123,234,18,1
my_file_1.txt,345,456,72,2
...
How can I get the results I want? It doesn't have to be Sed or Awk as long as I don't need to install something new and it is reasonably fast.
I don't really have any experience with awk. With sed my best guess would be
filename=$1
echo 'file,start,stop,epoch,run' > my_data.csv
sed -n '
s/.*start value \([0-9]\+\).*/'"$filename"',\1,/
h
$!N
/.*epoch \([0-9]\+\).*\n.*stop value\([0-9]\+\)/{s/\2,\1/}
D
T
G
P
' $filename | sed -z 's/,\n/,/' >> my_data.csv
and then deal with not getting the run number. Furthermore, this is not quite correct as the N will gobble up some "start value" lines leading to wrong result. It feels like it could be done easier with awk.
It is similar to 8992158 but I can't use that pattern and I know too little awk to rewrite it.
Solution (Edit)
I was not general enough in my description of the problem so I changed it up a bit and fixed some inconsistensies.
Awk (Rusty Lemur's answer)
Here I generalised from knowing that the numbers were at the end of the line to using gensub. For this I should have specified version of awk at is not available in all versions.
BEGIN {
counter = 1
OFS = "," # This is the output field separator used by the print statement
print "file", "start", "stop", "epoch", "run" # Print the header line
}
/start value/ {
startValue = gensub(/.*start value ([0-9]+).*/, "\\1", 1, $0)
}
/epoch/ {
epoch = gensub(/.*epoch ([0-9]+).*/, "\\1", 1, $0)
}
/stop value/ {
stopValue = gensub(/.*stop value ([0-9]+).*/, "\\1", 1, $0)
# we have everything to print our line
print FILENAME, startValue, stopValue, epoch, counter
counter = counter + 1
startValue = "" # clear variables so they aren't maintained through the next iteration
epoch = ""
}
I accepted this answer because it most understandable.
Sed (potong's answer)
sed -nE '1{x;s/^/file,start,stop,epock,run/p;s/.*/0/;x}
/^.*start value/{:a;N;/\n.*stop value/!ba;x
s/.*/expr & + 1/e;x;G;F
s/^.*start value (\S+).*\n.*epoch (\S+)\n.*stop value (\S+).*\n(\S+)/,\1,\3,\2,\4/p}' my_file_1.txt | sed '1!N;s/\n//'
It's not clear how you'd get exactly the output you provided from the input you provided but this may be what you're trying to do (using any awk in any shell on every Unix box):
$ cat tst.awk
BEGIN {
OFS = ","
print "file", "start", "stop", "epoch", "run"
}
{ f[$1] = $NF }
$1 == "stop" {
print FILENAME, f["start"], f["stop"], f["epoch"], ++run
delete f
}
$ awk -f tst.awk my_file_1.txt
file,start,stop,epoch,run
my_file_1.txt,123,234,N,1
my_file_1.txt,345,456,M,2
awk's basic structure is:
read a record from the input (by default a record is a line)
evaluate conditions
apply actions
The record is split into fields (by default based on whitespace as the separator).
The fields are referenced by their position, starting at 1. $1 is the first field, $2 is the second.
The last field is referenced by a variable named NF for "number of fields." $NF is the last field, $(NF-1) is the second-to-last field, etc.
A "BEGIN" section will be executed before any input file is read, and it can be used to initialize variables (which are implicitly initialized to 0).
BEGIN {
counter = 1
OFS = "," # This is the output field separator used by the print statement
print "file", "start", "stop", "epoch", "run" # Print the header line
}
/start value/ {
startValue = $NF # when a line contains "start value" store the last field as startValue
}
/epoch/ {
epoch = $NF
}
/stop value/ {
stopValue = $NF
# we have everything to print our line
print FILENAME, startValue, stopValue, epoch, counter
counter = counter + 1
startValue = "" # clear variables so they aren't maintained through the next iteration
epoch = ""
}
Save that as processor.awk and invoke as:
awk -f processor.awk my_file_1.txt my_file_2.txt my_file_3.txt > output.csv
This might work for you (GNU sed):
sed -nE '1{x;s/^/file,start,stop,epock,run/p;s/.*/0/;x}
/^start value/{:a;N;/\nstop value/!ba;x
s/.*/expr & + 1/e;x;G;F
s/^start value (\S+).*\nepoch (\S+)\nstop value (\S+).*\n(\S+)/,\1,\3,\2,\4/p}' file |
sed '1!N;s/\n//'
The solution contains two invocations of sed, the first to format all but the file name and second to embed the file name into the csv file.
Format the header line on the first line and prime the run number.
Gather up lines between start value and stop value.
Increment the run number, append it to the current line and output the file name. This prints two lines per record, the first is the file name and the second the remainder of the csv file.
In the second sed invocation read two lines at a time (except for the first line) and remove the newline between them, formatting the csv file.

Add column from one file to another based on multiple matches while retaining unmatched

So I am really new to this kind of stuff (seriously, sorry in advance) but I figured I would post this question since it is taking me some time to solve it and I'm sure it's a lot more difficult than I am imagining.
I have the file small.csv:
id,name,x,y,id2
1,john,2,6,13
2,bob,3,4,15
3,jane,5,6,17
4,cindy,1,4,18
and another file big.csv:
id3,id4,name,x,y
100,{},john,2,6
101,{},bob,3,4
102,{},jane,5,6
103,{},cindy,1,4
104,{},alice,7,8
105,{},jane,0,3
106,{},cindy,1,7
The problem with this is I am attempting to put id2 of the small.csv into the id4 column of the big.csv only if the name AND x AND y match. I have tried using different awk and join commands in Git Bash but am coming up short. Again I am sorry for the newbie perspective on all of this but any help would be awesome. Thank you in advance.
EDIT: Sorry, this is what the final desired output should look like:
id3,id4,name,x,y
100,{13},john,2,6
101,{15},bob,3,4
102,{17},jane,5,6
103,{18},cindy,1,4
104,{},alice,7,8
105,{},jane,0,3
106,{},cindy,1,7
And one of the latest trials I did was the following:
$ join -j 1 -o 1.5,2.1,2.2,2.3,2.4,2.5 <(sort -k2 small.csv) <(sort -k2 big.csv)
But I received this error:
join: /dev/fd/63: No such file or directory
Probably not trivial to solve with join but fairly easy with awk:
awk -F, -v OFS=, ' # set input and output field separators to comma
# create lookup table from lines of small.csv
NR==FNR {
# ignore header
# map columns 2/3/4 to column 5
if (NR>1) lut[$2,$3,$4] = $5
next
}
# process lines of big.csv
# if lookup table has mapping for columns 3/4/5, update column 2
v = lut[$3,$4,$5] {
$2 = "{" v "}"
}
# print (possibly-modified) lines of big.csv
1
' small.csv big.csv >bignew.csv
Code assumes small.csv contains only one line for each distinct column 2/3/4.
NR==FNR { ...; next } is a way to process contents of the first file argument. (FNR is less than NR when processing lines from second and subsequent file arguments. next skips execution of the remaining awk commands.)

Converting CSV file to multiline text file

I have file which looks like following:
C_DocType_ID,SOReference,DocumentNo,ProductValue,Quantity,LineDescription,C_Tax_ID,TaxAmt
1000000,1904093563U,1904093563U,5210-1,1,0,1000000,0
1000000,1904093563U,1904093563U,6511,2,0,1000000,0
1000000,1904093563U,1904093563U,5001,1,0,1000000,0
1000000,1904083291U,1904083291U,5310,4,0,1000000,0
1000000,1904083291U,1904083291U,5311,3,0,1000000,0
1000000,1904083291U,1904083291U,6101,6,0,1000000,0
1000000,1904083291U,1904083291U,6102,1,0,1000000,0
1000000,1904083291U,1904083291U,6106,6,0,1000000,0
I need to convert it to text file which looks like this:
WOH~1.0~~1904093563Utest~~~ORD~~~~
WOL~~~5210-1~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~
WOL~~~6511~~~~~~~~2~~~~~~~~~~~~~~~~~~~~~
WOL~~~5001~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~
WOH~1.0~~1904083291Utest~~~ORD~~~~~~
WOL~~~5310~~~~~~~~4~~~~~~~~~~~~~~~~~~~~~
WOL~~~5311~~~~~~~~3~~~~~~~~~~~~~~~~~~~~~
WOL~~~6101~~~~~~~~6~~~~~~~~~~~~~~~~~~~~~
WOL~~~6102~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~
WOL~~~6106~~~~~~~~6~~~~~~~~~~~~~~~~~~~~~
The output file has header record and line item record. Header Record contains the SOReference and some hardcoded fields and the Line Item record contains the Product Value and Quantity associated to that SOReference . In the input file we have 2 unique SOReferences thats why the the output file contains 2 header record and their associated line items record.
Need something being done as a command line (awk/sed)? since I have a series of files like this one which need to be converted to text.
With AWK, please try the following:
awk -F, '
FNR==1 {next} # skip the header line
{
if ($2 != prevcol2) { # insert newline when SOReference changes
nl = FNR<=2 ? "" : "\n" # suppress the newline in the 1st line
printf("%sWOH~1.0~~%stest~~~ORD~~~~\n", nl, $2)
}
printf("WOL~~~%s~~~~~~~~%s~~~~~~~~~~~~~~~~~~~~~\n", $4, $5)
prevcol2 = $2
}' file.csv

Extract subset of a feed file with custom delimiter and create CSV file

I get a feed file in below format.
employee_id||034100151730105|L|
employee_cd||03410015|L|
dept_id||1730105|L|
dept_name||abc|L|
employee_firstname||pqr|L|
employee_lastname||ppp|L|
|R||L|
employee_id||034100151730108|L|
employee_cd||03410032|L|
dept_id||4230105|L|
dept_name||fdfd|L|
employee_firstname||sasas|L|
employee_lastname||dfdf|L|
|R||L|
.....
Is there any easy unix script to extract subset of fields and create a CSV like below..
employee_cd,employee_firstname,dept_name
03410015,pqr,abc
03410032,sasas,fdfd
.....
I would suggest awk solution (considering that dept_name item always goes before employee_firstname item):
awk -F'|' 'BEGIN{OFS=","; print "employee_cd,employee_firstname,dept_name";}
$1~/employee_cd|employee_firstname|dept_name/{ a[++c]=$3 }
END { for(i=1;i<length(a);i+=3) print a[i],a[i+2],a[i+1] }' file
The output:
employee_cd,employee_firstname,dept_name
03410015,pqr,abc
03410032,sasas,fdfd
Solution details:
OFS="," - setting output field separator
$1~/employee_cd|employee_firstname|dept_name/ - if first column matches one of the needed items
a[++c]=$3 - capturing an item value indexed by consequent position
for(i=1;i<length(a);i+=3) print a[i],a[i+2],a[i+1] - outputting item values by threes
To save the output as .csv file:
the above command > output.csv

Resources