I have this file:
2016,05,P,0002 ,CJGLOPSD8
00,BBF,BBDFTP999,051000100,GBP, , -2705248.00
00,BBF,BBDFTP999,059999998,GBP, , -3479679.38
00,BBF,BBDFTP999,061505141,GBP, , -0.40
00,BBF,BBDFTP999,061505142,GBP, , 6207621.00
00,BBF,BBDFTP999,061505405,GBP, , -0.16
00,BBF,BBDFTP999,061552000,GBP, , -0.24
00,BBF,BBDFTP999,061559010,GBP, , -0.44
00,BBF,BBDFTP999,062108021,GBP, , -0.34
00,BBF,BBDFTP999,063502007,GBP, , -0.28
I want to programmatically (in unix, or informatica if possible) grab the first two fields in the top row, concatenate them, append them to the end of each line and remove that first row.
Like so:
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
This is my current attempt:
awk -vvar1=`cat OF\ OPSDOWN8.CSV | head -1 | cut -d',' -f1` -vvar2=`cat OF\ OPSDOWN8.CSV | head -1 | cut -d',' -f2` 'BEGIN {FS=OFS=","} {print $0, var 1var2}' OF\ OPSDOWN8.CSV> OF_OPSDOWN8.csv
Any pointers? I've tried looking around the forum but can only find answers to part of my question.
Thanks for your help.
Use this awk:
awk 'BEGIN{FS=OFS=","} NR==1{val=$1$2;next} {gsub(/ */,"");print $0,val}' file
Explanation:
BEGIN{FS=OFS=","} - This block will set FS (Field Separator) and OFS (Output Field Separator) as ,.
NR==1 - Working with line number 1. Here, $1 and $2 denotes field number.
print $0,val - Printing $0 (whole line) and stored value from val.
I would use the following awk command:
awk 'NR==1{d=$1$2;next}{$(NF+1)=d;gsub(/[[:space:]]/,"")}1' FS=, OFS=, file
Explanation:
NR==1{d=$1$2;next} applies on line 1 and set's a variable d(ate) to the value of the first and the second field. The variable is being used when processing the remaining lines. next tells awk to go ahead with the next line right away without processing further instructions on this line.
{$(NF+1)=d;gsub(/[[:space:]]/,"")}1 appends a new field to the line (NF is the number of fields, assigning d to $(NF+1) effectively adds a field. gsub() is used to removing spaces. 1 at the end always evaluates to true and makes awk print the modified line.
FS=, is a command line argument. It set's the input field delimiter to ,.
OFS=, is a command line argument. It set's the output field delimiter to ,.
Output:
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
With sed :
sed '1{s/\([^,]*\),\([^,]*\),.*/\1\2/;h;d};/.*/G;s/\n/,/;s/ //g' file
in ERE mode :
sed -r '1{s/([^,]*),([^,]*),.*/\1\2/;h;d};/.*/G;s/\n/,/;s/ //g' file
Output :
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
This might work for you (GNU sed):
sed '1s/,//;1s/,.*//;1h;1d;s/ //g;G;s/\n/,/' file
For the first line only: remove the first comma, remove from the next comma to the end of the line, store the amended line in the hold space (HS) and then delete the current line (the d abruptly ends processing). For subsequent lines: remove all spaces, append the HS and replace the newline (from the G command) with a comma.
Or if you prefer:
sed '1{s/,//;s/,.*//;h;d};s/ //g;G;s/\n/,/' file
If you want to use Informatica for this, use two Source Qualifiers. Read the file twice - just one line in one SQ (filter out the rest) and in the second SQ read the whole file except the first line (skip header). Join the two on dummy port and you're done.
I work (unix, shell scripts) with txt files that are millions field separate by pipe and not separated by \n or \r.
something like this:
field1a|field2a|field3a|field4a|field5a|field6a|[...]|field1d|field2d|field3d|field4d|field5d|field6d|[...]|field1m|field2m|field3m|field4m|field5m|field6m|[...]|field1z|field2z|field3z|field4z|field5z|field6z|
All text is in the same line.
The number of fields is fixed for every file.
(in this example I have field1=name; field2=surname; field3=mobile phone; field4=email; field5=office phone; field6=skype)
When I need to find a field (ex field2), command like grep doesn't work (in the same line).
I think that a good solution can be do a script that split every 6 field with a "\n" and after do a grep. I'm right? Thank you very much!
With awk :
$ cat a
field1a|field2a|field3a|field4a|field5a|field6a|field1d|field2d|field3d|field4d|field5d|field6d|field1m|field2m|field3m|field4m|field5m|field6m|field1z|field2z|field3z|field4z|field5z|field6z|
$ awk -F"|" '{for (i=1;i<NF;i=i+6) {for (j=0; j<6; j++) printf $(i+j)"|"; printf "\n"}}' a
field1a|field2a|field3a|field4a|field5a|field6a|
field1d|field2d|field3d|field4d|field5d|field6d|
field1m|field2m|field3m|field4m|field5m|field6m|
field1z|field2z|field3z|field4z|field5z|field6z|
Here you can easily set the length of line.
Hope this helps !
you can use sed to split the line in multiple lines:
sed 's/\(\([^|]*|\)\{6\}\)/\1\n/g' input.txt > output.txt
explanation:
we have to use heavy backslash-escaping of (){} which makes the code slightly unreadable.
but in short:
the term (([^|]*|){6}) (backslashes removed for readability) between s/ and /\1, will match:
[^|]* any character but '|', repeated multiple times
| followed by a '|'
the above is obviously one column and it is grouped together with enclosing parantheses ( and )
the entire group is repeated 6 times {6}
and this is again grouped together with enclosing parantheses ( and ), to form one full set
the rest of the term is easy to read:
replace the above (the entire dataset of 6 fields) with \1\n, the part between / and /g
\1 refers to the "first" group in the sed-expression (the "first" group that is started, so it's the entire dataset of 6 fields)
\n is the newline character
so replace the entire dataset of 6 fields by itself followed by a newline
and do so repeatedly (the trailing g)
you can use sed to convert every 6th | to a newline.
In my version of tcsh I can do:
sed 's/\(\([^|]\+|\)\{6\}\)/\1\n/g' filename
consider this:
> cat bla
a1|b2|c3|d4|
> sed 's/\(\([^|]\+|\)\{6\}\)/\1\n/g' bla
a1|b2|
c3|d4|
This is how the regex works:
[^|] is any non-| character.
[^|]\+ is a sequence of at least one non-| characters.
[^|]\+| is a sequence of at least one non-| characters followed by a |.
\([^|]\+|\) is a sequence of at least one non-| characters followed by a |, grouped together
\([^|]\+|\)\{6\} is 6 consecutive such groups.
\(\([^|]\+|\)\{6\}\) is 6 consecutive such groups, grouped together.
The replacement just takes this sequence of 6 groups and adds a newline to the end.
Here is how I would do it with awk
awk -v RS="|" '{printf $0 (NR%7?RS:"\n")}' file
field1a|field2a|field3a|field4a|field5a|field6a|[...]
field1d|field2d|field3d|field4d|field5d|field6d|[...]
field1m|field2m|field3m|field4m|field5m|field6m|[...]
field1z|field2z|field3z|field4z|field5z|field6z|
Just adjust the NR%7 to number of field you to what suites you.
What about printing the lines on blocks of six?
$ awk 'BEGIN{FS=OFS="|"} {for (i=1; i<=NF; i+=6) {print $(i), $(i+1), $(i+2), $(i+3), $(i+4), $(i+5)}}' file
field1a|field2a|field3a|field4a|field5a|field6a
field1d|field2d|field3d|field4d|field5d|field6d
field1m|field2m|field3m|field4m|field5m|field6m
field1z|field2z|field3z|field4z|field5z|field6z
Explanation
BEGIN{FS=OFS="|"} set input and output field separator as |.
{for (i=1; i<=NF; i+=6) {print $(i), $(i+1), $(i+2), $(i+3), $(i+4), $(i+5)}} loop through items on blocks of 6. Every single time, print six of them. As print end up writing a new line, then you are done.
If you want to treat the files as being in multiple lines, then make \n the field separator. For example, to get the 2nd column, just do:
tr \| \\n < input-file | sed -n 2p
To see which columns match a regex, do:
tr \| \\n < input-file | grep -n regex