awk on a text file to csv - bash

I have a massive text file that I need to convert to an CSV file so I can import it to an MySQL database.
The text file looks like this:
Original text file
VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;
Wanted result
I will need to convert it into a CSV file that looks like this:
type; Productnumber; Productname; Description; measurement_unit; price_unit; price_unit_txt; price; crowd; price_date; status; block_number; discount_group; manufac; type; stocked; sales_package; discount; price_type; PDF; IMAGE; baseinfo; WEIGHT; VOLUME; dimensjon; UNSPSC; va_2; va_3;
1; 1001; Productname 1; Description 1; 2; MTR; METER; 217883; 10000; 20180402 1; 010206; &10; PRODUCER1; ; N; 10000; ; ; ; ; ; ; ; ; ; 4044773815245; 0036453;
1; 1002; Productname 2; Description 2; 2; MTR; METER; 140365; 10000; 20180402; 1; 010206; &10; PRODUCER2 ; N; 10000; ; ; ; ; ; 7500; 3249; 57x57x1000; ; 4044773452884; 0036479;
1; 1003; Productname 3; Description ABC 3; 2; MTR; METER; 1575; 10000; 20171006; 1; 010606; &10; PRODUCER3; ; N; 10000; ; ; 1003.pdf; 1003.png; http://127.0.0.1/1003/; 20; ; 0x7x0; 26121616; 7070613017149; 1000116;
Explanation of the original file
The fist product line is always starting with VL and then continue in this order:
type;Productnumber;Productname;Description;measurement_unit;price_unit;price_unit_txt;price;crowd;price_date;status;block_number;discount_group;manufac;type;stocked;sales_package;discount;price_type;
PDF is always on a new line starting with VX;PDF;
IMAGE is always on a new line starting with VX;IMAGE;
baseinfo is always on a new line starting with VX;BASEINFO;
WEIGHT is always on a new line starting with VX;WEIGHT;
VOLUME is always on a new line starting with VX;VOLUME;
dimensjon is always on a new line starting with VX;DIMENSJON;
UNSPSC is always on a new line starting with VX;UNSPSC;
va_2 is always on a new line starting with VA;2;
va_3 is always on a new line starting with VA;3;
Hope someone can help me out with this :)

a way possible (not the wole solution)
#!/bin/bash
awk -F';' '
function init() {
# formation line to print_line
line = vl pdf image baseinfo weight volume dimensjon unspsc va_2 va_3
# erase ^M (\r)
gsub( /\r/;"";line )
# print a block
print line
# initialisation variables
vl = pdf = image = baseinfo = weight = volume = dimensjon = unspsc = va_2 = va_3 = ";"
}
# head/title, note that "%12s" format with 12 characters width
BEGIN { printf ( "%12s; %s; %s; %s; %s; %s; %s; %s; %s; %s;","vl","pdf","image","baseinfo ","weight","volume","dimensjon","unspsc","va_2","va_3" ) }
/^VL/ { init(); ; vl = sprintf( "%12s; %s; %s; %s; ", $3, $4, $5, $6 ) }
/^VX;WEIGHT;/ { weight = sprintf( "%s; ", $3 )}
# .. another conditions
END { init() }
' file.dat # > outputfile.csv
for test:
cat << end > file.dat
VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;
end
ouput
vl; pdf; image; baseinfo ; weight; volume; dimensjon; unspsc; va_2; va_3;
1001; Productname 1; Description 1; 2; ;;;;;;;;;
1002; Productname 2; This is product decrtiption for 2 product; 2; ;;;7500; ;;;;;
1003; Productname 3; Description......; 2; ;;;20; ;;;;;

Related

Find in Files - Number per file

In EMEditor, is there a way to get the number of occurrences of a "find in files" search per file? In other words, it finds 10,000 "hits" across 25 files, I'd like to know that 1200 where in file1 etc.
Notepad++ does a great job of this by allowing you to collapse the results by file and showing a summary for each, but I haven't seen a way to get the information in EMEditor.
After Find in Files, you can run this macro while the results document is active. Save this code as, for instance, statistics.jsee, and then select this file from Select... in the Macros menu. Finally, do Find in Files, and select Run in the Macros menu while the results document is active.
// Creates statistics from Find in Files Results.
// 2020-06-27
Redraw = false;
sOutput = "";
y = 1;
yMax = document.GetLines();
for( ;; ) {
document.selection.SetActivePoint( eePosLogical, 1, y++ );
document.selection.Mode = eeModeStream | eeModeKeyboard;
bFound = document.selection.Find("\\(\\d+?\\)\\:",eeFindNext | eeFindReplaceCase | eeFindReplaceRegExp,0);
document.selection.Mode = eeModeStream;
if( !bFound ) {
break;
}
sFile = document.selection.Text;
n = sFile.lastIndexOf("(");
sFile = sFile.substr( 0, n );
nCount = 1;
for( ;; ) {
document.selection.SetActivePoint( eePosLogical, 1, y );
sLine = document.GetLine( y );
if( sLine.length > sFile.length && sLine.substr( 0, sFile.length ) == sFile ) {
++nCount;
++y;
}
else {
sOutput += sFile + "\t" + nCount + "\n";
break;
}
}
}
document.selection.Mode = eeModeStream;
Redraw = true;
editor.NewFile();
document.write( sOutput );
editor.ExecuteCommandByID(4471); // switch to TSV mode

Split and format text output with separators Xamarin Forms

I have an entry and label I want to format my text to my label like this:
"email#gmail.com", "email2#gmail.com", "email3#gmail.com"
this is what I enter in my entry field:
email#gmail.com /space/ email2#gmail.com /space/ email3#gmail.com or
email#gmail.com,email2#gmail.com,email3#gmail.com
The separator is a space or comma. How can I format my output to the one above?
Good question!
string entry = Entry.Text;
List<string> arrayfromEntry = new List<string>();
if (entry.Contains(" ") == true){
arrayfromEntry = entry.Split(new char[] { ' ' }).ToList();
}
else{
arrayfromEntry = entry.Split(new char[] { ',' }).ToList();
}
for (int i = 0; i < arrayfromEntry.Count(); i++){
arrayfromEntry[i] = '"' + arrayfromEntry[i] + '"';
}
string f = (string.Join(", ",arrayfromEntry));
f = f.Remove(f.Count()-2,2);
f = f+'"';
textToLabel = f;
Where Entry.Text is the text from your entry and textToLabel changes the text of your label, this should work.
Based on the #jamesfdearborn answer, but using StringBuilder instead
string entry == "aaaa#ttttt.com,bbbb#ttttttyyy.com,tttt#errrer.com,yyyyyy#rrrttr.com,uuuuu#yuyuy.com";
var inputSeparator = ','; //comma is the separator in this case you can change it
var outputSeparator = ',';
var arrayfromEntry = entry.Split(inputSeparator).ToList();
var sb = new StringBuilder();
for (int i = 0; i < arrayfromEntry.Count(); i++)
{
sb.AppendFormat("\"{0}\"{1}",arrayfromEntry[i],outputSeparator);
}
sb.Remove(sb.ToString().Count()-1, 1);
sb.ToString() //result here
//output
//"aaaa#ttttt.com","bbbb#ttttttyyy.com","tttt#errrer.com","yyyyyy#rrrttr.com","uuuuu#yuyuy.com"
you can change the output or the the input separator

Having an issue with YACC grammar

The code I wrote seems to not be able to detect a function. I tried many edits but nothing seems to be working.
program : function-decl | decl | function-def
;
decl : kind var-list SEMICOLON
{
tok_type = "variable";
}
;
kind : KW_INT {integer = true; floatType = false;}
| KW_FLOAT {integer = false; floatType = true;}
;
var-list : ID varmany
{
tok_type = "variable";
t.check_token (tok_type, $1, line_no, bodyCheck, parameter);
}
;
varmany : /*empty*/ | varmany COMMA ID
{
tok_type = "variable";
t.check_token (tok_type, $3, line_no, bodyCheck, parameter );
}
;
function-decl : kind ID LPAR kind RPAR SEMICOLON
{
current_func = $2;
declaration = true;
parameter= false;
tok_type= "function";
t.check_token (tok_type, current_func, line_no, bodyCheck, parameter );
current_func ="";
}
;
function-def : kind ID LPAR kind ID RPAR body
{
current_func = $2;
paramName = $5;
declaration = false;
parameter= false;
tok_type= "function";
t.check_token (tok_type, current_func, line_no, bodyCheck, parameter );
tok_type = "variable";
parameter=true;
t.check_token(tok_type, paramName, line_no, bodyCheck, parameter);
current_func ="";
}
;
For example, for text input :
int main (int DUMMY) {
int x,y,z;
float p;
int main (int x){x = y;}
p = -z * (x/345+y*1.0) + - 300;
p = -z * (x/345+y*1.0) + -300;
while (p>=-(x+y)*3.45/6-z)
z = z + 3;
}
I get these error messages:
Local int variable y declared in line 3.
Local int variable z declared in line 3.
Local int variable x declared in line 3.
Local float variable p declared in line 5.
Local int variable main declared in line 6.
syntax error on line 6, matched: (
Local int variable x declared in line 6.
syntax error on line 6, matched: )
Your function_decl rule insists on exactly one parameter and does not allow for its name.

Unable to increment last 2 digit of variable declared in file using script

I have the file given below:
elix554bx.xayybol.42> vi setup.REVISION
# Revision information
setenv RSTATE R24C01
setenv CREVISION X3
exit
My requirement is to read RSTATE from file and then increment last 2 digits of RSTATE in setup.REVISION file and overwrite into same file.
Can you please suggest how to do this?
If you're using vim, then you can use the sequence:
/RSTATE/
$<C-a>:x
The first line is followed by a return and searches for RSTATE. The second line jumps to the end of the line and uses Control-a (shown as <C-a> above, and in the vim documentation) to increment the number. Repeat as often as you want to increment the number. The :x is also followed by a return and saves the file.
The only tricky bit is that the leading 0 on the number makes vim think the number is in octal, not decimal. You can override that by using :set nrformats= followed by return to turn off octal and hex; the default value is nrformats=octal,hex.
You can learn an awful lot about vim from the book Practical Vim: Edit Text at the Speed of Thought by Drew Neil. This information comes from Tip 10 in chapter 2.
Here's an awk one-liner type solution:
awk '{
if ( $0 ~ 'RSTATE' ) {
match($0, "[0-9]+$" );
sub( "[0-9]+$",
sprintf( "%0"RLENGTH"d", substr($0, RSTART, RSTART+RLENGTH)+1 ),
$0 );
print; next;
} else { print };
}' setup.REVISION > tmp$$
mv tmp$$ setup.REVISION
Returns:
setenv RSTATE R24C02
setenv CREVISION X3
exit
This will handle transitions from two to three to more digits appropriately.
I wrote for you a class.
class Reader
{
public string ReadRs(string fileWithPath)
{
string keyword = "RSTATE";
string rs = "";
if(File.Exists(fileWithPath))
{
StreamReader reader = File.OpenText(fileWithPath);
try
{
string line = "";
bool finded = false;
while (reader != null && !finded)
{
line = reader.ReadLine();
if (line.Contains(keyword))
{
finded = true;
}
}
int index = line.IndexOf(keyword);
rs = line.Substring(index + keyword.Length +1, line.Length - 1 - (index + keyword.Length));
}
catch (IOException)
{
//Error
}
finally
{
reader.Close();
}
}
return rs;
}
public int GetLastTwoDigits(string rsState)
{
int digits = -1;
try
{
int length = rsState.Length;
//Get the last two digits of the rsstate
digits = Int32.Parse(rsState.Substring(length - 2, 2));
}
catch (FormatException)
{
//Format Error
digits = -1;
}
return digits;
}
}
You can use this as exists
Reader reader = new Reader();
string rsstate = reader.ReadRs("C://test.txt");
int digits = reader.GetLastTwoDigits(rsstate);

awk syntax error: awk: line 29: syntax error at or near :

I have written a awk script and I keep on getting the following error:
awk: line 29: syntax error at or near :
I do not understand why I keep on getting this error.
The script is below(script is large but the error is only at the top section. Just added the script for completeness. A flag has been marked for the line a error).
#!/bin/sh
tshark -V -r $1 > .pcap_out1_ver.txt
tshark -r $1 > .pcap_out_summ.txt
awk -F ":" '
BEGIN {
#Packet types and subtypes.
frame_id[0] = "Association Request";
frame_id[1] = "Association Response";
frame_id[2] = "Association Response";
frame_id[3] = "Reassociation Response";
frame_id[4] = "Probe Request";
frame_id[5] = "Probe Response";
frame_id[6] = "Reserved";
frame_id[7] = "Reserved";
frame_id[8] = "Beacon";
frame_id[9] = "ATIM";
frame_id[10] = "Disassociation";
frame_id[11] = "Authentication";
frame_id[12] = "Deauthentication";
frame_id[13] = "Action";
for(x=14; x<24; ++x) {
frame_id[x] = "Reserved";
}
frame_id[24] = "Block Ack Request";
frame_id[25] = "Block Ack";
frame_id[26] = "PS-Poll";
frame_id[27] = "RTS"; #******Error here****
frame_id[28] = "CTS";
frame_id[29] = "ACK";
frame_id[30] = "CF-end";
frame_id[31] = "CF-end + CF-ack";
frame_id[32] = "Data";
frame_id[33] = "Data + CF-ack";
frame_id[34] = "Data + CF-poll";
frame_id[35] = "Data + CF-ack +CF-poll";
frame_id[36] = "Null";
frame_id[37] = "CF-ack";
frame_id[38] = "CF-poll";
frame_id[39] = "CF-ack + CF-poll";
frame_id[40] = "QoS data";
frame_id[41] = "QoS data + CF-ack";
frame_id[42] = "QoS data + CF-poll";
frame_id[43] = "QoS data + CF-ack + CF-poll";
frame_id[44] = "QoS Null";
frame_id[45] = "Reserved";
frame_id[46] = "QoS + CF-poll (no data)";
frame_id[47] = "Qos + CF-ack (no data)";
packet_type[0] = "Management";
packet_type[1] = "Control";
packet_type[2] = "Data";
#Variables for storing stats.
captured_length = 0;
for(x=0; x<50; ++x) {
count[x]=0;
traffic[x]=0;
}
#Counter for Epoch Time. Avg data rates.
next_mark=0;
j=0;
first_epoch_time = 0;
cur_epoch_time = 0;
#Counter for rssi values.
k=0;
}
{
gsub(/^[ \t]+/, "", $1);
if($1=="Frame Control") {
gsub(/^[ \t]+/, "", $2);
intRep = sprintf("%d", "0x" substr($2, 4, 2));
traffic[intRep] += captured_length;
count[intRep] += 1;
} else if($1=="Capture Length") {
gsub(/^[ \t]+/, "", $2);
gsub(/ [^\0]*/,"", $2);
captured_length = $2;
} else if($1=="Epoch Time") {
gsub(/^[ \t]+/, "", $2);
gsub(/ [^\0]*/, "", $2);
if(next_mark<$2) {
if(next_mark==0) {
next_mark = $2+60;
first_epoch_time = $2;
} else {
next_mark += 60;
j++;
}
#initialization of array element before using.
traffic_per_min[j] = 0;
count_per_min[j] = 0;
data_rate[j] = 0;
}
cur_epoch_time = $2;
traffic_per_min[j] += captured_length;
count_per_min[j] += 1;
} else if($1=="SSI signal") {
gsub(/^[ \t]+/, "", $2);
print "ssi signal"
if( substr($2, 0, 1) == "-") {
rssi_v[k] = $2;
rssi_t[k] = cur_epoch_time;
print rssi_v[k];
print rssi_t[k];
k++;
}
} else if($1=="Data Rate") {
gsub(/^[ \t]+/, "", $2);
gsub(/ [^\0]*/, "", $2);
data_rate_avg[j] += $2;
data_rate[k] = $2;
}
}
END {
# print "Packet Subtype" "No. of Packets" "Amount of traffic"
for(x=0; x<48; ++x) {
if(count[x] != 0) {
print frame_id[x]":"count[x]":"traffic[x];
}
}
print "-----"
for(x=0; x<=j; ++x) {
print x traffic_per_min[x]/count_per_min[x];
}
}
' .pcap_out1_ver.txt > .parsed.txt
awk -F " \t" '
BEGIN {
for(x=0; x<6; ++x)
count[6] = 0;
protocol[0] = "HTTP";
protocol[1] = "ARP";
protocol[2] = "SMTP";
protocol[3] = "DNS";
protocol[4] = "FTP";
protocol[5] = "DHCP";
}
{
if($5==protocol[0]){
count[0] += 1;
} else if($5==protocol[1]) {
count[1] += 1;
} else if($5==protocol[2]) {
count[2] += 1;
} else if($5==protocol[3]) {
count[3] += 1;
} else if($5==protocol[4]) {
count[4] += 1;
} else if($5==protocol[5]) {
count[5] += 1;
}
}
END {
for(x=0; x<6; ++x) {
print protocol[x]:count[x]
}
}
' .pcap_out_summ.txt > .app_net.txt
You have this line in the END block:
print protocol[x]:count[x]
It should be replaced with:
print protocol[x]":"count[x]
Beside your syntax error, could I make a suggestion or 2 about your awk scripts:
Get rid of all those null statements (spurious trailing
semi-colons).
You don't seem to be grasping the power of awks associative arrays. Take your second script for example. It could be re-written as just:
awk -F " \t" '
BEGIN { n=split("HTTP ARP SMTP DNS FTP DHCP",protocol,/ /) }
{ count[$5]++ }
END { for(x=0;x<n;++x) print protocol[x]":"count[protocol[x]]+0 }
' .pcap_out_summ.txt > .app_net.txt
You might want to take a look at the book Effective Awk Programming, Third Edition By Arnold Robbins (http://www.oreilly.com/catalog/awkprog3/).
As awk tells you, this line of your second awk script is wrong:
print protocol[x]:count[x]
You probably meant to print a colon:
print protocol[x] ":" count[x]

Resources