Creating csv file from text

Creating csv file from text - bash

Using the the following text file i would like to create a csv file
input file
time : 5/14/18 10:31:26.832 AM
dt # : 0
Shot # : 587
name : 2851
cdn # : 2306
cdl : C5
Comment : N/A
________________________________________________________________________
time : 5/14/18 10:31:23.280 AM
dt # : 0
Shot # : 974
name : 2852
cdn # : 2306
cdl : C5
Comment : N/A
________________________________________________________________________
time : 5/14/18 6:04:27.880 AM
dt # : 21
Shot # : 316
name : 2854
cdn # : 2306
cdl : C5
Comment : N/A
________________________________________________________________________
time : 5/14/18 10:12:53.932 AM
dt # : 21
Shot # : 731
name : 2849
cdn # : 2306
cdl : C5
Comment : N/A
________________________________________________________________________
I tried to use this code to transpose the rows to columns.
gawk -F'\n' -v RS= -v OFS=',' -v ORS='\n' '{$1=$1}1' file.txt
this the output I got.
time : 5/14/18 10:31:26.832 AM,dt # : 0,Shot # : 587,name : 2851,cdn # : 2306,cdl : C5,Comment : N/A,________________________________________________________________________
time : 5/14/18 10:31:23.280 AM,dt # : 0,Shot # : 974,name : 2852,cdn # : 2306,cdl : C5,Comment : N/A,________________________________________________________________________
time : 5/14/18 6:04:27.880 AM,dt # : 21,Shot # : 316,name : 2854,cdn # : 2306,cdl : C5,Comment : N/A,________________________________________________________________________
time : 5/14/18 10:12:53.932 AM,dt # : 21,Shot # : 731,name : 2849,cdn # : 2306,cdl : C5,Comment : N/A,________________________________________________________________________
But the desired output file should be like the below:
time,dt,Shot,name,cdn,cdl,Comment,
5/14/18 10:31:26.832 AM,0,587,2851,2306,C5,N/A
5/14/18 10:31:23.280 AM,0,974,2852,2306,C5,N/A
5/14/18 6:04:27.880 AM,21,316,2854,2306,C5,N/A
5/14/18 10:12:53.932 AM,21,731,2849,2306,C5,N/A
Thanks in advance.

EDIT:
awk -F" : " '!a[$1]++ && NF && !/^__/{sub(/ #/,"");heading=heading?heading OFS $1:$1} /^__/ && val{val=val ORS;next} NF{val=val?val OFS $2:$2} END{gsub(/\n,/,"\n",val);print heading ORS val}' OFS=, Input_file
Following awk may help you on same.
awk -F" : " 'BEGIN{print "time,dt,Shot,name,cdn,cdl,Comment,"}/^__/ && val{print val;val="";next} {val=val?val OFS $2:$2}' OFS=, Input_file

Related

How can I append a string on a specific column on the lines that match a condition on a txt file using shell scripting?

I have a text file with a bunch of serial numbers and they're supposed to be 16 characters long. But some of the records were damaged and are 13 characters long. I want to add 3 zeros at the beginning of every record that has 13 characters long.
Note: The serial numbers doesn't start at the beginning of the line, they all start at the column 15 of every line.
My file currently looks like this:
1:CCCC:CC: :C:**0000000999993**: :CCC: :
1:CCCC:CC: :C:**0000000999994**: :CCC: :
1:CCCC:CC: :C:**0000000999995**: :CCC: :
1:CCCC:CC: :C:**0000000000170891**: :CCC: :
1:CCCC:CC: :C:**0000000000170892**: :CCC: :
1:CCCC:CC: :C:**0000000000170893**: :CCC: :
And the output should be:
1:CCCC:CC: :C:**0000000000999993**: :CCC: :
1:CCCC:CC: :C:**0000000000999994**: :CCC: :
1:CCCC:CC: :C:**0000000000999995**: :CCC: :
1:CCCC:CC: :C:**0000000000170891**: :CCC: :
1:CCCC:CC: :C:**0000000000170892**: :CCC: :
1:CCCC:CC: :C:**0000000000170893**: :CCC: :
This is the code I made to get the records that are shortened:
#!/bin/bash
i=1
for OUTPUT in $*(cut -c15-30 file.txt)
do
if [[ ${#OUTPUT} == 13 ]]
then
echo $OUTPUT
echo $i
i=$((i+1))
fi
done
The txt file has more than 50,000 records so I can't change them manually.

This sed one-liner should do the job:
sed 's/^\(.\{14\}\)\([0-9]\{13\}[^0-9]\)/\1000\2/' file
This assumes serial numbers consist of decimal digits only and trusts that they all start at the 15th character of the line.
Or, an awk solution:
awk 'BEGIN { FS=OFS=":" } length($6) == 13 { $6 = "000" $6 } 1 ' file
This one only checks if the length of the sixth field is 13 and trusts that sixth field is the serial number field.

One awk idea that replaces all of OP's current code:
awk '
BEGIN { FS=OFS=":" } # set input/output field delimiter to ":"
length($6)<16 { $6=sprintf("%016d",$6) } # if length of 6th field < 16 then left-pad the field with 0's to length of 16
1 # print current line
' file.txt
This generates:
1:6822:26: :A:0000000000999993:DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994:MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995:CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

I took the liberty to tack a : on ...
$ awk '{if(length($2)<19){$2=gensub(/^(:.:)/,"\\1000","1",$2)":"}}1' file.txt
1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
If that's not what you want, use this: awk '{if(length($2)<19){$2=gensub(/^(:.:)/,"\\1000","1",$2)}}1' file.txt

Another alternative
awk -v{O,}FS=: '{$6=gensub(" ", "0", "g", sprintf("%16s", gensub(" ", "", "g", $6)))}1'
result
1:6822:26: :A:0000000000999993:DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994:MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995:CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

Is there a way to extract a value from log and use it further to extract another value using bash

I am trying to read a value from log file and then search another text based on this value .
Below is how my log file looks like. all I have with me is the customerId , the order ID is dynamically generated
I want to capture the orderId first based on customerId and store in a variable.
once successful I want to check the status of this order id which is some 10 lines below
Finally, print it in console or write to a file doesn't matter
2019-05-18 09:46:02.944 [thread-2 ] Orderprocesing: Sending order info '{
"customerName" : "jenny",
"customerId" : "JE19802501",
"customerphone" : null,
"orderId" : "8456/72530548",
"orderInfo" : {
"Item" : "comic series 2018",
"count" : "10"
}
}'
.............................................................
.............................................................
2019-05-18 09:46:02.944 [thread-2 ] Orderprocesing: Sending order info '{
"customerName" : "jenny",
"customerId" : "JE19802501",
"customerphone" : null,
"orderId" : "8456/82530548",
"orderInfo" : {
"Item" : "comic series 2019",
"count" : "10"
}
}'
.............................................................
.............................................................
2019-05-18 09:49:02.944 [thread-2 ] Orderprocesing: status for 82530548 is success
.............................................................
.............................................................
.............................................................
2019-05-18 09:50:06.872 [thread-2 ] Orderprocesing: status for 72530548 is success
I am new bash, I managed to slice a block of 10 lines that contains the OrderId corresponding to CustomerID but couldn't slice the OrderId and store it in a variable
$ cat orderlog_may_18 grep -A 15 "JE19802501"
expected results are to print
customerId : JE19802501
orderId : 72530548
status for 72530548 is success
customerId : JE19802501
orderId : 82530548
status for 82530548 is success

Two lines of bash, using sed.
ord=$(sed -n '/JE19802501/,/orderId/{/orderId/{s/[^0-9]//gp}}' orderlog_may18)
sed -n "/status for $ord/s/.*://p" orderlog_may18
$ord stores the numerals from the orderId line subsequent to JE198002501
The tail end of the status line is then printed.
You should be able to do the formatting you want in your bash script.

$ awk -v trgt='JE19802501' '
{ gsub(/[",]/," "); $1=$1 }
$1 == "customerId" { cust=$NF; print }
($1 == "orderId") && (cust == trgt) { ordr=$NF; print }
$0 ~ ("status for " ordr " is") { sub(/.*: /,""); print }
' file
customerId : JE19802501
orderId : 72530548
status for 72530548 is success

bash log file count words and replace them by number

I need to keep warnings from my script log and add a "LAST" to every line after each start so I know when the alert occurs at a glance so I add this to my script :
This is the fist line of my script :
echo "$( cat $ALERT_LOG_FILE | grep WARNING | tail -n 2k | ts "LAST ")" > $ALERT_LOG_FILE
Script log looks like this at first run :
WARNING : ...
WARNING : ...
WARNING : ...
WARNING : ...
When script start/restart the echo line adds "LAST" to each line and make it like this :
LAST WARNING : ...
LAST WARNING : ...
LAST WARNING : ...
LAST WARNING : ...
Problem is the log file becomes like this after some restarts:
LAST LAST LAST LAST WARNING : ....
LAST LAST LAST WARNING : ....
LAST LAST WARNING : ....
LAST LAST WARNING : ....
LAST WARNING : ....
WARNING:
Any way to make it like this:
LAST 4 WARNING : ....
LAST 3 WARNING : ....
LAST 2 WARNING : ....
LAST 2 WARNING : ....
LAST 2 WARNING : ....
LAST 1 WARNING : ....
WARNING:
EDIT:
code with #Yoda suggestion:
cat $LOG_FILE | grep WARNING | tail -n 2k | ts "LAST " | awk '{n=gsub("LAST ",X);if(n) print "LAST",n,$0;else print}')" > $LOG_FILE
out put log after some restarts with #Yoda suggestion:
LAST 2 2 1 WARNING : ...
LAST 2 1 WARNING : ...
LAST 1 WARNING : ...
WARNING : ...

Based on some assumptions:-
$ awk '{n=gsub("LAST ",X);if(n) print "LAST",n,$0;else print}' file
LAST 4 WARNING : ....
LAST 3 WARNING : ....
LAST 2 WARNING : ....
LAST 2 WARNING : ....
LAST 1 WARNING : ....
WARNING:
If this is not what your are looking for, then I would suggest posting a representative sample of your log file and expected output.

Here is something that might help:-
awk '
{
n = gsub("LAST ",X)
if( n )
{
for ( i = 1; i <= NF; i++ )
{
if ( $i ~ /WARNING/ )
{
sub(/^ */,X)
print "LAST",n,$0;
next
}
if ( $i ~ /^[0-9]$/ )
{
n += $i
$i = ""
}
}
}
else
print $0
}
'

Use variables in awk and/or sed [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 6 years ago.
In bash script from the output below, I need to print the lines between "Device #0" and "Device #1", but as all that is part of a bigger script I should use variables for start/stop lines.
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Block Size : 512 Bytes
Supported : Yes
Programmed Max Speed : SATA 6.0 Gb/s
Transfer Speed : SATA 6.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Connector 0, Device 0
Vendor : ATA
Model :
Firmware : 003Q
Serial number : S2HTNX0H418779
World-wide name : 5002538C402805A4
Reserved Size : 265496 KB
Used Size : 897129 MB
Unused Size : 18327 MB
Total Size : 915715 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full power,Powered off
SSD : Yes
Temperature : 39 C/ 102 F
NCQ status : Enabled
----------------------------------------------------------------
Device Phy Information
----------------------------------------------------------------
Phy #0
PHY Identifier : 0
SAS Address : 30000D1701801803
Attached PHY Identifier : 3
Attached SAS Address : 50000D1701801800
----------------------------------------------------------------
Runtime Error Counters
----------------------------------------------------------------
Hardware Error Count : 0
Medium Error Count : 0
Parity Error Count : 0
Link Failure Count : 0
Aborted Command Count : 0
SMART Warning Count : 0
Model, SSD
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #1
Device is a Hard drive
State : Online
Block Size : 512 Bytes
Supported : Yes
Programmed Max Speed : SATA 6.0 Gb/s
Transfer Speed : SATA 6.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Connector 0, Device 0
Vendor : ATA
Model :
Firmware : 003Q
Serial number : S2HTNX0H418779
World-wide name : 5002538C402805A4
Reserved Size : 265496 KB
Used Size : 897129 MB
Unused Size : 18327 MB
Total Size : 915715 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full power,Powered off
SSD : Yes
Temperature : 39 C/ 102 F
NCQ status : Enabled
----------------------------------------------------------------
Device Phy Information
----------------------------------------------------------------
Phy #0
PHY Identifier : 0
SAS Address : 30000D1701801803
Attached PHY Identifier : 3
Attached SAS Address : 50000D1701801800
----------------------------------------------------------------
Runtime Error Counters
----------------------------------------------------------------
Hardware Error Count : 0
Medium Error Count : 0
Parity Error Count : 0
Link Failure Count : 0
Aborted Command Count : 0
SMART Warning Count : 0
Model, SSD
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #2
Device is a Hard drive
State : Online
Block Size : 512 Bytes
Supported : Yes
Programmed Max Speed : SATA 6.0 Gb/s
Transfer Speed : SATA 6.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Connector 0, Device 0
Vendor : ATA
Model :
Firmware : 003Q
Serial number : S2HTNX0H418779
World-wide name : 5002538C402805A4
Reserved Size : 265496 KB
Used Size : 897129 MB
Unused Size : 18327 MB
Total Size : 915715 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full power,Powered off
SSD : Yes
Temperature : 39 C/ 102 F
NCQ status : Enabled
----------------------------------------------------------------
Device Phy Information
----------------------------------------------------------------
Phy #0
PHY Identifier : 0
SAS Address : 30000D1701801803
Attached PHY Identifier : 3
Attached SAS Address : 50000D1701801800
----------------------------------------------------------------
Runtime Error Counters
----------------------------------------------------------------
Hardware Error Count : 0
Medium Error Count : 0
Parity Error Count : 0
Link Failure Count : 0
Aborted Command Count : 0
SMART Warning Count : 0
Model, SSD
In this case the output for Device #0 to Device #2 is the same, but it doesn't really matter for the test.
So trying with cat arcconf | awk '/Device #0/,/Device #1/' where the output above is stored in a file called arcconf works. But trying to use variables instead of 0 and 1 doesn't work at all:
MIN_INDEX=0
INDEX=1
cat arcconf | awk '/Device #"$MIN_INDEX"/,/Device #"$INDEX"/'
cat arcconf | sed -n -e "/Device #"$INDEX_MIN"$/,/Device #"$INDEX"$/{ /Device #"$INDEX_MIN"$/d; /Device #"$INDEX"$/d; p; }"
It doesn't display anything.
Could you please help.
Also as I am going to use the output from Device to Device lines several times, is it possible to store it in some new variable which I should use in the future?
Thanks,
Valentina

Bash variables are not expanded within single quotes, that's why the first command doesn't work. Replace single quotes with double quotes:
cat arcconf | awk "/Device #$MIN_INDEX/,/Device #$INDEX/"
The second command should work, but it's unnecessarily complicated.
You don't need to drop out of the double-quoted string for the sake of the variables, this will work fine:
cat arcconf | sed -n -e "/Device #$INDEX_MIN$/,/Device #$INDEX$/{ /Device #$INDEX_MIN$/d; /Device #$INDEX$/d; p; }"
In fact it's better this way, as now the variables are within a double-quoted string, which is a good habit, as unquoted variables containing spaces would cause problems.

You can send variables to awk via -v var=val:
awk \
-v start="Device #$MIN_INDEX" \
-v end="Device #$MAX_INDEX" \
'$0 ~ end { p=0 }
$0 ~ start { p=1 }
p' arcconf
Simply by moving around p; you can whether or not to include the start and end line:
$0 ~ end { p=0 }; p; $0 ~ start { p=1 } # Will not include start nor end
$0 ~ end { p=0 }; $0 ~ start { p=1 }; p # Will include start and end
$0 ~ start { p=1 }; p; $0 ~ end { p=0 } # Will include start but not end
$0 ~ end { p=0 }; p; $0 ~ start { p=1 } # Will include end but not start

You can try below sed command -
#MIN_INDEX=0
#INDEX=1
#sed -n "/Device\ #$MIN_INDEX/,/Device\ #$INDEX/p" kk.txt
And to set the output to a variable -
#sed -n "/Device\ #$MIN_INDEX/,/Device\ #$INDEX/w output.txt" kk.txt
#var=`cat output.txt`
#echo $var
Explanation
-n to remove duplicate when pattern match.
w is to write the output to file output.txt
p is to print. We need to use escape character \ to search space.

Elasticsearch : Number of search operation per second

I am looking for a way to get the number of search operation per a second on a node (and / or on all nodes).
Is there a way to get this information without the Marvel plugin?
My ElasticSearch version is 0.90.11

Marvel does it by sampling. If you write a script to repeatedly run curl http://localhost:9200/_stats/search and parse the a result that looks like this:
...
"_all" : {
"primaries" : {
"search" : {
"open_contexts" : 0,
"query_total" : 51556,
"query_time_in_millis" : 2339958,
"query_current" : 0,
"fetch_total" : 8276,
"fetch_time_in_millis" : 34916,
"fetch_current" : 0
}
},
"total" : {
"search" : {
"open_contexts" : 0,
"query_total" : 73703,
"query_time_in_millis" : 2773745,
"query_current" : 0,
"fetch_total" : 10428,
"fetch_time_in_millis" : 45570,
"fetch_current" : 0
}
}
},
...
You can see the query_total values -- just repeatedly query those at some interval and then do the math.

Thanks Alcanzar.
Here is the script I created :
if [[ $1 == "" ]]
then
echo "Usage : $0 <refresh_interval_in second>"
echo "Example : ./$0 10"
exit 1
fi
refresh_interval=$1
while true; do
begin=$(curl --silent http://localhost:9200/_stats/search?pretty | grep '"query_total" :' | sed -n 2p | sed 's/,$//' | awk '{print $3}')
sleep $refresh_interval
end=$(curl --silent http://localhost:9200/_stats/search?pretty | grep '"query_total" :' | sed -n 2p | sed 's/,$//' | awk '{print $3}')
total=$(echo $(((( $end - $begin )) / $refresh_interval)))
echo $(date +"[%T] ")"Search ops/sec : "$total
done
For instance, refreshing every 10 seconds, execute : sh script.sh 10

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Creating csv file from text - bash

Related

How can I append a string on a specific column on the lines that match a condition on a txt file using shell scripting?

Is there a way to extract a value from log and use it further to extract another value using bash

bash log file count words and replace them by number

Use variables in awk and/or sed [duplicate]

Elasticsearch : Number of search operation per second

Categories

Resources