Processing multi line logs with AWK to gather SQL statements - bash

I have the following entries in a log file:
2016-01-25 21:12:41 UTC:172.31.21.125(56665):user#production:[21439]:ERROR: bind message supplies 1 parameters, but
prepared statement "" requires 0
2016-01-25 21:12:41 UTC:172.31.21.125(56665):user#production:[21439]:STATEMENT: SELECT count(*) AS total FROM (
SELECT 1 AS count
FROM leads_search_criteria_entities
INNER JOIN entities e on entity_id = e.viq_id
LEFT JOIN companies_user cu ON cu.entity_id = e.viq_id
WHERE criterium_id = 644 AND ((
( cu.udef_type IS NULL -- if not set by user, check calculated value
AND is_university >= 50
) OR (
cu.udef_type IS NOT NULL -- if set by user, use it
AND cu.udef_type = 'university'
)
))
GROUP BY e.viq_id
ORDER BY e.viq_id
) x
2016-01-25 21:14:11 UTC::#:[2782]:LOG: checkpoint starting: time
2016-01-25 21:14:16 UTC::#:[2782]:LOG: checkpoint complete: wrote 51 buffers (0.0%); 0 transaction log file(s) added, 0 remov
ed, 0 recycled; write=5.046 s, sync=0.038 s, total=5.091 s; sync files=18, longest=0.008 s, average=0.002 s
2016-01-25 21:19:11 UTC::#:[2782]:LOG: checkpoint starting: time
I would like to capture the SQL statements but I am not sure how can I do that with AWK.
Update:
Expected outcome:
SELECT count(*) AS total FROM ( SELECT 1 AS count FROM leads_search_criteria_entities INNER JOIN entities e on entity_id = e.viq_id LEFT JOIN companies_user cu ON cu.entity_id = e.viq_id WHERE criterium_id = 644 AND (( ( cu.udef_type IS NULL -- if not set by user, check calculated value AND is_university >= 50 ) OR ( cu.udef_type IS NOT NULL -- if set by user, use it AND cu.udef_type = 'university' ) )) GROUP BY e.viq_id ORDER BY e.viq_id ) x
My current almost working solution uses sed but this is where I got stuck, it just helps filtering the lines that have a select (multiple lines by itself) and the next line after that. Any suggestion is appreciated
sed -n "/:STATEMENT:/,/2016/p" out

I don't recommend using sed for this. First thought for an awk solution might look like this:
/^2016/&&line~/:STATEMENT:/ {
sub(/.*:STATEMENT:/,"",line)
print line
}
/^2016/ {
line=""
}
{
$1=$1
line=sprintf("%s %s",line,$0)
}
END {
if (line~/:STATEMENT:/) {
sub(/.*:STATEMENT:/,"",line)
print line
}
}
Obviously you could shrink this. I wrote and ran it (for testing) as a one-liner.
The idea here is that:
we'll append to a variable, resetting it every time our input line starts with the year. (You could replace this with a regexp matching the date if you want to run this next year without modification),
when we get to a new log line (or the end), we strip off the cruft before the SQL statement and print the result.
Note the $1=$1. The purpose of this is to change your line's whitespace, so that newlines and tabs and multiples spaces are collapsed into single spaces. Experiment with removing it to see the impact.

Update
Howabout a combination of sed and tr
sed 's/^[0-9][^S]*//' INPUT.txt | sed '/^[0-9a-z]/d' | tr -s ' ' | tr -d '\n'
output:
STATEMENT: SELECT count(*) AS total FROM ( SELECT 1 AS count FROM leads_search_criteria_entities INNER JOIN entities e on entity_id = e.viq_id LEFT JOIN companies_user cu ON cu.entity_id = e.viq_id WHERE criterium_id = 644 AND (( ( cu.udef_type IS NULL -- if not set by user, check calculated value AND is_university >= 50 ) OR ( cu.udef_type IS NOT NULL -- if set by user, use it AND cu.udef_type = 'university' ) )) GROUP BY e.viq_id ORDER BY e.viq_id ) x

$ cat log.awk
f && /^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/ {f=0; print ""}
sub(/^.*:STATEMENT:[[:space:]]+/,"") {f=1}
f { $1=$1; printf "%s ", $0 }
$ awk -f log.awk log.txt
SELECT count(*) AS total FROM ( SELECT 1 AS count FROM leads_search_criteria_entities INNER JOIN entities e on entity_id = e.viq_id LEFT JOIN companies_user cu ON cu.entity_id = e.viq_id WHERE criterium_id = 644 AND (( ( cu.udef_type IS NULL -- if not set by user, check calculated value AND is_university >= 50 ) OR ( cu.udef_type IS NOT NULL -- if set by user, use it AND cu.udef_type = 'university' ) )) GROUP BY e.viq_id ORDER BY e.viq_id ) x
(2nd line) This turns on printing (f=1) when :STATEMENT: is found, and as a side-effect, removes everything up until the start of the SELECT statement.
(3rd line) Then it keeps printing until printing is turned off (see below), cleaning up by replacing sequences of multiple spaces by a single space. (EDIT: Thanks to #ghoti for suggesting the elegant $1=$1 for that.)
(1st line) Turn off printing at the start of the next log, identified by starting with a date. Print a courtesy newline to end the SELECT.

Related

How to MOVE or MERGE different fields into the subfile in ONE LINE in RPGLE

i'm stuck on how to move or display different value from different field in one line.
My output supposed to look like this
Real Output
but for now, my output is look like this
Recent Output
This is my physical file
CUREXG file
I have three field in physical file which are :
EXGDAT = date And the key field
EXGCOD = exchange code
EXGRAT = exchange rate
I have 2 dates, and basically i need the output to only have 2 line which one is 31 May, and the second one is 1 june.
I tried to group them by doing the if condition but it didnt work. How I'm supposed to do? Please help me
Thanks in advance
//Add a logical for the table by date, exchange code
fcurexg2 if e k disk
**---------------------- start of your code
*LOVAL setll curexg
read curexg
dou %eof(curexg);
c eval ##date = exgdat
c exsr $GetVals
eval rrn = rrn + 1
write sfl01
// move to the next date
exgdat setgt curexg
read curexg
enddo
**------------------------
Begsr $GetVals; // runs for each code -- usd, eur, etc
##gcod = 'USD'
exsr $GetGrat;
move ##grat USD
##gcod = 'GBP'
exsr $GetGrat;
move ##grat GBP
##gcod = 'EUR'
exsr $GetGrat;
move ##grat EUR
##gcod = 'AUD'
exsr $GetGrat;
move ##grat AUD
##gcod = 'SGD'
exsr $GetGrat;
move ##grat SGD
Endsr;
**------------------------
Begsr $GetGrat; //find the rate for that date and code
*like define curexg ##date
*like define exgcod ##gcod
*like define exgrat ##grat
clear ##grat
Chain (##date: ##gcod) curexg2; //the new logical
if %found(curexg2);
##grat = exgrat
endif
Endsr;
**------------------------
consider an SQL function. Here is an SQL function which returns the exchange rate of a specific exchange code and date.
CREATE or replace function curexg_exchangeRate(
inDate date,
inCurrency char(3))
returns decimal(7,2)
language sql
begin
declare Sqlcode int ;
declare vSqlcode DECIMAL(5,0) default 0 ;
declare vExgrat decimal(7,2) default 0 ;
DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
SET vSqlcode = SQLCODE ;
select a.exgrat
into vExgrat
from curexg a
where a.exgdat <= inDate
and a.exgcod = inCurrency
order by a.exgdat desc
fetch first row only ;
return coalesce( vExgrat, 0 ) ;
end
RPG code that calls the exchangeRate sql function:
d usdRate s 7p 2
d gbpRate s 7p 2
d eurRate s 7p 2
/free
// get exchange rate, of each exchange code, as of the specified date
exec sql
set :usdRate = curexg_exchangeRate( :exgdat, 'USD' ) ;
exec sql
set :gbpRate = curexg_exchangeRate( :exgdat, 'GBP' ) ;
exec sql
set :eurRate = curexg_exchangeRate( :exgdat, 'EUR' ) ;
this code reads the exchange rates for each exchange date:
// list exchange rates for each exchange date.
exec sql
declare c1 cursor for
with t1 as (
select distinct a.exgdat
from curexg a
order by a.exgdat )
select a.exgdat,
curexg_exchangeRate( a.exgdat, 'USD' ) usdRate,
curexg_exchangeRate( a.exgdat, 'GBP' ) gbpRate,
curexg_exchangeRate( a.exgdat, 'EUR' ) eurRate
from t1 a
order by a.exgdat ;
exec sql
open c1 ;
exec sql
fetch c1
into :exgdat, :usdRate, :gbpRate, :eurRate ;
if sqlcode <> 0 ;
leave ;
endif ;
// write to subfile
sfExgdat = exgdat ;
sfUsdRate = usdRate ;
sfGbpRate = gbpRate ;
sfEurRate = eurRate ;
write sflrcd ;
enddo ;
exec sql
close c1 ;
*inlr = '1' ;
return ;
/end-free

How can I send data that has already been sent?

I am hoping someone may be able to assist as I am lost. I send a file to a customer several times a day that contains multiple Purchase Order numbers and each file contains a different PO# each time. I have a table called EDICUSTOUTBOUND_810_SENT that is updated with the PO# every time an order is sent to ensure I don't send duplicates. Once in a while, a PO# will get split between files and this causes an issue for the customer. What I am trying to do when this happens is include all the items for the PO# from the previous file plus the new items on the second. I cannot figure out how to do this since I update the sent table. What I have now ignores the sent table and just sends everything each run. EDICUSTOUTBOUND_810_SENT
SELECT 'TAG' as RECORD_TAG,
'ASN' as DOC_TYPE,
'CDS' as TPID,
'RMA' as PARENT_CHAIN,
To_char(ORD.sched_date, 'MM/DD/YYYY') SCHEDULE_DATE,
CST.ship_city FACILITY_CITY,
CST.ship_state FACILITY_STATE,
ORD.po_number || '-' || ORD.CUST_NBR ASN_ID,
SD.route_seq || To_char(SD.load_date, 'DDMMYY') MANIFEST,
SD.order_number ORDER_NUMBER,
SD.ROUTE_SEQ ROUTE_SEQ,
SD.LOAD_DATE LOADDATE,
ORD.CUST_NBR CUST_NBR,
ORD.po_number PO_NUM,
REPLACE(ORD.contractor, ',', ' ') JOB_NAME,
SD.line_item LINE_ITEM,
SD.sub_item SUB_ITEM,
CASE WHEN DET.item_qty > 1 THEN '1' ELSE CAST (DET.item_qty AS VARCHAR2(20))
END QTY,
SD.barcode BARCODE,
DET.prod_line LN,
DET.prod_style ST,
SD.unit_type WINDOW_PART,
REPLACE(SD.okopt_desc, ',', ' ') DESCRIPTION
FROM (
SELECT DISTINCT manifest
FROM (SELECT manifest, order_number
FROM wsoe.shippingdata
WHERE load_date >= trunc(sysdate) AND CURPROCESSID = 210 ) sd1
JOIN wsoe.ordhead oh1 ON sd1.order_number = oh1.order_number
LEFT JOIN WSOE.EDICUSTOUTBOUND_810_SENT edi ON oh1.po_number || '-' || oh1.CUST_NBR = edi.invoice_no
WHERE edi.invoice_no IS NULL
) m
JOIN wsoe.shippingdata sd ON sd.manifest = m.manifest
JOIN wsoe.ordhead ORD
ON SD.order_number = ORD.order_number
JOIN wsoe.orddet DET
ON DET.order_number = SD.order_number
AND DET.line_item = SD.line_item
AND DET.sub_item = SD.sub_item
JOIN wsoe.customer CST
ON CST.cust_nbr = ORD.cust_nbr
WHERE Substr(CST.custflags, 60, 1) = 'Y'
ORDER BY po_num, line_item;
It sounds like your real problem is that sometimes a PO# gets split between files. The best solution would be to make sure this doesn't happen in the first place. Are you limited by file size or something? Your other code that actually dumps the records to a file to send could be changed to measure the size of the next record and figure out whether it will fit in the current file.
Short of something like that, if you know which PO# you need to resend, you should be able to change your inner query to get a specific PO#. Something like
...
JOIN wsoe.ordhead oh1 ON sd1.order_number = oh1.order_number
LEFT JOIN WSOE.EDICUSTOUTBOUND_810_SENT edi ON oh1.po_number || '-' || oh1.CUST_NBR = edi.invoice_no
WHERE edi.invoice_no IS NULL OR oh1.po_number = 80
...
to include PO number 80 in the next run.

How to apply regular expression on the below given string

i have a string 'MCDONALD_YYYYMMDD.TXT' i need to use regular expressions and append the '**' after the letter 'D' in the string given . (i.e In the string at postion 9 i need to append '*' based on a column value 'star_len'
if the star_len = 2 the o/p = ''MCDONALD??_YYYYMMDD.TXT'
if the star_len = 1 the o/p = ''MCDONALD?_YYYYMMDD.TXT'
with
inputs ( filename, position, symbol, len ) as (
select 'MCDONALD_20170812.TXT', 9, '*', 2 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select substr(filename, 1, position - 1) || rpad(symbol, len, symbol)
|| substr(filename, position) as new_str
from inputs
;
NEW_STR
-----------------------
MCDONALD**_20170812.TXT
select regexp_replace('MCDONALD_YYYYMMDD.TXT','MCDONALD','MCDONALD' ||
decode(star_len,1,'*',2,'**'))
from dual
This is how you could do it. I don't think you need it as a regular expression though if it is always going to be "MCDONALD".
EDIT: If you need to be providing the position in the string as well, I think a regular old substring should work.
select substr('MCDONALD_YYYYMMDD.TXT',1,position-1) ||
decode(star_len,1,'*',2,'**') || substr('MCDONALD_YYYYMMDD.TXT',position)
from dual
Where position and star_len are both columns in some table you provide(instead of dual).
EDIT2: Just to be more clear, here is another example using a with clause so that it runs without adding a table in.
with testing as
(select 'MCDONALD_YYYYMMDD.TXT' filename,
9 positionnum,
2 star_len
from dual)
select substr(filename,1,positionnum-1) ||
decode(star_len,1,'*',2,'**') ||
substr(filename,positionnum)
from testing
For the fun of it, here's a regex_replace solution. I went with a star since that what your variable was called even though your example used a question mark. The regex captures the filename string in 2 parts, the first being from the start up to 1 character before the position value, the second the rest of the string. The replace puts the captured parts back together with the stars in between.
with tbl(filename, position, star_len ) as (
select 'MCDONALD_20170812.TXT', 9, 2 from dual
)
select regexp_replace(filename,
'^(.{'||(position-1)||'})(.*)$', '\1'||rpad('*', star_len, '*')||'\2') as fixed
from tbl;

Merge statement without affecting records where there is no change in data

I have a stored procedure that takes data from several tables and creates a new table with just the columns I want. I now want to increase performance by only attempting to insert/update rows that have at least one column of new data. For existing rows that would only receive the exact data it already has, I want to skip the update altogether for that row.
For example if a row contains the data:
ID | date | population | gdp
15 | 01-JUN-10 | 1,530,000 | $67,000,000,000
and the merge statement comes for ID 15 and date 01-JUN-10 with population 1,530,000 and gdp $67,000,000,000 then I don't want to update that row.
Here are some snippets of my code:
create or replace PROCEDURE COUNTRY (
fromDate IN DATE,
toDate IN DATE,
filterDown IN INT,
chunkSize IN INT
) AS
--cursor
cursor cc is
select c.id, cd.population_total_count, cd.evaluation_date, cf.gdp_total_dollars
from countries c
join country_demographics cd on c.id = cd.country_id
join country_financials cf on cd.country_id = cf.country_id and cf.evaluation_date = cd.evaluation_date
where cd.evaluation_date > fromDate and cd.evaluation_date < toDate
order by c.id,cd.evaluation_date;
--table
type cc_table is table of cc%rowtype;
c_table cc_table;
BEGIN
open cc;
loop -- cc loop
fetch cc bulk collect into c_table limit chunkSize; --limit by chunkSize parameter
forall j in 1..c_table.count
merge
into F_AMB_COUNTRY_INFO_16830 tgt
using (
select c_table(j).id cid,
c_table(j).evaluation_date eval_date,
c_table(j).population_total_count pop,
c_table(j).gdp_total_dollars gdp
from dual
) src
on ( cid = tgt.country_id AND eval_date = tgt.evaluation_date )
when matched then
update
set tgt.population_total_count = pop,
tgt.gdp_total_dollars = gdp
when not matched then
insert (
tgt.country_id,
tgt.evaluation_date,
tgt.population_total_count,
tgt.gdp_total_dollars )
values (
cid,
eval_date,
pop,
gdp );
exit when c_table.count = 0; --quit condition for cc loop
end loop; --end cc loop
close cc;
EXCEPTION
when ACCESS_INTO_NULL then -- catch error when table does not exist
dbms_output.put_line('Error ' || SQLCODE || ': ' || SQLERRM);
END ;
I was thinking that in the on statement, I could just say something along the lines of:
on ( cid = tgt.country_id AND eval_date = tgt.evaluation_date
AND pop != tgt.population_total_count AND gdp != tgt.gdp_total_dollars )
but surely there's a cleaner / more efficient way to do it?
The otherway you could do it is use ora_hash to get a hash of the row. So your where clause could be something like.
where ora_hash(src.col1 || src.col2 || src.col3 || src.col4) = ora_hash(src.col1 || src.col2 || src.col3 || src.col4)

bach echo variable substitution

I want make a query wiht this bash code.
##fist I will extrac the table name from table.txt file
table=$(head -n 1 tables.txt)
#In this code is where I will make the substitution on the query
result=`echo "select 'result '||segment_name||':'||MB||':'||TOTAL_MB from (
select TSEG.segment_name,round(TSEG.bytes/1024/1024) MB,l.segment_name as LSEGMENT_NAME,nvl(l.MB,0) as LOB_MB,nvl(l.MB,0)+round(TSEG.bytes/1024/1024) as TOTAL_MB
from dba_segments tseg, (select LOBS.table_name,round(bseg.bytes/1024/1024) MB,lobs.SEGMENT_NAME
from dba_lobs lobs,dba_segments bseg
where LOBS.SEGMENT_NAME=bseg.segment_name
order by bseg.bytes asc
) l
where TSEG.segment_type='TABLE'
and TSEG.segment_name='$table'
and TSEG.SEGMENT_NAME=l.table_name(+)
order by TOTAL_MB
)where rownum=1;`
my problem is on line TSEG.segment_name='$table', I need the table name on format 'TABLE_NAME'.
this is my actual ouput with table named "AABLG":
select 'result '||segment_name||':'||MB||':'||TOTAL_MB from (
select TSEG.segment_name,round(TSEG.bytes/1024/1024) MB,l.segment_name as LSEGMENT_NAME,nvl(l.MB,0) as LOB_MB,nvl(l.MB,0)+round(TSEG.bytes/1024/1024) as TOTAL_MB
from dba_segments tseg, (select LOBS.table_name,round(bseg.bytes/1024/1024) MB,lobs.SEGMENT_NAME
from dba_lobs lobs,dba_segments bseg
where LOBS.SEGMENT_NAME=bseg.segment_name
order by bseg.bytes asc
) l
where TSEG.segment_type='TABLE'
' and TSEG.segment_name='AABLG
and TSEG.SEGMENT_NAME=l.table_name(+)
order by TOTAL_MB
)where rownum=1;
you can see that the " ' " is on the first position, and I don't know why.
regards.
Marco.
This would be much better done without echo at all. Consider, for instance:
IFS=$'\r\n ' read -r table <tables.txt
IFS= read -r -d '' result <<EOF
select 'result '||segment_name||':'||MB||':'||TOTAL_MB from (
select TSEG.segment_name,round(TSEG.bytes/1024/1024) MB,l.segment_name as LSEGMENT_NAME,nvl(l.MB,0) as LOB_MB,nvl(l.MB,0)+round(TSEG.bytes/1024/1024) as TOTAL_MB
from dba_segments tseg, (select LOBS.table_name,round(bseg.bytes/1024/1024) MB,lobs.SEGMENT_NAME
from dba_lobs lobs,dba_segments bseg
where LOBS.SEGMENT_NAME=bseg.segment_name
order by bseg.bytes asc
) l
where TSEG.segment_type='TABLE'
and TSEG.segment_name='$table'
and TSEG.SEGMENT_NAME=l.table_name(+)
order by TOTAL_MB
) where rownum=1;
EOF
This also fixes the bug observed in your question by setting IFS to a value that includes $'\r', the carriage-return character found in DOS-formatted newlines, and thus stripping such characters when they exist at the end of the first line of tables.txt.

Resources