Oracle Clob holds complex XML; how to select specific data with Xquery

Oracle Clob holds complex XML; how to select specific data with Xquery - oracle

I'm trying to extract specific data from a complex XML data set stored in a CLOB field in a commercial app. I cannot change the XML format (namespace, etc), I cannot change the CLOB to XMLType.
The xml data looks like:
<?xml version="1.0" encoding="utf-8"?>
<Calculation>
<ProcessUnitModelScenario Id="1265319" EntityId="10030" EntityName="Chaco Plant" ProcessUnitId="10225" ProcessUnitName="Turbine - Unit 37" EmissionModelId="10000" EmissionModelName="Emissions" ScenarioId="10053" ScenarioName="GHG_Comb_Run_Time" EffectiveDate="1/1/2012 12:00:00 AM" EndDate="2/1/2012 12:00:00 AM" ActiveDate="1/1/2008 12:00:00 AM" ProductionUnitId="10031" ProductionUnitName="Default Production Unit - Month" ProductionScheduleId="13541" OperatingPercentage="100" LinkLevel="1">
<EmissionModel Id="10935" EffectiveDate="1/1/2012 12:00:00 AM" EndDate="2/1/2012 12:00:00 AM">
<EmissionModelMaterial Id="13250" OutputType="Air Emissions" OutputTypeId="1" Media="Vapor" MediaName="Air" MaterialId="83" EquationId="10096" EquationName="GHG Combustion: Run time" EquationUnit="lb/hr" EquationUnitName="lb/hr" EquationBaseUnit="lb/hr" EquationBaseUnitName="lb/hr" SpeciationOption="StandardSpeciation" SpeciationOptionName="Standard Speciation" UseComponentVaporPressureMethods="False" VaporPressureOptionName="Material's vapor pressure methods">
<Material Id="83" Name="Methane" EffectiveId="10082" EffectiveDate="1/1/1990 12:00:00 AM" ComponentBasis="Vapor" MolecularWeight="16.043" LiquidDensity="1.34687732957939" VaporPressureMethod="Riedels" RiedelA="39.205" RiedelB="-1324.4" RiedelC="-3.4366" RiedelD="3.1E-05" RiedelE="2" UseDefinedComposition="False">
<CalculationPeriod StartDate="1/1/2012 12:00:00 AM" EndDate="2/1/2012 12:00:00 AM">
<EquationVariable Id="11079" Name="HeatRating" Order="10" BaseUnit="BTU/sec" EquationUnit="MMBtu/hr" Type="System" TypeName="System Variable" SystemCalculationType="ProcessUnitProperty" SystemCalculationName="Process Unit Property" SystemParameterProcessPropertyId="10005" SystemParameterModelOutputTypeId="1" TimeDependent="False" Value="116" EnteredValue="116" EnteredUnit="MMBtu/hr" />
<EquationVariable Id="11077" Name="GHGEF" Order="20" BaseUnit="lb/BTU" EquationUnit="kg/MMBTU" Type="GlobalEmissionFactor" TypeName="Global Emission Factor" TimeDependent="True" Value="0.001" EnteredValue="0.001" EnteredUnit="kg/MMBTU" />
<EquationVariable Id="11078" Name="RunHrs" Order="30" BaseUnit="hr" Type="Parameter" TypeName="Parameter" ParameterLevel="ProcessUnit" ParameterLevelName="Process Unit" ParameterId="10044" ParameterName="RunHrs - " TimeDependent="True" Value="612" EnteredValue="612" EnteredUnit="hr" />
<EquationVariable Id="11080" Name="kgtolb" Order="40" BaseUnit="lb" Type="GlobalConstant" TypeName="Global Constant" GlobalConstantId="10007" TimeDependent="False" Value="2.20462" />
<EquationVariable Id="11081" Name="OpHrs" Order="45" BaseUnit="hr" EquationUnit="hr" Type="System" TypeName="System Variable" SystemCalculationType="OperatingHours" SystemCalculationName="Operating Hours" TimeDependent="True" Value="744" />
<EquationVariable Id="11082" Name="EmissionRate" Order="46" BaseUnit="lb/hr" Type="FinalResult" TypeName="Final Expression" Formula="(HeatRating*GHGEF)*RunHrs*kgtolb/OpHrs" TimeDependent="True" Value="0.210363418064516" />
<Emission EffectiveDate="1/1/2012 12:00:00 AM" EndDate="2/1/2012 12:00:00 AM" BaseUnit="lb/hr" BaseUnitName="lb/hr" EmissionAmount="0.210363418064516" Unit="lb/hr" UnitName="lb/hr" ResultValue="0.210363418064516" LinkType="Unabated" LinkTypeName="" OperatingHours="744" EmissionMass="156.51038304" EmissionMassUnit="lb" MaterialId="83" EffectiveMaterialId="10082" MaterialName="Methane" MaterialEffectiveDate="1/1/1990 12:00:00 AM" />
</CalculationPeriod>
</Material>
<Material etc...>
</Material>
</EmissionModelMaterial>
<EmissionModelMaterial etc...>
</EmissionModelMaterial>
</EmissionModel>
<EmissionModel etc...>
</EmissionModel>
<ProcessUnitModelScenario etc...>
</ProcessUnitModelScenario>
</Calculation>
My need is to return certain attribute values from the elements for specified combination of [ProcessUnitModelScenario/#ProcessUnitId], [ProcessUnitModelScenario/#ScenarioId], and [Material/#Id].
The XML data is kept in the Air_Calc_Log table Verbose_Xml CLOB field.
In my PL/SQL I am (mis?)using the follow select:
SELECT
XMLType(l.verbose_xml).extract(
'for $scen in /Calculation/ProcessUnitModelScenario
where ($scen/#ScenarioId="10053")
return $scen/* ')
FROM air_calc_log l
WHERE l.vld_site_id = 10030 -- pVldSite
AND l.start_date = To_Date('01/01/2012','mm/dd/yyyy') -- pStartDate
AND l.End_Date = To_Date('04/01/2012','mm/dd/yyyy')
Whatever combination of XQuery/XPath using FLOWR syntax I use I always get the following error message:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00601: Invalid token in: 'for $scen in /Calculation/ProcessUnitModelScenario
where ($scen/#ScenarioId="10053")
return $scen/* '
ORA-06512: at "SYS.XMLTYPE", line 111
Can someone help point out what I'm doing wrong?

Try it like this:
SELECT
XMLType(l.verbose_xml).extract(
'/Calculation/ProcessUnitModelScenario[#ScenarioId="10053"]')
FROM air_calc_log l
WHERE l.vld_site_id = 10030 -- pVldSite
AND l.start_date = To_Date('01/01/2012','mm/dd/yyyy') -- pStartDate
AND l.End_Date = To_Date('04/01/2012','mm/dd/yyyy')
Here is a fiddle (Note that I had to change your XML to make it well-formed)

Related

Splunk strptime returning NaN

I have a eval on a dashboard that used to work but it stopped and I havent been able to figure out why.
On the dashboard im taking the _time and turning it into a human readable string using strftime(_time, "%m/%d/%Y %H:%M:%S %Z") and that works great. The problem comes in when I try to convert it back later for making a link to a search.
For example:
<eval token="endTimestamp">relative_time(strptime($row.Timestamp$, "%m/%d/%Y %H:%M:%S %Z"), "+30m")</eval>
Used to work and return the unix time that I added 30m to, but now strptime just returns NaN but this is the right format. I've checked out all the Splunk docs and everything looks right but it still is broke.
Any idea what I could be doing wrong?
Here is the snippet from my field row im making:
<condition field="Search">
<eval token="startTimestamp">$row.Timestamp$</eval>
<eval token="endTimestamp">relative_time(strptime($row.Timestamp$, "%m/%d/%Y %H:%M:%S %Z"), "+30m")</eval>
<eval token="corKey">$row.Correlation Key$</eval>
<link target="_blank">search?q=(index=### OR index=###) earliest=$startTimestamp$ latest=$endTimestamp$ correlationKey=$corKey$</link>
</condition>
I have taken out everything but the $row.Timestamp$ and that returns something like 10/03/2021 07:41:27 PDT which is the format that I put into it, I just cant do the reverse. I have copied and pasted the format from the strftime and still no luck converting it back so I can do math on it.
Any suggestions?

I don't think it's anything you're doing wrong... but it does seem that strptime/strftime in the dashboard evals don't seem to like %Z for whatever reason. (My Splunk Cloud stack is on version 8.2.2107.1 )
Doing the roundtrip from epoch to string and back within SPL itself seems to work fine... it's just the (javascript driven) dashboard side that doesn't seem to work quite right with timezone abbreviations.
relative_time from an epoch value works fine... and str[pf]time using UTC offsets with %z format also seems to work (which those could be workarounds for you)
I threw together a quick test dashboard to illustrate such things with the variations in formats to see differences... If you (or someone from your company) is on a current support entitlement, I would log a case for this. (I don't think I see anything related in the published known issues at least).
<dashboard version="1.1">
<label>Teddybear Time Drilldown Test</label>
<row>
<panel>
<table>
<search>
<query>
| makeresults
| eval epoch="1633272087", format=mvappend("%m/%d/%Y %H:%M:%S %Z","%m/%d/%Y %H:%M:%S %z","%m/%d/%Y %H:%M:%S"), Search="Go This Row", Reset="Clear"
| fields - _time
| mvexpand format
| eval Timestamp=strftime(epoch,format), roundtrip=strptime(Timestamp,format)
| table Search, Reset, *
</query>
<earliest>-1s</earliest>
<latest>now</latest>
</search>
<option name="drilldown">cell</option>
<option name="rowNumbers">true</option>
<drilldown>
<condition field="Search">
<eval token="timestamp">$row.Timestamp$</eval>
<eval token="strptime">strptime($row.Timestamp$, $row.format$)</eval>
<eval token="strftime">strftime($row.epoch$, $row.format$)</eval>
<eval token="relative_time">relative_time($row.epoch$,"-30m")</eval>
</condition>
<condition field="Reset">
<unset token="timestamp"/> <unset token="strptime"/> <unset token="strftime"/> <unset token="relative_time"/>
</condition>
</drilldown>
</table>
</panel>
</row>
<row>
<panel>
<title>timestamp</title>
<html>
<h2>$timestamp|s$</h2>
</html>
</panel>
<panel>
<title>strptime</title>
<html>
<h2>$strptime|s$</h2>
</html>
</panel>
<panel>
<title>strftime</title>
<html>
<h2>$strftime|s$</h2>
</html>
</panel>
<panel>
<title>relative_time</title>
<html>
<h2>$relative_time|s$</h2>
</html>
</panel>
</row>
</dashboard>

How to remove column header in csv output in BI publisher?

Hi I want to create a report in bi publisher which is
csv format
uses semicolon as the delimiter
no column header
Note that the report is scheduled.
I always get the data like this
GL_ACCOUNT_CODE;GL_ACCOUNT_DESCRIPTION;REPORTING_CODE;REPORTING_DESCRIPTION;ACCOUNT_TYPE;START_DATE;END_DATE
208000;"SITES INTERNET";208000;"208000 desctest";Asset;;
101000;CAPITAL;;;"Owner's Equity";;
218300;"MATERIEL DE BUREAU ET INFO. ST DENIS";;;Asset;;
205000;"CONCESSIONS ET DROITS SIMILAIRES";;;Asset;;
but i just want the data, not the column headers, like this
208000;"SITES INTERNET";208000;"208000 desctest";Asset;;
101000;CAPITAL;;;"Owner's Equity";;
218300;"MATERIEL DE BUREAU ET INFO. ST DENIS";;;Asset;;
205000;"CONCESSIONS ET DROITS SIMILAIRES";;;Asset;;
I tried to use an eText template, but it only returns 0's and question marks. Can you please analyze my template. Thank you.
Format Setup:
<TEMPLATE TYPE> DELIMITER_BASED
<OUTPUT CHARACTER SET> iso-8859-1
<NEW RECORD CHARACTER> Carriage Return
Format Data Records:
<LEVEL> DATA_DS
<NEW RECORD> G_1
<MAXIMUM LENGTH> <FORMAT> <DATA> <COMMENTS>
99 Number ‘GL_ACCOUNT_CODE’
1 Alpha `;` Delimiter
99 Alpha ‘GL_ACCOUNT_DESCRIPTION’
1 Alpha `;` Delimiter
99 Alpha ‘ACCOUNT_TYPE’
1 Alpha `;` Delimiter
99 Number ‘REPORTING_DESCRIPTION’
1 Alpha `;` Delimiter
<END LEVEL> G_1
<END LEVEL> DATA_DS

You can use a eText template to achieve your requirement. See documentation here https://docs.oracle.com/cd/E28280_01/bi.1111/e22254/create_etext_tmpl.htm#BIPRD2908

Change the format of date from "mm/dd/yyyy" to "Month dd, yyyy" in Ruby

I am trying to extract date from XML and compare it with the date in a PDF.
I am using Nokogiri to get the date from XML and PDF-Reader to read the date from PDF.
But the date in XML is in "mm/dd/yyyy" format and the date in PDF is in "Month dd, yyyy" format.
XML Tag:
<LetterSendDate>02/29/2016</LetterSendDate>
Extracting the Date from xml using Nokogiri:
#reader = file('C:\Users\ecz560\Desktop\30004_Standard.pdf').parse_pdf
#xml = file('C:\Users\ecz560\Desktop\30004_Standard.xml').parse_xmlDoc
#LettersendDate = #xml.xpath("//Customer[RTLtr_Loancust='0163426']//RTLtr_LetterSendDate").map(&:text)
Comparing the XML date with the date in PDF:
page_index = 0
#reader.pages.each do |page|
page_index = page_index+1
if expect(page.text).to include #LettersendDate
valid_text = "Given text is present in -- #{page_index}"
puts valid_text
end
end
but expect(page.text) returns February 29, 2016
so it is giving me error while comparing
Error
if expect(page.text).to include #LettersendDate
TypeError: no implicit conversion of String into Array
How can I convert the date from "mm/dd/yy" format to "Month dd, yyyy format" ?

How to extract a particular number from a long string of data using Oracle SQL?

I need the numbers extracted from the string of data contained in a column of a table.
Example string :
<strong>Customer Name</strong>: Hit - julaifnaf afbafbaf Caraballo Pichardo vs PICHARDO ALBERTO<br />
<strong>Address</strong>: NA - abdcinfainaf 42982542542 vs xx<br />
<strong>Country of citizenship</strong>: NA<br />
<strong>Country of residency</strong>: NA<br />
<strong>Date of birth</strong>: NA - xx vs Nov-72<br />
<strong>Place of birth</strong>: NA<br />
<strong>Identification Number</strong>: **1**<br />
<strong>emailDetails</strong>: <br/>
<b>Subject: </b>abcdejnfanfa <br/>
<b>Sent To: </b>abced#test.com<br/>
In the above example string the number I need extracted 1.
The length of the stings and position of the record vary,
but the numbers to be extracted always come after Identification Number</strong>: and before <br /><strong>.
What function can I use to extract this data?

SELECT TO_NUMBER(
REGEXP_SUBSTR(
column_name,
'<strong>Identification Number</strong>:.*?(\d+).*?<br />',
1,
1,
NULL,
1
)
) AS id_number
FROM table_name;

Try this:
select
regexp_replace(column_name,'.*<strong>Identification Number</strong>:[^>\d]*(\d+)[^>\d]*<br\s*/>.*', '\1', 1, 0, 'inm') as id
from html;
PS it's not very reliable solution though, because you can't parse any HTML using RegExp's.
Output:
ID
-----------
1

RSS feed not validating because of substr cutting html characters

Currently unable to get my rss feed to validate through W3C RSS Validator. It seems there's a problem with the time/date. If you click the W3C link it'll show the errors. When I comment out the date it works fine but the date is kinda crucial!!
Here's the original script:
include "db.php";
header("Expires: 0");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("cache-control: no-store, no-cache, must-revalidate");
header("Pragma: no-cache");
header("Content-type: text/xml");
print "<?xml version=\"1.0\" encoding=\"utf-8\" ?>";
?>
<rss version="2.0">
<channel>
<title>MediWales Events</title>
<description>The latest Events, updates and announcements from MediWales.</description>
<link>http://www.mediwales.com</link>
<copyright>Copyright 2011 MediWales.</copyright>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<language>en-us</language>
<lastBuildDate><? print date("D, d M Y H:i:s"); ?> 0000</lastBuildDate>
<managingEditor>info#mediwales.com</managingEditor>
<pubDate><? print date("D, d M Y H:i:s"); ?> 0000</pubDate>
<webMaster>info#mediwales.com</webMaster>
<generator>codeworks rss script (1.0.0)</generator>
<image>
<url>http://mediwales.com/login/uploaded/template/logo.png</url>
<title>MediWales Website</title>
<link>http://www.mediwales.com</link>
<description>The latest Events, updates and announcements from MediWales.</description>
<width>144</width>
<height>52</height>
</image>
<?
$latestnews = mysql_query("SELECT myevents.*, myevents_dates.datefrom from myevents, myevents_dates WHERE myevents_dates.datefrom >= CURDATE() AND myevents.id = myevents_dates.eventid order by myevents_dates.datefrom");
while ($news = mysql_fetch_assoc($latestnews)) {
$datetime = explode(" ",$news[datefrom]);
$date = explode("-",$datetime[0]);
$time = explode(":",$datetime[1]);
$news[description] = strip_tags($news[description]);
$news[description] = htmlspecialchars($news[description]);
echo "<item>";
echo "<title>".mb_convert_encoding(htmlspecialchars($news[title]),"US-ASCII")."</title>";
echo "<description>".mb_convert_encoding(substr($news[description],0, 250),"US-ASCII")."</description>";
echo "<link>http://www.mediwales.com/index.php?id=4&nid=$news[id]</link>";
echo "<pubDate>".date('D, d M Y H:i:s O', mktime($time[0],$time[1],$time[2],$date[1],$date[2],$date[0]))."</pubDate>";
echo "</item>";
}
?>
</channel>
</rss>

Notice that the only error is in the line 56:
nbsp;&</description>
should be:
nbsp;&</description>
The problem is that you are calling htmlspecialchars and then substr, so the last & gets truncated to &, and that makes your feed invalid. Call substr first and htmlspecialchars last, to fix this.
The other things ("Email address is missing real name", "item should contain a guid element") are just recomendations: you should follow them because they are good ideas, but they would caise the feed to fail the validation.

There are a number of other errors you'll need to fix (like cutting off in the middle of an HTML entity). But they provide a Help link for each one.
In specific reference to the date error, if you follow their Help link, you'll see that one of the possible reasons for this warning is that a date is in the future. The date they're complaining about is "Implausible date: Mon, 07 Mar 2011 00:00:00 +0000". Today is 1 Mar 2011, so 7 Mar 2011 is indeed in the future.
If you continue reading their Help link, they explain why this is a problem. The fix is not to include future dates in your feed.

I think they're complaining about the fact that you're using a date that's in the future.
If so, that is not, imho, a reason to declare your feed invalid. Real-world publications often have publication dates in the future.
The spec, which is the actual authority on this doesn't say there's anything wrong with pubdates in the future.
http://cyber.law.harvard.edu/rss/rss.html
Validators can have bugs too. :-)

I've temporarily solved the problem by removing some html characters on my actual website so the feed isn't grabbing them.
I know the problem may arise when we grab the next set of feeds but too rushed to fix at the moment.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Oracle Clob holds complex XML; how to select specific data with Xquery - oracle

Related

Splunk strptime returning NaN

How to remove column header in csv output in BI publisher?

Change the format of date from "mm/dd/yyyy" to "Month dd, yyyy" in Ruby

How to extract a particular number from a long string of data using Oracle SQL?

RSS feed not validating because of substr cutting html characters

Categories

Resources