Oracle PL/SQL regex for sanitize input from textArea - oracle

I have basic editor which allow the user to enter notes. I am using the https://quilljs.com/ API for the editor. The content of the editor will be saved in a database, but before that persisting the data, I want to sanitize the HTML content, to remove all possible JavaScript events in Oracle PL/SQL. I am not able to get a regular expression to sanitize the HTML content before saving.
Example: <p>This is www.test.com</p><p>ffffff</p><p><br></p><p><br></p><p>Review at <a href="http://www.1159pm.com" rel="noopener noreferrer" target="_blank" **onclick="alert()" ondblclick="alert()" onmouseover="alert()" onkeypress="alert()"**>www.1159PM.com</a> </p><p>fffffff</p>'
Expected Result: <p>This is www.test.com</p><p>ffffff</p><p><br></p><p><br></p><p>Review at www.1159PM.com </p><p>fffffff</p>'
All Js events removed. Other scripts and styles should be removed as well. Please help me with the Oracle RegEx to solve this problem.

If your HTML is restricted to valid XHTML (i.e. it has a single root element and each of the opened tags is closed) then you can use:
INSERT INTO table_name (value) VALUES (
XMLQuery(
'copy $i := $p1
modify (delete nodes ($i//#onclick, $i//#ondblclick, $i//#onmouseover, $i//#onkeypress))
return $i'
PASSING XMLTYPE('<html><p>This is www.test.com</p><p>ffffff</p><p><br /></p><p><br /></p><p>Review at www.1159PM.com </p><p>fffffff</p></html>')
AS "p1"
RETURNING CONTENT
).getClobVal()
)
Which, for the table:
CREATE TABLE table_name (value CLOB);
Then the inserted value is:
VALUE
<html><p>This is www.test.com</p><p>ffffff</p><p><br/></p><p><br/></p><p>Review at www.1159PM.com</p><p>fffffff</p></html>
db<>fiddle here

Related

ImportXML function in Google Dynamic XML path

I am trying to import the headlines and landing page URL's from "New + Updated" section of this page:
https://www.nytimes.com/wirecutter/
The issue is that the class "_988f698c" keeps changing as the headline is being replaced with a new headline/topic.
I need a workaround to use IMPORTXML function which will dynamically capture the class of that object in that position. The current formula is:
=IMPORTXML(https://www.nytimes.com/wirecutter/,"//*[#class='_988f698c']")
Here is the html tag for example. The class "_988f698c" refreshes every hour or so with new headlines coming in.
<li class="e9a6bea7">
<a class="_988f698c" href="https://www.nytimes.com/wirecutter/reviews/gir-spatula-review/">Why We Love GIR Spatulas</a>
<p class="_9d1f22a9">today
</p>
</li>
Is there a way I can do this?
Come back a little and look for an alternative path without forcing the use of random numbers.
For the title, use:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/a"
)
For the URL attached to the title:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/a/#href"
)
For the text indicating the day of publication:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/p"
)
If you want to collect everything together, use | to split the paths:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/a |
//ul[#data-testid='new-and-updated']/li/a/#href |
//ul[#data-testid='new-and-updated']/li/p"
)
only use it if you are absolutely sure that the values will always exist, because if they don't, you will have problems with the position in the sheet rows if you define formulas that depend on fixed values in each of the cells.

Is it possible to select the properties of a node a XPATH?

I have an XML of the form:
<articleslist>
<articles>
<originalId>507948</originalId>
<title>Hogan Lovells Training Contract</title>
<slug>hogan-lovells-training-contract</slug>
<metaTitle>Hogan Lovells Training Contract</metaTitle>
<metaDescription>Find out about the Hogan Lovells Training Contract and Application Process</metaDescription>
<language>en</language>
<disableAds>false</disableAds>
<shortUrl>false</shortUrl>
<category_slug>law</category_slug>
<subcategory_slug>industry</subcategory_slug>
<updatedAt>2021-03-15T18:38:51.058+00:00</updatedAt>
<createdAt>2018-11-29T06:42:51.665+00:00</createdAt>
</articles>
</articlelist>
I'm able to select the row values with the XPATH //articles.
How can I select the child properties of articles (i.e. the column headings), so I get back a list of the form:
originalId
title
slug
etc...
Depends on your XPath version.
In XPath 2.0 it's simply //articles/*/name()
In 1.0 it's not possible because there's no such data type as a "sequence of strings". You would have to return the set of elements as //articles/*, and then extract their names in the calling program.

Capybara - Finding a disabled button located in table row

I have a button
<button type="button" id="saveListing" class="button small save-button" data-bind="enable: saveEnabled, click: save"><i class="fa fa-save"></i> Save</button>
located in the tr of a table.
I wrote a function for testing the button status, simply using:
And(/^...the button "(.*?)" on "(.*?)" page is disabled$/) do |button_name, page|
button_id = ConfigHelper.get_button_info(button_name, page)['id']
button_class = ConfigHelper.get_button_info(button_name, page)['class']
if !button_id.nil?
find_button(button_id)[:disabled].should eq 'true'
elsif !button_class.nil?
find(button_class)[:disabled].should eq 'true'
else
button_text = ConfigHelper.get_button_info(button_name, page)['text']
find('button', text: button_text)[:disabled].should eq "true"
end
end
However, this block does not work for a button in a table row. I also tried add checking by button id, and it also did not work. How can I implement it without taking table id as a parameter? (As I don't want to write table id inside the feature)
When using id, the error is:
Capybara::ElementNotFound: Unable to find css ".saveListing"
or using text:
Ambiguous match, found 4 elements matching css "button" (Capybara::Ambiguous)
Thanks.
Capybaras find_button doesn't search css classes at all so unless you have overwritten find_button I'm not sure why you would be getting that error from using it with an id. find_button will search by the id, value, text content, or title attribute of the button, and also supports a disabled filter for searching. More stable (if the status of the button is changing due to JS) versions of those checks would be
find_button('saveListing', disabled: true).should be #Note: no # in front of the id here since its not a css selector
find_button('button text', disabled: true).should be
These would be more stable because they will utilize Capybaras waiting behavior to find the disabled button, whereas the way they were written previously would immediately find the button and error if they weren't yet disabled.
saveListing is the id of your button, not a class. In css selectors, dot is used for classes and hash sign is used for ids.
Therefore, you should either be using #saveListing or .save-button, but not .saveListing. This is why your first matching fails.
As to why the second one does - I guess there are 4 rows, each with one button and Capybara doesn't know which one you are referring to. If you want to check this condition for all of them, you could use all instead of find like this:
all('button', text: button_text).each do |button|
button[:disabled].should eq "true"
end

extract a substring from clob in oracle

I have a clob with data
<?xml version='1.0' encoding='UTF-8'?><root available-locales="en_US" default-locale="en_US"><static-content language-id="en_US"><![CDATA[<script type="text/javascript">
function change_case()
{
alert("here...");
document.form1.type.value=document.form1.type.value.toLowerCase();
}
</script>
<form name=form1 method=post action=''''>
<input type=text name=type value=''Enter USER ID'' onBlur="change_case();">
<input type=submit value=Submit> </form>
</form>]]></static-content></root>
I want to extract the line with the onblur attribute, in this case:
<input type=text name=type value=''Enter USER ID'' onblur="change_case();">
Tom Kyte say how get varchar2 from clob in SQL or PL/SQL code
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:367980988799
And when you have varchar2 you can use SUBSTR or REGEXP_SUBSTR function for extract the line.
http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions147.htm#i87066
http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions116.htm
If you want to use SQL code, you can create this request
select col1, col2, func1(dbms_lob.substr( t.col_clob, 4000, 1 )) from table1 t
And in PL/SQL function "func1" you can do what you want with input string using SUBSTR or any other functions
Subdivide your problem. You want to extract a line of text from your CLOB which contains a particular substring. I can think of two possible interpretations of your requirements:
Option 1.
Split the CLOB into a series of lines - e.g. split it by newline/carriage return characters if that's really what you meant by "line".
Check each line to see if it includes the substring, e.g. onblur. If it does, you have found your line.
Option 2.
If you don't actually mean the line, but you want the <script>...</script> html fragment, you can use similar logic:
Search for the first occurrence of <script>.
Search for the next occurrence of </script> after that point. Extract the substring from <script> to </script>.
Search the substring for onblur. If it is found, return the substring. Otherwise, find the next occurrence of <script>, go to step 2, rinse, repeat.

use YQL with substring-before in xpath

I am trying to get a string before '--' within a paragraph in an html page using the xpath and send it to yql
for example i want to get the date from the following article:
<div>
<p>Date --- the body of the article</p>
</div>
I tried this query in yql:
select * from html where url="article url" and xpath="//div/p/text()/[substring-before(.,'--')]"
but it does not work.
how can I get the date of the article which is before the '--'
You can simply use:
substring-before(//div/p,'--')
Use:
substring-before(/div/p/text(), '--')
This XPath expression evaluates to the string immediately preceding '--' in the first text node in the XML document, that is a child of a p that is a child of the div top element.
In case you want to get this value for every such text node, you have to use an expression like:
substring-before((//div/p/text())[$k], '--')
and evaluate this expression $N times, for $k = 1,2, ..., $N
where $N is count(//div/p/text())
Do note: Try to avoid using the // XPath pseudo-operator always when the structure of the XML document is statically known. Using // usually results in big inefficiency (O(N^2)) that are felt especially painful on big XML documents.

Resources