escaping double quote and comma for to generate CSV - apex-code

Here is the data that I need to parse them as CSV.
Actually, I am making CSV string which I will need to import to another system.
Basically, I append comma between each field that I query from DB.
Column1 :
Testing
Column 2:
<p class="MsoNormal" style=""><b><span style="font-size: 10.0pt; ">This is just a test, test2, test3</span></b><span style="font-size: 10.0pt; "></span></p>
Column 3:
Blah Blah
Now, I am facing problem of retaining double quotes and comma (as I need to save as in HTML format of this text).
I try to append double quote for for the column2 data at the start and end, but it doesn't work out.
Any suggestion for this?

String has the instance method escapeCSV, this should be what you need.
If you need something different you could always use replace to replace any characters you want to escape with escapeCharacter+originalCharacter eg. (" => \").

Related

Finding the xpath of a class name with \n and spaces

This may be an easy question, I'm new to this.
I'm trying to get the data within this div
<div class="search-results-listings
" vocab="http://schema.org/" typeof="SearchResultsPage">
response.xpath("//div[#class='search-results-listings\n']")
and
response.xpath("//div[#class='search-results-listings\n ']")
are returning empty arrays
You can use XPath's contains:
response.xpath("//div[contains(#class, 'search-results-listings')]")

Nokogiri: How to select the value of an attribute that contains periods in its id?

I've been working with Nokogiri for a couple of days and I absolutely adore it. Everything was working brilliantly until I got a requirement to scrape a website that uses the data-reactid javascript attribute tag. The problem is that Nokogiri seems to be getting confused with the attribute id format this website is using (several periods, some dollar signs and some other invalid xml/css characters):
An example of what I need to scrape would be:
<td data-reactid=".3.3.1:$contract_23.$=1$dataRow:0.1">94.280</td>
I need the value (94.280) inside of the attribute with an id of ".3.3.1:$contract_23.$=1$dataRow:0.1"
which usually in nokogiri we would select by doing something like:
doc.css("type[attributename=attributeid]")
in my example it would be:
doc.css("td[data-reactid=.3.3.1:$contract_23.$=1$dataRow:0.1]")
but no matter what I do to escape the invalid characters, it keeps telling me there is an invalid character after my equals sign:
Error message for code above:
nokogiri-1.4.3.1/lib/nokogiri/css/parser.rb:78:in `on_error': unexpected '.3' after 'equal'
I've tried:
a) Getting my string defined as a variable and forced into a string
b) Escaping it with backslashes (.3.[...])
c) Prefixing it with a hash (#.3.3[...])
d) Escaping it using cgi escapedString
e) Placing it inside '%{ }' eg '%{.3.3[...]}'
No matter what I do, I keep getting the same message (except for option e which gives me an altogether different error message:
: no .<digit> floating literal anymore; put 0 before dot
Can you guys help me get the right value with such an oddly-named attribute?
You didn't show how you are parsing your document, but if I parse it as HTML and then use single quotes around the attribute value in the css selector, I can get the tag:
require 'nokogiri'
html = <<END_OF_HTML
<td data-reactid="hello">10</td>
<td data-reactid=".3.3.1:$contract_23.$=1$dataRow:0.1">94.280</td>
<td data-reactid="goodbye">20</td>
END_OF_HTML
html_doc = Nokogiri::HTML(html)
html_doc.css("td[data-reactid='.3.3.1:$contract_23.$=1$dataRow:0.1']").each do |tag|
puts tag.text
end
--output:--
94.280
Check out the Mothereffing Unquoted Attribute Value Validator via this SO post:
CSS attribute selectors: The rules on quotes (", ' or none?)

Replacing <a> tags that have two pairs of double quotes

I have asked a similar question before but this one is slightly different
I have content with this sort of links in:
Professor Steve Jackson
[UPDATE]
And this is how i read it:
content = doc.xpath("/wcm:root/wcm:element[#name='Body']").inner_text
The links has two pairs of double quotes after the href=.
I am trying to strip out the tag and retrieve only the text like so:
Professor Steve Jackson
To do this I'm using the same method which works for this sort of link which has only a single pair of double quotes:
World
This returns World:
content = Nokogiri::XML.fragment(content_with_link)
content.css('a[href^="ssLINK"]')
.each{|a| a.replace("<>#{a.content}</>")}
=>World
When I try To do the same for the link that has two pairs of double quotes it complains:
content = Nokogiri::XML.fragment(content_with_link)
content.css('a[href^=""ssLINK""]')
.each{|a| a.replace("<>#{a.content}</>")}
Error:
/var/lib/gems/1.9.1/gems/nokogiri-1.6.0/lib/nokogiri/css/parser_extras.rb:87:in
`on_error': unexpected 'ssLINK' after '[:prefix_match, "\"\""]' (Nokogiri::CSS::SyntaxError)
Anyone know how I can overcome this issue?
I can suggest you two ways to do it, but it depends on whether : every <a> tag has href's with two "" enclosing them or its just the one with ssLINK
Assume
output = []
input_text = 'Professor Steve Jackson'
1) If a tags has href with "" only with ssLink then just do
Nokogiri::HTML(input_text).css('a[href=""]').each do |nokogiri_obj|
output << nokogiri_obj.text
end
# => output = ["Professor Steve Jackson"]
2) If all the a tags has href with ""then you can try this
nokogiri_a_tag_obj = Nokogiri::HTML(input_text).css('a[href=""]')
nokogiri_a_tag_obj.each do |nokogiri_obj|
output << nokogiri_obj.text if nokogiri_obj.has_attribute?('sslink')
end
# => output = ["Professor Steve Jackson"]
With this second approach if
input_text = 'Professor Steve Jackson Some other TextSecond link'
then also the output will be ["Professor Steve Jackson"]
Your content is not XML, so any attempt to solve the problem using XML tools such as XSLT and XPath is doomed to failure. Use a regex approach, e.g. awk or Perl. However, it's not immediately obvious to me how to match
<a href="" sometext"">
without also matching
<a href="" sometext="">
so we need to know a bit more about this syntax that you are trying to parse.

xpath expression to remove whitespace

I have this HTML:
<tr class="even expanded first>
<td class="score-time status">
<a href="/matches/2012/08/02/europe/uefa-cup/">
16 : 00
</a>
</td>
</tr>
I want to extract the (16 : 00) string without the extra whitespace. Is this possible?
I. Use this single XPath expression:
translate(normalize-space(/tr/td/a), ' ', '')
Explanation:
normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.
translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.
II. Alternatively:
translate(/tr/td/a, '
&#13', '')
Please try the below xpath expression :
//td[#class='score-time status']/a[normalize-space() = '16 : 00']
You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]
I came across this thread when I was having my own issue similar to above.
HTML
<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
<a href="/nsomar/OAStackView/releases/tag/1.0.1">
1.0.1
</a>
XPath start command
tree.xpath('//div[#class="d-flex"]/h4/a/text()')
However this grabbed random whitespace and gave me the output of:
['\n ', '\n 1.0.1\n ']
Using normalize-space, it removed the first blank space node and left me with just what I wanted
tree.xpath('//div[#class="d-flex"]/h4/a/text()[normalize-space()]')
['\n 1.0.1\n ']
I could then grab the first element of the list, and use strip() to remove any further whitespace
XPath final command
tree.xpath('//div[#class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()
Which left me with exactly what I required:
1.0.1
you can check if text() nodes are empty.
/path/text()[not(.='')]
it may be useful with axes like following-sibling:: if these are no containers, or with child::.
you can use string() or the regex() function of xpath 2.
NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().
if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.
you can separate node and string manipulation
So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).

extract a substring from clob in oracle

I have a clob with data
<?xml version='1.0' encoding='UTF-8'?><root available-locales="en_US" default-locale="en_US"><static-content language-id="en_US"><![CDATA[<script type="text/javascript">
function change_case()
{
alert("here...");
document.form1.type.value=document.form1.type.value.toLowerCase();
}
</script>
<form name=form1 method=post action=''''>
<input type=text name=type value=''Enter USER ID'' onBlur="change_case();">
<input type=submit value=Submit> </form>
</form>]]></static-content></root>
I want to extract the line with the onblur attribute, in this case:
<input type=text name=type value=''Enter USER ID'' onblur="change_case();">
Tom Kyte say how get varchar2 from clob in SQL or PL/SQL code
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:367980988799
And when you have varchar2 you can use SUBSTR or REGEXP_SUBSTR function for extract the line.
http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions147.htm#i87066
http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions116.htm
If you want to use SQL code, you can create this request
select col1, col2, func1(dbms_lob.substr( t.col_clob, 4000, 1 )) from table1 t
And in PL/SQL function "func1" you can do what you want with input string using SUBSTR or any other functions
Subdivide your problem. You want to extract a line of text from your CLOB which contains a particular substring. I can think of two possible interpretations of your requirements:
Option 1.
Split the CLOB into a series of lines - e.g. split it by newline/carriage return characters if that's really what you meant by "line".
Check each line to see if it includes the substring, e.g. onblur. If it does, you have found your line.
Option 2.
If you don't actually mean the line, but you want the <script>...</script> html fragment, you can use similar logic:
Search for the first occurrence of <script>.
Search for the next occurrence of </script> after that point. Extract the substring from <script> to </script>.
Search the substring for onblur. If it is found, return the substring. Otherwise, find the next occurrence of <script>, go to step 2, rinse, repeat.

Resources