How can I enter quotes to "white list" on HtmlEncode? - asp.net-mvc-3

Heyy all!
I'm using asp.net mvc 3 and AntiXssLibrary 4.2 and I try to encode some text with single or duble quotes and the problem is that I get ' " instead of ' or " and in Hebrew they are very useful (like רמב"ם or צ'ק). I know that there are included on the hebrew and default parameter on this method:
UnicodeCharacterEncoder.MarkAsSafe(
LowerCodeCharts.Default | LowerCodeCharts.Hebrew,
LowerMidCodeCharts.None,
MidCodeCharts.None,
UpperMidCodeCharts.None,
UpperCodeCharts.None);
I try all the encoding methods with no expected result.
EDIT:
for my second problem that I try to put on my view a html string like this
return new HtmlString(Encoder.HtmlEncode(resFile));
and i get all the html format instead the rendered page, the problem was that microsoft move the GetSafeHtml() method to the HtmlSanitizationLibrary assembly - I find it on this answer and I download it from here. Now I can use it like this
return new HtmlString(Sanitizer.GetSafeHtml(questionsAnswerString));
After that of course I added the reference
using Microsoft.Security.Application;
Now I'm stuck with those qoutes' any help?

Ok, if you get ' " on the html page that is rendered, then it occurs to me that you are running in to the problem of double html encoding.
To replicate your situation, copy and paste the Replication: code in one of your views, and see the problem for yourself.
HtmlString and MvcHtmlString are not supposed to encode a html string that is already encoded. So in your case either the
return new HtmlString(Encoder.HtmlEncode(resFile));
or
Sanitizer.GetSafeHtml(questionsAnswerString)
is returning a string that is Html encoded, and after which in the view you are actually encoding it one more time.
This may happen because in your view which is actually rendering your content, you are using the razor
#alreadyHtmlEncodedString
// razor's # syntax html encodes the given string
//(irrespective of the fact that the given string is not html encoded
//or the given string is html encoded already or whatever.
//it just encodes the given string)
or the aspx
<%:alreadyHtmlEncodedString%>
// aspx's <%: %> html encodes the given string
//(irrespective of the fact that the given string is not html encoded
//or the given string is html encoded already or whatever.
//it just encodes the given string)
So, if that is the case. Either use Html.Raw for the string that is already html encoded. Or just rely on the # syntax of razor for the unsafe non html encoded string, whichever is your way to go.
Replication:
Below is some code for replicating your scenario, if it helps. And a sample output as well as an image. Put the below code in one of your views.
#{string quotes = #"'""";
string quotesHtmlEncoded = Html.Encode(#"'""");
string hebrew = #"like רמב""ם or צ'ק";
string hebrewHtmlEncoded = Html.Encode(#"like רמב""ם or צ'ק");
string sampleXss = "<script>alert('1')</script>";
string sampleXssHtmlEncoded = Html.Encode("<script>alert('1')</script>");
}
<table border="1">
<thead>
<tr>
<th></th>
<th>razor ##
</th>
<th>Raw
</th>
<th>MvcHtmlString.Create
</th>
</tr>
</thead>
<tbody>
<tr>
<td>quotes
</td>
<td>
#quotes
</td>
<td>
#Html.Raw(quotes)
</td>
<td>
#MvcHtmlString.Create(quotes)
</td>
</tr>
<tr>
<td>quotesHtmlEncoded
</td>
<td>
#quotesHtmlEncoded
</td>
<td>
#Html.Raw(quotesHtmlEncoded)
</td>
<td>
#MvcHtmlString.Create(quotesHtmlEncoded)
</td>
</tr>
<tr>
<td>hebrew
</td>
<td>
#hebrew
</td>
<td>
#Html.Raw(hebrew)
</td>
<td>
#MvcHtmlString.Create(hebrew)
</td>
</tr>
<tr>
<td>hebrewHtmlEncoded
</td>
<td>
#hebrewHtmlEncoded
</td>
<td>
#Html.Raw(hebrewHtmlEncoded)
</td>
<td>
#MvcHtmlString.Create(hebrewHtmlEncoded)
</td>
</tr>
<tr>
<td>sampleXss
</td>
<td>
#sampleXss
</td>
<td>
#Html.Raw(sampleXss)
</td>
<td>
#MvcHtmlString.Create(sampleXss)
</td>
</tr>
<tr>
<td>sampleXssHtmlEncoded
</td>
<td>
#sampleXssHtmlEncoded
</td>
<td>
#Html.Raw(sampleXssHtmlEncoded)
</td>
<td>
#MvcHtmlString.Create(sampleXssHtmlEncoded)
</td>
</tr>
</tbody>
</table>
sample output image
.

I'm sorry for the hassle but impossible to put these characters whitelist.
we can see hare on Microsoft Reference Source of MarkAsSafe .
he call ApplyHtmlSpecificValues() and there we can see
private static void ApplyHtmlSpecificValues() {
characterValues['<'] = "lt".ToCharArray();
characterValues['>'] = "gt".ToCharArray();
characterValues['&'] = "amp".ToCharArray();
characterValues['"'] = "quot".ToCharArray();
characterValues['\''] = "#39".ToCharArray();
}
Anyway they keep these characters so you can not get them after encoding.
So the only solution I have seen fit to call this function is always from one place and after its execution just changed the character back :(
return Encoder.HtmlEncode(input).Replace(""", "\"").Replace("'", "'");
10x ;)

Related

how to exclude a table inside in another table in xpath?

I have the follow html file:
<table class="pd-table">
<caption> Tech </caption>
<tbody>
<tr data-group="1">
<td> Electrical </td>
<td> Design </td>
<tr data-group="1">
<td> Output </td>
<td> Function </td>
<tr data-group="7">
<td> EMC </td>
<table>
<tbody>
<tr>
<td> EN 6547 ESD </td>
<td> EN 8901 ESD </td>
<tr data-group="8">
<td> Weight [8] </td>
<td> 27.7 </td>
I can isolate EN 6547 ESD and EN 8901 ESD with the follow xpath:
//table[#class="pd-table"]//tbody//tr//td/table//tr//td/text()').getall()
Any other way is always welcome :)
Another data which I would like to get is to get all the rest of the data without the previous isolated.
Is there any way to do it? :)
Looks like table tag is not closed properly in data-group-7...
Anyway in such cases you can stick to text content of the cell using contains() or text()="some exact text"
response.xpath('//td[contains(text(), "EMC")]').css('td~table tbody td::text').extract()
Your used Xpath uses a lot of unwanted double slash.
See meaning of double slash in Xpath.
The less you use double slash, the better it will perform.
So just use single slash like this:
//table[#class="pd-table"]/tbody/tr/td/table/tr/td/text()
Another way of selecting td's that have two ancestor::table
//td[count(ancestor::table)=2]/text()
And that leads to the answer of your second question:
//td[count(ancestor::table)=1]/text()
An other possibility would just be:
//table[#class="pd-table"]/tbody/tr/td/text()
Or(assuming the second tabel does not have tr's with #data-group):
//tr[#data-group]/td/text()
So you see there are many Xpath's lead to Rome ;-).

XPath find text according last word in the string

I need to find the whole text according last word in the string. I have something like this:
<table>
<tr>
<td style='white-space:nowrap;'>
<a href=''>test</a>
</td>
<td>any text</td>
<td>text text texttofind</td>
<td>Not Available</td>
<td class='aui-lozenge aui-lozenge-default'>text</td>
</tr>
<tr>
<td style='white-space:nowrap;'>
<a href=''>test</a>
</td>
<td>any text</td>
<td>text text texttofind2</td>
<td>Not Available</td>
<td class='aui-lozenge aui-lozenge-default'>text</td>
</tr>
<tr>
<td style='white-space:nowrap;'>
<a href=''>test</a>
</td>
<td>any text</td>
<td>text text texttofind3</td>
<td>Not Available</td>
<td class='aui-lozenge aui-lozenge-default'>text</td>
</tr>
</table>
I need to find whole text vallue according last word texttofind
<td>text text texttofind</td>
I cant use contains, because it will find multiple values. I need something like ends-with but I am using xpath 1.0.
I tried something like this, but I am not sure what is wrong because it is not working
//tr[substring(., string-length(#td)
- string-length('texttofind') + 1) = 'texttofind']
or maybe it would be better to use matches?
You're almost there; try changing your xpath expression to
//tr//td[substring(., string-length(.)
- string-length('texttofind') + 1) = 'texttofind']
and see if it works.

Scraping page with correct xpath using Mechanize and nokogiri

I am trying to access data contained in a table that is itself contained in a table with class ='L1'.
So basically my html structure is like this:
<table class="L1">
<table>
<tr></tr>
<tr>
<td></td>
<td>data</td>
</tr>
<tr>
<td></td>
<td>data</td>
</tr>
...ect...ect
</table>
</table>
I need to catch the data contained in a all <a> </a> that are in the second contained in <tr> </tr> but only starting with the second <tr> of the table.
So far I came up with that:
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr/td[2]/a[1]")
But seems to me that this doesn't express the fact that I want to start only after the second <tr> (second <tr> included?
What would be the right code to do this ?
You can use position() to select the later elements that you want.
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr[position()>1]/td[2]/a[1]")
As the comments on that SO answer say, remember XPath counts from 1, so >1 skips the first tr.

Xpath or CSS selector get specific node

I need to know if 1324 was Win or Loss in a table. How do I select the single <td> Element to know if it was a loss or a win.
<tr>
<td> 1323 </td>
<td> Won </td>
</tr>
<tr>
<td> 1324 </td>
<td> Loss </td>
</tr>
[...]
<tr>
<td> 1328 </td>
<td> Won </td>
</tr>
Whilst the answers are correct in this question, people are forgetting the context: Selenium. You give those XPath's to it, and it'll blow up in your face.
Selenium expects XPath queries to return physical DOM elements, and not attributes from those elements.
You should find the element, and use Selenium to get it's text. This could be .getText(), or .Text or something similar in whatever language you are using (C# and Java examples below - assuming driver is a valid Driver instance):
C#:
driver.FindElement(By.XPath("//td[text()="1324"]/following-sibling::td")).Text;
Java:
driver.findElement(By.xpath("//td[text()="1324"]/following-sibling::td")).getText();
Try this:
//td[text()='1324']/../td[2]/text()

Nokogiri next_element with filter

Let's say I've got an ill formed html page:
<table>
<thead>
<th class="what_I_need">Super sweet text<th>
</thead>
<tr>
<td>
I also need this
</td>
<td>
and this (all td's in this and subsequent tr's)
</td>
</tr>
<tr>
...all td's here too
</tr>
<tr>
...all td's here too
</tr>
</table>
On BeautifulSoup, we were able to get the <th> and then call findNext("td"). Nokogiri has the next_element call, but that might not return what I want (in this case, it would return the tr element).
Is there a way to filter the next_element call of Nokogiri? e.g. next_element("td")?
EDIT
For clarification, I'll be looking at many sites, most of them ill formed in different ways.
For instance, the next site might be:
<table>
<th class="what_I_need">Super sweet text<th>
<tr>
<td>
I also need this
</td>
<td>
and this (all td's in this and subsequent tr's)
</td>
</tr>
<tr>
...all td's here too
</tr>
<tr>
...all td's here too
</tr>
</table>
I can't assume any structure other than there will be trs below the item that has the class what_I_need
First, note that your closing th tag is malformed: <th>. It should be </th>. Fixing that helps.
One way to do it is to use XPath to navigate to it once you've found the th node:
require 'nokogiri'
html = '
<table>
<thead>
<th class="what_I_need">Super sweet text<th>
</thead>
<tr>
<td>
I also need this
</td>
<tr>
</table>
'
doc = Nokogiri::HTML(html)
th = doc.at('th.what_I_need')
th.text # => "Super sweet text"
td = th.at('../../tr/td')
td.text # => "\n I also need this\n "
This is taking advantage of Nokogiri's ability to use either CSS accessors or XPath, and to do it pretty transparently.
Once you have the <th> node, you could also navigate using some of Node's methods:
th.parent.next_element.at('td').text # => "\n I also need this\n "
One more way to go about it, is to start at the top of the table and look down:
table = doc.at('table')
th = table.at('th')
th.text # => "Super sweet text"
td = table.at('td')
td.text # => "\n I also need this\n "
If you need to access all <td> tags within a table you can iterate over them easily:
table.search('td').each do |td|
# do something with the td...
puts td.text
end
If you want the contents of all <td> by their containing <tr> iterate over the rows then the cells:
table.search('tr').each do |tr|
cells = tr.search('td').map(&:text)
# do something with all the cells
end

Resources