I want to find and replace numerics in my html file as follows e.g.: href="#1" to href="#12". Basically I want to add 10 to the exiting value
from
<p>bla bla
<p>bla bla
to
<p>bla bla
<p>bla bla
how can I do this in textmate?
Related
I have this html sample
<html>
<body>
....
<p id="book-1" class="abc">
<b>
book-1
section
</b>
"I have a lot of "
<i>different</i>
"text, and I want "
<i>all</i>
" text and we may or may not have italic surrounded text."
</p>
....
the xpath I currently have is this:
#"/html[1]/body[1]/p[1]/text()"
this gives this result:
I have a lot of
but I want this result:
I have a lot of different text, and I want all text and we may or may not have italic surrounded text.
Thanks for your help.
In XPath 2 and higher you could use string-join(/html[1]/body[1]/p[1]/b/following-sibling::node(), '') I think. It is not quite clear which nodes you want but that would select all sibling nodes following the b child of the p and then concatenate their string values into one.
<div class="jokeContent">
<h2 style="color:#369;">Can I be Frank</h2>
What did Ellen Degeneres say to Kathy Lee?
<p></p> <p>Can I be Frank with you? </p>
<p>Submitted by Calamjo</p>
<p>Edited by Curtis</p>
<div align="right" style="margin-top:10px;margin-bottom:10px;">#joke #short </div>
<div style="clear:both;"></div>
</div>
So I am trying to extract all text after the <\h2> and before the [div aign = "right" style=...] nodes.
What I have tried so far:
jokes = response.xpath('//div[#class="jokeContent"]')
for joke in jokes:
text = joke.xpath('text()[normalize-space()]').extract()]
if len(text) > 0:
yield text
This works to some extend, but the website is inconsistent in the html and sometimes the text is embedded in <.p> TEXT <\p> and sometimes in <.br> TEXT <\br> or just TEXT.
So I thought just extracting everything after the header and before the style node might make sense and then the filtering can be done afterwords.
If you are looking for a literal xpath of what you are describing, it could be something like:
In [1]: sel.xpath("//h2/following-sibling::*[not(self::div) and not(preceding-sibling::div)]//text()").extract()
Out[1]: [u'Can I be Frank with you? ', u'Submitted by Calamjo', u'Edited by Curtis']
But there's probably a more logical, cleaner conclusion:
In [2]: sel.xpath("//h2/following-sibling::p//text()").extract()
Out[2]: [u'Can I be Frank with you? ', u'Submitted by Calamjo', u'Edited by Curtis']
This is just selecting paragraph tags. You said the paragraph tags might be something else and you can match several different tags with self::tag specification:
In [3]: sel.xpath("//h2/following-sibling::*[self::p or self::br]//text()").extract()
Out[3]: [u'Can I be Frank with you? ', u'Submitted by Calamjo', u'Edited by Curtis']
Edit: apparently I missed the text under the div itself. This can be ammended with | - or selector:
In [3]: sel.xpath("//h2/../text()[normalize-space(.)] | //h2/../p//text()").extract()
Out[3]:
[u'\n What did Ellen Degeneres say to Kathy Lee? \n ',
u'Can I be Frank with you? ',
u'Submitted by Calamjo',
u'Edited by Curtis']
normalize-space(.) is there only to get rid of text values that contain no text (e.g. ' \n').
You can append the first part of this xpath to any of the above and you'd get similar results.
I'm testing a poorly written webpage where on one page the first heading is a H1 then on the next page it's a H2.
Usually I would write something like find('h1', text: 'bla bla bla') or expect(find('h1')).to have_text 'bla bla bla'
As it keeps changing between H1 and H2, is there a way to say find('h1' || 'h2', text: 'bla bla bla)
I'd like to keep the test looking within the headers as the text sometimes exists within the body of the page too.
I am trying to fetch the numeric value after strong tag, as its not an web element, I am not able to get the value 123456789 in to variable:
If I use Get Text xpath=//*[#id='referral-or-navinet-reference-number'] then the result is "Referral #: 123456789"
Please help me in getting only numeric value in to variable.
HTML Code:
<td class="normal-text" id="referral-or-navinet-reference-number" align="right">
<strong>Referral #:</strong> 123456789
</td>
You can directly use split method of python
Like :-
x.split(":") // x is a string variable of your gettext
http://www.tutorialspoint.com/python/string_split.htm
http://www.pythonforbeginners.com/dictionary/python-split
Hope it will help you :)
If your td only contains the wanted text as content text you may use the following xpath:
//*[#id='referral-or-navinet-reference-number']/text()
This should return 123456789 (perhaps with some whitespace)
You can use given xpath :
//td[#id="referral-or-navinet-reference-number"]/text()[normalize-space()]
My code is like this,
<div>
<strong> Text1: </strong>
1234
<br>
<strong> Text2: </strong>
5678
<br>
</div>
where numbers, 1234 and 5678 are generated dynamically. When I take XPath of Text2 : 5678, it gives me like /html/body/div[7]/div/div[2]/div/div[2]/div[2]/br[2]. This does not work for me. I need to take XPath of only "Text2 : 5678". any help will be appreciated. (I am using selenium webdriver and C# to code my test script)
I second #Anil's comment above. The text "Text2:" is retrievable as it is within "strong" element. But, "5678" comes under div and is not the innerHTML for either "strong" or "br".
Hence, to retrieve the text "Text 2: 5678", you'll have to retrieve the innerHTML/text of "div" and modify it accordingly to get the required text.
Below is a Java code snippet to retrieve the text:-
WebElement ele = driver.findElement(By.xpath("//div"));
System.out.print(ele.getText().split("\n")[1]; //Splitting using newline as the split string.
I hope you can formulate the above in C#.