Excluding blockquote from forum post with xpath - xpath

I'm trying to extract forum posts (message2) while getting rid of the blockquote (message1). Here is the HTML (post content modified/simplified):
<div class="cPost_contentWrap ipsPad">
<div data-controller="core.front.core.lightboxedImages" class="ipsType_normal ipsType_richText ipsContained" itemprop="text" data-role="commentContent">
<blockquote data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentid="40244" data-ipsquote-contenttype="forums" data-ipsquote-contentapp="forums" data-cite="aries_gurl" data-ipsquote-username="aries_gurl" data-ipsquote-contentcommentid="584324" class="ipsQuote" data-ipsquote="">
<div>
(message1)
</div>
</blockquote>
<p>(message2)</p>
</div>
I am trying with the following XPath query:
//div[#class="ipsType_normal ipsType_richText ipsContained"]/p[not(#class="ipsQuote")]
For some reason, however, this query returns all subsequent posts under the same case rather than just the current node -so, taking the above as a reference, the returned results would be: message2 message2 message2 message2, and so on (total N of messages).
Is there a way I can get one message at a time? Thank you!

Is there a way I can get one message at a time?
Yes ;) use:
(//div[#class="ipsType_normal ipsType_richText ipsContained"]/p[not(#class="ipsQuote")])[1]
for the first one. And [n] with n=1..x for the others.

Related

XPath "and" Confusion

I recently started a new job that uses cucumber/Gherkin along with selenium. I was trying to create a XPath for a specific element. The xml looks slightly like this...
<p>
<div class="slds-text-title_bold slds-m-bottom_x-small ncc-input-label">
Amp
</div>
<div class="slds-text-title_bold slds-m-bottom_x-small ncc-input-label required-field-label">
Voltage
</div>
</p>
I am looking to only get the div with the required field label in the class and text of "Voltage" So far this kinda works...
//div[contains(text(), "Voltage")] | //*[contains(class, "required-field-label")]
however I'm getting way too many false positives. Any time I change the pipe into "and" I get nothing. What am I doing wrong?
HCSloan
Try the following expression on your actual code, and see if it works:
//div[contains(#class, "required-field-label")][contains(text(), "Voltage")]
You can match the element using "and" like this:
//div[contains(#class, 'required-field-label') and contains(text(), 'Voltage')]

To compare selenium xpath values

You are trying to run xpath values by comparing them.
You want to compare whether there are comments or not.
<div class="media-body">
<a href="https://url" class="ellipsis">
<span class="pull-right count orangered">
+26 </span>
post title </a>
<div class="media-info ellipsis">
admin <i class="fa fa-clock-o"></i> date </div>
</div>
If there is a comment, span class="full-right count or changed" is generated. If you don't have it, it won't be produced.
xpath comment //*[#id="thema_wrapper"]/div[3]/div/div/div[3]/div/div[7]/div[2]/div[1]/div[2]/a/span
xpath nocomment //*[#id="thema_wrapper"]/div[3]/div/div/div[3]/div/div[7]/div[2]/div[1]/div[2]/a/
I think we can compare this with if,else,but I don't know how.
if
#nocomment start
else
#comment stop
I searched a lot for the data, but I couldn't find it. Please help me.
Here's an XPath example to select/click on something without comment. This website seems to use the same system as your sample data :
http://cineaste.co.kr/
To select the entries with no comment for the movies block ("영화이야기"), just use :
//h3[.="영화이야기"]/following::div[#class="widget-small-box"][1]//li[#class="ellipsis"][not(contains(.,"+"))]
We verify the presence of the "+" in the li node to filter the data.
Oh, it's the same system. I tested it and there was an error.
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//h3[.='영화이야기']/following::div[#class='widget-small-box'][1]//li[#class='ellipsis'][not(contains(.,'+'))]"}
(Session info: chrome=81.0.4044.138)
from selenium import webdriver
import time
path = "C:\chromed\chromedriver.exe"
driver = webdriver.Chrome(path) #path
'''
'''
driver.get("http://cineaste.co.kr/") #url
time.sleep(0.5)
postclick = driver.find_element_by_xpath("//h3[.='영화이야기']/following::div[#class='widget-small-box'][1]//li[#class='ellipsis'][not(contains(.,'+'))]") #로그인창 활성화
postclick.click()
driver.close()
Could you make an example with the site? I want to ignore the posts with comments and just click the ones without comments.

Obtain an xpath element containing another element with an specific class

Hello I have this HTML:
<div class="_3Vhpd"><span>Your commerce Data</span>
<a class="n3G0C" href='http://www.webadress.......'><span>Some Text</span</a>
</div>
I tried to obtain the tag as follow:
parser.xpath('//div[contains(#class,"_3Vhpd")]//following-sibling::*[a[#class="n3G0C"]]/#href ')
but I received none '[]'. Maybe because is not just after div but after a span...
First, you sample html doesn't have a class="n3G0C", but assuming you fix it, this xpath expression should work:
//div[contains(#class,"_3Vhpd")]//following-sibling::a/#href
Output:
http://www.webadress.......

How to get the whole title which consists of several spans with XPATH?

How to get the whole title:
Iphone case :) #phonecases#xmas#iphone#case
When the title does not include hashtags I can get all the title with this xpath:
((//*[#class='pinWrapper'])[2]//span)[1]/text()
This line:
((//*[#class='pinWrapper'])[2]//span)[1]//text()[normalize-space()]
returns only the first one: Iphone case :).
And this:
((//*[#class='pinWrapper'])[2]//span)[1][string()]
returns whole xml:
<span>Iphone case :) <span class="pinHashtag">#phonecases</span> <span class="pinHashtag">#xmas</span> <span class="pinHashtag">#iphone</span> <span class="pinHashtag">#case</span></span>
If ((//*[#class='pinWrapper'])[2]//span)[1]/text() returns you first text node only, try
string(((//*[#class='pinWrapper'])[2]//span)[1])
to get complete string

How to check of a Node is inside a form tag?

Using XPath, how do I determine if a node is within a form tag? I guess I am trying to locate the form tag of its ancestor/preceding (but I couldn't get it to work).
example 1:
<form id="doNotKnowIDofForm">
<div id="level1">
<span id="mySpan">someText</span>
</div>
</form>
example 2:
<form id="doNotKnowIDofForm">
This is a closed form.
</form>
<div id="level1">
<span id="mySpan">someText</span>
</div>
</form>
I can use xpath "//span[id='mySpan']" to locate the span node. But I would like to know if mySpan is inside a form (I do not know the id of the form). I have tried "//span[id='mySpan']/preceding::form/" and "//span[id='mySpan']/ancestor::form/"
Thanks in advance.
EDIT: I would like the XPath to select the myForm form tag in Example1 but NOT in Example2
I'm not 100% sure from your description whether you're looking to select the form element, or the span element. It seems more likely that you're going for the form, so I'll address that first.
Your XPath with the ancestor::form would have been ok if it didn't have the slash at the end, but it's more roundabout than it needs to be. I think this is a better way:
//form[.//span/#id = 'mySpan']
or this:
//form[descendant::span/#id = 'mySpan']
To produce an XPath that locates certain nodes only if they are within a form, you would put the ancestor::form inside the predicate:
//span[#id = 'mySpan' and ancestor::form]
or you can do this, which would again be more straightforward:
//form//span[#id = 'mySpan']
Your own attempt
//span[id='mySpan']/ancestor::form/
looks fine to me.
You can simply use,
"form//span[id='mySpan']"

Resources