xpath help cannot extract second element - xpath

Sorry very new to xpath
<figure id="image0" data-zoom-src="//www.XXX.com/2afb588db7c3c044a6e7594fe94f1c3b.jpg">
I am trying to extract data-zoom-src of this using xpath.
Not sure how to achieve this tried numerous options but stump .
If anyone know of great xpath reference as well
Thanks for help in advance
Darz

If you only want to get the value of data-zoom-src you can use string():
string(figure/#data-zoom-src)
or, depending on the location of the figure element, e.g.
string(//figure/#data-zoom-src)
returns
//www.XXX.com/2afb588db7c3c044a6e7594fe94f1c3b.jpg
It may depend on the XPath parser that you use as I had to add a closing </figure> tag to get the result.
For the question about recommendable XPath references: Questions on Stackoverflow to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic as they tend to attract opinionated answers and spam. But you can click on the xpath tag label and then on "learn more" to get to the Stackoverflow XPath tag wiki page. There you'll find some useful links and reliable information.

Related

I have doubts about two mappings on the site-prism

I don't want to use xpath on the elements below.
element :img_login, :xpath, '//[#id="main-wrapper"]/div/section/div/div[2]/div/div/div[1]/img'
element :msg_login_senha_invalidos, :xpath, '//[#id="main-wrapper"]/div/section/div/div[2]/div/div/div[2]/div/p'
They are on the page as follows:
element img_login
<div class="sc-jRQAMF eRnhep">
<img src="https://quasar-flash-staging.herokuapp.com/assets/login/flashLogo-3a77796fc2a3316fe0945c6faf248b57a2545077fac44301de3ec3d8c30eba3f.png" alt="Quasar Flash">
</div>
element msg_login_senha_invalidos
<p class="MuiFormHelperText-root MuiFormHelperText-contained Mui-error MuiFormHelperText-filled">Login e/ou senha inválidos</p>
You have asked multiple questions about converting from using XPath to some other type of selector when using Site-Prism. StackOverflow is meant to be a place to come, learn, and improve your skills - not just to get someone else to do your work. It really seems you'd be better off reading up on CSS and how it can be used to select elements. Also note that there's nothing specifically wrong with using XPath, per se, it's just the way people new to testing and selecting elements on a page tend to use it (just copying a fully specified selector from their browser) that leads to having selectors that are way too specific and therefore brittle. A good site for you to learn about the different general CSS selector options available is https://flukeout.github.io/ - and you can look at the built-in selector types provided by Capybara at https://github.com/teamcapybara/capybara/blob/master/lib/capybara/selector.rb#L18
In your current case the below may work, but with the HTML you have provided all that's possible to say is that they will match the elements shown however they may also match other elements which will give you ambiguous element errors.
element :img_login, :css, 'img[alt="Quasar Flash"]' # CSS attribute selector
element :msg_login_senha_invalidos, :css, 'p.Mui-error', text: 'Login e/ou senha inválidos' # CSS class selector combined with Capybara text filter

How to remove tweet photos from twitter widget?

to display and style my tweets on my website I use somethiong like this code:
<a
class="twitter-timeline"
href="https://twitter.com/YourNickname"
data-widget-id="xxxxxxxxxxxxxxxxxxx" <!--you will haveyour own number http://stackoverflow.com/questions/16375116/what-is-data-widget-id-in-twitter-api-how-i-can-get-the-data-widget-id-->
data-chrome="noheader nofooter noborders noscrollbar transparent" <!--tweak these for the looks-->
data-tweet-limit="5"
data-link-color="#FFFFFF"
data-border-color="#FFFFFF"
lang="EN" data-theme="light" <!--light or dark-->
height="447"
width="255"
data-screen-name="yourName"
data-show-replies="false"
data-aria-polite="assertive">
Tweets by #YourName
</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
I'm looking for a way to remove images in case they are twitted.
Looking for a solution I've found this:
https://dev.twitter.com/discussions/19073
They suggest to add a line
data-src-2x="false"
to my code.
This solution, sadly, doesn't seem to work for me.
If any of you have solved this problem, a hint would be much appreciated!
Cheers!
The following answer is no longer relevant due to recent changes by twitter!
In the configuration options when creating the widget there is a checkbox labeled 'auto expand photos'. Uncheck this, but be aware it takes

xpath br (line breaks) in p (paragraph)

Below is an example XML.
<p>
Thisisgood
</p>
<p>
Thisisbad
</p>
<p>
This
<br>
is
<br>
acceptable
</p>
<p>
Thisisfine
</p>
I want the result:
Thisisgood
Thisisbad
Thisisacceptable
Thisisfine
I use Xpath //p/text() in Google Doc (=importXML). This results in:
Thisisgood
Thisisbad
This is acceptable (appearing in different cells)
Thisisfine
What XPath would give me the result I need? Thank you.
You cannot solve this problem using XPath 1.0. Using XPath 2.0, you'd just do a
//p/string-join(text(), '')
but this is not supported by Google Spreadsheet.
I'm pretty sure you can use ARRAYFUNCTION and JOIN in Google Spreadsheet, but cannot help you with this. Better ask a new question with appropriate tags for Google Spreadsheets so people following that tag get notified, and provide an example Spreadsheet using the ImportXML function so people can work with it.
I had the same problem. I used this code
=Trim(JOIN("",L3:X3))
L3:X3 are the cells
//p/ without text() must be enough to get this: Thisisacceptable

Avoiding duplicate-content hit on Google for archive pages?

Each blog post on my site -- http://www.correlated.org -- is archived at its own permalinked URL.
On each of these archived pages, I'd like to display not only the archived post but also the 10 posts that were published before it, so that people can get a better sense of what sort of content the blog offers.
My concern is that Google and other search engines will consider those other posts to be duplicate content, since each post will appear on multiple pages.
On another blog of mine -- http://coding.pressbin.com -- I had tried to work around that by loading the earlier posts as an AJAX call, but I'm wondering if there's a simpler way.
Is there any way to signal to a search engine that a particular section of a page should not be indexed?
If not, is there an easier way than an AJAX call to do what I'm trying to do?
Caveat: this hasn't been tested in the wild, but should work based on my reading of the Google Webmaster Central blog and the schema.org docs. Anyway...
This seems like a good use case for structuring your content using microdata. This involves marking up your content as a Rich Snippet of the type Article, like so:
<div itemscope itemtype="http://schema.org/Article" class="item first">
<h3 itemprop="name">August 13's correlation</h3>
<p itemprop="description" class="stat">In general, 27 percent of people have never had any wisdom teeth extracted. But among those who describe themselves as pessimists, 38 percent haven't had wisdom teeth extracted.</p>
<p class="info">Based on a survey of 222 people who haven't had wisdom teeth extracted and 576 people in general.</p>
<p class="social"><a itemprop="url" href="http://www.correlated.org/153">Link to this statistic</a></p>
</div>
Note the use of itemscope, itemtype and itemprop to define each article on the page.
Now, according to schema.org, which is supported by Google, Yahoo and Bing, the search engines should respect the canonical url described by the itemprop="url" above:
Canonical references
Typically, links are specified using the element. For example, the
following HTML links to the Wikipedia page for the book Catcher in the
Rye.
<div itemscope itemtype="http://schema.org/Book">
<span itemprop="name">The Catcher in the Rye</span>—
by <span itemprop="author">J.D. Salinger</a>
Here is the book's <a itemprop="url"
href="http://en.wikipedia.org/wiki/The_Catcher_in_the_Rye">Wikipedia
page.
http://schema.org/docs/gs.html#advanced_enum
So when marked up in this way, Google should be able to correctly ascribe which piece of content belongs to which canonical URL and weight it in the SERPs accordingly.
Once you've done marking up your content, you can test it using the Rich Snippets testing tool, which should give you a good indication of what Google things about your pages before you roll it into production.
p.s. the most important thing you can do to avoid a duplicate content penalty is to fix the titles on your permalink pages. Currently they all read 'Correlated - Discover surprising correlations' which will cause your ranking to take a massive hit.
I'm afraid but I think it is not possible to tell a Search Engine that a specif are of your web page should not be be indexed (example a div in your HTML source). A solution to this would be to use an Iframe for the content you do not what search engine to index, so I would use a robot.text file with an appropriate tag Disallow to deny access to that specific file linked to the Iframe.
You can't tell Google to ignore portions of a web page but you can serve up that content in such a way that the search engines can't find it. You can either place that content in an <iframe> or serve it up via JavaScript.
I don't like those two approaches because they're hackish. Your best bet is to completely block those pages from the search engines since all of the content is duplicated anyway. You can accomplish that a few ways:
Block your archives using robots.txt. If your archives in are in their own directory then you can block the entire directory easily. You can also block individual files and use wildcards to match patterns.
Use the <META NAME="ROBOTS" CONTENT="noindex"> tag to block each page from being indexed.
Use the X-Robots-Tag: noindex HTTP header to block each page from being indexed by the search engines. This is identical in effect to using the ` tag although this one can be easier to implement since you can use it in a .htaccess file and apply it to an entire directory.

How does Cufon affect SEO and Search Bots? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've been searching the web and can't find an answer to the question
of how using Cufon affects SEO (the way bots from Google, Bing,
Yahoo... read the page). I know the original text is still there, but
it is inside a tag, inside a tag, and is next to
a tag (instead of next to the word that should be next to
it). In other words, do the search bots read "search by", the same
way they'd read the cufon generated html below?
<cufon class="cufon cufon-canvas" alt="search" style="width: 72px;
height: 28.1667px;">
<canvas width="95" height="28" style="width: 95px; height: 28px; top:
0px; left: -5px;"/>
<cufontext>search</cufontext>
</cufon>
<cufon class="cufon cufon-canvas" alt=" by:" style="width: 36px;
height: 28.1667px;">
<canvas width="68" height="28" style="width: 68px; height: 28px; top:
0px; left: -5px;"/>
<cufontext> by:</cufontext>
</cufon>
I really like cufon since I'm not much of a graphics guy, but I also
don't want to ruin any good SEO I've got going.
Thanks in advance for any help or advice,
Chuck Foster
Cufon does not affect SEO at all. Its rendering engine is written in Javascript, and search engines don't read Javascript.
The code snippet you posted is what HTML looks like in your browser after Cufon has done its job; the search engines will only see your original html (the one you view when you click on View > Page Source in Firefox for instance).
A handy tip I learned while reading up on Google SEO is to take a look at your page in a text-viewer to give you a sense of what's visible to Google. You can do that with this handy tool: http://www.yellowpipe.com/yis/tools/lynx/lynx_viewer.php
Notice how your cufon shows up just fine.
Theoretically Cufon shouldn't affect search rankings as it is rendered after the page loads by Javascript. The actual source code still contains the heading. Despite this I found that there were quite a few conflicting opinions about the search-friendliness of Cufon so I've done a small study to try and get some data on whether it does actually affect rankings, here it is: Cufon SEO Effects
The study finds that Cufon doesn't have any direct effect on search rankings, although you could argue that the marginal increase in a page load time on a site that includes the Cufon Javascript file could potentially affect rankings, although in my opinion this difference would be minor.
No SEO impact at all. Much better than sFIR IMO for two reasons. 1. Faster, 2. Simplicity
I have found a great article which will prove there is no "negative seo" in cufon..
http://www.aerodesigns.co.uk/blog/negative-seo-effects-of-cufon/
Thanks..

Resources