Easiest Way to Find XPath Of An Element [closed] - xpath

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am learning Scrapy and wondering is there some existing tool - probably Chrome Maybe Web Developer or Firefox plug-in to quickly get the XPath of a web element. Or the best way to go is learn Xpath and build the Xpath yourself from scratch.

For Chrome...
There are plugins such as XPath Helper that can produce an XPath to a given element on an HTML page. You can also right-click on an element in a page and pull up its position in the Elements tab. From there, you can right-click and select Copy XPath.
And to really learn XPath, I'd recommend directly writing your own from scratch. You can select nodes directly from the console by using $x(). For example, here's how to select the search form on this page:
> $x("//form[#id='search']")
[<form id=​"search" action=​"/​search" method=​"get" autocomplete=​"on">​…​</form>​]
Note that the form element will be expandable interactively in the console.
Here's how to select all of the text nodes on this page that contain the word Thanks:
> $x("//text()[contains(.,'Thanks')]")
["Thanks a lot!", "Thanks for contributing an answer to Stack Overflow!"]
Note that you'll get more matches than I originally did if you try it on this page. Strange loop.
Here's how to select the number of votes this answer has received:
> $x("//div[#id='answer-18839594']//span[#class='vote-count-post ']/text()")
["0"]
Note an unfortunate robustness issue where vote-count-post must include a trailing space to mirror the current source. Note also the unfortunately low value returned by that XPath. ;-)

There is no such thing as "the XPath of an element". There are a variety of paths you might be interested in. The shortest machine-executable path is probably along the lines *[3]/*[1]/*[2]. The most readable path is something like chap[3]/section[1]/para[2]; but this may be dependent on the namespace context. For a context-free path you might want *[local-name()='chap' and namespace-uri()='...'][1]/*[local-name()='section' and namespace-uri()='...'][3]. But sometimes when people ask for "the path", they just want chap/section/para, that is, a path that selects many elements including the target element. But for some purposes, the most usable XPath expression might be id('Intro').

Related

XPath selector returning an empty list instead of targeted value [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am trying to scrape some data from this table: https://sofifa.com/ but I ran into a problem when trying to extract information from the Value column. I've used Mozilla dev-tools to get the XPath selectors which worked fine for Names and Overall ratings, but in the case of Value, using the browser-generated XPath only returns an empty list. I'm using Scrapy.
In [85]: value = response.xpath('/html/body/div[1]/div/div/div[1]/table/tbody/tr[1]/td[13]').extract()
In [86]: value
Out[86]: []
What can I try next?
If you take a look at page source you will find out player's values are under the data-col="vl" so you can extract it with XPath:
response.xpath('//*[#data-col="vl"]/text()').extract()
Which give you all values in the table.
In order to crawl a page you better not using the XPath which inspect element gives you but use page source and try to find appropriate XPath statement by elements data and test it in scrapy shell.

Do Google SEO Content Keywords matter and should I remove sidebar content?- [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I run a Magento based store website.
At the side of every product page we have delivery information.
Because of this, Google Webmaster tools picks up words such as 'delivery' 'orders' 'returns' as significant keywords - rather than more relevant 'industry specific' keywords.
Does it matter that he gives 'delivery' a higher significant rating?
Should I remove the delivery info from the side of each page?
Or is there a way to disavow keywords to tell Google that 'delivery' isn't relevant?
Or maybe turn the text info at the side into a graphic instead?
Many thanks!
Before SEO, you should always consider what is best for your user. If displaying shipping information in the sidebar is going to enhance the user's experience, leave it. If the information could be put on a page and a link can be added the sidebar, do that.
Having said that, I wouldn't worry about it. Unless you're trying to rank for the keywords 'return' or 'delivery', you're not likely to notice any sort of algorithm penalty that comes from having the words appear all over the website.
Furthermore, a keyword stuffing penalty is applied to each page individually. You should be careful with stuffing your keywords in tags on the side, as it will increase the keyword density.

A generic algorithm for extracting product data from web pages [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Preface: this might seem to be a very beginner-level question maybe stupid or ill-formulated. That's why I don't require a determined answer, but just a hint, a point, which I can start with.
I am thinking of script, which would allow me to parse product pages of different online retailers, such as Amazon, for instance. The following information is to be extracted from the product page:
product image
price
availability (in stock/out of stock)
The key point in the algorithm is that, once implemented, it should work for any retailer, for any product page. So it is pretty universal.
What techniques would allow implementation of such an algorithm? Is it even possible to write such a universal parser?
If the information on the product page is marked up in a structured, machine-readable way, e.g. using schema.org microdata, then you can just parse the page HTML into a DOM tree, traverse the tree to locate the microdata elements, and extract the data you want from them.
Unfortunately, many sites still don't use such structured data markup — they just present the information in a human-readable form, with no consideration given for machine parsing. In such cases, you'll need to customize your data extraction code for each site, so that it knows where the information you want is located on the page. Parsing the HTML and then working with the DOM is still often a good first step, but the rest will have to be site-specific (and may need to be updated whenever the site changes its design).
Of course, you can also try to come up with heuristic methods for locating relevant data, like, say, assuming that a number following a $ sign is probably a price. Of course, such methods are also likely to occasionally produce incorrect matches (like, say, mistaking the "$10" in "Order now and save $10!" for a price). You can adjust and refine your heuristics to be smarter about such things, but no matter how good you get at it, there will always be some new and unexpected cases that you haven't anticipated.

How can I index my source code? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Are there any tools out there that will index source code, client side, and provide blazing fast search results?
How can I index our internal source code? is related but covers server side tools.
Everything and Locate32 are nice indexing-tools on the windows platform. Just one problem, they only index the file-names.
DocFetcher is another solution, it tries to index the content of the files, but have big memory issues as it cannot index the content of bigger files, and just skips them
I'm also on the search for something to index my data, and i want some tool like locate32 wich is supernice to integrate with the windows shell, but it would be nice to get it to index the content of files also, only brute word indexing, no magic to be done to the data, but let me do plain wildcard searches, like words starting with, ending with, and containing.
But the search is still on.. (for an app, that is..)
Install ctags.
Then ctags -R in the root of your source tree. Many editors, including Vim, can use the resulting tags file to give near-instant search results.
I know this is an old question, but maybe this will help someone else.
Take a look at CodeIDX: http://sourceforge.net/projects/codeidx/.
Using CodeIDX you can index multiple directories using filetype filters and search the created index.
You can open multiple searches at the same time and the results can be viewed in a preview.
Using GNU Global you can get browsable, searchable source code. You can run this locally too or use all the tools that go with it (like less to go straight to a function definition).
See http://www.tamacom.com/tour/kernel/linux/ for an example of the Linux Kernel.

Where can I find a good template for a software application user guide? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Looking for links to resources that will help me write a user guide for a software application. I'm after something that will help me to structure my guide, give it an appropriate writing style, and ensure that it has an excellent look and feel throughout.
This link has some strong points. Each point is presented with clear speech and with inline justifications.
When writing procedures, use the
active voice (e.g. Click this) and
address users directly (write "you"
rather than "the user").
When explaining an action, use the
"command" form of the verb:
"Choose an option from the menu and
press [ENTER]."
http://www.klariti.com/technical-writing/User-Guides-Tutorial.shtml
Here is the complete list of the topics covered on the aforementioned article:
Front Page (cover pages)
Cover and Title Page
Disclaimer
Preface
Contents
Body of the guide
Procedures
Writing procedures
Chunking text
Number your steps
Using the If-Then Approach
Reference Materials
Back Matter
Glossary
Index
Establishing Standards
Document Format
Structure Style
Technical Language
Addressing the User
Presenting your material
Special Requirements
For structure and look+feel, consider using a framework such as DocBook.
DocBook uses an XML markup schema that makes you think about how your document should be arranged. There are XSL transformations to convert it to common formats like HTML and PDF with a whole load of config options to make it look the way you want. And it's open-source (free). There are downsides of course: the schema's pretty big, and editing can be hard work without a good XML editor.
Examples: http://wiki.docbook.org/topic/WhoUsesDocBook

Resources