I am trying to extract all text from a webpage, but without information in the side bar and all its children. I also don't want to have text in the script, style or head. For the styles and scripts the following works:
.xpath('//*[not(self::script or self::style or self::head)]/text()[normalize-space(.)]').extract()
For the side bar I've started the other way around, and I've managed to get only the sidebar information like that:
.xpath('//*/div[#class="sidebar section"]//text()[normalize-space(.)]').extract()
I've tried to combine it, but like this I still get the sidebar information, and other tries threw a syntax error:
.xpath('//*[not(self::script or self::style or self::head or div[#class="sidebar section"])]/text()[normalize-space(.)]').extract()
Any ideas how to combine these to things together so that it works?
Related
I'm trying to set the right xpath for using RSelenium, but I'm not very experienced in this area, so any help would be much appreciated.
Since I'm not allowed to post pictures yet I have tried to add a link to a screenshot of the html:
The html
I need R to scrape the dates (28-10-2020 - 13-11-2020), but so far I have not been able to set the correct xpath when using html.nodes.
I'm trying to scrape from sites like this one: https://www.boligsiden.dk/adresse/topperne-9-3-33-2620-albertslund-01650532___9__3__33
I usually do this on python rather than R
As you can see in this image when you right-click on the element concerned. You get a drop-down menu with an x-path to the element.
Other than that, the site orientation and x-path might change and a full x-path might be a good option in the short-run, so I rather prefer driver.find_element_by_xpath('//button[contains(text(),"Login")]')\ .click()
In your case which would be find_element_by_xpath('//*[contains(#class, 'u-pb-4 u-block')]')
I hope this helps and it is mostly the same across different languages
I am trying to capture Text from UI. However in inspect element tool the text is missing. Text is coming from some other div. When I indicate elements from UIPath it says validated but inside the element the text is not there. It's somewhere in other div. I tried editing CSS shown below, which is the exact location of the text. However the data is blank.
Midified CSS:
<webctrl css-selector='div[class*=is-cherry]h1v' idx='2' isleaf='1' parentid='root' tag='DIV' /
The page is using React. So there is no proper way to simply read the text. Instead do it with OCR.
Use a Selector
The best way for you would be using a selector. Here it's a little bit tricky to find the proper selector as the React Framework tries to hide several elements by itself. But when you found the pattern you are good to go.
Find your value under the level root as you can see in the image:
So now you simply use the Get Text activity:
Make sure that you edit the selector in the following schema:
<webctrl tag='H1' parentclass='level-item is-blue fadeInUp' />
This is the selector for your blue value. Now if you want the red one take this one:
<webctrl tag='H1' parentclass='level-item is-cherry fadeInUp' />
I believe you got the point now how that page is working and how to selection of the different colours works.
OCR technology (make sure you are using the Profile Scan)
I would not recommend you to use OCR as you never know if the elements is switching its visual area. If so your process would fail.
The method of writing a code element in Asciidoc is by writing an element enclosed in the grave accent(`):
`var`
And, the method to show a link is:
link:www.awebistelink.com[var]
I am attaching an image to show these two on a website that renders Asciidoc
Image Displaying the output in an asciidoc document
When I am trying to show a link highlight of a code element inside an inline code by writing:
`link:www.awebistelink.com[var]`
It renders perfectly fine on Asciidoc Please see it here
But on the website, it doesn't show any link, and simply shows a code element like as if we declared it simply as
`var`
The correct way to make a link label appear in monospace is to apply the backticks to the label itself, not the link.
Using your example, the markup should be:
link:www.awebsitelink.com[`var`]
I found the answer,
it should be
`link:www.awebistelink.com[var]`
I still don't know how it worked but now it works just fine as intended.
Reading through GitHub's help website, I've come across a very neat-looking procedure list.
From GitHub's Create A Repo Tutorial
I have read through GitHub's Markdown Tutorial as well as the Markdown Cheatsheet, but have not found any way to perfectly replicate this type of format.
The closest I have done so far was to basically put a blockquote after a number list entry:
1. > line 1
>
> line 2
2. > line 3
>
> line 4
And it looks similar, but, well, see for yourself:
Is this the closest I can get to GitHub's format?
Is there a proper Markdown syntax that I have missed?
If not, is there a way to achieve this in raw HTML?
If you look at the source of that page you can see that this is just a normal ordered list containing paragraphs:
<ol>
<li><p>In the upper-right corner of any page, click...
So to produce that HTML with markdown simply use the normal ordered list syntax:
1. First paragraph here.
2. Another paragraph here.
The particular appearance of that page is done by styling with CSS. It looks a little involved, using the ::before pseudo-element. If you want to replicate something similar have a look at the page with your browsers inspector to see the the styles. Experiment until you understand how it works and can create the effect you want.
If you are looking to create this effect on your own Github pages (e.g. a README) then I think you are out of luck, since you can’t control the CSS.
My application under test has been developed by external suppliers so I have no control over the HTML structure. The application is extremely Javascript and Ajax heavy, with numerous dynamically generated buttons and auto-complete lists.
In other words, the characteristics of the pages are that they are filled with:
Elements with no fixed IDs (IDs are generated on the fly and have
numbers or other text dynamically added to them)
The same happens with some classes
Most of the times the buttons have no text associated with them since they are either custom coded 'down' arrows for lookup lists
(which aren't lookup lists but hidden divs) or '+' and '-' icons to
maximise or minimise portions of the content. -
It is therefore very difficult to identify these elements, especially the buttons.
I am trying to write a generic 'I click on the button near y' type of step so that it is not necessary to hardcode each and every button (assuming I can even get something to identify them with) into each and every test.
The thinking behind this is that normally there is a label of some sort close to the button at least.
What I want to to is to find the text label, then see if there is a button inside the same scope, and if there is not, move 'back' through the parent elements, and check if there is a button inside the scope of each parent level, up to 5 parents.
There might be all sorts of problems with this approach but I am just curious to see if this will work in general. I have run into some problems.
First I tried to use Xpaths, so I got the Xpath of the parent through :
$parentelement = $element->getParent();
$parentXpath->getXpath();
This would give me an Xpath of : (//html//span[text()='Cost center'])[1] and moving up through the parent elements all the time, they would become successively:
(//html//span[text()='Cost center'])[1]/..[1]
(//html//span[text()='Cost center'])[1]/..[1]/..[1]
and so forth.
The actual button is located in: (//html//span[text()='Cost center'])[1]/..[1]/..[1]//button but it has to go through all the parent elements in order to get there, so it will start with (//html//span[text()='Cost center'])[1]//button and should end with (//html//span[text()='Cost center'])[1]/..[1]/..[1]//button where it should find the button.
Trying to use Xpath I used:
$button_element = $session->getPage()->find('xpath',$parentXpath."//button")
I soon saw that the 'find' command appends an //html to the front of your xpath string so the Xpath that it tried to use ended up being (for each parent Xpath, but using this one as an example):
(//html(//html//span[text()='Cost center'])[1]/..[1])
I then stripped out the brackets as well as the //html, leaving me with:
//span[text()='Cost center'][1]/..[1]
but when I tried:
$button_element = $session->getPage()->find('xpath',$strippedParentXpath."//button")
I got the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '(//html//span[text()='Cost center'][1]/..[1]//button)[1]' is not a valid XPath expression
However, Firepath can execute this expression and does not show a syntax error for it, although it does not find the actual button (since the button is actually located one level up, where Firepath DOES find it).
So my question 1 is: What is wrong with my Xpath that I can't use it in the find? It actually looks as if //span[text()='Cost center'][1]//button does not throw the same exception, since as I said, I am looping through the parent Xpaths, and it starts with //span[text()='Cost center'][1]//button. It crashes on //span[text()='Cost center'][1]/..[1]//button.
My second option was to get the parent element each time, starting with finding the text on the page, but then to search for a button inside the scope of the parent element using the findbutton functionality.
Looping through the parent elements (up to a maximum of 5):
$parentelement = $parentelement->getParent();
$butonelement = $parentelement->findbutton('xxx');
In other words, find ANY button in the scope of the parent element. The problem I have is how to specify a generic 'button'.
One has to associate SOME text with the button (depicted by the 'xxx' above).
But this is a typical example of buttons in the application:
<button class="autocomplete_button" type="button" id="button_OM_1"> </button>
Where the class is used more than once, and the ID is auto-generated and not the same number all the time. There is no text associated with the button since the class specifies an image.
Question 2: So how can I use 'findbutton' to generically find a 'button' no specific distinguishing characteristics? Please note that I actually did try findbutton("button"), taking the chance that there might be a 'button' somewhere in a button, but this did not work either. At least, it doesn't work consistently and by that I mean that the same test randomly seems to either find or not find the same button when I run the test a couple of times.
After doing some more investigation on this issue I have found the following:
My method of trying to find the closest button to a piece of text via traversing 'up' through the scope of the divs and spans around the text (using xpath) is actually working.
What is NOT working is SAHI, which I am using as the web driver. In other words, it is not a Behat/Mink problem, it is SAHI specific issue.
I tried the same code using Selenium2 and it executes perfectly.
I still require an answer to question 2 - how can I use findbutton() without a specific parameter such as the ID, name or value but I will see if I can find an answer to that question separately and on the Behat user group since I do think that is a Behat/Mink specific issue.
I normaly use css selector and with that, I use to navigate to the class and ID's that the button is inside. it is easier than xpath I think, like you can use
$this->getSession ()->getPage ()->find ( 'css', '.parrent1 .parrent2 .autocomplete_button ' );
I think this will help you as you know which button your gonna use in each scenario