Xpath for Gmail encountering issue - xpath

I'm investigating automation task relating to Gmail using non-US language.
I've 2 parent labels ('Cá nhân', Cá thể) and 3 sub-labels, they're on the left panel of Gmail:
Cá nhân
Private
Remotekit
Cá thể
Cá tính
Installing Xpath Helper extension in Chrome Browser. Then, open it to identify the 'Cá nhân' location
//div[#class='aio aip']//a[contains(string(),'C')]
--> Result returns 4: 'Cá nhân,Cá thể,Cá tính,Cơ quan'
Then, I type full:
//div[#class='aio aip']//a[contains(string(),'Cá nhân')]
--> Result return: Null
But, if I type //div[#class='aio aip']//a[contains(string(),'Ca')]
--> Result return: 'Cá nhân'
Why does have difference result between short name 'Ca' and full name 'Cá nhân' -- is this UTF8 encoding ?
One strange, if //div[#class='aio aip']//a[contains(string(),'Cá thể')], it can find exactly 'Cá thể' label location
Please help me figure out the issue.

Related

Issue with xsl-fo :footnote when generating pdf/ua-1 document with fop.: "tagged PDF note id is missing"

I have an issue with <fo:footnote> when generating pdf/ua-1 document with fop.
The resulting pdf displays correctly the footnote in the page but don’t pass the pdf-ua validation. A severe error on pdf tag Note “id is missing” is raised so the document is not conformed. I'm using PAC3 for the conformance test.
In the example below I have extracted the basic <fo:footnote> element which has a unique id.
How can I generate the missing Id attribute in the pdf tagged Note element?
Here is the xsl-fo really simple footnote. Note that I used an id to reference the footnote.
some text...
<fo:footnote id="FNE0001">
<fo:inline font-size="6pt" baseline-shift="super">E0001</fo:inline>
<fo:footnote-body>
<fo:block>
<fo:inline>E0001</fo:inline><fo:inline > JO L 139 du 29.5.2002, p. 9.</fo:inline>
</fo:block>
</fo:footnote-body>
</fo:footnote> some text...
Apache FOP has been set to generate pdf-ua through the conf file as follow:
<renderers>
<renderer mime="application/pdf">
<!-- Before setting the pdf-ua-mode, we must insert metadata Title in FO declaration -->
<pdf-ua-mode>PDF/UA-1</pdf-ua-mode>
....
PAC3 is checking against the failure conditions in the Matterhorn Protocol; latest edition at https://www.pdfa.org/resource/the-matterhorn-protocol/.
PAC3 is probably reporting on failure condition 19-003, ID entry of the tag is not present.
fo:footnote-body can have an id property. You could try adding an ID to that. In the fo:inline for the footnote marker, you might also need to add an fo:basic-link that refers to that ID.
FWIW, your footnote does not cause that error when checking PDF/UA generated by AH Formatter, and I'm only guessing at what FOP would be doing differently. (PAC3 and I generally disagree when it complains about a footnote being a possibly incorrect use of a Note tag, but that's another story: PAC3 tries to automate checking the conditions that should be checked by a human, and it doesn't always get it right.)

How to properly scraping filtered content using XPath Query to Google Sheet?

So, this is about a content from a website which I want to get and put it in my Google Sheets, but I'm having difficulty understanding the class of the content.
target link: https://www.cnbc.com/quotes/?symbol=XAU=
This number is what I want to get from. Picture 1: The part which i want to scrape
And this is what the code looks like in inspector. Picture 2: The code shown in inspector
The target is inside a span attribute but the span attribute looks very difficult to me, so I tried to simplify it using this line of code here =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span")
Picture 3: List is shown when putting the code
After some tries, I am able to get the right target, but it confuse me, Im using this code =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span[#class='last original'][1]")
Picture 4: The right target is shown when the xpath query is more specified
As what you can see in 2nd Picture, 'last original' is not really the full name of the class, when I put the 'last original ng-binding' instead it gave me an error saying imported content is empty
So, correct me if my code is wrong, or accidental worked out somehow because there's another correct way?
How about this answer?
Modified formula 1:
When the name of class is last original and last original ng-binding, how about the following xpath and formula?
=IMPORTXML(A1,"//span[contains(#class,'last original')][1]")
In this case, the URL of https://www.cnbc.com/quotes/?symbol=XAU= is put in the cell "A1".
In this case, //span[contains(#class,'last original')][1] is used as the xpath. The value of span that the name of class includes last original is retrieved. So last original and last original ng-binding can be used.
Modified formula2:
As other xpath, how about the following xpath and formula?
=IMPORTXML(A1,"//meta[#itemprop='price']/#content")
It seems that the value is included in the metadata. So this sample retrieves the value from the metadata.
Reference:
IMPORTXML
To complete #Tanaike's answer, two alternatives :
=IMPORTXML(B2;"//span[#class='year high']")
"Year high" seems always equal to the current stock index value.
Or, with value retrieved from the script element :
=IMPORTXML(B2;"substring-before(substring-after(//script[contains(.,'modApi')],'""last\"":\""'),'\')")
Note : since I'm based in Europe, you need to replace ; with , in the formulas.

iMacros: URL GOTO={{!COLn}} Error

I have an eight-column csv ('UTF-8, with BOM' formatted), where the eighth column is a URL. I have the following code that's simple enough and supposed to take me to a different URL pending the row:
VERSION BUILD=123456 RECORDER=FX
SET !DATASOURCE /Users/Ryan/iMacros/test.csv
SET !DATASOURCE_COLUMNS 8
SET !LOOP 2
SET !DATASOURCE_LINE {{!LOOP}}
TAB T=1
URL GOTO={{!COL8}}
There's more to it than that, but I essentially just need the URL in column 8 to populate before the rest of the code. Anybody know why I'm getting a crazy error message?
Error loading page http://api.mybrowserbar.com/cgi/searchp ...
google.com/, line 7 (Error code: -933)
Seems like the "https://www.google.com/" at the end of the error message is just whichever website I was currently on while trying to run the macro; the test URL currently in column 8, row 2 is http://www.facebook.com, which appears nowhere in the error. And "Error code: -933" means "Network error while file or page loading." Any ideas? I'm using the newest version of iMacros and Firefox on Mac OS X.
Well, that was simple. Instead of loading your URLs in the csv as http://www.google.com, for example, just load them in as ="http://www.google.com" and it works great.

JMeter issues with span tag in Response Assertion

I added a Response Assertion to my test to hit the home page of our local site. I added this to the "Patterns to Test" in a Response Assertion:
Email
This worked. ( To get that label, I did View Source in Firefox and copied the code including all white space. I then clicked "Add" for the Response Assertion and pasted the copied code directly into JMeter this way. ) When I run my test, my test will pass with just this label as a Pattern to Test. It shows no red errors after running it in JMeter.
However, when I add the following span tag by clicking on "Add" to get a new entry in the same Response Assertion, the test will fail.
1.7.0.147
So, to be clear, I had 2 entries for the same Response Assertion...one for the "Email" label and one for the "footerVer" span. Each of these had their own separate line under the same Response Assertion.
Also, for most tests that passed and did not pass, I had "Main Sample only", "Text Response", and "Contains" selected. I did try to change to "Matches" and "Equals" but I just ended up with different errors. So, I wanted to stay on "Contains" for now since my other entry for the "Email" label worked when I had "Contains" selected.
Under the "View Results Tree", JMeter tells me about this failure when I add the span tag:
Assertion error: false
Assertion failure: true
Assertion failure message: Test failed: text expected to contain /
1.7.0.147
/
I also have had success with other tags like , , , , etc. along the way.
Only the tag seems to be giving me a problem right now. Any ideas?
===============================
Added config:
I am not able to add the full response since it is not my code, but the company's code. But, I can try to get something on here that me be useful in a different way.
This is the response dealing with the version copied verbatim from the response tab within JMeter:
<span class="footerVer">
1.7.0.147
</span>
Hope that helps
I would suggest using XPath assertions for multiline HTML entities parsing as page source may vary and it can be a headache to deal with flaky HTML code.
Following XPath expression validates whether inner text of span with footerVer class equals 1.7.0.147
//span[#class='footerVer']/text()='1.7.0.147'
Use Substring instead of Contains for Pattern Matching rules:
http://jmeter.apache.org/usermanual/component_reference.html#Response_Assertion
So, I found one way around this. Although, I do not think this is the most efficient way to verify the test. I split the span into 3 individual lines in the Response Assertion.
<span class="copyright marginLeft_100">
© Copyright 2002-2013 Turning Technologies, LLC. All Rights Reserved.
</span>
==========================
I do not really mind the first 2 lines. But, the third line is so generic it really does no good if not combined with the beginning tag
Well, for now, I can at least confirm something. Also, I left it on "Contains", even though I took a look at the other link posted above, because all of my other tags presented no problem when it was on "Contains". Hope this helps someone else also.

HtmlUnit getByXpath returns null

I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[#id="gmi-ResViewSizer_img"]
How do I handle that?
First Answer:
/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a
Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.
Maybe something like this:
/html/body//div/h1/a
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".
There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:
/html/body/div/div/div/div/img[1]
I had the same problem, I solved when I realize iframe tags on page, try call
((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(...
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.

Resources