XPath: Forms with Unique IDs - xpath

I am trying to use XPath as part of a data scraper in order to scrape random comments from reddit for a project. The problem is, the comment forms have unique IDs that change on every page and within comment indent levels. I'm not sure how to make XPath target all of the comment fields with these different IDs.
An example is shown below:
//form[#id='form-t1_cj8cyupxa3']/div
//form[#id='form-t1_cj8e0iyx6w']/div

If there is some pattern to the id then try e.g. //form[starts-with(#id, 'form-')]/div

Related

Does NetSuite load custom sublists when a record is being printed through freemarker / advanced pdf templates?

I am a NetSuite newbie, so I apologize in advance if this has a simple answer that I do not see.
We have a custom record type that among other things, has a field indicating the list/record (invoice or credit memo) that a particular custom record is related to. Using this, I have a saved search sublist on the invoice record.
When this invoice is being printed / emailed using an advanced template, I can't seem to be able to find this sublist using freemarker syntax (I can access the normal item sublist just fine). I have also tried to make this custom record type a child / parent relationship with the invoice, but i get the same result.
Based on this SuiteAnswer and the NS help article on freemarker, it looks like it is possible.
When I load an invoice record in the UI and append &xml=t to the URL, I can see the normal items sublist, but I do not see any other sublist available.
Is there something I need to do to make a custom sublist exposed to the template engine?
You'll need to find the internal id of the sublist. Right click on the title of the sublist and select 'Inspect Element'. You'll notice a repeated string in the html similar to customsublist1.
Once you have that you should be able to access the line items using normal Freemarker sequence syntax.
<#list record.customsublist1 as item>
${item.field1}
<#/list>

import.io : Inserting an Independent row into result using XPATH

I am trying to scrape this site using import.io: ScoreCard
I am able to get the batting scores successfully but I want to insert additional column in the end which can tell me about the innings. So it should be relative to the name of batsman.
I tried to use XPATH: //*[#id="innings_1"]/div[1]/div/h4/b
but that will always return First Inning as ID is "innings_1".
Other IDs are innings_2/3/4 etc. Is there any way in XPATH where I can get this element relative to Batsman column?
Here is what I did in order to get the desired result:
I used following XPATH value.
.//a/ancestor::div/div[1]/div/h4/b
.//a was providing me name of Batsmen. I searched for its ancestors and the path div[1]/div/h4/b was being used by only Innings section.. So it did the trick :)
Try using starts-with():
//*[starts-with(#id,'innings_')/div/div/h4/b

Extract href in table with importxml in Google spreadsheet

I am trying to pull the href for each row of each table from this website:
http://www.epa.gov/region4/superfund/sites/sites.html#KY
I can pull the table information off using =IMPORTHTML(A1,"table",1) for all 7 tables, but I need the href to the site with the detailed information.
Using =IMPORTxml(A1,"//div[#class='box']") I can pull the information needed from a site like:
http://www.epa.gov/region4/superfund/sites/fedfacs/alarmyaplal.html
but I need to extract the fedfacs/alarmyaplal.html portion for each row on the original page.
I've tried using //#href, but it is not returning any results. I'm thinking it is because the data is structured in a table but I'm stuck on where to go from here.
I'm not sure about any of the Google Spreadsheet functionality, but here's an XPath to select all href attributes of the Kentucky sites (since your first link included the 'ky' anchor):
//body//a[#id='ky']/following-sibling::table[1]/tbody/tr/td[1]/strong/a/#href
This is very specific to the Kentucky table: following-sibling::table[1] means the first table node after, and at the same level of, a[#id='ky'].

better selenium xpath is expecting

I'm trying to create xpath expression which will work with selenium using following html snippet.
Below is table contains various row that gets incremented with uniquely generatedid(for example in following snippet that id is 1000).
Selenium has created following expressions when row of id 1000 was added in table. However instead of using id, I want to create xpath by using 3rd data element in row which is (MyName) in html snippet.
A possible suggestion is to not use xpath whenever possible.
http://saucelabs.com/blog/index.php/2011/05/why-css-locators-are-the-way-to-go-vs-xpath/
You need to convert the places in the XPATH where it is referring to the row by its ID to its relative position in the table.
In all of your XPATHs, you would change tr[#id='1000'] to tr[3]
Your first example XPATH would look liek this:
//tr[3]/td[1]/a[1]/img //tr[#id='1000']/td[1]/span/a/img
Your second example would follow similarly:
//tr[3]/td[1]/span/a/img
As would your third:
//tr[3]/td[1]/a[2]/img
Hopefully you are now able change the rest of them.

How do I return multiple columns of data using ImportXML in Google Spreadsheets?

I'm using ImportXML in a Google Spreadsheet to access the user_timeline method in the Twitter API. I'd like to extract the created_at and text fields from the response and create a two-column display of the results.
Currently I'm doing this by calling the API twice, with
=ImportXML("http://twitter.com/status/user_timeline/matthewsim.xml?count=200","/statuses/status/created_at")
in the cell at the top of one column, and
=ImportXML("http://twitter.com/status/user_timeline/matthewsim.xml?count=200","/statuses/status/text")
in another.
Is there a way for me to create this display with a single call?
ImportXML supports using the xpath | separator to include as many queries as you like.
=ImportXML("http://url"; "//#author | //#catalogid| //#publisherid")
However it does not expand the results into multiple columns. You get a single column of repeating triplets (or however many attributes you've selected) as shown below in column A.
The following is deprecated
2015.06.16: continue is not available in "the new Google Sheets" (see: The Google Documentation for continue).
However you don't need to use the automatically inserted CONTINUE() function to place your results.
=CONTINUE($A$2, (ROW()-ROW($A$2)+1)*$A$1-B$1, 1)
Placed in B2 that should cleanly fill down and right to give you sane column data.
ImportXML is in A2.
A3 and below are how the CONTINUE() functions are automatically filled in.
A1 is the number of attributes.
B1:D1 are the attribute index for their columns.
Another way to convert the rows of =CONTINUE() into columns is to use transpose():
=transpose(importxml("http://url","//a | //b | //c"))
Just concatenate your queries with "|"
=ImportXML("http://twitter.com/status/user_timeline/matthewsim.xml?count=200","/statuses/status/created_at | /statuses/status/text")
I posed this question to the Google Support Forum and this is was a solution that worked for me:
=ArrayFormula(QUERY(QUERY(IFERROR(IF({1,1,0},IF({1,0,0},INT((ROW(A:A)-1)/2),MOD(ROW(A:A)-1,2)),IMPORTXML("http://example.com","//td/a | //td/a/#href"))),"select min(Col3) where Col3 <> '' group by Col1 pivot Col2",0),"offset 1",0))
Replace the contents of IMPORTXML with your data and query and see if that works for you. I
Apparently, this attempts to invoke the IMPORTXML function only once. It's a solution for now, at least.
Here's the full thread.
This is the best solution (NOT MINE) posted in the comments below. To be honest, I'm not sure how it works. Perhaps #Pandora, the original poster, could provide an explanation.
=ArrayFormula(iferror(hlookup(1,{1;ARRAY},(row(A:A)+1)*2-transpose(sort(row(A1:A2)+0,1,0)))))
This is a very ugly solution and doesn't even explain how it works. At least I couldn't get it to work due to multiple errors, like i.e. to much parameters for IF (because an array is used). A shorter solution can be found here =ArrayFormula(iferror(hlookup(1,{1;ARRAY},(row(A:A)+1)*2-transpose(sort(row(A1:A2)+0,1,0))))) "ARRAY" can be replaced with IMPORTXML-Function. This function can be used for as much XPATHS one wants. – Pandora Mar 7 '19 at 15:51
In particular, it would be good to know how to modify the formula to accommodate more columns.

Resources