I have the following xpaths:
//*[#id="content"]/div[2]/div[2]//following-sibling::p[1]//strong[2]
//*[#id="content"]/div[2]/div[2]/p[10]/span/strong
//*[#id="content"]/div[2]/div[2]/p[9]/span/strong
Is there is a way to combine them into one xpath?
I can use "|" so it would be:
//*[#id="content"]/div[2]/div[2]//following-sibling::p[1]//strong[2]
|
//*[#id="content"]/div[2]/div[2]/p[10]/span/strong
|
//*[#id="content"]/div[2]/div[2]/p[9]/span/strong
But it's a bit long for me, are there any other ways in my case?
Related
not sure how to iterate between 2 sets of data on the same column, so lets say i have a CSV file with all titanic passangers and i want to extract the people between 20 and 29 years old and from 40 to 49 years old, and people who spoke english AND other lenguage lets say french, since both data are in the same column is quite challenging.
egrep does not seem to have a AND only and or so im struggling to find how to do it
so what i was trying was something like (from a coma separated csv)
3rd columns is Age and 8th is lenguage
(despite that i know that it might be easier solutions with some sed/awk etc i need it for training porposes in egrep)
egrep "^.*,.*,[2-0][0-9],.*,.*,[eng.*]" titanic-passengers.csv
thanks in advance.
You should use [^,]* to match a single column. .* will match across multiple columns.
To match 20-29 use 2[0-9]; to match 40-49 use 4[0-9]. You can then combine them with [24][0-9].
You don't need to put [] around the language, that's for matching a single character that's any of the characters in the brackets.
grep -E '^[^,]*,[^,]*,[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,eng' titanic-passengers.csv
maybe this one?
grep -E '^[^,]*,[^,*],[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,[^,]*( english|english )[^,]*' titanic-passengers.csv
#Barmar explained well the other patterns so I'll explain the "language" part.
To be sure to match at least one more language than english, you need to force a space before or after the word english. The OR operator is expressed by (pattern1|pattern2)
I have a few Xpaths as below:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p
//*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="2555ab30-bb84-11ea-9e8b-277e7f6208b2"]/div/div/div[1]/div/p
//*[#id="7e100250-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
All of the above are used to extract text from a single web page since text is located at different view--ports, but I wish to find a single xpath to extract text for all of them. Is it possible to use 'and' and multiple ID's to extract all of it through one xpath?
Any other suggestions would be appreciate.
You can use the or operator for the last four.
And the merge-nodes operator | to add the first one.
So to select all 5 expression in one, use the following expression:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p | //*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2" or #id="2555ab30-bb84-11ea-9e8b-277e7f6208b2" or #id="7e100250-a71d-11ea-b994-53a3e91a35c2" or #id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
A shorter and more generic solution could be :
(//div/div/div[1]/div/p|//div/p)[parent::*[string-length(#id)=36 and substring(#id,24,1)="-"]]
First part with () is used to specify the end of the path. Since #id attributes have the same length, we use it inside the predicate. We also verify the presence of a - at a specific position with substring.
Can't get any idea to cement the two expressions into one xpath. They both fall under the same class "pagination". I need to use that in a loop. I tried separately like this:
//div[#class='pagination']//a/#href
//div[#class='pagination']//a[contains(#class,'next')]/#href
Elements for the expression:
<div class="pagination"><p><span>Showing</span>1-30
of 483<span>results</span></p><ul><li><span class="disabled">1</span></li><li>2</li><li>3</li><li>4</li><li>5</li><li>Next</li></ul></div>
There are many ways you can combine two XPath expressions. One way, for example, is to use the union operator "|". Whether that's the right operator to use depends on what you want to achieve. Unfortunately you forget to tell us what you want to achieve, so that might not be the right operator for your purposes.
I'm looking to select a collection of elements based on an array of ID names. I'm currently using a giant OR statement essentially:
//*[#id='apple']|//*[#id='orange']|//*[#id='banana']
But building that string manually seems messy. Is there something like a nice SQL-esque "WHERE IN [a,b,c]" operator that I could be using?
I am using the HTTPAgilityPack for ASP.Net which I think equates to XPath1.o (feel free to correct me on that.)
Thanks.
First, you could simplify this by using or. This avoids repeating the //* multiple times although you till specify the #id= part multiple times:
//*[#id='apple' or #id='orange' or #id='banana']
A more elegant solution is to check against a list of acceptable ids. Now if you're using XPath 1.x then you'll have to do a bit of gymnastics to get contains() to do your bidding. Specifically, notice that I've got spaces on both ends of the first string, and then concatenate spaces to each end of #id before looking for a match. This is to prevent an #id of "range" from matching, for example.
//*[contains(' apple orange banana ', concat(' ', #id, ' '))]
If you have are using XPath 2.0 then the way forward is simpler thanks to the addition of sequences to the language:
//*[exists(index-of(('apple', 'orange', 'banana'), #id))]
Use:
//*[contains('|apple|banana|orange|', concat('|',#id, '|'))]
In case some of the id attributes may contain the "|" character, use another instead, that is known not to be present in the value of any of the id attributes.
An XPath 2.0 solution:
//*[#id=('apple', 'orange', 'banana')]
Using apricot, it is pretty easy to see how I can extract all elements with a given id or class using a CSS Selector. Is it possible to extract elements from a document based on whether some attribute of those elements matches against some regular expression?
If you mean do something like:
doc.search("//div[#id=/regex/]")
then I don't think it can be done. The alternative is to find all elements and then iterate through the results deleting those that don't match a regex.
result = doc.search("//div")
result.delete_if (|x| x.to_s !~ /regex/)
There are lots of alternative approaches. This thread has two other suggestions: Hpricot and Regular Expression.
Note, depending on exactly what it is you are trying to match you may be able to use the "Supported, but different" syntaxes available on the Hpricot Wiki, e.g:
E[#foo$=“bar”]
Matches an E element whose “foo”
attribute value ends exactly with the
string “bar”