XPath Selector Join Elements - xpath

I need to use XPath to select and return the values of elements "Longitude" and "Latitude" in the below XML.
<LocationsList>
<Locations>
<Id>31dbwlph0yi--1</Id>
<Latitude>45.352304</Latitude>
<Longitude>-75.724945</Longitude>
</Locations>
<Locations>
<Id>600001-0-0--1</Id>
<Latitude>45.33142</Latitude>
<Longitude>-79.96399</Longitude>
</Locations>
</LocationsList>
Here's what I need my XPath to return (where one list item returns both Latitude and Longitude):
1. 45.352304 -75.724945
2. 45.33142 -79.96399
Unfortunately, I'm having trouble. Here's what I've tried...
XPath LocationsList/Locations returns:
1. 31dbwlph0yi--1 45.352304 -75.724945
2. 600001-0-0--1 HWY 400 45.33142 -79.96399
XPath LocationsList/Locations/(Latitude | Longitude) returns:
1. 45.35304
2. -75.724945
3. 45.33142
4. -79.96399
XPath LocationsList/Locations/concat(Latitude, " ", Longitude) returns a string:
Non-standard output:
45.352304 -75.724945
45.33142 -79.96399

Prior to XPath 3.1, there's no array data type in the data model. The best you could do is return a sequence, but NB there's no such thing as a sequence of sequences; a sequence is necessarily a flat list. Let's see how this works with your sample data:
<LocationsList>
<Locations>
<Id>31dbwlph0yi--1</Id>
<Latitude>45.352304</Latitude>
<Longitude>-75.724945</Longitude>
</Locations>
<Locations>
<Id>600001-0-0--1</Id>
<Latitude>45.33142</Latitude>
<Longitude>-79.96399</Longitude>
</Locations>
</LocationsList>
Take this one of your expressions:
/LocationsList/Locations/concat(Latitude, " ", Longitude)
It produces a sequence of 2 strings which each contain a latitude and a longitude value, separated by a space:
("45.352304 -75.724945", "45.33142 -79.96399")
Or you could use an expression like this to select a sequence of either Latitude or Longitude elements and convert them to numbers:
LocationsList/Locations/(Latitude | Longitude)/number(.)
That returns a sequence of 4 numbers, which are alternately latitude and longitude values
(45.352304,-75.724945,45.33142,-79.96399)
Finally, in XPath 3.1 you have support for an array datatype, so you can iterate over the sequence of Locations and construct an array from each one, and return a sequence of arrays:
for $location in
/LocationsList/Locations
return
[number($location/Latitude), number($location/Longitude)]
The result is this sequence of arrays, each containing 2 numbers:
([45.352304,-75.724945], [45.33142,-79.96399])
Or if you want to get fancy you could return an array of arrays:
fold-left(
for $location in
/LocationsList/Locations
return
[number($location/Latitude), number($location/Longitude)],
[],
function($array, $location) {
Q{http://www.w3.org/2005/xpath-functions/array}append($array, $location)
}
)
... giving:
[[45.352304,-75.724945],[45.33142,-79.96399]]

Related

Modify query function so it can work as an arrayformula in Google Sheets

How do I modify this equation so I can use it with an array function instead of dragging it down.
SUBSTITUTE(JOIN(", ", UNIQUE(QUERY(A:D,"SELECT B WHERE C = '"&G2&"'"))), ", , ", "")
Explanation of the equation:
Have a function is used to extract and concatenate unique values from column B of a sheet named A:D, where the values in column C match a specific criteria. The function is made up of several parts:
It uses the QUERY function to extract all values from column B of sheet A:D where the values in column C match the specific criteria in G.
UNIQUE removes any duplicate values from previous step.
JOIN to concatenate into a single string separated by a comma to returns a string of unique values that match the criteria
SUBSTITUTE to replace occurrences of ", , " with an empty string.
can you try:
=BYROW(G2:G,LAMBDA(gx,IF(gx="",,TEXTJOIN(", ",1,IFNA(UNIQUE(FILTER(B:B,C:C=gx)))))))

Select all elements that end with a given string

Is it possible to select elements of xml tree that end with a given string? Not the elements that contain an attribute that ends with a string, but the elements themselves?
As mentioned in my comment, you can use the XPath-2.0 function ends-with to solve this. Its signature is
ends-with
fn:ends-with($arg1 as xs:string?, $arg2 as xs:string?) as xs:boolean
fn:ends-with( $arg1 as xs:string?,
$arg2 as xs:string?,
$collation as xs:string) as xs:boolean
Summary:
Returns an xs:boolean indicating whether or not the value of $arg1 ends with a sequence of collation units that provides a minimal match to the collation units of $arg2 according to the collation that is used.
So you can use the following expression to
select elements of xml tree that end with a given string
document-wide
//*[ends-with(.,'given string')]
To realize this in Xpath-1.0, refer to this SO answer.
For example, to select all elements that end with "es", you can search for all the elements whose name contains the substring "es" starting at the position corresponding to the length of the name minus 1:
//*[substring(name(),string-length(name())-1,2) = "es"]

RethinkDB: matching a substring in a list of strings

Thanks to the answer here, I manged to get all the rows that contain a given string as a substring of a specific field's value by:
r.db('my_db').table('my_table').filter(lambda row: row['some_key'].match(".\*some_given_string.\*"))
What if I want to have a similar result, but this time, "some_key" is a list of strings instead of a single string? Say for the following table:
[{"name": "row1", "some_key": ["str1", "str2"]}, {"name": "row2", "some_key": ["str3", "blah"]}, {"name": "row3", "some_key": ["blah", "blahblah"]}]
I want to look for ".*tr.*" and get the first two rows only because the last one has a list under "some_key" that doesn't contain "tr" in none of its strings.
How can I do that with rethinkdb?
On a stream/array you can use contains that behaves like a any operator when given a function.
r.db('my_db').table('my_table').filter(lambda row:
row["some_key"].contains(lambda key:
key.match(".\*some_given_string.\*")
)
)
Short answer:
def has_match(row, regex):
return row['some_key']
.map(lambda x: x.match(regex))
.reduce(lambda x,y: x | y)
my_table.filter(lambda row: has_match(row, ".*tr.*"))
Longer answer:
match is a method that you can call on a string. In general in ReQL when you have an array of X and a function you want to apply to each element of the array you want to use the map command. For example if you run:
r.expr(["foo", "boo", "bar"]).map(lambda x: x.match(".\*oo"))
you'll get back:
[True, True, False]
I'm a bit unclear from your question but I think what you want here is to get all the documents in which ANY of these strings matches regex. To see if any of them match you need to reduce the booleans together using or so it would be:
list_of_bools.reduce(lambda x,y: x | y)

xquery- how to select value from a specific element even when that element has null values/multiple return-separated values

Please consider the following XML--
<table class="rel_patent"><tbody>
<tr><td>Name</td><td>Description</td></tr>
<tr><td>A</td><td>Type-A</td></tr>
<tr><td>B</td><td>Type-B</td></tr>
<tr><td>C</td><td>Type-C</td></tr>
<tr><td>AC</td><td>Type-C
Type-A</td></tr>
<tr><td>D</td><td></td></tr>
</tbody></table>
Now I want to select and display all values of "Name" with corresp. values of "Description" element...even when Description element has null values viz element with name=D, and also, when description element has values separated by enter then I want those values (of Description) in separate rows- viz Type-C and Type-A for element with name=AC
This is the type of query I have written--
let $rows_data:= $doc//table[#class="rel_patent"]/tbody/tr[1]/following-sibling::tr
for $data_single_row in $rows_data
return
let $cited_name:= $data_single_row/td[1]
let $original_types_w_return:= $data_single_row/td[4]
let $original_types_list:= tokenize($original_types_w_return, '(\r?\n|\r)$')
for $cited_type_each at $pos2 in $original_types_list
return concat( $cited_name, '^', $original_type_each, '^', $pos2)
However, I am getting the following type of response--
A^Type-A^1
B^Type-B^1
C^Type-C^1
AC^Type-C
Type-A^1
Now, I need to get the following correct in the above code+response---
(1) The data for "AC" should be 2 separate rows with "Type-C" and "Type-A" being in each of the 2 rows along with corresp. value for last field in each row as 1 and 2 (because these are 2 values)
(2) The data for "D" is not being shown at all.
How do I correct the above code to conform with these 2 requirements?
This works:
for $data_single_row in $rows_data
return
let $cited_name:= $data_single_row/td[1]
let $original_types_w_return:= $data_single_row/td[2]
let $original_types_list:= tokenize(concat($original_types_w_return, " "), '(\r?\n|\r)')
for $cited_type_each at $pos2 in $original_types_list
return concat( $cited_name, '^', normalize-space($cited_type_each), '^', $pos2)
(The first change was to replace $original_type_each with $cited_type_each and [4] with [2] which may ).
The first problem can be solved by removing the $ at the end of the tokenize parameter, since in the default mode $ only match the end of the string.
The second one is solved by adding an space $original_types_w_return, so it is not empty and tokenize returns something, and then removing it again with normalize-space (in XQuery 3.0 it could probably be solved by using 'allowing empty' in the for expression)

XPath 2.0:reference earlier context in another part of the XPath expression

in an XPath I would like to focus on certain elements and analyse them:
...
<field>aaa</field>
...
<field>bbb</field>
...
<field>aaa (1)</field>
...
<field>aaa (2)</field>
...
<field>ccc</field>
...
<field>ddd (7)</field>
I want to find the elements who's text content (apart from a possible enumeration, are unique. In the aboce example that would be bbb, ccc and ddd.
The following XPath gives me the unique values:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
Now I would like to extent that and perform another XPath on all the distinct values, that would be to count how many field start with either of them and retreive the ones who's count is bigger than 1.
These could be a field content that is equal to that particular value, or it starts witrh that value and is followed by " (". The problem is that in the second part of that XPath I would have refer to the context of that part itself and to the former context at the same time.
In the following XPath I will - instead of using "." as the context- use c_outer and c_inner:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))[count(//field[(c_inner = c_outer) or starts-with(c_inner, concat(c_outer, ' ('))]) > 1]
I can't use "." for both for obvious reasons. But how could I reference a particular, or the current distinct value from the outer expression within the inner expression?
Would that even be possible?
XQuery can do it e.g.
for $s
in distinct-values(
//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
where count(//field[(. = $s) or starts-with(., concat($s, ' ('))]) > 1
return $s

Resources