Selecting multiple results from XQUERY query - xpath

I am trying to select multiple columns from a query, but so far, I can only manage to select one. So I'm basically stuck with either selecting one, or all of them.
Here's my expression, what I got so far, which select only (1) column:
let $y := doc("http://en.wikipedia.org/wiki/List_of_deaths_on_eight-thousanders")//table[preceding-sibling::h2//span[string() = "K2"]][1]
return $y/tr/td[2]/string()
I would love some explanation of how one would go about doing this, since there's almost no documentation of this lovely language.

How would you like the result to be returned? You could construct new elements, or concatenate strings. There are many ways that this could be accomplished.
Here's one way to get comma-separated values:
return $y/tr/fn:string-join( (td[2] | td[4]), ", " )
You can try it on zorb.io.
Update
(td[2] | td[4]) selects both elements, and passes them, as a sequence, to fn:string-join(). | is the XQuery union operator (and can be substituted for the keyword).
As far as documention, the functx site documents the standard library (all fn-prefixed functions), and has useful examples. And the specs are surprisingly readable.

Related

assistance needed constructing JSONata query

I am trying to construct a JSONata query using the try.jsonata.org Invoice data.
The query I am trying to pose is select distinct OrderID where Order.Product.Price is < 50?
I have not been able to figure out how to do this using the predicate in square brackets notation ... my attempts have been thwarted when I try to get past the $.Account.Order.Product array.
Using $map and $reduce I was able to come up with this rather complex solution ... which still doesn't correctly handle duplicate OrderIDs. (I see that the issue of duplicate removal has been requested here)
Q: What is the proper way to express this query in JSONata?
I think this does what you need:
Account.Order[Product.Price.($ < 50)].OrderID
The expression in the predicate, which gets tested for each Order, will generate an array of Booleans (one for each Product.Price). The resulting predicate will evaluate to true if any of the Booleans within that array are true, due to the semantics of the $boolean function which is implicitly applied.
Overall, the expression will return the OrderID for every Order which has at least one Product whose Price is less than 50

Assessing from the end of a split array in Hive

I need to split a tag that looks something like "B1/AHU/_1/RoomTemp", "B1/AHU/_1/109/Temp", so with a variable with a variable number of fields. I am interested in getting the last field, or sometimes the last but one. I was disappointed to find that negative indexes do not count from the right and allow me to select the last element of an array in Hive as they do in Python.
select tag,split(tag,'[/]')[ -1] from sensor
I was more surprised when this did not work either:
select tag,split(tag,'[/]')[ size(split(tag,'[\]'))-1 ] from sensor
Both times giving me an error along the lines of this:
FAILED: SemanticException 1:27 Non-constant expressions for array indexes not supported.
Error encountered near token '1'
So any ideas? I am kind of new to Hive. Regex's maybe? Or is there some syntactic sugar I am not aware of?
This question is getting a lot of views (over a thousand now), so I think it needs a proper answer. In the event I solved it with this:
select tag,reverse(split(reverse(tag),'[/]')[0]) from sensor
which is not actually stated in the other suggested answers - I got the idea from a suggestion in the comments.
This:
reverses the string (so "abcd/efgh" is now "hgfe/dcba")
splits it on "/" into an array (so we have "hgfe" and "dcba")
extracts the first element (which is "hgfe")
then finally re-reverses (giving us the desired "efgh")
Also note that the second-to-last element can be retrieved by substituting 1 for the 0, and so on for the others.
There is a great library of Hive UDFs here. One of them is LastIndexUDF(). It's pretty self-explainatory, it retrieves the last element of an array. There are instructions to build and use the jar on the main page. Hope this helps.
This seem to work for me, this returns the last element from the SPLIT array
SELECT SPLIT(INPUT__FILE__NAME,'/')[SIZE(SPLIT(INPUT__FILE__NAME,'/')) -1 ] from test_table limit 10;
After reading the LanguageManual UDF a while, I luckily found the function substring_index exactly meets your requirement, dosen't need any additional calculations at all.
The manual says:
substring_index(string A, string delim, int count) returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'.
Use cases:
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
--www.mysql
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
--com
See here for more information.

Prefix the result of a XPATH query

I use libxmljs to parse some html.
I have a xpath query which has an "or" conjunction to retrieve basically the information of two queries
Example
doc.find("//div[contains(#class,'important') or contains(#class,'overdue')]")
this returns all the divs with either important or overdue...
Can I prefix or see within my result set which comes from which condition?
The result could be an array with an index for the match 0 for the first condition and 1 for the 2... Is this possible...
Or how can I find out which result comes from which query condition...
Thanks for any help...
P.S.: this is a simplified exampled of a sequence of elements which either have an important or an overdue item ... both, one or none of them... So I cannot go by looking for every second entry ... etc
This is the result I want to get...
message:{},
message:{
.....
important: "some immportant text",
overdue: "overdue date,
.....
}
There is no way to know which clause of an or XPath query caused a particular result to be included. It's simply not information that's kept around.
You'll either need to do entirely separate queries for important and overdue, or do one large query to get the entire result set (as you are now) and then further test each result's class to find out which one it is.

ORA-00907 Error when using Analytic Function in a Query (PS/Query, Peopletools 8.51.12)

Query's throwing an ORA-00907 Error when I try to paste a list of values into a criteria.
Background: I'm not a developer, I'm just an end user that's studied enough to where I can write queries using PS/Query within Peoplesoft,
for my company's implementation. I work with Peoplesoft's FSCM module
(Financials and Supply Chain Management), currently on Version FSCM
8.90.08.024, using I think Oracle 11g as the base database.
I'm mostly self-taught, and the technical experts we have are busy
with database/application stuff, or they aren't familiar with my
section's specific data needs.
I should point out that I'm unable to directly write SQL statements to
Query the database. I have to use a built-in program called "PS/Query"
(also known as Query Manager) with a GUI that writes the SQL for you
and saves it as a Query that you can run to the database to extract
data. This is relevant to my question only in that:
1. I cannot create or alter views/tables
2. I cannot perform any type of SQL Statement except "SELECT"
3. I can embed PL/SQL, MetaSQL and plain SQL into Expressions
4. At this point, Query Manager is the only option I have.
PS/Query is my only experience with SQL so far, aside from Oracle's
documentation and sites like this. From my research, it's considered
extremely confining by "actual" SQL programmers.The restrictions on it
require you to do things in a manner that violates what seem to be
best practices of SQL coding.
Query Request: I have a query I've been requested to write that pulls out spend (on Vouchers and POs) against certain system-defined
Category Codes. What I'm trying to do is pull in Voucher IDs, sum the
merchandise amounts on them by Vendor and Category Code, and display
the results. Or in other words, for every unique combination of
Vendor/Category, add up all the Voucher Amounts that have that
Vendor/Category combination.
Using the SUM (Fieldname) OVER (PARTITION BY fieldname, fieldname)
syntax.
So the end result should look something like...
Code Vendor Amount
123-45 Acme $5000.00
123-45 Apple $4200.00
123-46 Acme $750.00
With that said, here's the SQL that Query Manager is displaying to get the result set I showed above:
SELECT DISTINCT D.CATEGORY_CD, D.TN_DESCR1000, C.VENDOR_ID, E.NAME1, SUM ( A.MERCH_AMT_VCHR) OVER (PARTITION BY D.CATEGORY_CD, C.VENDOR_ID),E.SETID,E.VENDOR_ID
FROM PS_PO_LINE_MATCHED A, PS_PO_LINE B, PS_PO_HDR C, PS_ITM_CAT_TBL D, PS_VENDOR E, PS_PYMNT_VCHR_XREF F
WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT
AND A.PO_ID = B.PO_ID
AND A.LINE_NBR = B.LINE_NBR
AND B.BUSINESS_UNIT = C.BUSINESS_UNIT
AND B.PO_ID = C.PO_ID
AND D.CATEGORY_ID = B.CATEGORY_ID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_ITM_CAT_TBL D_ED
WHERE D.SETID = D_ED.SETID
AND D.CATEGORY_TYPE = D_ED.CATEGORY_TYPE
AND D.CATEGORY_CD = D_ED.CATEGORY_CD
AND D.CATEGORY_ID = D_ED.CATEGORY_ID
AND D_ED.EFFDT <= SYSDATE)
AND ( F.SCHEDULED_PAY_DT >= TO_DATE('2010-07-01','YYYY-MM-DD')
AND F.SCHEDULED_PAY_DT <= TO_DATE('2011-06-30','YYYY-MM-DD'))
AND D.CATEGORY_CD LIKE :1
AND E.VENDOR_ID = C.VENDOR_ID
AND A.BUSINESS_UNIT = F.BUSINESS_UNIT
AND A.VOUCHER_ID = F.VOUCHER_ID
ORDER BY 1
Underlying Issue: This works fine, but it can only prompt on one
Category Code at a time. Category Codes are 5 digits, a 3-digit
"Class" followed by a dash and then a 2-digit "subclass. I have a list
of 375 Category Codes I need to get this Query result for.
I've set up a prompt on this version that allows entry of a Wildcard
(So 123-%%), but that's still about a hundred separate runs of the
Query. Query Manager allows use of an "In List" expression type in
Criteria, but it requires you to manually enter each entry in the
list.
I'm trying to set it up to where I can paste a plaintext copy of the
Code list into an Expression, with proper quotes/commas, and have it
evaluate that to give me a combined list of all the NIGP codes
specified. The Prompt field created by Query Manager doesn't allow
pasting of lists (as far as I know).
Attempted Solution: I viewed the page at http://peoplesoft.ittoolbox.com/groups/technical-functional/peoplesoft-other-l/create-an-expression-in-psoft-90-query-to-paste-a-list-of-emplids-2808427 and I've tried some of the answers given there, but none of them worked. That page led to me trying this modified SQL (obviously the list of codes is truncated a bit for display here):
SELECT DISTINCT D.CATEGORY_CD, D.TN_DESCR1000, C.VENDOR_ID, E.NAME1, SUM ( A.MERCH_AMT_VCHR) OVER (PARTITION BY D.CATEGORY_CD, C.VENDOR_ID),E.SETID,E.VENDOR_ID
FROM PS_PO_LINE_MATCHED A, PS_PO_LINE B, PS_PO_HDR C, PS_ITM_CAT_TBL D, PS_VENDOR E, PS_PYMNT_VCHR_XREF F
WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT
AND A.PO_ID = B.PO_ID
AND A.LINE_NBR = B.LINE_NBR
AND B.BUSINESS_UNIT = C.BUSINESS_UNIT
AND B.PO_ID = C.PO_ID
AND D.CATEGORY_ID = B.CATEGORY_ID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_ITM_CAT_TBL D_ED
WHERE D.SETID = D_ED.SETID
AND D.CATEGORY_TYPE = D_ED.CATEGORY_TYPE
AND D.CATEGORY_CD = D_ED.CATEGORY_CD
AND D.CATEGORY_ID = D_ED.CATEGORY_ID
AND D_ED.EFFDT <= SYSDATE)
AND ( F.SCHEDULED_PAY_DT >= TO_DATE('2010-07-01','YYYY-MM-DD')
AND F.SCHEDULED_PAY_DT <= TO_DATE('2011-06-30','YYYY-MM-DD'))
AND D.CATEGORY_CD = '005-00' OR D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
AND E.VENDOR_ID = C.VENDOR_ID
AND A.BUSINESS_UNIT = F.BUSINESS_UNIT
AND A.VOUCHER_ID = F.VOUCHER_ID
ORDER BY 1
And the SQL above is what's giving me the ORA-00907 error. Has anyone ran into this problem before? Massive wall of text, I know. My apologies. This is my first post here and I'm trying not to leave anything relevant out.
I've got the immediate problem that spurred this question fixed,but that request is just the tip of a very large iceberg, and at some point I need to figure out a way to be able to paste plaintext lists in as criteria using Query Manager, preferably in a way that plays nice with Analytic Grouping.
TL;DR version:
Using Peoplesoft Query Manager to do an Analytic SUM with grouping using OVER, PARTITION BY. When I try to paste a list into the criteria, it throws an ORA-00907 Error.
Any help would be greatly appreciated. Thanks!
Ok, after a bit more tweaking with this, I've found what I think is the underlying issue.
The error, in this case, is two-fold. Part of it was my fault (I didn't check for Peoplesoft mangling the quotation marks I pulled from Word), and part of it was the way Query Manager interprets some kinds of functions (you have to wrap some stuff in a Case When statement to get it to evaluate properly).
First, the "My Fault" part:
Every time I was pasting in my list of test NIGP Codes, I was doing it from a file I kept saved in Microsoft Word.
Which has the probably-handy "replace straight quotes with smart quotes" feature. Peoplesoft goes bonkers when its presented a "smart quote", and will display them as upside-down question marks (there's probably a technical term, I don't know it).
So when I'd test suggestions (such as fixing the quote/comma order as suggested by #Rene Nyffenegger and #WayneH) I'd start with my base test query, add in the expressions and test it, saving it as a separate query. If they didn't work, I'd go back to the base query. That way I could iterate changes and save potential tests as different versions.
My mistake was in not saving the different versions, leaving the application and going back in. It's when you save the query, leave the page, go somewhere else in Peoplesoft, then go back to open Query Manager that it actually shows you that it's doing the character conversion. You can't see it unless you do that. Even though Query Manager is doing it. So it was throwing a character Query Manager wouldn't recognize, but not showing me the character it wouldn't recognize.
I got a new work PC recently, and I've now disabled the Smart Quotes auto-replace for future use.
Second, the "Query Manager: part:
On the version of this that I got to work, I made use of wrapping the "IN" function inside a Case statement. I've found that a lot of SQL functions, when used "plain" (as I'd define them by just copy-pasting from Oracle's definitions pages and filling in the appropriate variables) tend to give PS/Query (Query Manager) heartburn. But if you wrap them inside a CASE...WHEN...END statement that evaluates the result of the function and then build a criteria that selects based on certain values of that result, the function will work and properly display a result.
So for an example, set up this expression (like in the example from #qyb2zm302). I'm using different codes from what was in my original example, but they work the same (they're all five-digit, character-typed codes consisting of three digits, a dash, then two digits)
Case when E.CATEGORY_CD IN
('375-15', '375-30', '375-54', '375-60', '380-30','938-63')
then 'true'
else 'false'
end
And then set a criteria:
AND
Case when E.CATEGORY_CD IN
('375-15', '375-30', '375-54', '375-60', '380-30','938-63')
then 'true'
else 'false'
end
= 'true'
It'll run to completion and return any rows that have that Category Code.
If you don't want to do that, you can do like in #qyb2zm302's Method 2. The only downside to that in Query Manager is that you have to enter them into individual rows in the "List", and if you can only copy-paste 25 at a time.
Wrapping it in a Case Statement lets you paste it directly into an Expression, which is far better for larger lists.
Solutions:
The above is the code I went with that worked. It's simplifying a bit for brevity's sake, but it works.
In List works through the native Query Manager option as long as you manually-populate the list
D.CATEGORY_CD = '005-00' OR works as long as you wrap it in a Case Statement
D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07') works as long as you wrap it in a Case Statement
Peoplesoft hates Smart Quotes. None of the above will work if you're copying quotation marks directly from Word, but you won't see it unless you save, leave and come back to the same query in edit mode
Formatting is important. All of the above require the proper comma/quotation formatting, as pointed out by Rene and Wayne. Meaning: ('xxx-xx', 'xxx-01','xxx-02') etc
Thanks to everyone who helped on this! I don't think I've head-desked this hard before on any question, but I guess that's part of the learning process. Since all the answers posted are valid and correct (or at least a portion of the larger "correct"), I'm going to flag them all.
The
D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
part looks fishy to me
Since a '' within a string "evaluates" to a single ' the first string is
'015-00,'' '
followed by (the non-string)
015-06,
The following '' is probably the thing that the parser stumbles upon since it's pretty meaningless.
Edit try it with a D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07').
Following the link you posted, I see 2 methods for doing what you are trying to accomplish.
I also notice that you tried a 3rd method.
Method 1
Criteria > Add Criteria
Expression Type: Character
Length: 255
Expression Text: D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07') AND 1
Condition Type: equal to
Constant: 1
Method 2
Criteria > Add Criteria
Field: D.CATEGORY_CD
Condition Type: in list
Value: 015-00','015-06','015-10','615-07
Method 3 (Your Method)
Criteria > Add Criteria
Field: D.CATEGORY_CD
Condition Type: equal to
Define Expression: '015-00' OR D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07')
Question) Does the below exactly match the text you are putting the Expression box?
'015-00' OR D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07')
If not, what are you putting in that box?
I think the D.CATEGORY_CD criteria are giving you the problems, I changed the double quotes to single quotes and then it still looked strange to me. I then notice the commas are inside your quotes and not between them, try making the one criteria line look like this:
before:
OR D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
after:
OR D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07')
Also, the "IN" is an implied "OR" and I am not sure if you have parenthesis around the two D.CATEGORY_CD,
I would just put the one additional code into the IN criteria and remove the "D.CATEGORY_CD =" line:
before:
AND D.CATEGORY_CD = '005-00' OR D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07')
after:
AND D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07', '005-00')
Of course, you are already ordering by CATEGORY_CD, you could remove this criteria and pull all categories in one run (that is unless there are too many rows for excel), and then you might also want to include either VENDOR_ID or NAME1 in the ORDER BY clause.
Hope that helps you.

using xpath to obtain complex values

Given the following, I'd like to extract VarVal1, VarVa5 and VarText where FixedVals are, well, fixed :)
<TypeA Attr1="VarVal1">
<TypeB Attr2="FixedVal2">
<TypeC Attr3="FixedVal3">
<TypeD Attr4="FixedVal4" Attr5="VarVal5">
VarText
</TypeD>
</TypeC>
</TypeB>
</TypeA>
Notice that the big problem for me is that the context is important. I want the complete pattern. There may be other TypeA nodes, but I'm not interested in their values unless they're followed by
<TypeB Attr2="FixedVal2">
<TypeC Attr3="FixedVal3">
<TypeD Attr4="FixedVal4" Attr5="VarVal5">
VarText
</TypeD>
</TypeC>
</TypeB>
In other words, what I'm interested in is a set of tripletts, each of them in the form of (VarVal1, VarVal5, VarText)
These XPath expressions:
//TypeA
[TypeB[#Attr2="FixedVal2"]
/TypeC[#Attr3="FixedVal3"]
/TypeD[#Attr4="FixedVal4"]]
/#Attr1
Then those already posted:
//TypeA
/TypeB[#Attr2="FixedVal2"]
/TypeC[#Attr3="FixedVal3"]
/TypeD[#Attr4="FixedVal4"]
/#Attr5
And
//TypeA
/TypeB[#Attr2="FixedVal2"]
/TypeC[#Attr3="FixedVal3"]
/TypeD[#Attr4="FixedVal4"]
You could also combine them with | union set operator. But depending on the host language, you should better select the TypeA elements you want (first expression with out last /#Attr1 part) and then query each of those to extract the remaining values.
I think you need a couple of queries for this (could be wrong though)
for VarVal1
//TypeA/#Attr1
for VarVal5
//TypeA
/TypeB[#Attr2="FixedVal2"]
/TypeC[#Attr3="FixedVal3"]
/TypeD[#Attr4="FixedVal4"]
/#Attr5
Think these should do the trick
EDIT - missed VarText!
//TypeA
/TypeB[#Attr2="FixedVal2"]
/TypeC[#Attr3="FixedVal3"]
/TypeD[#Attr4="FixedVal4"]

Resources