xquery- how to select value from a specific element even when that element has null values/multiple return-separated values - xpath

Please consider the following XML--
<table class="rel_patent"><tbody>
<tr><td>Name</td><td>Description</td></tr>
<tr><td>A</td><td>Type-A</td></tr>
<tr><td>B</td><td>Type-B</td></tr>
<tr><td>C</td><td>Type-C</td></tr>
<tr><td>AC</td><td>Type-C
Type-A</td></tr>
<tr><td>D</td><td></td></tr>
</tbody></table>
Now I want to select and display all values of "Name" with corresp. values of "Description" element...even when Description element has null values viz element with name=D, and also, when description element has values separated by enter then I want those values (of Description) in separate rows- viz Type-C and Type-A for element with name=AC
This is the type of query I have written--
let $rows_data:= $doc//table[#class="rel_patent"]/tbody/tr[1]/following-sibling::tr
for $data_single_row in $rows_data
return
let $cited_name:= $data_single_row/td[1]
let $original_types_w_return:= $data_single_row/td[4]
let $original_types_list:= tokenize($original_types_w_return, '(\r?\n|\r)$')
for $cited_type_each at $pos2 in $original_types_list
return concat( $cited_name, '^', $original_type_each, '^', $pos2)
However, I am getting the following type of response--
A^Type-A^1
B^Type-B^1
C^Type-C^1
AC^Type-C
Type-A^1
Now, I need to get the following correct in the above code+response---
(1) The data for "AC" should be 2 separate rows with "Type-C" and "Type-A" being in each of the 2 rows along with corresp. value for last field in each row as 1 and 2 (because these are 2 values)
(2) The data for "D" is not being shown at all.
How do I correct the above code to conform with these 2 requirements?

This works:
for $data_single_row in $rows_data
return
let $cited_name:= $data_single_row/td[1]
let $original_types_w_return:= $data_single_row/td[2]
let $original_types_list:= tokenize(concat($original_types_w_return, " "), '(\r?\n|\r)')
for $cited_type_each at $pos2 in $original_types_list
return concat( $cited_name, '^', normalize-space($cited_type_each), '^', $pos2)
(The first change was to replace $original_type_each with $cited_type_each and [4] with [2] which may ).
The first problem can be solved by removing the $ at the end of the tokenize parameter, since in the default mode $ only match the end of the string.
The second one is solved by adding an space $original_types_w_return, so it is not empty and tokenize returns something, and then removing it again with normalize-space (in XQuery 3.0 it could probably be solved by using 'allowing empty' in the for expression)

Related

How do I identify whether a column entry starts with a letter or a number using m code in power query?

I have a column that contains either letters or numbers. I want to add a column identifying whether each cell contains a letter or a number. The problem is that there are thousands of records in this particular database.
I tried the following syntax:
= Table.AddColumn(Source, "Column2", each if [Column1] is number then "Number" else "Letters")
My problem is that when I enter this, it returns everything as "Letter" because it looks at the column type instead of the actual value in the cell. This remains the case even when I change the column type from Text to General. Either way, it still produces "Letter" as it automatically assigns text as the data type since the column contains both text and numbers.
Use this expression:
= Table.AddColumn(Source, "Column2", each if List.Contains({"0".."9"}, Text.Start([Column1], 1)) then "Numbers" else "Letters")
Note: It would have been smart to add sample data to your question so I wouldn't have to guess what your data actually looks like!
Add column, custom column with
= try if Value.Is(Number.From([Column1]), type number) then "number" else "not" otherwise "not"
Peter's method works if the choice is AAA/111 but this one tests for A11 and 1BC as well

NIFI: Unable to extract two values from a list during each iteration over a loop

I would like to retrieve large SQL dump between date ranges. For the same, I constructed a loop over a date list, which intends to extract adjacent fields. Unfortunately, in my case, it doesnt work as planned.
Following is my flow:
Replace Text: Takes flowfile content date list as all_first_dates
Initialize Count:
While Loop:
Get first and adjacent dates:
However, on seeing the queue, I get the first and second as this:
Whereas, I desired as 2016-01-01 and 2016-01-02 for first and second respectively on my first iteration and so on.
check the description of the getDelimitedField function and it's parameters:
Description: Parses the Subject as a delimited line of text and returns just a single field from that delimited text.
Arguments:
index: The index of the field to return. A value of 1 will return the first field, a value of 2 will return the second field, and so on.
delimiter: Optional argument that provides the character to use as a field separator. If not specified, a comma will be used. This value must be exactly 1 character.
...
you are not passing the second parameter, so the coma used to split the subject, and you got the whole subject as one element in result.

string.IndexOf exact match

I have the following:
string text = "Select [id] AS [FROMId] FROM [TASK] ORDER BY id"
and I want to use text.IndexOf("FROM") in order to find where the FROM starts.
I want to find the position of FROM and not the position of FROMId.
LastIndexOf or FirstIndexOf are not correct answers cause the text could be anything like
string text = #"Select [id] AS [FROMId],
newId as [newFROMId] FROM [TASK] ORDER BY [FROMId]"
I need the indexof to do exact matching.
Any ideas?
Since FROM is an SQL reserved word that will generally have spaces on either side, you could look for that then, since that will give you the address of the space before the F, add one to get the location of the F itself:
int index = text.IndexOf(" FROM ") + 1
This may not necessarily take care of all edge cases(a) but, to do that properly, you may have to implement an SQL parser to ensure you can correctly locate the real from keyword and distinguish it from other possibilities.
(a) Such as things like:
select [a]FROM[tble] ...
select 'got data from unit #' | unit from tbl ...
and so on.

Decoding with SUBSTRING and INSTRING?

I have a table which has city column having few records with state values as well-separated by comma.
There are other records without, as well. I want to take the state values for those present into a separate field called state.
How to do that? I tried the code below and it is saying "missing right parenthesis":
SELECT DECODE(ORA_CITY,
INSTR(ORA_CITY,',') > 0,
SUBSTR(ORA_CITY, INSTR(ORA_CITY, ','), LENGTH(ORA_CITY) ) ,
NULL) AS STATE
from ADDRESS
I don't know if you still need it but use CASE:
SELECT CASE
WHEN INSTR(ORA_CITY, '5') > 0 THEN
SUBSTR(ORA_CITY, INSTR(ORA_CITY, '5'), LENGTH(ORA_CITY))
ELSE
NULL
END STATE
FROM ADDRESS
Clearly you have not understood decode syntax.
Try the following:
SELECT DECODE(INSTR(ORA_CITY,','),
0,
NULL,
SUBSTR(ORA_CITY, INSTR(ORA_CITY, ','), LENGTH(ORA_CITY) )) AS STATE
FROM ADDRESS
The correct syntax is:
DECODE( expression , search , result [, search , result]... [,
default] ), where
expression is the value to compare.
search is the value that is compared against expression.
result is the value returned, if expression is equal to search.
default is optional. If no matches are found, the DECODE function will
return default. If default is omitted, then the DECODE function will
return null (if no matches are found).
Examples here and here
SELECT REGEX_REPLACE(ORA_CITY, '.*, *', '') AS STATE
FROM ADDRESS
WHERE ORA_CITY LIKE '%,%'
This uses regular expression to replace all upto the comma, and then maybe spaces with nothing. A WHERE included.

XPath 2.0:reference earlier context in another part of the XPath expression

in an XPath I would like to focus on certain elements and analyse them:
...
<field>aaa</field>
...
<field>bbb</field>
...
<field>aaa (1)</field>
...
<field>aaa (2)</field>
...
<field>ccc</field>
...
<field>ddd (7)</field>
I want to find the elements who's text content (apart from a possible enumeration, are unique. In the aboce example that would be bbb, ccc and ddd.
The following XPath gives me the unique values:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
Now I would like to extent that and perform another XPath on all the distinct values, that would be to count how many field start with either of them and retreive the ones who's count is bigger than 1.
These could be a field content that is equal to that particular value, or it starts witrh that value and is followed by " (". The problem is that in the second part of that XPath I would have refer to the context of that part itself and to the former context at the same time.
In the following XPath I will - instead of using "." as the context- use c_outer and c_inner:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))[count(//field[(c_inner = c_outer) or starts-with(c_inner, concat(c_outer, ' ('))]) > 1]
I can't use "." for both for obvious reasons. But how could I reference a particular, or the current distinct value from the outer expression within the inner expression?
Would that even be possible?
XQuery can do it e.g.
for $s
in distinct-values(
//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
where count(//field[(. = $s) or starts-with(., concat($s, ' ('))]) > 1
return $s

Resources