Can't index data in alphabetical order in spanish alphabet before to select it in a query - xpath

I have a set of assets which had a property "name".
I want to get a dynamic number of those assets and I should get it alphabetically sorted by that "name" property.
I query that with this query:
type=dam:Asset
path=/content/dam/en/foobar/contacts/
orderby=#jcr:content/data/master/#name
orderby.sort=asc
p.limit=3
and this is working, so in a set of names:
[Paloma, Abel, José, Eduardo]
it retrieves:
Abel, Eduardo, José.
The problem is with spanish alphabet, in which Á is the same letter as A.
So in a set of:
[Paloma, Abel, José, Álvaro, Eduardo]
it retrieves:
Abel, Eduardo, José.
Being Álvaro excluded because its not part of the first 3 elements after ordeby it, when in should be the second, it should retrieve:
Abel, Álvaro, Eduardo.
So, to fix that, I've created a custom oak lucene index like below:
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:oak="http://jackrabbit.apache.org/oak/ns/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:rep="internal"
jcr:mixinTypes="[rep:AccessControllable]"
jcr:primaryType="nt:unstructured">
<socialLucene/>
<workflowDataLucene/>
<slingeventJob/>
<jcrLanguage/>
<versionStoreIndex/>
<repMembers/>
<cqReportsLucene/>
<commerceLucene/>
<counter/>
<authorizables/>
<enablementResourceName/>
<externalPrincipalNames/>
<cmLucene/>
<foobarCFIndexFilter
jcr:primaryType="oak:QueryIndexDefinition"
async="[async,nrt]"
evaluatePathRestrictions="{Boolean}true"
includedPaths="[/content/dam/es/foobar,/content/dam/en/foobar]"
queryPaths="[/content/dam/es/foobar,/content/dam/en/foobar]"
reindex="{Boolean}false"
reindexCount="{Long}24"
seed="{Long}3850652403740003290"
type="lucene">
<analyzers jcr:primaryType="nt:unstructured">
<default jcr:primaryType="nt:unstructured">
<filters jcr:primaryType="nt:unstructured">
<Synonym
jcr:primaryType="nt:unstructured"
format="solr"
synonyms="synonyms.txt">
<synonyms.txt/>
</Synonym>
</filters>
<tokenizer
jcr:primaryType="nt:unstructured"
name="Classic"/>
</default>
</analyzers>
<indexRules jcr:primaryType="nt:unstructured">
<nt:base jcr:primaryType="nt:unstructured">
<properties jcr:primaryType="nt:unstructured">
<title
jcr:primaryType="nt:unstructured"
analyzed="{Boolean}true"
isRegexp="{Boolean}false"
name="jcr:content/data/master/title"
nodeScopeIndex="{Boolean}true"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"
type="String"/>
<date
jcr:primaryType="nt:unstructured"
name="jcr:content/data/master/date"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"/>
<sectors
jcr:primaryType="nt:unstructured"
name="jcr:content/data/master/sectors"
propertyIndex="{Boolean}true"/>
<contentFragment
jcr:primaryType="nt:unstructured"
name="jcr:content/contentFragment"
propertyIndex="{Boolean}true"/>
<model
jcr:primaryType="nt:unstructured"
name="cq:model"
propertyIndex="{Boolean}true"/>
<name
jcr:primaryType="nt:unstructured"
analyzed="{Boolean}true"
isRegexp="{Boolean}false"
name="jcr:content/data/master/name"
nodeScopeIndex="{Boolean}true"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"
type="String"/>
</properties>
</nt:base>
</indexRules>
</foobarCFIndexFilter>
<cqProjectLucene/>
<ntFolderDamLucene/>
<acPrincipalName/>
<uuid/>
<damAssetLucene/>
<rep:policy/>
<cqPayloadPath/>
<nodetypeLucene/>
<nodetype/>
<ntBaseLucene/>
<reference/>
<principalName/>
<cqTagLucene/>
<lucene/>
<repTokenIndex/>
<externalId/>
<authorizableId/>
<cqPageLucene/>
</jcr:root>
Where in the synonyms.txt I had:
á, a
Á, A
and so on.
Also tried with a charFilter with Mapping equivalent chars.
I have made sure that my custom oak index is the one my query is using with Query Performance Diagnosis tool.
But nothing works, after reindex the query results are the same.
How to solve that?

Related

XSLT: filter and retrieve different elements

<cars>
<car>
<name v="speedy"/>
<type v="sport"/>
<engine>
<hp>300</hp>
</engine>
<car>
<car>
<name v="biggo"/>
<type v="truck"/>
<engine>
<hp>190</hp>
</engine>
<car>
</cars>
I have a problem in building a xpath-term that gives my biggos horsepower.
I am not sure how to filter and get the value of something that is not in the filtered element.

Xpath function to loop through repeating nodes

What XPath function works to loop through repeating XML nodes.
This is my Source XML:
<?xml version="1.0" encoding="UTF-8"?>
<Record>
<Type>V</Type>
<Address>
<Qual>A</Qual>
<ID>A1</ID>
</Address>
<Address>
<Qual>A</Qual>
<ID>B2</ID>
</Address>
<Address>
<Qual>C</Qual>
<ID>C2</ID>
</Address>
<Category>
<EL>PO</EL>
</Category>
<Category>
<EL>DP</EL>
</Category>
</Record>
I don't want to process the data if Qualf=A & ID = B2, Category =DP & Type =V
My Xpath does not work due to repeating nodes..
(concat(Xpath./Type,Xpath./Record/Address/Qual,Xpath./Record/Address/ID,Xpath./Record/Category/EL) != "VAB2DP"
so I tried
choose((concat(Xpath./Type,Xpath./Record/Address/Qual,Xpath./Record/Address/ID,Xpath./Record/Category/EL) != "VAB2DP"),'true','false'
It still does not work.

XML sorted list of only certain nodes

I have the following XML file. I need to print a list of only selected nodes (Total) in ascending order. I tried to use sort function, but there were some mistakes I couldn't identify and it returned everything, including values of other nodes in the initial file.
XML input:
<?xml version="1.0" encoding="UTF-8"?>
<Invoice>
<From>
<Name>Lucy</Name>
<Country>UK</Country>
</From>
<To>
<Name>John</Name>
<Country>US</Country>
</To>
<Items>
<Position>
<Name>Table</Name>
<Total>1</Total>
</Position>
<Position>
<Name>Chair</Nr>
<Total>4</Total>
</Position>
<Position>
<Name>Cup</Name>
<Total>5</Total>
</Position>
<Position>
<Name>Box</Name>
<Total>4</Total>
</Position>
</Items>
</Invoice>
How could I get the required output using?
Any help is greatly appreciated! Thank you!
One obvious approach to generate the desired output from the given input would be using an xsl:for-each also making use of xsl:sort:
<xsl:template match="/Invoice">
<SortedTotalList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:for-each select="Positions/Position">
<xsl:sort select="Total"/>
<xsl:copy-of select="Total" />
</xsl:for-each>
</SortedTotalList>
</xsl:template>
Output is:
<?xml version="1.0"?>
<SortedTotalList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Total>1</Total>
<Total>4</Total>
<Total>4</Total>
<Total>5</Total>
</SortedTotalList>

I am using pig version .8 , How to extract specific elements of xml by using XPath() ?. I tried with multiple ways but couldn't get.Please suggest

<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
This is xml i am using.
I want to extract only TITLE and COMPANY elements.Is there any way to extract them by using Regex or XPath();
First thing you need to do is format your XML like so:
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
Then you can extract those elements like so:
/CATALOG/BOOK/*[self::title or self::company]
More about axes you can find here: http://www.w3schools.com/xsl/xpath_axes.asp

How to write a Xpath Expression comparing two attributes or nodes

Given the following sample:
<?xml version="1.0" encoding="UTF-8"?>
<Patients>
<patientRole>
<id extension="996-756-495" root="2.16.840.1.113883.19.5"/>
<id extension="775-756-495" root="2.16.840.1.113883.14.6"/>
<patient>
<name>
<given>Henry</given>
<family>Levin</family>
</name>
<administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19320924"/>
</patient>
<providerOrganization>
<id root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</providerOrganization>
<admissionTime value="2012030111:32"/>
</patientRole>
<patientRole>
<id extension="65" root="2.16.840.1.113883.3.933"/>
<patient>
<name>
<given>Paul</given>
<family>Pappel</family>
</name>
<administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19551217"/>
</patient>
<providerOrganization>
<id extension="84756-11241-283-OPTD-3322" root="1.2.3.4.5.6.1.8.9.0"/>
<name> Dr.med. Hans Topp-Glucklich</name>
</providerOrganization>
<admissionTime value="201201152200"/>
</patientRole>
<patientRole>
<id extension="800001" root="2.16.840.1.113883.19.5"/>
<patient>
<name>
<given>JEANNE</given>
<family>PETIT</family>
</name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19480105"/>
</patient>
<providerOrganization>
<id root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</providerOrganization>
<admissionTime value="20120101T22:00"/>
</patientRole>
</Patients>
How would I write a X-Path expression for the following:
Family names for the male patients (gender code="M")
Any help is greatly appreciated I am new to XML/Xpath and i have tried multiple ways and its not generating what i need.
This should work:
/Patients/patientRole/patient[administrativeGenderCode/#code='M']/name/family

Resources