Failure of conditional combination index in eXist-db range index - exist-db

I encountered the following problems in eXist-db configuring range indexes to specify attributes that are worth indexing.
<collection xmlns="http://exist-db.org/collection-config/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<index>
<range>
<create qname="tei:term">
<condition attribute="type" value="main"/>
<field name="mainTerm" type="xs:string"/>
</create>
</range>
</index></collection>
Error occurred:"/db/system/config/db/range/collection.xconf cvc-complex-type.2.4.a: Invalid content was found starting with element 'condition'. One of '{"http://exist-db.org/collection-config/1.0":field}' is expected."
Please help me.

The error you are getting is a schema validation error, triggered by the presence of the <condition> element used by the recently introduced conditional combined index feature.
I've submitted a fix to the error, and for now, you can ignore the error. The schema validation error will not have any effect on the functionality.

General Configuration Structure and Syntax
Index configuration collection.xconf files are standard XML documents that have their elements and attributes defined by the eXist-db namespace http://exist-db.org/collection-config/1.0. The following example shows a configuration example:
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<!-- Full text index based on Lucene -->
<lucene>
<text qname="SPEECH">
<ignore qname="SPEAKER"/>
</text>
<text qname="TITLE"/>
</lucene>
<!-- Range indexes -->
<range>
<create qname="title" type="xs:string"/>
<create qname="author" type="xs:string"/>
<create qname="year" type="xs:integer"/>
</range>
<!-- N-gram indexes -->
<ngram qname="author"/>
<ngram qname="title"/>
</index>
</collection>
To use the new range index, wrap the range index definitions into a range element:
<collection xmlns="http://exist-db.org/collection-config/1.0">
<!--from Tamboti-->
<index xmlns:mods="http://www.loc.gov/mods/v3">
<lucene>
<text qname="mods:title"/>
</lucene>
<!-- Range indexes -->
<range>
<create qname="mods:namePart" type="xs:string" case="no"/>
<create qname="mods:dateIssued" type="xs:string"/>
<create qname="#ID" type="xs:string"/>
</range>
</index>
</collection>
Conditional combined indexes
For combined indexes, you can specify conditions to restrict the values being indexed to those contained in elements that have an attribute meeting certain criteria:
<range>
<create qname="tei:term">
<condition attribute="type" value="main"/>
<field name="mainTerm" type="xs:string"/>
</create>
</range>
This will only index the value of the tei:term element if it has an attribute named type with the value main. Multiple conditions can be specified in an index definition, in which case all conditions need to match in order for the value to be indexed.
Make sure you have a valid xml. For further details, you can read the documentation here: https://exist-db.org/exist/apps/doc/newrangeindex.xml

Related

XPath remove single node (via Saxon CLI)

I want to remove a node from an XML file (using SaxonHE9-8-0-11J):
<project name="Build">
<property name="src" value="src/main/resources" />
<property name="target" value="target/classes" />
<condition property="target.exists">
<available file="target" />
</condition>
</project>
Apparently there are 2 ways I can do this.
XPath1: using a not function
XPath2: using an except clause. But both simply return the entire node-set.
With a not function:
saxonb-xquery -s:test.xml -qs:'*[not(local-name()="condition")]'
With an except clause:
saxonb-xquery -s:test.xml -qs:'* except condition'
With -explain switch the queries are:
<query>
<body>
<filterExpression>
<axis name="child" nodeTest="element()"/>
<operator op="ne (on empty return true())">
<functionCall name="local-name">
<dot/>
</functionCall>
<literal value="condition" type="xs:string"/>
</operator>
</filterExpression>
</body>
</query>
and
<query>
<body>
<operator op="except">
<axis name="child" nodeTest="element()"/>
<path>
<root/>
<axis name="descendant" nodeTest="element(condition, xs:anyType)"/>
</path>
</operator>
</body>
</query>
In general, XPath select nodes from one or more input documents, it doesn't allow you to construct new ones, for that you need XSLT or XQuery. And removing the condition child of the project root, if that is what you want to achieve, is something you need XSLT or XQuery for, with XPath, even if you use /*/(* except condition), you then get all children except the condition element, but as a sequence, not wrapped into a a root.
So with XQuery you could use
/*/element {node-name()} { * except condition }
as a compact but generic way to reconstruct any root with all child elements except the condition: https://xqueryfiddle.liberty-development.net/948Fn5b
Whether you get such an expression through a command line shell is a different problem, on Windows with a Powershell window and the cmd shell it works for me to use
-qs:"/*/element {node-name()} { * except condition }"

How to print nested xml elements?

Sample xml:
<Root>
<Customers>
<Customer>
<CompanyName>Great Lakes Food Market</CompanyName>
<ContactName>Howard Snyder</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(503) 555-7555</Phone>
<FullAddress>
<Address>2732 Baker Blvd.</Address>
<City>Eugene</City>
<Region>OR</Region>
<PostalCode>97403</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
</Customers>
</Root>
In the above xml, when I use "Customer" as the root node and xpath query as "/Root/Customers/Customer", I'm unable to print the child nodes of "FullAddress" and when I use "FullAddress" as the root node and the xpath query as "/Root/Customers/Customer/FullAddress", unable to print all the fields.
Kindly help me with the solution to print all the xml elements including the nested in a single report.
The correct XPath query is
<queryString language="XPath">
<![CDATA[/Root/Customers/Customer]]>
</queryString>
This include both of your nodes, to access the value is FullAddress node you should use XPath also in fieldDescription when you define your field, hence Address is accessed through FullAddress/Address
Example
If the field declaration of CompanyName is
<field name="CompanyName" class="java.lang.String">
<fieldDescription><![CDATA[CompanyName]]></fieldDescription>
</field>
the field declaration of for example the City is
<field name="City" class="java.lang.String">
<fieldDescription><![CDATA[FullAddress/City]]></fieldDescription>
</field>

Appending multiple strings in Oracle SOA Suit in a BPEL process

I'm new to the Oracle SOA Suite and I have created a basic BPEL process that allows you to create a "User" with some basic values.
XSD:
//INPUT
<element name="User">
<complexType>
<sequence>
<element name="First_Name" type="string"/>
<element name="Last_Name" type="string"/>
<element name="Age" type="string"/>
<element name="DOB" type="string"/>
</sequence>
</complexType>
</element>
// String to output
<element name="processResponse">
<complexType>
<sequence>
<element name="Output" type="string"/>
</sequence>
</complexType>
</element>
So using this XSD I want to be able to create a user, and append all the values together and create a response/reply with my synchronous BPEL process.
I've tried simply adding all the values together using the '+' operation but doesn't work as it tries to cast it it a integer, so it seems and the value comes out as "Jon NaN".
<copy>
<from>concat("Hello! ", $inputVariable.payload/ns1:Last_Namel)</from>
<to>$outputVariable.payload</to>
</copy>
I've also tried to use multiple concat's but it get's real ugly real quick, and something i would like to avoid is messy code.
So as a summary i want to be able to take all the input values (First Name, Last Name, Age, DOB) and convert them into a nice string with some padding and extra hard coded strings to show a nice message at the end.
Give all the values which you require to be concatenated comma separated in the concat() function:
<from>concat('Hello! ', $inputVariable.payload/ns1:Last_Name,' ',$inputVariable.payload/ns1:First_Name,' ',$inputVariable.payload/ns1:Age,' ',$inputVariable.payload/ns1:DOB)</from>
<to>$outputVariable.payload/Output</to>

Ignoring diacritics when sorting facet values in Solr 4

Tl;dr: How can I get Solr 4 to ignore diacritics when sorting facet values?
I've added the following four documents to the "collection1" Solr core in the default Solr example:
<doc>
<field name="id">1</field>
<field name="cat">manuka</field>
<field name="cat">mystery</field>
</doc>
<doc>
<field name="id">2</field>
<field name="cat">mānuka</field>
<field name="cat">stuff</field>
</doc>
<doc>
<field name="id">3</field>
<field name="cat">management</field>
<field name="cat">stuff</field>
</doc>
<doc>
<field name="id">4</field>
<field name="cat">abc</field>
<field name="cat">stuff</field>
</doc>
The "cat" field is defined as:
<field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
and the "string" type is defined as:
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
When I do a facet query on the "cat" field, sorted by value (http://localhost:8983/solr/collection1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=cat&facet.sort=index), I get:
....
"facet_fields":{
"cat":[
"abc",1,
"management",1,
"manuka",1,
"mystery",1,
"mānuka",1,
"stuff",3]},
....
Note that mānuka comes after mystery. I'd like to have mānuka come after manuka and before stuff, that is, I'd like the sort to ignore diacritics including the macron.
If this was a non-facet search, it looks like I could achieve what I want by setting up Collation for a separate copy field and sort by that (I can't set up collation for the field itself because the stored data will be a binary representation of the collation key). However, it looks like this approach isn't possible for facet queries since they can only be sorted by index or count.
Am I overlooking something? Is there some trick to get this working in an environment where I do need to display the value of the "cat" field?
The question is about customizing the index-order of a facet.
Your suggestion is to use Collation. You can do this and the order of your facets will be correct. The problem is that neither CollationField nor ICUCollationField are overriding the indexedToReadable method.
The two classes cannot override indexedToReadable because in general the mapping from word to term is not invertible. But for your case possible you can implemenent a subclass of ICUCollationField which overrides indexedToReadable in a sencefull way.
Your starting point could be TestICUCollationField with
<fieldType name="sort_fr_t" class="solr.ICUCollationField" locale="fr" strength="primary"/>
...
<field name="sort_fr" type="sort_fr_t" indexed="true" stored="true" docValues="true" multiValued="true"/>
as you will see in this case the names of the facet values are very unreadable.

Declaring enumeration on elements in DTD [duplicate]

I'm constructing a DTD which has a fuel_system element.
I want to restrict the text between <fuel_system> tag. It must be only carbureted or fuel-injected. How can I do this?
I don't mention something like this = > attribute type (carbureted, fuel-injected), because I want to force this rule in <fuel_system> tags, not the attribute of fuel_system.
when defining an element in a DTD, there is no way to restrict the text inside the element. you can only tell what other element (child elements) it might contain and their order, or you can tell that the element contains text, or a mixture of the 2.
so, basically you have 2 options for restricting the <fuel-system>: either declare it as an attribute (<fuel-system type="fuel-injected"/>), or declare children elements <fuel-injected> and <carburated>. the choice between those 2 options depends on what you are trying to describe and what will change depending on the type of fuel-system.
(the grammar for the declaration of an element is defined here)
examples: first option, attributes
<!ELEMENT fuel-system EMPTY>
<!ATTLIST fuel-system (fuel-injected|carburated) #REQUIRED>
second option, child elements
<!ELEMENT fuel-system (fuel-injected|carburated)>
<!ELEMENT fuel-injected ...>
<!ELEMENT carburated ...>
Does it have to be a DTD? Is XML Schema an option?
Using XML Schema you can restrict element text to an enumerated list of values:
<xs:element name="fuel-system">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="fuel-injected"/>
<xs:enumeration value="carbourated"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

Resources