Cts search slow for one or two characters - performance

I have a query for fetching autosuggest from xml, it works fine for 3 or more characters.. but is extremely slow for 1 or 2 characters..
Here's the xquery which I'm using, any correction would be appreciated..
xquery version "1.0-ml";
declare default function namespace "local";
declare default collation "http://marklogic.com/collation/en/S1";
declare variable $search-term := xdmp:get-request-field("query", "b");
declare function getWildCardedTerm($term as xs:string) as xs:string*{
(fn:concat($search-term, "*"),fn:concat("* ",$search-term))
};
let $terms := getWildCardedTerm($search-term)
let $values := cts:search(//searchLabel,cts:and-query((cts:collection-query(("/collections/autosuggest")), cts:element-word-query(xs:QName("suggestion"),$terms))) )
let $suggestionsOrderedByLength := for $value in $values
order by fn:string-length($value//altLabel/text()), $value/altLabel/text()
return $value
return <suggestions>{$suggestionsOrderedByLength[1 to 10]}</suggestions>
Is there a way where to sort during search??

The Search Developer's Guide recommends combining the trailing wildcard searches with a word lexicon. You also have three/two/one character searches options you may not have enabled yet, but they will require substantial extra disk space.
But I'm afraid that MarkLogic indexes and lexicons are not particularly optimized for this use case. You can get frequency-order and item-order values directly from range indexes using cts:values, but not values ordered by length. If it would, you could have used that with a cts:index-order to apply sorting to your cts:search.
I would recommend taking a look at cts:value-match though, and apply that to your suggestion element. It will only require a range index, and no further settings. It will also not pull actual fragments from disk like cts:search, and run from in-memory data entirely. That will likely be faster..
HTH!

Related

I'm having trouble with the sort order in a kbmmemtable

I have a Delphi 10.1 Berlin application that is using a kbmMemTable 7.82 table. It is indexed on two fields: lastname and firstname.
My problem is that the order is wrong when an apostrophe is encountered. It may also happen with other characters, but I haven't tested it extensively.
Does anyone have an idea what might be wrong? Any help would be appreciated.
I've tried it on both Delphi 7 and Delphi 10.1 Berlin and get the same results. I've tried having the index be case-sensitive and case-insensitive. In all four cases, I get the same results. I've also tried creating the index programmatically and through the object inspector.
Here is the generic form of how I created the index programatically:
Table.Open;
IndexDef := Table.IndexDefs.AddIndexDef;
IndexDef.Name := IndexByName;
IndexDef.Fields := 'lastname;firstname';
IndexDef.Options := [ixCaseInsensitive];
Table.Indexes.Add(IndexDef);
Table.IndexName := IndexByName;
Table.IndexFieldNames := 'lastname;firstname';
Table.CreateIndexes;
Table.EnableIndexes := True;
Table.First;
The index puts O'HARA after OLSON, instead of before. This works the same with and without case sensitivity.
Its because the locale settings defines the order.
Try to set mtifoIgnoreLocale in the index definition, alternatively setup a locale that provides the correct sort order.
Then it will do a simple sort.
best regards
Kim/C4D

Compare index function with JavaScript function?

Is it possible to compare the function of an existing index with an ordinary native function in JavaScript?
For instance, I might create an index with the following code:
r.table('Table').indexCreate('index', document => document.hasFields('field'));
I might then, later, wish to determine whether the index has the same function:
document => document.hasFields('field')
Using indexStatus(), two properties may be able to help with this.
First, function is a Buffer representing the function, and can be compared with Buffers obtained from other indexStatus() objects. However, it's not clear how this could be compared with a native JavaScript function. A new index could be created, and then its Buffer compared with the Buffer of the existing index, but this would be a messy and performance-impacting workaround.
Second, query is a string containing a function resembling that which was provided to indexCreate(). However, this property seems to be undocumented, and the value is not always exactly the same as the function provided to indexCreate(), with changes to variable names and the transformation of arrow functions to function expressions. I've written some rough code which tries to work with this approach, although it's imperfect, given the opaque nature of the rules by which the query value is generated by RethinkDB.

How to ignore "stop words" while sorting in MarkLogic?

Is there any way to ignore "stop words" while sorting.
For example:
I have words like
dixit
singla
the marklogic
On sorting in descending order the result should be
singla, the marklogic, dixit
As in the above example the is ignored.
Any way to achieve this?
Update:
Stop word can occur at any place.
for example
the MarkLogic
MarkLogic is the best
the MarkLogic is awesome
while sorting should not consider any stop word in the text.
Above is just a small example to describe the problem.
In actual I am using search:search API.
For sorting, I am using sort-order search options.
The element on which I have to perform sorting is dynamic. There are approx 30-35 elements.
Is there any way to customize the collation at this level like to configure some words (stop words) which will be ignored while sorting.
There is no standard collation URI that is going to do this for you (at least none that I've ever seen). You can do it dynamically, of course, by sorting on the result of a function invocation, but if you want it done efficiently at scale (and available to search:search), then you need to materialize the sortable string into your document. I've often done this as an attribute on the element:
<title sortable="Great Gatsby, The">The Great Gatsby</title>
Then you put a range index on the title/#sortable attribute.
You can also use the "envelope pattern" where materialized metadata like this is maintained in its own section of the document with the original kept in its own section. For things like this, I think it's a bit more elegant to decorate the elements directly, to keep the context.
If I understand your question correctly you're trying to get rid of the definite article when sorting your result-set.
In order to do this you need to use some additional functions and create a 'sort' criteria. My solution would look like this (I'm also including some sample documents so that you can test this just by copy-pasting):
(:
xdmp:document-insert("/peter.xml", <person><firstName>Peter</firstName><lastName>O'Toole</lastName><age>60</age></person>);
xdmp:document-insert("/john.xml", <person><firstName>John</firstName><lastName>Adams</lastName><age>18</age></person>);
xdmp:document-insert("/simon.xml", <person><firstName>Simon</firstName><lastName>Petrov</lastName><age>22</age></person>);
xdmp:document-insert("/mark.xml", <person><firstName>Mark</firstName><lastName>the Lord</lastName><age>25</age></person>);
:)
for $person in /person
let $sort := fn:reverse(fn:tokenize($person/lastName, ' '))[1]
order by $sort
(: return $person :)
return $person/lastName/text()
Notice that now the sort order is going to be
- Adams
- the Lord
- O'Toole
- Petrov
I hope this will help.

Marklogic - how to execute insert and node replace in single xquery

I have provided sample Xquery for insert and node replace Query.
My problem is If I am going to Execute the Query , its only insert the XML first.
even though i am going to execute second time the node will be replaced.
But I want the single execution it will be processed insert and update as well.
sample query :
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
let $a := xdmp:document-insert("/fo.xml", <a>1</a>)
let $b := xdmp:node-replace(fn:doc("/fo.xml")/a, <b>1</b>)
return ($a,$b)
You cannot do that in a single statement. You either need to:
use a multi-statement call (main statements separated by a semi-colon)
do the update in-memory, and passing the updated content to the insert
Regarding 1:
The Application Developers Guide has a section about Semi-Colon as a Statement Separator, but in short it comes down to writing your query as follows:
xquery version "1.0-ml";
xdmp:document-insert("/fo.xml", <a>1</a>)
;
xquery version "1.0-ml";
xdmp:node-replace(fn:doc("/fo.xml")/a, <b>1</b>)
Regarding 2:
There are libraries that provide methods that operate very similar to the xdmp:node-* ones, but operate on content that has not yet been persisted to the database instead. You will need to download those first, and upload them to your modules database in order to use them. The best version I am aware of is https://github.com/ryanjdew/XQuery-XML-Memory-Operations. You can download and install that using MLPM. With that library you could write something like:
import module namespace mem = "http://maxdewpoint.blogspot.com/memory-operations" at "/ext/mlpm_modules/XQuery-XML-Memory-Operations/memory-operations.xqy";
let $a := <a>1</a>
let $b := <b>1</b>
let $updated-a := mem:copy($a) ! (
mem:replace(., $a, $b),
mem:execute(.)
)
return xdmp:document-insert("/foo.xml", $updated-a)
HTH!
Based on grtjn's answer you can do the following:
xdmp:document-insert("/fo.xml", <a>1</a>);
xdmp:node-replace(fn:doc("/fo.xml")/a, <b>1</b>);
Notice the semi-colon at the end of the lines. Those are statement separators. To understand more about statement separators (and transcations in general) you can refer to the following resource: https://docs.marklogic.com/guide/app-dev/transactions#id_11899.

Selecting multiple results from XQUERY query

I am trying to select multiple columns from a query, but so far, I can only manage to select one. So I'm basically stuck with either selecting one, or all of them.
Here's my expression, what I got so far, which select only (1) column:
let $y := doc("http://en.wikipedia.org/wiki/List_of_deaths_on_eight-thousanders")//table[preceding-sibling::h2//span[string() = "K2"]][1]
return $y/tr/td[2]/string()
I would love some explanation of how one would go about doing this, since there's almost no documentation of this lovely language.
How would you like the result to be returned? You could construct new elements, or concatenate strings. There are many ways that this could be accomplished.
Here's one way to get comma-separated values:
return $y/tr/fn:string-join( (td[2] | td[4]), ", " )
You can try it on zorb.io.
Update
(td[2] | td[4]) selects both elements, and passes them, as a sequence, to fn:string-join(). | is the XQuery union operator (and can be substituted for the keyword).
As far as documention, the functx site documents the standard library (all fn-prefixed functions), and has useful examples. And the specs are surprisingly readable.

Resources