Implementing natural sort in XQuery on MarkLogic

Implementing natural sort in XQuery on MarkLogic - xpath

I wasn't finding any readily-available natural sort implementations in XQuery and thought I'd give it a shot. Using stuff found here as a reference, this is what I came up with:
xquery version "1.0-ml";
declare namespace xf = "http://www.w3.org/2005/xpath-functions";
declare namespace nat-sort = "https://github.com/JKesMc9tqIQe9M/xquery-natural-sort";
import module namespace functx = "http://www.functx.com" at "/MarkLogic/functx/functx-1.0-nodoc-2007-01.xqy";
declare %public function nat-sort:nat-sort($s as item()*) as item()* {
let $pad-length := fn:max($s ! fn:analyze-string(., '([0-9]+)')/xf:match ! fn:string-length(.)) + 1
for $item in $s
let $sort-key := fn:string-join(
let $decomposed := fn:analyze-string($item, '([0-9]+)')/xf:*
for $i in $decomposed
return
if ($i/self::xf:match)
then fn:string(functx:pad-integer-to-length(fn:number($i), $pad-length))
else fn:string($i),
'')
order by $sort-key
return $item
};
It seems to work, at least with the few test cases I have constructed, but I'm wondering if (1) there's a more concise way to do it (2) there's an existing/standard way to do it I'm unaware of (3) there are obvious problems with this approach. Thanks!

The set of collations available in XQuery depends on your XQuery implementation. You can choose a collation as an option in the "order by" clause. XQuery 3.1 brings support for the Unicode Collation Algorithm, which allows you to request the sort order you are asking for using the collation URI
http://www.w3.org/2013/collation/UCA?numeric=yes
but with XQuery 1.0 or 3.0 implementations, you'll have to check the documentation for your chosen product. Saxon has an alphanumeric collation which fits the bill, but I don't know about MarkLogic.

Related

Enumeration (enum in lua) .Want to use enum in lua5.2.4

I have a condition where in my lua script I want to use enum like for SUCCESS I can give 1 and for FAILURE I can give 0
I am using lua version 5.2.4
Can anyone please help me on how to use enum
I want to use enum
elseif(cc_config_cmd == "DELETE" and file_found==1)then
api:executeString("callcenter_config queue unload " .. queue_name)
stream:write("1")
else
stream:write("0")
end

There are no enums in Lua.
Simply define variables.
SUCCESS = "1"
FAILURE = "0"
stream:write(SUCCESS)
Or put it into a table which would be quite similar to enum style syntax.
Result = {SUCCESS = "1", FAILURE = "0"}
stream:write(Result.SUCCESS)

As far as I know, there is no enums in Lua, you can use strings such as your current code. The strings will be interned inside the Lua Virtual Machine, so in the memory the strings will not be duplicated.
Another option will be to use numbers in place of strings.
local COMMAND_DELETE = 1
local COMMAND_TEST_1 = 2
local COMMAND_TEST_2 = 3
Other options would be to use a third-party package such as the enum package or maybe go further and use a Lua Preprocessor

XQuery/Xpath referring to xml elements with no namespace, in a namespace environment

In Xquery 3.1 (under eXist-DB 4.7) I receive xml data like this, with no namespace:
<edit-request id="TC9999">
<title-collection>foocolltitle</title-collection>
<title-exempla>fooextitle</title-exempla>
<title-short>fooshorttitle</title-short>
</edit-request>
This is assigned to a variable $content and this statement:
let $collid := $content/edit-request/#id
...correctly returns: TC9999
Now, I need to actually transform all the data in $content into a TEI xml document.
I first need to get some info from an existing TEI file, so I assigned another variable:
let $oldcontent := doc(concat($globalvar:URIdata,$collid,"/",$collid,".xml"))
And then I create the new TEI document, referring to both $content and $oldcontent:
let $xml := <listBibl xmlns="http://www.tei-c.org/ns/1.0"
type="collection"
xml:id="{$collid}">
<bibl>
<idno type="old_sql_id">{$oldcontent//tei:idno[#type="old_sql_id"]/text()}</idno>
<title type="collection">{$content//title-exempla/text()}</title>
</bibl>
</listBibl>
The references to the TEI namespace in $oldcontent come through, but to my surprise the references to $content (no namespace) don't show up:
<listBibl xmlns="http://www.tei-c.org/ns/1.0"
type="collection"
xml:id="TC9999">
<bibl>
<idno type="old_sql_id">1</idno>
<title type="collection"/>
</bibl>
</listBibl>
The question is: how do I refer to the non-namespace elements in $content in the context of let $xml=...?
Nb: the Xquery document has a declaration at the top (as it is the principle namespace of virtually all the documents):
declare namespace tei = "http://www.tei-c.org/ns/1.0";

In essence you are asking how to write an XPath expression to select nodes in an empty namespace in a context where the default element namespace is non-empty. One of the most direct solutions is to use the "URI plus local-name syntax" for writing QNames. Here is an example:
xquery version "3.1";
let $x := <x><y>Jbrehr</y></x>
return
<p xmlns="foo">Hey there,
{ $x/Q{}y => string() }!</p>
If instead of $x/Q{}y the example had used the more common form of the path expression, $x/y, its result would have been an empty sequence, since the local name y used to select the <y> element specifies no namespace and thus inherits the foo element namespace from its context. By using the "URI plus local-name syntax", though, we are able to specify the empty namespace we are looking for.
For more information on this, see the XPath 3.1 specification's discussion of expanded QNames: https://www.w3.org/TR/xpath-31/#doc-xpath31-EQName.

Sitecore item multilistfield XPATH builder

I'm trying to count with XPATH Builder in Sitecore, the number of items which have more than 5 values in a multilist field.
I cannot count the number of "|" from raw values, so I can say I am stuck.
Any info will be helpful.
Thank you.

It's been a long time since I used XPath in Sitecore - so I may have forgotten something important - but:
Sadly, I don't think this is possible. XPath Builder doesn't really run proper XPath. It understands a subset of things that would evaluate correctly in a full XPath parser.
One of the things it can't do (on the v8-initial-release instance I have to hand) is be able to process XPath that returns things that are not Sitecore Items. A query like count(/sitecore/content/*) should return a number - but if you try to run that using either the Sitecore Query syntax, or the XPath syntax options you get an error:
If you could run such a query, then your answer would be based on an expression like this, to perform the count of GUIDs referenced by a specific field:
string-length( translate(/yourNodePath/#yourFieldName, "abcdefg1234567890{}-", "") ) + 1
(Typed from memory, as I can't run a test - so may not be entirely correct)
The translate() function replaces any character in the first string with the relevant character in the second. Hence (if I've typed it correctly) that expression should remove all your GUIDs and just leave the pipe-separator characters. Hence one plus the length of the remaining string is your answer for each Item you need to process.
But, as I say, I don't think you can actually run that from Query Builder...
These days, people tend to use Sitecore PowerShell Extensions to write ad-hoc queries like this. It's much more flexible and powerful - so if you can use that, I'd recommend it.
Edited to add: This question got a bit stuck in my head - so if you are able to use PowerShell, here's how you might do it:
Assuming you have declared where you're searching, what MultiList field you're querying, and what number of selections Items must exceed:
$root = "/sitecore/content/Root"
$field = "MultiListField"
$targetNumber = 3
then the "easy to read" code might look like this:
foreach($item in Get-ChildItem $root)
{
$currentField = Get-ItemField $item -ReturnType Field -Name $field
if($currentField)
{
$count = $currentField.Value.Split('|').Count
if($count -gt $targetNumber)
{
$item.Paths.Path
}
}
}
It iterates the children of the root item you specified, and gets the contents of your field. If that field name had a value, it then splits that into GUIDs and counts them. If the result of that count is greater than your threshold it returns the item's URI.
You can get the same answer out of a (harder to read) one-liner, which would look something like:
Get-ChildItem $root | Select-Object Paths, #{ Name="FieldCount"; Expression={ Get-ItemField $_ -ReturnType Field -Name $field | % { $_.Value.Split('|').Count } } } | Where-Object { $_.FieldCount -gt $targetNumber } | % { $_.Paths.Path }
(Not sure if that's the best way to write that - I'm no expert at PowerShell syntax - but it gives the same results as far as I can see)

MarkLogic: how to search for an XPath using cts:uris

Is it possible to search for a uri whose document contains a certain XPath using cts:uris()? I thought it may be quicker than returning uris from a cts:search. Here is what I have currently:
declare function local:xpath-search($collection) {
for $i in cts:search(//a/b, cts:and-query((cts:collection-query($collection)) ))[1] return fn:base-uri($i)
} ;
Is there a quicker way to return documents that contain a match to the XPath //a/b, using cts:uris()?

You can use cts:element-query() to construct a cts:query that functions similar to the XPath expression //a/b searching for documents that have a elements that have b element descendants. It isn't exactly the same, and might give you some false positives, because it is really more akin to //a//b, but might be acceptable and can be used with cts:uris().
xquery version "1.0-ml";
declare function local:xpath-search($collection) {
cts:uris("", (),
cts:and-query((
cts:collection-query($collection),
cts:element-query(xs:QName("a"),
cts:element-query(xs:QName("b"), cts:and-query(()) ) ) )) )
};

CAS registry to Pubchem cid identifier conversion in R

An annoying problem many chemists are faced with is to convert CAS registry numbers of chemical compounds (stored in some commercial database that is not readily accessible) to Pubchem identifiers (openly available). Pubchem kind of supports conversion between the two, but only through their manual web interface, and not their official PUG REST programmatic interface.
A solution in Ruby is given here, based on the e-utilities interface: http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby/
Does anybody know how this would translate into R?
EDIT: based on the answerbelow, the most elegant solution is:
library(XML)
library(RCurl)
CAStocids=function(query) {
xmlresponse = xmlParse( getURL(paste("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query,sep="") ) )
cids = sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
return(cids)
}
> CAStocids("64318-79-2")
[1] "6434870" "5282237"
cheers,
Tom

This how the Ruby code does it, translated to R, uses RCurl and XML:
> xmlresponse = xmlParse( getURL("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=64318-79-2") )
and here's how to extract the Id nodes:
> sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
[1] "6434870" "5282237"
wrap all that in a function....
convertU = function(query){
xmlresponse = xmlParse(getURL(
paste0("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query)))
sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
}
> convertU("64318-79-2")
[1] "6434870" "5282237"
> convertU("64318-79-1")
list()
> convertU("64318-78-2")
list()
> convertU("64313-78-2")
[1] "313"
maybe needs a test if not found.

I think you should still be able to convert CAS numbers to PubChem ID's using the PUG where instead of the name of the compound you enter the CAS number. Of course this might not be as specific if the CAS numbers overlap. I haven't tested it.
An example with aspirin
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Implementing natural sort in XQuery on MarkLogic - xpath

Related

Enumeration (enum in lua) .Want to use enum in lua5.2.4

XQuery/Xpath referring to xml elements with no namespace, in a namespace environment

Sitecore item multilistfield XPATH builder

MarkLogic: how to search for an XPath using cts:uris

CAS registry to Pubchem cid identifier conversion in R

Categories

Resources