By default xslt sorts numerically then alphabetically.
What if I want alphabetically first then numerically.
<root>
<item>B3</item>
<item>A1</item>
<item>C2</item>
<item>3B</item>
<item>2C</item>
<item>1A</item>
</root>
I'd want:
<root>
<item>A1</item>
<item>B3</item>
<item>C2</item>
<item>1A</item>
<item>2C</item>
<item>3B</item>
</root>
The thing is I don't know how long or how many letters numbers are in the names. It could be 1054-FS or C104-G. Also C20-H should comme before C101-H.
Is that something easy to achieve without knowledge of what will be pushed through ?
Thanks.
I think that what you are saying is that you want a collation sequence in which letters precede digits.
If you want C20 to precede C101 then you are also looking for a collation that groups consecutive digits and sorts them as numbers.
It's incorrect to say that XSLT (always) sorts digits before letters. The default collation depends on the XSLT processor you are using, it's not defined in the language specification.
In XSLT 3.0, and in Saxon 9.6, you can use the Unicode Collation Algorithm to achieve this effect. You would write
<xsl:sort select="..."
collation="http://www.w3.org/2013/collation/UCA?numeric=yes;reorder=Latn,digit"/>
(I haven't tested this specific example).
If you're using some other processor, you'll have to check its documentation to see what collation options it provides. You can sometimes play tricks for example by using <xsl:sort select="translate(....)"/> to compute a sort key that works the way you want.
Related
For reasons related to app functionality, we need to massage certain data incoming to a system by replacing an integer value with a fixed length decimal value
Example:
Before
<smile:ordinary code:type="Fields" code:value="25">
After
<smile:ordinary code:type="Fields" code:value="25.000000000">`
I had tried to used a sed command in place to replace with a regex group such as the one below
sed -i 's/\(ordinary.*"[0-9]\+\)/\1.000000000/'
This works fine but there's a file watcher that triggers when the file is modified and if it receives a well formatted file, it ends up adding an extra set of 0s
<smile:ordinary code:type="Fields" code:value="25.000000000.000000000">
I've also struggled to get this working with awk and printf but ideally, i'd replace the integer strictly with a decimal. I've considered using an xsl filter transform as well but I'm not quite as well versed there as with shell commands. I'm open to all suggestions including possibly writing a shell script to loop through each line I guess.
Very easily done in XSLT. It just needs a stylesheet with two rules, the standard identity template that copies elements unchanged by default plus a rule
<xsl:template match="smile:ordinary/#code:value">
<xsl:attribute name="code:value">
<xsl:value-of select="format-number(., '#.000000000')"/>
</xsl:attribute>
</xsl:template>
Plus the required namespace declarations, of course.
Is there a way to generate a random number with xpath? My input is any well-formed xml, the output should be a random integer of a given length.
I usually achieve it with any coding or xslt but I'm struggling to find a working xpath expression.
XPath 3.1 has a function fn:random-number-generator().
In earlier XPath versions you'll need to improvise.
When asking XPath questions please say which version you are using - the ancient XPath 1.0 is still in widespread use so it's impossible to make guesses.
Since this question is the top google result, I'll say in XPath 3.1, Mr. Kay's answer can be achieved like this:
Say you want a random element. Count the elements:
<xsl:variable name="random-upper-limit" select="count(/myroot/mypath/myelement)"/>
Then get a random index number:
<xsl:variable name="my-random-number" select="random-number-generator()['next']?permute(1 to $random-upper-limit)[1]"/>
I've got a massive file of hex encoded MD5 values that I'm using linux 'sort' utility to sort. The result is that the hashes come out in sequential order (which is what I need for the next stage of processing). E.g:
000001C35AE83CEFE245D255FFC4CE11
000003E4B110FE637E0B4172B386ACAC
000004AAD0EB3D896B654A960B0111FA
In the interest of speeding up the sort operation (and making the files smaller), I was considering encoding the data as base32 or base64.
The question is, would an alpha-sort of the base32/64 data get me the same result? My quick tests seem to indicate that it would work. For example, the above three hex strings correspond 1:1 to these base64 strings:
AAABw1roPO/iRdJV/8TOEQ==
AAAD5LEQ/mN+C0Fys4asrA==
AAAEqtDrPYlrZUqWCwER+g==
But I'm unsure as to the sort order when it comes to special characters used in Base64 like "/" and "+" and how those would be treated in the context of an alpha sort.
Note: I happen to be using the linux sort utility but the question still applies to other alpha-sorting tools. The tool used is not really part of the question.
I've since discovered that this isn't possible with the standard base32/64 implementations. There exists however a base32 variation called "base32hex" which preserves sort ordering, but there is no official "base64hex" equivalent.
Looks like that leaves creating a custom encoding like this.
EDIT:
This turned out to be very trivial to solve. Simply encode in base 64 then translate character to character with a custom table of characters that respects sort order.
Simply map from the standard Mime 64 characters:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
To something like this:
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz|~"
Then sorting will work.
Is anyone aware of sorting logic in FAST ESP engine ver. 5.3? How special characters are handled and how sorting of Japanese and Chinese words is performed?
Here are top 8 of search results which were sorted in ascending order:
門
¿ c
¿ c¡a «n »c ‹e ›r § ¶~#15
¿ c¡a «n »c ‹e ›r § ¶~#44
¿ c¡a «n »c ‹e ›r § ¶~#45
§ word document4
門 他の他の
門 他の他の 2
Does it mean that 門 character is omitted from sorting scope?
And these are top 10 of search results sorted in descending order:
他の門そ他の門
の他
他の
そ他の門そ他の
そ他の門門門
そ他他そ
そ
そ他
СЌРЅРІР»гЃќд»
марцпиорыв
It appears that last two results with Cyrillic symbols are handled correctly but then ambiguity is observed when そ result is put between そ他 and そ他他そ.
Sorting is handled in alphabetical order in Latin languages and Greek, but in the case of JKC languages, you need to set up properly the document configuration to be able to handle those languages. Also you need to install the tokenization for those languages too. Microsoft provides the patches to include tokenization and dictionary for each of those languages. I think that would be really useful to verify that the search engine and documents in the collection are properly configured.
I am trying to get a xpath query using the xpath function lower-case or upper-case, but they seem to not work in selenium (where I test my xpath before I apply it).
Example that does NOT work:
//*[.=upper-case('some text')]
I have no problem locating the nodes I need in complex path and even using aggregated functions, as long as I don't use the upper and lower case.
Has anyone encountered this before? Does it make sense?
Thanks.
upper-case() and lower-case() are XPath 2.0 functions. Chances are your platform supports XPath 1.0 only.
Try:
translate('some text','abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')
which is the XPath 1.0 way to do it. Unfortunately, this requires knowledge of the alphabet the text uses. For plain English, the above probably works, but if you expect accented characters, make sure you add them to the list.
In most environments you are using XPath out of a host language of some sort, and can use the host language's capabilities to work around this XPath 1.0 limitation by externally providing upper- and lower-case variants of the search string to translate().
Shown on the example of Python:
search = 'Some Text'
lc = search.lower()
uc = search.upper()
xpath = f"//p[contains(translate(., '{lc}', '{uc}'), '{uc}')]"
This would produce the following XPath expression:
//p[contains(translate(., 'some text', 'SOME TEXT'), 'SOME TEXT')]
which searches case-insensitively and works for arbitrary search text.
If you are going to need upper case in multiple places in your xslt, you can define variables for the lower case and upper case and then use them in your translate function everywhere. It should make your xslt much cleaner.
Example at XSL/XPATH : No upper-case function in MSXML 4.0 ?