Marklogic: Forwardslash in attribute - xpath

Some of my attributes have a forward-slash in the value. I have XQuery that attempts to match on the attribute. However, I was recently changing indexing options and now the XQuery won't match on any attributes containing the forward-slash. I don't know what index/setting that might have affected the comparison. Help!
Used to work, but not longer works:
fn:doc()//model[#id='model/books/20']
This works fine:
fn:doc()//model[#id='model1']

Looks like it was a wildcard option that was set.
UPDATE:
Due to downvotes, here is the explicit name of the setting that fixed the issue for me:
"trailing wildcard searches" set to "false"

Related

How do I escape the word "And" in Elasticsearch if I want to search by the literal "And"?

I'm trying to search over an index that includes constellation code names, and the code name for the Andromeda constellation is And.
Unfortunately, if I search using And, all results are returned. This is the only one that doesn't work, across dozens of constellation code names, and I assume it's because it's interpreted as the logical operator AND.
(constellation:(And)) returns my entire result set, regardless of the value of constellation.
Is there a way to fix this without doing tricks like indexing with an underscore in front?
Thanks!
I went for a bit of a hack, indexing the constellation as __Foo__ and then changing my search query accordingly by adding the __ prefix and suffix to the selected constellation.

Case insensitive substring LDAP search on OpenLDAP 2.4.33

The current question is not the same as this one.
I have an LDAP entry which the content "This is a SimpleTest indeed" in the "supName" field.
I need to write a filter so that when the user of my software introduces any substring of this content in any case (upper, lower or mixed case), it finds the entry. It must work even if the user does not input a complete word ("impletes", for example).
The supName field follows DirectoryString syntax. This means that the default matching rule is exact and case sensitive ("caseExactMatch"). But this syntax in theory, should allow also "caseIgnoreMatch" and "caseIgnoreSubstringsMatch" matching rules. I though I just needed to force to use the last one ("caseIgnoreSubstringsMatch"), so I tried this filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
But this does not work. I make my tests using Apache Directory Studio, and that tool refuses to accept the above filter. It complains on the asterisks, and I don't understand why, since I am using a Substring match (and thus asterisks should be allowed). If I run the filter from command line (using ldapsearch), I get this error message:
ldap_search_ext: Bad search filter (-7)
Therefore this is not an issue with Apache Directory Studio.
So my question is: What is the correct way of defining a case-insensitive substring filter on a field that is case-sensitive by default?
Futher tests:
What follows are some other filters I have tested, and the reasons they do not suit me.
Test #1 filters:
(supName=*impleTes*)
This operator (=) returns my test entry, but it's not case-insensitive. If I replace "impleTes" for "impletes" it does not return anything.
Test #2 filter:
(supName~=simpletest)
This operator (~=) works, but:
It needs a complete word. If I replace "simpletest" for "impletes" it does not return anything.
As it is an "approximate" search operator, it may return unwanted results. For example, the filter above returns also a second entry: "This is a SimpleTast indeed" (notice the "a" instead of "e" in "SimpleTast"). I don't want approximate results.
Test #3 filter:
(supName:caseIgnoreMatch:=this is a simpletest indeed)
This returns the entry I was expecting, and only that entry. It is also case-insensitive. But it forces the user to write the whole content of the field: it is not a substring search, but a case-insensitive exact-match search.
Test #4 filter:
(supName:caseIgnoreMatch:=*impletes*)
This returns a "Bad search filter (-7)" error, with is expected since I am not allowed to use substring syntax in an exact matching rule.
And finally, the Test #5 filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
Which I expected to work, but returns a "Bad search filter (-7)" error.
Additional info - Opposite example
I have found here (see the "Extensible Matching" section at the end) examples of the opposite case. In the example, the field "sn" uses the "caseIgnoreMatch" matching rule by default (making it case-insensitive). So what they want in the example is to do a case-sensitive substring search. This is the filter they use:
(sn:caseExactSubstringMatch:=*S*)
But I doubt if this example is correct, because if I try exactly the same filter on my side:
(supName:caseExactSubstringMatch:=*S*)
I get a "Bad search filter (-7)" error.
So maybe my issue is due to limitation on OpenLDAP 2.4.33 but would work with other LDAP servers, although the example comes from a guide that is supposed to cover OpenLDAP 2.x ... (?)
If I'm reading RFC 4515ยง3 correctly, an extensible match can only be done with an assertion value (read: a fixed string) and not with a substring filter. If that were permitted, I would expect your original example (supName:caseIgnoreSubstringsMatch:=*impletes*) to work.
Other option is
(Attribute~=value)
-- http://publib.boulder.ibm.com/tividd/td/IBMDS/IDSprogref52/en_US/HTML/progref.htm#HDRLDAPSRCH

Elasticsearch URI based query with AND operator

How do I specify AND operation in URI based query? I'm looking for something like:
http://localhost:9200/_search?q="profiletype:student AND username:s*"
For URI search in ElasticSearch, you can use either the AND operator profiletype:student AND username:s, as you say but without quotes:
_search?q=profiletype:student AND username:s*
You can also set the default operator to AND
_search?q=profiletype:student username:s*&default_operator=AND
Or you can use the + operator for terms that must be present, i.e. one would use +profiletype:student +username:s as query string. This doesn't work without URL encoding, though. In URL encoding + is %2Band space is %20, therefore the alternative would be
_search?q=%2Bprofiletype:student%20%2Busername:s*
According to documentation, it should work as you described it. See http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
That said, you can also use the following:
http://localhost:9200/_search?q="+profiletype:student +username:s*"
You can try bellow line.
http://localhost:9200/_search?q=profiletype:student%20AND%20username:s*
http://localhost:9200/_search?q=profiletype:student AND username:s*
i had to combine the default operator and + sign to make it work
curl -XGET 'localhost:9200/apm/inventory/_search?default_operator=AND&q=tenant_id:t2+_all:node_1'

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

Ignoring apostrophes in sphinx indexes

In my sphinx config file, I have the following:
ignore_chars: "U+0027"
charset_table: "0..9, a..z, _, A..Z->a..z, U+00C0->a, U+00C1->a,
U+00C2->a, U+00C3->a, U+00C4->a, U+00C5->a, U+00C7->c, U+00C8->e,
U+00C9->e, U+00CA->e, U+00CB->e, U+00CC->i, U+00CD->i, U+00CE->i [SNIP]"
(The charset_table entry is from here: http://speeple.com/unicode-maps.txt)
The expected result is that querying kyles will return all records matching kyles and/or kyle's, since I'm telling sphinx to exclude ' (single quote/apos) from the index (ab'cd -> abcd). However, in practice, this is not happening.
I believe adding it to the ignore_chars has the opposite of the desired effect. This is telling sphinx not to split on that character, but instead it will collapse the word around the characters to be ignored. So, kyle's will become kyles instead of kyle and s.
The solution I just tried for this issue that seems to have worked was to add s to my list of stopwords (might need 's in there also, can't remember). Sphinx seems to split kyle's up into the words kyle and 's. Because match all mode is on, some documents fail on the match for 's. Adding it to the stop words seems to have the desired effect.
It seems like the normal stemming should take care of this however, so maybe we're both doing something wrong...

Resources