in my index i have a multivalued doc field named "counter". it stores values like counter:[5, 12, 75, ...] and so on.
if i try to retrieve only docs where the min value of this field is greater than another value i cant get the expected result. i tried with the following filter query:
field(counter, min)\:[10 TO *]
this is the field type used and the field declaration in schema.xml:
<fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
<field name="counter" type="integer" indexed="true" stored="true" docValues="true" multiValued="true"/>
there are no errors but not the expected result, unfortunately. thanks in advance.
You can use the frange query parser to retrieve documents that match a range of values returned by a function:
fq={!frange l=10}field(counter,min)
The FunctionRangeQParser extends the QParserPlugin and creates a range query over a function. This is also referred to as frange, as seen in the examples below.
Other parameters:
l, The lower bound, optional
u, The upper bound, optional
incl, Include the lower bound: true/false, optional, default=true
incu, Include the upper bound: true/false, optional, default=true
Related
For example I have an XML element:
<input id="optSmsCode" type="tel" name="otp" placeholder="SMS-code">
Suppose I know that somewhere there must be an attribute with otp value, but I don’t know in what attribute it can be, respectively, is it possible to have an XPath expression of type like this:
.//input[(contains(*, "otp")) or (contains(*, "ode"))]
Try it like this and see if it works:
one = '//input/#*[(contains(.,"otp") or contains(.,"ode"))]/..'
print(driver.find_elements_by_xpath(one))
Edit:
The contains() function has a required cardinality of first argument of either one or zero. In plain(ish) English, it means you can check only one element at a time to see if it contains the target string.
So, the expression above goes through each attribute of input separately (/#*), checks if the attribute value of that specific attribute contains within it the target string and - if target is found - goes up to the parent of that attribute (/..) which, in the case of an attribute, is the node itself (input).
This XPath expression selects all <input> elements that have some attribute, whose string value contains "otp" or "ode". Notice that there is no need to "go up to the parent ..."
//input[#*[contains(., 'otp') or contains(., 'ode')]]
If we know that "otp" or "ode" must be the whole value of the attribute (not just a substring of the value), then this expression is stricter and more efficient to evaluate:
//input[#*[. ='otp' or . = 'ode']]
In this latter case ("otp" or "ode" are the whole value of the attribute), if we have to compare against many values then an XPath expression of the above form will quickly become too long. There is a way to simplify such long expression and do just a single comparison:
//input[#*[contains('|s1|s2|s3|s4|s5|', concat('|', ., '|'))]]
The above expression selects all input elements in the document, that have at least one attribute whose value is one of the strings "s1", "s2", "s3", "s4" or "s5".
Is it possible to do a search for a key words in an exist-db using xquery?
I've tried using
//foo//#val[. &= $param]
But this returns an error because this isn't supported with my version of exist-db (1.4.2)
What is the best way to do a search over a number of nodes?
<xml>
<foo #val='test1'>
<bar #val='test2'>
<thunk #val='test3'/>
</bar>
</foo>
So with my example XML, how can I do
let $result :=
if //xml/foo[contains(#val,$param)] or
//xml/foo/bar[contains(#val,$param)] or
//xml/foo/bar/thunk[contains(#val,$param)]
return $result
Either of these should work:
//foo//#val[contains(.,$param)]
//foo//#val[. eq $param]
However, there are obviously issues to consider when using contains() instead of equals. Also, if the paths will always be constrained as you describe in your example, and you are only checking to see if any of those are true (as opposed to actually getting all the elements), then this should be a faster and more efficient query:
((//xml/foo[#val eq $param])[1] or (//xml/foo/bar[#val eq $param])[1] or (//xml/foo/bar/thunk[#val eq $param])[1])
Untested, but the [1] should short-circuit the xpath evaluator after it gets its first result from the query, and the ORs should short-circuit the expression when any one of them returns a value.
I want to limit number of result I receive from xpath query.
For example:-
$info = $xml->xpath("//*[firstname='Sheila'] **LIMIT 0,100**");
You see that LIMIT 0,100.
You should be able to use "//*[firstname='Sheila' and position() <= 100]"
Edit:
Given the following XML:
<root>
<country.php desc="country.php" language="fr|pt|en|in" editable="Yes">
<en/>
<in>
<cityList desc="cityList" language="in" editable="Yes" type="Array" index="No">
<element0>Abu</element0>
<element1>Agartala</element1>
<element2>Agra</element2>
<element3>Ahmedabad</element3>
<element4> Ahmednagar</element4>
<element5>Aizwal</element5>
<element150>abcd</element150>
</cityList>
</in>
</country.php>
</root>
You can use the following XPath to get the first three cities:
//cityList/*[position()<=3]
Results:
Node element0 Abu
Node element1 Agartala
Node element2 Agra
If you want to limit this to nodes that start with element:
//cityList/*[substring(name(), 1, 7) = 'element' and position()<=3]
Note that this latter example works because you're selecting all the child nodes of cityList, so in this case Position() works to limit the results as expected. If there was a mix of other node names under the cityList node, you'd get undesirable results.
For example, changing the XML as follows:
<root>
<country.php desc="country.php" language="fr|pt|en|in" editable="Yes">
<en/>
<in>
<cityList desc="cityList" language="in" editable="Yes" type="Array" index="No">
<element0>Abu</element0>
<dog>Agartala</dog>
<cat>Agra</cat>
<element3>Ahmedabad</element3>
<element4> Ahmednagar</element4>
<element5>Aizwal</element5>
<element150>abcd</element150>
</cityList>
</in>
</country.php>
</root>
and using the above XPath expression, we now get
Node element0 Abu
Note that we're losing the second and third results, because the position() function is evaluating at a higher order of precedence - the same as requesting "give me the first three nodes, now out of those give me all the nodes that start with 'element'".
Ran into the same issue myself and had some issue with Geoffs answer as it, as he clearly describes, limits the number of elements returned before it performs the other parts of the query due to precedence.
My solution is to add the position() < 10 as an additional conditional after my other conditions have been applied e.g.:
//ElementsIWant[./ChildElementToFilterOn='ValueToSearchFor'][position() <= 10]/.
Notice that I'm using two separate conditional blocks.
This will first filter out elements that live up to my condition and secondly only take 10 of those.
I'm using the NGramFilterFactory for indexing and querying.
So if I'm searching for "overflow" it creates an query like this:
mySearchField:"ov ve ... erflow overflo verflow overflow"
But if I misspell "overflow", i.e. "owerflow" there are no matches, because the quotes around the query:
mySearchField:"ow we ... erflow owerflo werflow owerflow"
Is it possible to tokenize the result of the NGramFilteFactory, that it'll creates an query like this:
mySearchField:"ow"
mySearchField:"we"
mySearchField:"erflow"
mySearchField:"owerflo"
mySearchField:"werflow"
mySearchField:"owerflow"
In this case solr would also find results, because the token "erflow" exists.
You don't need to tokenize your query like you wrote. Check if in your schema.xml you have the NGramFilterFactory applied at both index time and query time.
Then, the query parser you're using makes the difference. With LuceneQParser you'd get the result you're looking for, but not with DisMax and eDisMax.
I checked the query mySearchField:owerflow with eDisMax and debugQuery=on:
<str name="querystring">text:owerflow</str>
<str name="parsedquery">
+((text:o text:w text:e text:r text:f text:l text:o text:w text:ow text:we text:er text:rf text:fl text:lo text:ow text:owe text:wer text:erf text:rfl text:flo text:low text:ower text:werf text:erfl text:rflo text:flow text:owerf text:werfl text:erflo text:rflow text:owerfl text:werflo text:erflow text:owerflo text:werflow text:owerflow)~36)
</str>
If you look at the end of the generated query you'll see ~36 where 36 is the number of n-grams generated from your query. You don't get any results because of that ~36, but you can change it through the mm parameter, which is the minimum should match.
If you change the query to mySearchField:owerflow&mm=1 or a value lower than 25 you'll have the result you're looking for.
The difference between this answer and yours is that with EdgeNGramFilterFactory an infix query like mySearchField:werflow doesn't return any result, while it does with NGramFilterFactory.
Anyway, If you're using the NGramFilterFactory for making spelling correction, I'd strongly recommend to have a look at the SpellCheckComponent as well, made exactly for that purpose.
OK, I found a quick and easy way to solve the problem.
The fieldType has an optional attribute autoGeneratePhraseQueries (Default=true). If I set autoGeneratePhraseQueries to false, everything works fine.
Explanation:
fieldType used in schema.xml:
<fieldType name="edgytext" class="solr.TextField" autoGeneratePhraseQueries="false">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhiteSpaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
</fieldType>
If you are indexing the word "surprise", following tokens are in the index:
s, su, ,sur, surp, surpr, surpri, surpris, surprise
If you are search for "surpriese" (misspelled) solr creates following tokens (matching tokens are bold):
s, su, sur, surp, surpr, surpri, surprie, surpries, surpriese
The real query which will be created looks like:
mySearchField:s, mySearchField:su, mySearchField:sup .. and so on
But if you set autoGeneratePhraseQueries=true following query will be created:
mySearchField:"s su surp supr surprie surpries surpriese"
This is an phrase query and does not match the indexed terms.
Given this xml:
<mets:techMD ID="techMD014">
<mets:mdWrap MDTYPE="PREMIS:OBJECT">
<mets:xmlData>
<premis:object
xsi:type="premis:file"
xsi:schemaLocation="info:lc/xmlns/premis-v2
http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd">
<premis:objectIdentifier>
<premis:objectIdentifierType
>filepath</premis:objectIdentifierType>
<premis:objectIdentifierValue
>bib1234_yyyymmdd_99_x_performance.xml</premis:objectIdentifierValue>
</premis:objectIdentifier>
</premis:object>
</mets:xmlData>
</mets:mdWrap>
</mets:techMD>
<mets:techMD ID="techMD015">
<mets:mdWrap MDTYPE="PREMIS:OBJECT">
<mets:xmlData>
<premis:object
xsi:type="premis:representation"
xsi:schemaLocation="info:lc/xmlns/premis-v2
http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd">
<premis:objectIdentifier>
<premis:objectIdentifierType
>local</premis:objectIdentifierType>
<premis:objectIdentifierValue
>bib1234_yyyymmdd_99_x</premis:objectIdentifierValue>
</premis:objectIdentifier>
</premis:object>
</mets:xmlData>
</mets:mdWrap>
</mets:techMD>
I would like to make a xpath query that takes both index and attribute into account.
I.e can I combine these two into ONE query? (Its the stuff around the "object" element Im interested in):
//techMD/mdWrap[
#MDTYPE=\'PREMIS:OBJECT\'
]/xmlData//object[1]/objectIdentifier/objectIdentifierValue
//techMD/mdWrap[
#MDTYPE=\'PREMIS:OBJECT\'
]/xmlData//object[
#xsi:type=\'premis:file\'
]/objectIdentifier/objectIdentifierValue
Thanks!
Just replace according part to:
object[#xsi:type='premis:file'][1]
if you want first object of those who have a given xsi:type value or
object[1][#xsi:type='premis:file']
if you want the first object, providing it has a given xsi:type value.