Allowing quotes in search term solrnet - solrnet

I've come across a problem when users are trying to search for terms with multiple works wrapped in quotes. I'm using SolrNet to query Solr.
For example, if the user searches for "fertilizer", I get back the correct results. However searching for "fertilizer report" brings back no results.
The only code that I can see that actually queries solr is shown below, where term is the search term, and qo are my query options that contain a start point and number of rows.
_solr.Query(string.IsNullOrEmpty(term) ? SolrQuery.All : new SolrQuery(term), qo);
I've had a look at article online, see here that mention using Quoted, is this the correct way to go about doing this? Or is this something that should be configured in my schema/solr config files?
UPDATE 1
Further to Mauricio's comment, I've added part of my solr config xml below that shows how I'm using dismax:
<requestHandler name="select" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<int name="rows">10</int>
<str name="df">title</str>
<str name="qf">title^1000 text^1 item_type_search^0.1 category_search^0.1 topic_search^0.1</str>
<str name="pf">title^1000 text^1 item_type_search^0.1 category_search^0.1 topic_search^0.1</str>
<str name="facet">true</str>
<str name="facet.mincount">1</str>
<str name="facet.field">category_id</str>
<str name="facet.field">item_type</str>
<str name="facet.field">topic_id</str>
<str name="facet.date">sort_date</str>
<str name="f.sort_date.facet.date.start">2000-01-01T00:00:00.00Z</str>
<str name="f.sort_date.facet.date.end">2050-01-01T00:00:00.00Z</str>
<str name="f.sort_date.facet.date.gap">+1YEAR</str>
</lst>
Looking at the link here some of the parameters qf, df and pf I am using dismax.

Related

Lucene search on contents of several tags (eXist-db)

I have the following lucene indices in collection.xconf
<lucene>
<text qname="tei:text" />
<text qname="tei:summary"/>
<text qname="tei:placeName"/>
</lucene>
My xquery code then makes the following query:
let $query-results := ($documents[.//tei:text[ft:query(., $q)]],
$documents[.//tei:summary[ft:query(., $q)]],
$documents[.//tei:placeName[ft:query(., $q)]])
Now I want to, say, search on:
-pommern erik
Now all the documents in query-results have the word "erik" but not the
word "pommern" in the contents of the tei:text tag.
But some of the results have both "erik" and "pommern" in the
tei:summary tag.
How do I make an xpath so that the user can search across all three tags
with lucene syntax (so that the contents of the three tags appears as
one text)?
At the moment you are doing a union of the results, you need a intersection instead.
To do this you really need to query against the parent and and the predicates together, something like:
$documents[ft:query(.//tei:text, $q)][ft:query(.//tei:summary, $q)][ft:query(.//tei:placeName, $q)]

Trouble omitting punctuation from Solr Alphabetic Range query search string

I am trying to create a alphabetic browse of names (personal and institutional) using range queries that will sort without regard to punctuation or capitalization, but even though the analysis tool in Solr suggests that punctuation in queries should be stripped out correctly, the presence of punctuation in the query still negatively affects the results.
from schema.xml:
<fieldType name="sort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="-" replacement=" "/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[\p{Punct}¿¡「」]" replacement=""/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
<field name="authorSort" type="sort" indexed="true" stored="true" multiValued="false" required="true"/>
from solrconfig.xml:
<requestHandler name="/authors" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">lucene</str>
<str name="echoParams">explicit</str>
<str name="fl">*</str>
<str name="df">authorSort</str>
<str name="sort">authorSort asc</str>
<str name="rows">20</str>
<str name="wt">ruby</str>
<str name="indent">true</str>
</lst>
</requestHandler>
My actual queries look like this:
http://myserver/solr/testCore/authors?q=["Search String" TO *]
When I search for q=["ACA" TO *], my top result is "ACA (Academy of Certified Archivists)", which is good. If I vary the capitalization used in "ACA" my results don't change, which is also good. If I search for the acronym with periods (q=["A.C.A." TO *]) I don't get the appropriate results at all, and my top hit is "A3 (Musical group)". In this case I suspect that it's sorting on the period rather than dropping it.
According to the analysis tool in Solr, both "ACA" and "A.C.A." should be rendered down to "aca" using the analyzer I've configured. I'm at a loss to explain why these two searches aren't effectively equivalent.
(If it makes any difference, the index-time analysis is effectively useless as my code is doing the same conversion before submitting the data to be indexed. There are reasons for that. So it's only the query-time analysis that is giving me grief.)
Edit: Here's a screenshot of how my analysis of "A.C.A." as a query should be working (according to the Solr analysis tool).
Added about four months later:
Since posting the question and not finding a resolution, I have
switched to using a custom filter factory for the analysis. This gave
me control over the analysis that would have been difficult or
impossible given the provided filters. My first attempt had the same
problem - the analysis worked in regular search but wasn't applied in
range queries. This problem was resolved by adding
implements MultiTermAwareComponent to my filter factory and overriding
getMultiTermComponent(). I have no idea what this does for a field
which is using the KeywordTokenizer and therefore never has multiple
terms in a field value... but it did fix the problem. This was for
Solr 4.2.

How to get two or more words as suggestions from solr

I've got this problem that I can't solve. Partly because I can't explain it with the right terms. I'm new to this so sorry for this clumsy question.
Below you can see an overview of my goal.
I'm using Magento CE 1.7.0.2 & Solr 4.6.0.
Here you can see my goal..
my config.xml
<searchComponent class="solr.SpellCheckComponent" name="suggester">
<lst name="spellchecker">
<str name="name">suggester</str>
<str name="field">didyoumean</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggester">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggester</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggester</str>
</arr>
</requestHandler>
its working fine But the problem is its giving only one word suggestions...
If i'm searching for suggestions for this word me its giving the following suggestions,
http://127.0.0.1:8080/solr/collection1/suggester?q=me&wt=json&indent=true
{
"responseHeader":{
"status":0,
"QTime":3},
"spellcheck":{
"suggestions":[
"me",{
"numFound":6,
"startOffset":0,
"endOffset":2,
"suggestion":["men's",
"mens",
"mephisto",
"men",
"merrell",
"mesh"]},
"collation","men's"]}}
& now i'm searching for the suggestions with "adidas bla" that should give adidas black like this combination of 2 & more words suggestions also
Please check here once
here i'm getting the products, like that only i want to get the suggestions also i tried like this also but its not giving like that.
Any ideas ?
Check solr API documentation.It has a 'term' handler that can be useful in your case.It can be used as solr_url/terms?q=abc.This will suggest you all the terms related to your query.For more information check this out http://wiki.apache.org/solr/TermsComponentSolr terms component

Solr equivalent of SQL BETWEEN for numeric fields

Hoping for some pointers to speed up some (very) slow solr queries in version 3.4.0.
I have an index of about 6-million documents. Each document is quite small, and contains two solr.TrieDoubleFields; "start" and "end".
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
----
<field name="start" type="double" indexed="true" stored="false" />
<field name="end" type="double" indexed="true" stored="false" />
When querying I need to perform the SQL equivalent of:
WHERE #input BETWEEN Start AND End
To do this I'm writing my query as:
start:[* TO #input] AND end:[#input TO *]
The query succeeds, returning the correct document, but with a QTime of ~4,500; most other queries are well below 100.
What can be modified to improve performance?
I believe that you should try the Function Range (frange) query parser in Solr. Please see the introductory blog post - Ranges Over Functions in Solr 1.4 for more details on how to use the Function Range query parser.
(Note this should apply to all versions of Solr greater that 1.4 as well)
So I think something like the following should work...
`?q=*:*&fq={!frange l=0 u=<#input>}start
&fq{!frange l=<#input> u=<number larger than greatest value>}end`
I am not sure if you can use wildcards for the upper and lower bounds, probably not...

What makes a good autowarming query in Solr and how do they work?

This question is a follow up to this question about infrequent, isolated read timeouts in a solr installation.
As a possible problem missing / bad autowarming queries for new searchers were found.
Now I am confused about how good autowarming queries should "look like".
I read up but couldnt find any good documentation on this.
Should they hit a lot of documents in the index? Or should they have matches in all distinct fields that exist in the index?
Wouldnt just *:* be the best autowarming query or why not?
The example solr config has theese sample queries in it:
<lst><str name="q">solr</str> <str name="start">0</str> <str name="rows">10</str></lst>
<lst><str name="q">rocks</str> <str name="start">0</str> <str name="rows">10</str></lst>
I changed them to:
<lst><str name="q">george</str> <str name="start">0</str> <str name="rows">10</str></lst>
Why? Because the index holds film entities with fields for titles and actors. Those are the most searched ones. And george appears in titles and actors.
I don't really know whether this makes sense. So my question is:
What would be good autowarming queries for my index and why?
What makes a good autowarming query?
This is an example document from the index. The index has about 70,000 documents and they all look like this (only different values of course):
example document:
<doc>
<arr name="actor"><str>Tommy Lee Jones</str><str>Will Smith</str><str>Rip Torn</str>
<str>Lara Flynn Boyle</str><str>Johnny Knoxville</str><str>Rosario Dawson</str><str>Tony Shalhoub</str>
<str>Patrick Warburton</str><str>Jack Kehler</str><str>David Cross</str><str>Colombe Jacobsen-Derstine</str>
<str>Peter Spellos</str><str>Michael Rivkin</str><str>Michael Bailey Smith</str><str>Lenny Venito</str>
<str>Howard Spiegel</str><str>Alpheus Merchant</str><str>Jay Johnston</str><str>Joel McKinnon Miller</str>
<str>Derek Cecil</str></arr>
<arr name="affiliate"><str>amazon</str></arr>
<arr name="aka_title"><str>Men in Black II</str><str>MIB 2</str><str>MIIB</str>
<str>Men in Black 2</str><str>Men in black II (Hombres de negro II)</str><str>Hombres de negro II</str><str>Hommes en noir II</str></arr>
<bool name="blockbuster">false</bool>
<arr name="country"><str>US</str></arr>
<str name="description">Agent J (Will Smith) muss die Erde wieder vor einigem Abschaum bewahren, denn in Gestalt des verführerischen Dessous-Models Serleena (Lara Flynn Boyle) will ein Alien den Planeten unterjochen. Dabei benötigt J die Hilfe seines alten Partners Agent K (Tommy Lee Jones). Der wurde aber bei seiner "Entlassung" geblitzdingst, und so muß J seine Erinnerung erst mal etwas auffrischen bevor es auf die Jagd gehen kann.</str>
<arr name="director"><str>Barry Sonnenfeld</str></arr>
<int name="film_id">120912</int>
<arr name="genre"><str>Action</str><str>Komödie</str><str>Science Fiction</str></arr>
<str name="id">120912</str>
<str name="image_url">/media/search/filmcovers/105x/kf/false/F6Q1XW.jpg</str>
<int name="imdb_id">120912</int>
<date name="last_modified">2011-03-01T18:51:35.903Z</date>
<str name="locale_title">Men in Black II</str>
<int name="malus">3238</int>
<int name="parent_id">0</int>
<arr name="product_dvd"><str>amazon</str></arr>
<arr name="product_type"><str>dvd</str></arr>
<int name="rating">49</int>
<str name="sort_title">meninblack</str>
<int name="type">1</int>
<str name="url">/film/Men-in-Black-II-Barry-Sonnenfeld-Tommy-Lee-Jones-F6Q1XW/</str>
<int name="year">2002</int>
</doc>
Most queries are exact match queries on actor fields with some filters in place.
Example:
INFO: [] webapp=/solr path=/select/
params={facet=true&sort=score+asc,+malus+asc,+year+desc&hl.simple.pre=starthl&hl=true&version=2.2&fl=*,score&facet.query=year:[1900+TO+1950]&facet.query=year:[1951+TO+1980]&facet.query=year:[1981+TO+1990]&facet.query=year:[1991+TO+2000]&facet.query=year:[2001+TO+2011]&bf=div(sub(10000,malus),100)^10&hl.simple.post=endhl&facet.field=genre&facet.field=country&facet.field=blockbuster&facet.field=affiliate&facet.field=product_type&qs=5&qt=dismax&hl.fragsize=200&mm=2&facet.mincount=1&qf=actor^0.1&f.blockbuster.facet.mincount=0&f.genre.facet.limit=20&hl.fl=actor&wt=json&f.affiliate.facet.mincount=1&f.country.facet.limit=20&rows=10&pf=actor^5&start=0&q="Josi+Kleinpeter"&ps=3}
hits=1 status=0 QTime=4
There are 2 types of warming. Query cache warming and document cache warming (There's also filters, but those are similar to queries). Query cache warming can be done through a setting which will just re-run X number of recent queries before the index was reloaded. Document cache warming is different.
The goal of document cache warming is to get a large quantity of your most frequently accessed documents into the document caches so they don't have to be read from disk. So, your queries should focus on this. You need to try and figure out what your most frequently searched documents are and load those. Preferably with a minimal number of queries. This has nothing to do with the actual content of the fields. EDIT: To clarify. When warming document caches your primary interest is the documents that turn up in search RESULTS most often, regardless of how they are queried.
Personally, I'd run searches for things like:
Loading by country, if most of your searches are for US films.
Loading by year, if most of your searches are for more recent films.
Loading by genre, if you have a short list of heavily searched genres.
A last possibility is to load them all. Your documents look small. 70,000 of them is nothing in terms of server memory nowadays. If your document cache is large enough, and you have enough memory available, go for it. As a side note, some of your biggest benefit will be from your document cache. A query cache is only beneficial for repeated queries, which can be disappointingly low. You almost always benefit from a large document cache.

Resources