Lucene - Searching several terms in different fields

Lucene - Searching several terms in different fields - full-text-search

I have a Lucene index which populates from a database. I store/index some fields and then add a FullText field in which I index the contents of all the other fields, so I can do a general search.
Now let's say I have a document with the following two fields:
fld1 - "Samsung releases a new 22'' LCD screen"
fld2 - "Sony Ericsson phone's batteries explode"
If an user does a "Samsung phone", he probably just wants news about samsung phones, not a document with info about a samsung screen and a sony phone, but searching by the FullText field, I will get this as a valid result.
Is there a nice way to handle this?
I've thought of indexing with some separator and the doing a SpanNotQuery, so the FullText field would have this contents:
"Samsung releases a new 22'' LCD screen MYLUCENESEPARATOR Sony Ericsson phone's batteries explode" and then doing a SpanNotQuery with MYLUCENESEPARATOR as the non-spanning term.
Is this a good solution? Does it scale well with more than two terms? I fear it would be a performance killer. Is there a better way to achieve this?

If the number of fields is limited you can put the two description strings in two different fields. Then you can use MultiFieldQueryParser to search on these fields. Since these are two separate fields, the document will match only if both the terms appear in a single field with AND operator.
Let's take your example.
fld1 - "Samsung releases a new 22'' LCD screen"
fld2 - "Sony Ericsson phone's batteries explode"
If these are indexed in separate fields fld1 & fld2, your query becomes
(+fld1:samsung +fld1:phone) (+fld2:samsung +fld2:phone)
Multifield query helps you to construct such queries easily so that you don't need to repeat a query for multiple fields.

Related

Elasticsearch show what is matched from a query

I'm implementing a sort of "natural language" search assistant. I have a form with a number of select fields. The list of options in each field can be pretty lengthy. So rather than having to select each item individually, I'm adding a text input box where people can just type what they're looking for and the app will suggest possible searches, based on the options in the select dropdowns.
Let's say my options are:
Color: red, blue, black, yellow, green
Size: very small, kinda medium, super large
Shape: round, square, oblong, cylindrical
Year: 2007, 2008, 2009, 2010
If you typed in "2007 very small star-spangled", the text input would suggest "Search all 2007 very small widgets for 'star-spangled'". It understood that "2007" and "very small" were select options in the form, and that "star-spangled" was not, and suggested a search where "2007" and "very small" are selected, and then left the "star-spangled" bit for a plaintext search.
What I'm working on right now is parsing the search query and picking out the bits that fit into the select fields. I have all the options in Elasticsearch. I was thinking of searching each type individually to see if it matches anything in the search query. That seems straightforward to me. I can easily find matches. However, I don't know which part of the query actually matches each type, which I need in order to find out that e.g. "star-spangled" is the part that didn't match options.
So, in the end, I need to know that only the "2007" substring matched the year, only the "very small" substring matched the size, and "star-spangled" didn't match anything.
My first thought is to split the query into word-grams (e.g. "2007", "2007 very", "2007 very small", "2007 very small star-spangled", "very", "very small", "very small star-spangled", "small", "small star-spangled", "star-spangled") and search each option for each gram. Then I would know for sure which gram matched. However, this could obviously get resource intensive pretty quickly. Also, I know Elasticsearch can do that sort of search internally much faster.
So what I really need is to be able to perform a search and, along with the results, get back which part of the original query actually matched. So if I searched, "2007 verr small" (intentional misspelling) and did a fuzzy search of sizes, passing the entire query string, and I get the "Very Small" size back as a result, it would indicate that "verr small" is the part of the query that matched that size.
Any idea of how to do that? Or possibly some other solutions?
I could do the search and parse the results to see which bits match the string. Though I could see that being resource intensive as well. And if I'm doing a fuzzy search, it wouldn't necessarily be clear which part of the query triggered a match in the result.
I was also thinking that highlighting might work for this, but I don't know enough about Elasticsearch to know for sure.
EDIT: I tested this out using highlighting. It's so close to working. The highlight field comes back with the part of the string that matches. However, it only shows the part of the result that matches. It doesn't show the part of the query that matches. So if I want to allow for fuzzy searches, the highlight field won't match the original query and I won't be able to tell which part of the query matched. For example, a query of "very smaal" will return the size "Very Small", but the highlight field will show very small, not very smaal.

There are 2 types of queries in Elasticsearch, Match Query and Filtered Query. Match query matches your term in the documents and find all the relevant documents with a relevance score. For example when you search for term: "help fixing javascript problem" you are interested in all documents which contain one or more of the search term.
On the other hand, when you are using Filtered Query, a document is either a match or not match... there is no relevance score here... as an example, you want all the products built in year "2007"... here you need to use a filtered query. All the product built in 2007 have the same score and all other years are excluded from the result.
In my opinion, your problem should be dealt with Filter Query...
When using filter query, normally each filter has its own corresponding input in the UI, consider the following screen-shot which is from ebay:
If I have understood your requirement correctly, you want to include all those filters in a single search-box. In my opinion, this is nearly impossible to implement because you have no way to parse user input and decide which word corresponds to which filter...
If you want to go down the filter path, it's better to introduce corresponding UI fields for each filter...
If you want to stick to a single search box, then don't implement the filter functionality and stick to Elasticsearch Multi-match query... you can match the input term across multiple fields but you won't be able to filter out (exclude) result instead you get a relevance score.

Elasticsearch - what is better: query several fields or single combined field?

Declaimer: Possible duplicate of this SO question, not sure...
Let's assume I have something similar to IMDB (e.g. catalog of movies) and I want to store it in Elasticsearch.
Single Movie record contains Title, Description, and Categories (strings, e.g. "Children", "Action", etc).
Let's assume that users allowed to search a free text, which can be everything: words from title, from description or from categories (e.g. "movie for children").
I wondering, from search performance perspective, what is more efficient: to query on each of the fields, or to create a special big field which is a concatenation of all of the fields and then to query only on it.

Predictive autosuggest logic

I would like to implement predictive autosuggest in my website. I have used Solr to improve search performance. But after a research of last 2 days, I understand that Solr didn't have any built in package or support to implement predictive suggestion like Amazon or flipkart search. Anybody can advice me what is the easy logic to implement predictive suggestion
OR what are the technologies supports this type of search suggestion?
Expected workflow as follows,
If user search string "samsung" our autosuggestion should show grouped suggestion as follows,
samsung in Mobile
samsung in Television
samsung in Laptop
and so on

You're describing "filtered search" (via autosuggest). You can determine which filters to offer using Solr facets.
Assuming "Mobile", "Television" and "Laptop" are all values in a Solr field called category:
Run a query for samsung with rows=0 and request a terms facet on category.
You'll get back an frequency-ordered list of categories where documents match samsung
Display these categories as filtered search options (via autosuggest) if you decide the result count is high enough.
When a suggestion is chosen, run a second query for samsung filtered by the chosen category (eg: q=samsung&fq=category:Mobile&rows=10)

Multiple field autocomplete with index type boost

What I'm trying to accomplish on a high level is an autocomplete input field which queries both customers and orders on multiple fields, with customers ranking higher for customer name searches.
It seems to me that there are various ways to approach this problem with the tools that elasticsearch provides.
The way that I have approached this is to use multi_match queries with prefix_phrase type in order to get partial queries to work across multiple fields.
For example, "bo" should return back matches for "Bob Smith" as well as "Adam Boss". I'm indexing fullname as a separate field from firstname and lastname, so that "adam boss" will return a valid prefix match as well.
In addition, I'd like to boost customer results - trying to do that with a boost param on the multi_match, but that doesn't seem to be working the way I'd expect it to.
What would be a straight forward way to tackle this problem?
One of the challenges I'm facing with the elasticsearch docs is that it's not always clear which properties and features apply to which others. For example, the multi_match documentation doesn't talk about using a custom boost, other than on a field-level.

I think the best way is using completion suggester of ES (v0.90.3+), please refer here for a real use case:
http://www.elasticsearch.org/blog/you-complete-me/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

Searching only specific fields with elasticsearch

How can I tell Elasticsearch to exclude a field when searching by a term?
I have an index of users (names, email, certifications, experience, office...), but only certain people can search for users by certification. In my current PHP Lucene implementation I have 2 separate indexes with and without that data. Is there a way I can do this with only one users index? I assume I need to apply some kind of filter [1] [2], but don't see one that will allow me to ignore a field entirely.
If there is any way specifically to do this with Elastica (PHP Client) that would be even more helpful, but native ES would be equally as helpful.
Say I have 2 users in my index
Kevin Smith
Certified in Muffin Making
Mark Smith
Certified in Motorcycle jumping.
When a normal user searched for motorcycle, nothing should be returned, but if they search for Smith both should be returned.
A user with the ability to search the certifications field will return Mark if they search for motorcycle and both if they search for Smith.

I have not tested anything, but it seems that you might be able to set "include_in_all" to false on the mapping phase. That means that your field won't be include in the "_all". Then you just have to make a query on "_all" fields.
Note that the field is still available for search and indexed, you can query it by specifying it in the field query; it's just excluded from the "_all" parameters.
Again I haven't tested anything yet.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio