retaining case in elasticsearch faceted search - elasticsearch

Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck

Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.

Related

Elastic Enterprise Search - Is it a best practice to index data of two different json schema in a single index

Hi I'm trying out Elastic Enterprise Search with Elasticsearch. I have a couple of questions on data indexing.
When referring to Elasticsearch documentation, I read that there is a limit to the number of fields that an Elasticsearch index could have. Since Elasticsearch is used with Elastic Enterprise Search I believe there is no arguing that the same applies here. In that case lets say I have multiple document types with various fields. For an example Person.json and Dog.json, they both have different properties. So when indexing I use one search engine in Elastic Enterprise Search to index both Person and Dog so that when I query using the Elastic Enterprise Search API I'll get results which are both Person and Dog depending on the search term.
Is this the way to go,or should I specify a seperate search engine for each schema type?
I am assuming that your person.json and dog.json contains different fields as your heading suggest and weather to create a separate index for these entities or have them in a single index, depends on the various use-cases you have in your application and you will not find elasticsearch marking one approach better than other and mainly will explain the pros/cons based on a particular context(like relevance, performance, management etc).
Please refer to my this SO answer, where I talked about various pros/cons of both the approach and discussion in chat to get more context why OP chose an approach based on his use-case, after knowing the pros/cons.

how can I find related keywords with elasticsearch?

I am pretty new to elasticsearch and already love it.
Right know I am interested in understanding on how I can let elasticsearch make suggestions for similar keywords.
I have already read this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents.
This is already more than I am looking for. I dont need similar documents but only related / similar keywords.
So lets say I have an index of documents about movies and I start a query about "godfather". Then elasticsearch should suggest related keywords - e.g. "al pacino" or "Marlon Brando" because they are likely to occur in the same documents.
any ideas how this can be done?
Unfortunately, there is no built-in way to do that in Elastic. What you could possibly do, is to write a program, that will query Elastic, return matched documents, then you will get the _source data, or just retrieve it from your original datasource (like DB or file), later you will need to calculate TF-IDF for each term in the retrieved ones and somehow combine everything all together to get top K terms out of all returned terms.

Elasticsearch - Autocomplete return word/term/token suggestions instead of whole documents

I am trying to implement a simple auto completion for query terms.
There are many different approaches but most of them do return documents instead of terms
- or the authors simply stopped explaining from that point and i am not able to adapt.
A user is typing in a query - e.g. phil
What i want is to provide a list of term completion suggestions like philipp, philius, philadelphia, ...
I am able to get document matches via (edge)ngrams, phrase_prefix and so on but i am am stuck at retrieving matching terms (completion suggestions).
Can someone give me a hint?
I have documents like this {"title":"...", "description":"...", "content":"..."}
All fields have larger string values but especially the field content contains fulltext content.
I do not want to suggest the whole title of a document containing e.g. Philadelphia. Just the word "Philadelphia".
Looking for something like that, myself.
In SOLR it was relatively simple to configure (although a pain to build and keep up-to-date) using solr.SpellCheckComponent. Somehow the same underlying Lucene functionality is used differently between SOLR and ElasticSearch, and in ElasticSearch it is geared towards finding whole documents (or whole field values, if you will) or so it seems...
Despite the profusion of "elasticsearch autocomplete" articles, none appears to deal with this particular issue. Like it doesn't exist. Maybe their use case is different and ElasticSearch works for them just fine, who knows?
At this point I think that preparing the exact field values to use with ElasticSearch autocomplete (yes, that's the input field values, not analyzer tokens) maybe the only way to solve the problem. Which is terrible, because the performance is going to be very low.
Try term suggester:
The term suggester suggests terms based on edit distance. The provided
suggest text is analyzed before terms are suggested. The suggested
terms are provided per analyzed suggest text token. The term suggester
doesn’t take the query into account that is part of request.

Elasticsearch find search terms matching text

I have a scenario where i need to map each article to an entity. To do so, we are maintaining a set of keywords / search phrase (ex: (icici OR hdfc) AND bank) that may be available in each article. We want to use the power of elastic search to scan all the search phrases that may be available in the article being processed.
What i have come across yet is forward search (like full text search and so on) But now here what i need is to have a reverse search of search phrases against an article.
I was digging for a solution and hopped some genius would have already discovered the same and would help in for the same.
In Elasticsearch it's called percolator.

Elasticsearch questions: search, performance and caching

I'm new to elasticsearch, have been reading their API and some things are not clear to me
1) It is said that filters are cached. what does that mean? if i send a query with a filter on it, what gets cached? The results of that query? If i send a different query with the same filter, will the cache help me somehow?
I know the question is kinda vague, but so is ElasticSearch's documentation for this.
2) Is there a real performance difference between a query matching a term X to the "_all" field or to a specific field? As far i understand, both queries will be compared against all documents that contain X in one of their fields, and the only difference is in how many fields will be matched against X, in these documents. is that correct?
1) For your first question take a look at this link.
To quote from the post
"Filters don’t score documents – they simply include or exclude. If a document matches a filter, it is represented with a one in the BitSet; otherwise a zero. This means that Elasticsearch can store an entire segment’s filter state (“who matches this particular filter?”) in a single, compact BitSet.
The first time Elasticsearch executes a filter, it parses Lucene segment data structures to determine what matches your filter. Instead of throwing away this information, it caches it inside a BitSet.
The next time the same filter is executed, Elasticsearch can reference the compact BitSet instead of the Lucene segments. This has huge performance benefits."
2) "The idea of the _all field is that it includes the text of one or more other fields within the document indexed. It can come very handy especially for search requests, where we want to execute a search query against the content of a document, without knowing which fields to search on. This comes at the expense of CPU cycles and index size."link
So if you know what fields you are going to query use specifics fields to search on.

Resources