Find documents whose analyzed field contains slash and underscore - elasticsearch

My documents have an analyzed field url with content looking like this
http://sub.example.com/data/11/222/333/filename.txt
I would like to find all documents whose filename starts with an underscore. I've tried multiple approaches (wildcard, pattern, query_string, span queries) but I never got the right result. I expect this is because the the underscore is a term separator. How can I write such a query? Is it possible at all without changing the field to not analyzed (which I cannot do at the moment)?
It's ElasticSearch 1.5, but we'll be migrating to at least 2.4 in foreseeable future.

You might be able to write a script that would do that, but it would be amazingly slow.
You best bet (even though you say you can't right now) is changing the field from analyzed to a multi-field. This way you could have both analyzed and not-analyzed versions to work work.
You could use the Reindex API to migrate all the data from the old version to the new version (assuming you're using ES 2.3 or greater).

Related

Does changing from Simple Analyzer of Elastic search to Standard Analyzer requires the re-index?

I changed the simple analyzer on a field to Standard analyzer and tested it locally and it's working fine. I don't have to re-index all my documents in ES.
But according to this SO post and this ES doc, looks like we need to re-index if we add/change the analyzer on a field.
I am confused as its working fine now and it would take consider amount of time if I do the re-indexing and want to avoid it, if it's not required.
Let me know if somebody faced the similar situation and what they did ?
Edit :- I am using the ES 1.7 version and I changed the analyzer on a field and just started the app again, I think my app just update the latest mapping to ES.
If you change an analyzer, of course you need to reindex your data, or at the very least the field whose analyzer was changed.

Can we migrate non stored Index data in SOLR to Elastic search?

We are currently using SOLR for full-text search. Now we are planning to move from SOLR to ElasticSearch. When we were in this process i have read somewhere that there are some plugins available which will migrate data from SOLR-ElasticSearch. But it won't be able to migrate those records which are not stored in SOLR. So is there a plugin available which will migrate non-stored index data from SOLR to elastic search if so please let me know.
Currently am using SOLR-to-ES plugin, but it won't migrate the non-stored index data.
Thanks
If the field is not stored, then you don't have the original value. If you have it indexed, what's is in there is the value after it has gone through the analysis chain, and so is probably different than the original one (has no stopwords, is probably lowercased, maybe stemmed...stuff like that).
There are a couple of possibilities that might allow you to have the original content when not stored:
indexed field: if it has been analyzed with just the keyword tokenizer: then the indexed value is the original value.
field has docValues=true then the original value is also stored. This feature was introduced later, so your index might not be using it.
The issue is, the common plugings might not take advantage of those cases where stored=true is not totally necessary. You need to check them.

Elasticsearch - Autocomplete return word/term/token suggestions instead of whole documents

I am trying to implement a simple auto completion for query terms.
There are many different approaches but most of them do return documents instead of terms
- or the authors simply stopped explaining from that point and i am not able to adapt.
A user is typing in a query - e.g. phil
What i want is to provide a list of term completion suggestions like philipp, philius, philadelphia, ...
I am able to get document matches via (edge)ngrams, phrase_prefix and so on but i am am stuck at retrieving matching terms (completion suggestions).
Can someone give me a hint?
I have documents like this {"title":"...", "description":"...", "content":"..."}
All fields have larger string values but especially the field content contains fulltext content.
I do not want to suggest the whole title of a document containing e.g. Philadelphia. Just the word "Philadelphia".
Looking for something like that, myself.
In SOLR it was relatively simple to configure (although a pain to build and keep up-to-date) using solr.SpellCheckComponent. Somehow the same underlying Lucene functionality is used differently between SOLR and ElasticSearch, and in ElasticSearch it is geared towards finding whole documents (or whole field values, if you will) or so it seems...
Despite the profusion of "elasticsearch autocomplete" articles, none appears to deal with this particular issue. Like it doesn't exist. Maybe their use case is different and ElasticSearch works for them just fine, who knows?
At this point I think that preparing the exact field values to use with ElasticSearch autocomplete (yes, that's the input field values, not analyzer tokens) maybe the only way to solve the problem. Which is terrible, because the performance is going to be very low.
Try term suggester:
The term suggester suggests terms based on edit distance. The provided
suggest text is analyzed before terms are suggested. The suggested
terms are provided per analyzed suggest text token. The term suggester
doesn’t take the query into account that is part of request.

Where do .raw fields come from when using Logstash with Elasticsearch output?

When using Logstash and Elasticsearch together, fields with .raw are appended for analyzed fields, so that when querying Elasticsearch with tools like Kibana, it's possible to use the field's value as-is without per-word splitting and what not.
I built a new installation of the ELK stack with the latest greatest versions of everything, and noticed my .raw fields are no longer being created as they were on older versions of the stack. There are a lot of folks posting solutions of creating templates on Elasticsearch, but I haven't been able to find much information as to why this fixes things. In an effort to better understand the broader problem, I ask this specific question:
Where do the .raw fields come from?
I had assumed that Logstash was populating Elasticsearch with strings as-analyzed and strings as-raw when it inserted documents, but considering the fact that the fix lies in Elasticsearch templates, I question whether or not my assumption is correct.
You're correct in your assumption that the .raw fields are the result of a dynamic template for string fields contained in the default index template that Logstash creates IF manage_template: true (which it is by default).
The default template that Logstash creates (as of 2.1) can be seen here. As you can see on line 26, all string fields (except the message one) have a not_analyzed .raw sub-field created.
However, the template hasn't changed in the latest Logstash versions as can be seen in the template.json change history, so either something else must be wrong with your install or you've changed your Logstash config to use your own index template (without .raw fields) instead.
If you run curl -XGET localhost:9200/_template/logstash* you should see the template that Logstash has created.

retaining case in elasticsearch faceted search

Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck
Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.

Resources