Querystring query without analysis on es? - elasticsearch

I have a field "animal" that is not_analyzed. Is a document with "animal": "fox"
searchable with a querystring query if a user passes in "fo" as the querystring? Or would the user have to pass in "fox" in order to match that document?

"fo" won't match "fox" if you're using not_analyzed.
There are three types of index mappings in elasticsearch:
analyzed analyses your text and then indexes it. ("fo" matches "fox")
not_analyzed indexes your text (makes it searchable) exactly as it is. ("fo" doesn't match "fox", only "fox" does)
no the field is not searchable.
reference

If you search for fo* it will work , else you need to search for the exact term.

Related

analyzed vs not_analyzed: storage size

I recently started using ElasticSearch 2. And as I undestand analyzed vs not_analyzed in the mapping, not_analyzed should be better in storage (https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0 and https://www.elastic.co/blog/elasticsearch-storage-the-true-story).
For testing purposes I created some indexes with all the String field as analyzed (by default) and then I created some other indexes with all the fields as not_analyzed, my surprise came when I checked the size of the indexes and I saw that the indexes with the not_analyzed Strings were 40% bigger!! I was inserting the same documents in each index (35000 docs).
Any idea why this is happening? My documents are simple JSON documents. I have 60 String fields in each document that I want to set as not_analyzed and I tried both setting each field as not analyzed and also creating a dynamic template.
I edit for adding the mapping, although I think it has nothing special:
{
"mappings": {
"my_type" : {
"_ttl" : { "enabled" : true, "default" : "7d" },
"properties" : {
"field1" : {
"properties" : {
"field2" : {
"type" : "string", "index" : "not_analyzed"
}
more not_analyzed String fields here
...
...
...
}
not_analyzed fields are still indexed. They just don't have any transformations applied to them beforehand ("analysis" - in Lucene parlance).
As an example:
(Doc 1) "The quick brown fox jumped over the lazy dog"
(Doc 2) "Lazy like the fox"
Simplified postings list created by Standard Analyzer (default for analyzed string fields - tokenized, lowercased, stopwords removed):
"brown": [1]
"dog": [1]
"fox": [1,2]
"jumped": [1]
"lazy": [1,2]
"over": [1]
"quick": [1]
30 characters worth of string data
Simplified postings list created by "index": "not_analyzed":
"The quick brown fox jumped over the lazy dog": [1]
"Lazy like the fox": [2]
62 characters worth of string data
Analysis causes input to get tokenized and normalized for the purpose of being able to look up documents using a term.
But as a result, the unit of text is reduced to a normalized term (vs an entire field with not_analyzed), and all the redundant (normalized) terms across all documents are collapsed into a single logical list saving you all the space that would normally be consumed by repeated terms and stopwords.
From the documentation, it looks like not_analyzed makes the field act like a "keyword" instead of a "full-text" field -- let's compare these two!
Full text
These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed.
Keyword
Keyword fields are not_analyzed. Instead, the exact string value is added to the index as a single term.
I'm not surprised that storing an entire string as a term, rather than breaking it into a list of terms, doesn't necessarily translate to saved space. Honestly, it probably depends on the index's analyzer and the string being indexed.
As a side note, I just re-indexed about a million documents of production data and cut our index disk space usage by ~95%. The main difference I made was modifying what was actually saved in the source (AKA stored). We indexed PDFs for searching, but did not need them to be returned and so that saved us from saving this information in two different ways (analyzed and raw). There are some very real downsides to this, though, so be careful!
Doc1{
"name":"my name is mayank kumar"
}
Doc2.{
"name":"mayank"
}
Doc3.{
"name":"Mayank"
}
We have 3 documents.
So if field 'name' is 'not_analyzed' and we search for 'mayank' only second document would be returned.If we search for 'Mayank' only third document would be returned.
If field 'name' is 'analyzed' by a analyser 'lowercase analyser'(just as a example).We we search for 'mayank', all 3 documents would be returned.
If we search for 'kumar' ,first document would be returned.This happens because in first document the field value gets tokenised as "my" "name" "is" "mayank" "kumar"
'not_analyzed' is basically used for 'full-text' search(mostly except in wildcards matching).less space on disk.Takes less time during indexing.
'analyzed' is basically used for matching documents.more space on disk (if the analyze fields are big).Takes more time during indexing.(More fields due to analyze fields)

Elasticsearch : Query on one of the fields given in the list

Elasticsearch has documents indexed with the following fields:
{"id":"1", "title":"test", "locale_1_title":"locale_test"}
Given a query, following behaviour is needed at each document level:
1) If locale_1_title field is not empty(""), search only on locale_1_title field. Do not search on title field.
2) If locale_1_title field is empty, search on title field.
What can be a simple elasticsearch query to get the above behaviour ?

ElasticSearch Match Multiple Prefix Terms

I am trying to give ElasticSearch a query with multiple terms and then be given matching documents where the terms specified are anywhere in the target field. The terms may be full words or word prefixes.
Example document:
{
"msg": "hello I am a text message"
}
Example query string:
"hello message"
The words "hello" and "message" appear in the text so I want the document returned. The same query should also return the document if the query string is:
"hel mes"
What is the most performant way to query ElasticSearch to achieve this goal?

How to search fields with '-' characters in elastic search

I am new to elastic search. I have got following document where one of the field "eventId" has "-" in value.
When i try to search with complete value of eventId, i don't get any results.
Sample Document app/event
{
"tags": {}
"eventId": "cc98d57b-c6bc-424c-b54c-df1e3df0d942",
}
I haven't created any explicit settings for my index.
Thanks.
you should check if the tokenizer splits your value into multiple fields. Maybe your value is stored as 5 fields: "cc98d57b", "c6bc", "424c", "b54c" and "df1e3df0d942"
You can analyze that with the 'Kopf' Plugin (https://github.com/lmenezes/elasticsearch-kopf).
If that is your problem you should change your field mapping, so that the value is not analyzed ("index" : "not_analyzed").
For an example how to set that mapping see here: Elasticsearch mapping settings 'not_analyzed' and grouping by field in Java
After that, you should be able to search for your specific value.

ElasticSearch: Matching multiple queries

I am using Tire (ElasticSearch Ruby gem), and want to match a few fields on the keyword "community marketing". However, I also want ElasticSearch to return me results for the keyword "communities marketing" as well. The standard analyzer does not parse/tokenize "communities" as "community" so they're separate keywords.
How do I get ElasticSearch to return me results for both "community marketing" and "communities marketing"? I prefer to do this in query time, rather than index time. I'm fine with ElasticSearch standard analyzer and prefer not to mess around with it.
fields = ["title", "popular_hash_tags"]
keyword = "communities marketing"
keyword2 = "community marketing"
s = Tire.search "articles" do
query do
match fields, keyword, :operator => "AND"
#NOW I also want to match keyword2??
end
end
I suggest digging through the query DSL of Elasticsearch. You will find a lot of interesting stuff.
For instance, the "should" clause of a bool filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

Resources