Elasticsearch 5.0.2 failed to search for keyword

Elasticsearch 5.0.2 failed to search for keyword - elasticsearch

I am indexing a very simple item with a field define as such:
"comid": {
"type": "keyword",
"store": "false",
"index": "no",
"include_in_all": false,
"doc_values": false
}
I then ingested a single item where comid = "this is an id"
When I query the item with exact match "this is an id"
I got this error:
cannot search on field [comid] since it is not indexed
Maybe I misunderstood the documentation but I thought we are able to search for keyword (using exact match) ?
I think I could get around this problem by changing the type from keyword to text and then using a keyword analyzer (which is noop if I understood it correctly) but it seems weird to do this for every keyword type field.
I must be missing something obvious here?

If it's not indexed, the field data is only stored in the document, not the index so you cannot search using that field.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html

Related

Unexpected result using Elasticsearch when dash character is involved

I'm querying Elasticsearch 2.3 using django-haystack, and the query that is executed seems to be the following:
'imaging_telescopes:(*\\"FS\\-60\\"*)'
An object in my Elasticsearch data has the following value for its property imaging_telescopes: "Takahashi FSQ-106N".
This object matches the query, and to me this result is unepected, I wouldn't want it to match.
My assumption is that it matches becasue it contains the letters FS, but in my frontend I'm just searching for "FS-60".
How can I modify the query so that it's stricter in looking for objects whose property imaging_telescopes exactly contains some text?
Thanks!
EDIT: this is the mapping of the field:
"imaging_telescopes": {
"type": "string",
"analyzer": "snowball"
}

Non indexed field in sort clause

I have a field in my elastic search index mapping which would not be used for any searching. But I require it in sort clause of the query. Is it possible that I put "index" : "false" in the mapping definition ?
Basically in mapping :
"name":{
"type": "keyword",
"index": "false"
}
And in query :
"sort" : [
{"name" : {"order" : "asc"}}
]

Please read about the index option from the official elasticsearch documents, which says:
The index option controls whether field values are indexed. It accepts
true or false and defaults to true. Fields that are not indexed are
not queryable.
So, in your case, you are explicitly making it false, hence you would not be able to include in your query, hence sort queries will also not work on this field.
You can easily verify this yourself, by creating one such field in your index and see if it allows you to sort on that field.

How to treat certain field values as null in `Elasticsearch`

I'm parsing log files which for simplicity's sake let's say will have the following format :
{"message": "hello world", "size": 100, "forward-to": 127.0.0.1}
I'm indexing these lines into an Elasticsearch index, where I've defined a custom mapping such that message, size, and forward-to are of type text, integer, and ip respectively. However, some log lines will look like this :
{"message": "hello world", "size": "-", "forward-to": ""}
This leads to parsing errors when Elasticsearch tries to index these documents. For technical reasons, it's very much untrivial for me to pre-process these documents and change "-" and "" to null. Is there anyway to define which values my mapping should treat as null ? Is there perhaps an analyzer I can write which works on any field type whatsoever that I can add to all entries in my mapping ?
Basically I'm looking for somewhat of the opposite of the null_value option. Instead of telling Elasticsearch what to turn a null_value into, I'd like to tell it what it should turn into a null_value. Also acceptable would be a way to tell Elasticsearch to simply ignore fields that look a certain way but still parse the other fields in the document.

So this one's easy apparently. Add the following to your mapping settings :
{
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
}
}
}
}
This will still index the field (contrary to what I've understood from the documentation...) but it will be ignored during aggregations (so if you have 3 entries in an integer field that are "1", 3, and "hello world", an averaging aggregation will yield 2).
Keep in mind that because of the way the option was implemented (and I would say this is a bug) this still fails for and object that is entered as a concrete value and vice versa. If you'd like to get around that you can set the field's enabled value to false like this :
{
"mappings": {
"my_mapping_name": {
"properties": {
"my_unpredictable_field": {
"enabled": false
}
}
}
}
}
This comes at a price though, since this means the field won't be indexed, but the values entered will be still be stored so you can still accessing them by searching for that document through another field. This usually shouldn't be an issue as you likely won't be filtering documents based on the value of such an unpredictable field, but that depends on your specific case use. See here for the official discussion of this issue.

Ngram Tokenizer on field, not on query

I'm having trouble finding the solution for a use case here.
Basically, it's pretty simple : I need to perform a "contains" query, like a SQL like '%...%'.
I've seen there is a regexp query, which I actually managed to get working perfectly, but as it seems to scale badly, i'm trying out nGrams. Now, I've played around with them before and know "how they work", but the behaviour isn't the one I expect it to be.
Basically, i've configured my analyzer to be mingram =2, maxgram = 20. Say I index a user called "Christophe". I want the query "Chris" to actually match, which it does, since Chris is a 5-gram of Christophe. The problem is, "Risotto" matches aswell, because it gets broken down into Ngrams and ultimately "is" is a 2-gram of "Christophe" and so it matches aswell.
What I need is the analyzer to actually break down the indexed field in nGrams at indexing time, and compare those to the FULL text query. Risotto should match Risotto, XXXRisottoXXX and so on, but not Risolo or something where the nGrams do match.
Is there any solution ?

You need to use search_analyzer setting to have distinct index time and search time analyzers.
Sample from docs:
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}

Django-Haystack elasticsearch queries

Haystack generates elasticsearch queries to get results from elasticsearch. The queries get prepended with a filter containing the following query:
"query": {
"query_string": {
"query": "django_ct:(customers.customer)"
}
}
What is the meaning of the django_ct(..) query? Is this a function that haystack installs in elasticsearch? Is it some caching magic? Can I get rid of this part altogether?
The reason why I'm asking is that I have to build a custom query to use an elasticsearch multi_field. In order to change the queries I want to understand first how haystack generates its own queries.

Haystack uses Django's content types to determine which model attributes to search against in Elasticsearch. This is not really best practice, but it's how it's done in HS.
Basically, the code in HS looks something like this:
app_name, model_name = django_ct.split('.')
ct = ContentType.objects.get_by_natural_key(app_name, model_name)
model = ct.model_class()
# do stuff with model
So, you really don't want to ignore it when using haystack, if you are indexing more than one model in your index.
I have a couple other answers based on elasticsearch here: index analyzer vs query analyzer in haystack - elasticsearch? and here: Django Haystack Distinct Value for Field
EDIT regarding multi-fields:
I've used Haystack and multifields in the past, so I'm not sure you need to write you own backend. The key is understanding how haystack creates searches. As I said in one of the other posts, everything goes into query_string and from there it creates a lucene based search string. Again, not really best practice.
So let's say you have a multi-field that looks like this:
"some_field": {
"type": "multi_field",
"fields": {
"some_field_edgengram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"some_field": {
"type": "string",
"index": "not_analyzed"
}
}
},
In haystack, you can just search against some_field and some_field_edgengram directly.
For example SearchQuerySet().filter(some_field="cat") and SearchQuerySet().filter(some_field_edgengram="cat") will both work, but the first will only match tokens that have cat exactly and the second will match cat, cats, catlin, catch, etc, at least using my edgengram analyzers.
However, just because you use haystack for indexing and search doesn't mean you have to use it for 100% of your search solutions. In the past, I've used PYES in some areas of the app and haystack in others, because haystack lacked the support for more advanced features and the query_string parsing was losing some of the finer grained accuracy we were looking for.
In your case, you could get results from the search engine via elasticutils or python-elasticseach directly for some more advanced searches and use haystack for the other more routine searches.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio