elasticsearch search query for exact match not working - elasticsearch

I am using query_string for search. Searching is working fine but its getting all records with small letters and capital letters match.But i want to exact match with case sensitive?
For example :
Search field : "title"
Current output :
title
Title
TITLE,
I want to only first(title). How to resolved this issue.
My code in java :
QueryBuilder qbString=null;
qbString=QueryBuilders.queryString("title").field("field_name");

You need to configure your mappings / text processing so tokens are indexed without being lowercased.
The "standard"-analyzer lowercases (and removes stopwords).
Here's an example that shows how to configure an analyzer and a mapping to achieve this: https://www.found.no/play/gist/7464654

With Version 5 + on ElasticSearch there is no concept of analyzed and not analyzed for index, its driven by type !
String data type is deprecated and is replaced with text and keyword, so if your data type is text it will behave like string and can be analyzed and tokenized.
But if the data type is defined as keyword then automatically its NOT analyzed, and return full exact match.
SO you should remember to mark the type as keyword when you want to do exact match with case sensitive.
code example below for creating index with this definition:
PUT testindex
{
"mappings": {
"original": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"APPLICATION": {
"type": "text",
"fields": {
"exact": {"type": "keyword"}
}
},
"type": {
"type": "text",
"fields": {
"exact": {"type": "keyword"}
}
}
}
}
}
}

Related

Cannot create elasticsearch mapping or get date field to work

{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
"status" : 400
I've read that the above is because type are deprecated in elasticsearch 7.7, is that valid for data type? I mean how am I suppose to say I want the data to be considered as a date?
My current mapping has this element:
"Time": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
I just wanted to create it again with type:"date", but I noticed even copy pasting the current mapping (which works) yields an error... The index and mappins I have are generate automatically by https://github.com/jayzeng/scrapy-elasticsearch
My goal is simply to have a date field, I have all my date in my index, but when I want to filter in kibana I can see it is not considered as a date field. And modyfing mapping doesn't seem like an option.
Obvious ELK noob here, please bare with me (:
The error is quite ironic because I pasted the mapping from an existing index/mapping...
You're experiencing this because of the breaking change in 7.0 regarding named doc types.
Long story short, instead of
PUT example_index
{
"mappings": {
"_doc_type": {
"properties": {
"Time": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
you should do
PUT example_index
{
"mappings": { <-- Note no top-level doc type definition
"properties": {
"Time": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
I'm not familiar with your scrapy package but at a glance it looks like it's just a wrapper around the native py ES client which should forward your mapping definitions to ES.
Caveat: if you have your fields defined as text and you intend to change it to date (or add a field of type date), you'll get the following error:
mapper [Time] of different type, current_type [text], merged_type [date]
You basically have two options:
drop the index, set the new mapping & reindex everything (easiest but introduces downtime)
extend the mapping with a new field, let's call it Time_as_datetime and update your existing docs using a script (introduces a brand new field/property -- that's somewhat verbose)

I want to find exact term of sub string, exact term not just part of the term

I have group of json documents from wikidata (http://www.wikidata.org) to index to elasticsearch for search.
It has several fields. For example, it looks like below.
{
eId:Q25338
eLabel:"The Little Prince, Little Prince",
...
}
Here, what I want to do is for user to search 'exact term', not part of the term. Meaning, if a user search 'prince', I don't want to show this document in the search result. When user types the whole term 'the little prince' or 'little prince', I want to make this json included in the search result, namely.
Should I pre-process all the comma separate sentence (some eLabel has tens of elements in the list) and make it bunch of different documents and make the keyword term field respectively?
If not, how can I make a mapping file to make this search as expected?
My current Mappings.json.
"mappings": {
"entity": {
"properties": {
"eLabel": { # want to replace
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}
Should I pre-process all the comma separate sentence (some eLabel has tens of elements in the list) and make it bunch of different documents and make the keyword term field respectively?
This is exactly what you should do. Elasticsearch can't process the comma-separated list for you. It will think your data is just 1 whole string. But if you preprocess it, and then make the resulting field a Keyword field, that will work very well - it's exactly what the Keyword field type is designed for. I'd recommend using a Term query to search for exact matches. (As opposed to a Match query, a Term query does not analyse the incoming query and is thus more efficient.)

Is field named "language" somehow special?

In my query I have following filter:
"term": {
"language": "en-us"
}
And it's not returning any results despite there are a lot of docs with "language" = "en-us" and this field is defined in the mapping correctly. When I change filter for example for:
"term": {
"isPublic": true
}
Then it correctly filter by "isPublic" field.
My suspicion here is that field named "language" is treated somehow special? Maybe it's reserved keyword in ES query? Can't find it in docs.
ES v2.4.0
Mapping of document:
"mappings": {
"contributor": {
"_timestamp": {},
"properties": {
"createdAt": {
"type": "date",
"format": "epoch_millis||dateOptionalTime"
},
"displayName": {
"type": "string"
},
"followersCount_en_us": {
"type": "long"
},
"followersCount_zh_cn": {
"type": "long"
},
"id": {
"type": "long"
},
"isPublic": {
"type": "boolean"
},
"language": {
"type": "string"
},
"photoUrl": {
"type": "string",
"index": "not_analyzed"
},
"role": {
"type": "string",
"store": true
},
"slug": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
The field language is nothing special. It should be all in the mapping. Several possible causes come to mind:
query analyzer != index analyzer
the analyzer first splits into two tokens, en and de and then throws away short tokens, which would leave both, query and index empty:-)
the field is not indexed, just stored.
The - is not a normal ascii dash in the index or the query. I have seen crazy things happening when people paste queries from a word processor, like quotes are no longer straight quotes, dashes are ndash or mdash, ü ist not one character but a combined character.
EDIT after mapping was added to the question:
The type string is analyzed with the Standard Analyzer which splits text into tokens in particular at dashes too, so the field contains two tokens, "en" and "us". Your search is a term query, which should probably be called token-query, because it queries exactly this, the token as you write it: "en-us". But this token does not exist in the field.
Two ways to remedy this:
set the field to not-analyzed and keep the query as is
change the query to a match query.
I would rather use (1), since the language field content is something like an ID and should not be analyzed.
More about the topic: "Why doesn’t the term query match my document?" on https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-term-query.html

Request Body search in Elasticsearch

I am using Elasticsearch 5.4.1. Here is mapping:
{
"testi": {
"mappings": {
"testt": {
"properties": {
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
When I use URI search I receive results. On the other hand during using Request Body search there is empty result in any case.
GET testi/testt/_search
{
"query" : {
"term" : { "name" : "John" }
}
}
Couple things going on here:
For both last and name, you are indexing the field itself as text and then a subfield as a keyword. Is that your intention? You want to be able to do analyzed/tokenized search on the raw field and then keyword search on the subfield?
If that is your intention, you now have two ways to query each of these fields. For example, name gives you the analyzed version of the field (you designed type text meaning Elasticsearch applied a standard analyzer on it and applied lowercase filter, some basic tokenizing and stemming, etc.) and name.keyword gives you the unaltered keyword version of this field
Therefore, your terms query expects your input string John to match on the field you're querying against. Since you used capitalization in your query input, you likely want to use the keyword subfield of name so try "term" : { "name.keyword" : "John" } instead.
As a light demonstration of what is happening to the original field, "term" : { "name.keyword" : "john" } should work as well
You are seeing results in _search because it is just executing a match_all. If you did pass a basic text parameter, it is executing against _all which is a concatenation of all the fields in each document, so both the keyword and text versions are available

Using both term and match query on same text field?

I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.

Resources