I want to find exact term of sub string, exact term not just part of the term - elasticsearch

I have group of json documents from wikidata (http://www.wikidata.org) to index to elasticsearch for search.
It has several fields. For example, it looks like below.
{
eId:Q25338
eLabel:"The Little Prince, Little Prince",
...
}
Here, what I want to do is for user to search 'exact term', not part of the term. Meaning, if a user search 'prince', I don't want to show this document in the search result. When user types the whole term 'the little prince' or 'little prince', I want to make this json included in the search result, namely.
Should I pre-process all the comma separate sentence (some eLabel has tens of elements in the list) and make it bunch of different documents and make the keyword term field respectively?
If not, how can I make a mapping file to make this search as expected?
My current Mappings.json.
"mappings": {
"entity": {
"properties": {
"eLabel": { # want to replace
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}

Should I pre-process all the comma separate sentence (some eLabel has tens of elements in the list) and make it bunch of different documents and make the keyword term field respectively?
This is exactly what you should do. Elasticsearch can't process the comma-separated list for you. It will think your data is just 1 whole string. But if you preprocess it, and then make the resulting field a Keyword field, that will work very well - it's exactly what the Keyword field type is designed for. I'd recommend using a Term query to search for exact matches. (As opposed to a Match query, a Term query does not analyse the incoming query and is thus more efficient.)

Related

How to index a field for both numeric and text search

I want to enable both numberic and full-text search for a field.
I need the two ways to search the field for different scenarios.
How can I index the field?
You can always make use of fields for such use cases. Lets say the field name is field1. Below is how you can define it for indexing it in different ways:
"field1": {
"type": "integer",
"fields": {
"textval": {
"type": "text"
},
"keyword": {
"type": "keyword"
}
}
}
Refer this for understanding more on fields.

Is field named "language" somehow special?

In my query I have following filter:
"term": {
"language": "en-us"
}
And it's not returning any results despite there are a lot of docs with "language" = "en-us" and this field is defined in the mapping correctly. When I change filter for example for:
"term": {
"isPublic": true
}
Then it correctly filter by "isPublic" field.
My suspicion here is that field named "language" is treated somehow special? Maybe it's reserved keyword in ES query? Can't find it in docs.
ES v2.4.0
Mapping of document:
"mappings": {
"contributor": {
"_timestamp": {},
"properties": {
"createdAt": {
"type": "date",
"format": "epoch_millis||dateOptionalTime"
},
"displayName": {
"type": "string"
},
"followersCount_en_us": {
"type": "long"
},
"followersCount_zh_cn": {
"type": "long"
},
"id": {
"type": "long"
},
"isPublic": {
"type": "boolean"
},
"language": {
"type": "string"
},
"photoUrl": {
"type": "string",
"index": "not_analyzed"
},
"role": {
"type": "string",
"store": true
},
"slug": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
The field language is nothing special. It should be all in the mapping. Several possible causes come to mind:
query analyzer != index analyzer
the analyzer first splits into two tokens, en and de and then throws away short tokens, which would leave both, query and index empty:-)
the field is not indexed, just stored.
The - is not a normal ascii dash in the index or the query. I have seen crazy things happening when people paste queries from a word processor, like quotes are no longer straight quotes, dashes are ndash or mdash, ü ist not one character but a combined character.
EDIT after mapping was added to the question:
The type string is analyzed with the Standard Analyzer which splits text into tokens in particular at dashes too, so the field contains two tokens, "en" and "us". Your search is a term query, which should probably be called token-query, because it queries exactly this, the token as you write it: "en-us". But this token does not exist in the field.
Two ways to remedy this:
set the field to not-analyzed and keep the query as is
change the query to a match query.
I would rather use (1), since the language field content is something like an ID and should not be analyzed.
More about the topic: "Why doesn’t the term query match my document?" on https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-term-query.html

Using both term and match query on same text field?

I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.

Mapping in elasticsearch

Good morning, In my code I can't search data which contain separate words. If I search on one word all good. I think problem in mapping. I use postman. When I put in URL http://192.168.1.153:9200/sport_scouts/video/_mapping and use method GET I get:
{
"sport_scouts": {
"mappings": {
"video": {
"properties": {
"hashtag": {
"type": "string"
},
"id": {
"type": "long"
},
"sharing_link": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "long"
},
"video_preview": {
"type": "string"
}
}
}
}
}
}
All good title have type string but if I search on two or more words I get empty massive. My code in Trait:
public function search($data) {
$this->client();
$params['body']['query']['filtered']['filter']['or'][]['term']['title'] = $data;
$search = $this->client->search($params)['hits']['hits'];
dump($search);
}
Then I call it in my Controller. Can you help me with this problem?
The reason that your indexed data can't be found is caused by a mismatch of the analyzing during indexing and a strict term filter when querying the data.
With your mapping configuration, you are using the default analyzing which (besides many other operations) does a tokenizing. So every multi-word data you insert is split at punctuation or whitespaces. If you insert for example "some great sentence", elasticsearch maps the following terms to your document: "some", "great", "sentence", but not the term "great sentence". So if you do a term filter on "great sentence" or any other part of the original value containing a whitespace, you will not get any results.
Please see the elasticsearch docs on how to configure your mapping for indexing without analyzing (https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html#_index_2) or consider doing a match query instead of a term filter on the existing mapping (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html).
Please be aware that if you switch to not_analyzed you will be disabling many of the great fuzzy fulltext query functionality. Of course you can set up a mapping that does both, analyzed and not_analyzed in different fields. Then it's up on you to decide on which field you want to query on.

elasticsearch search query for exact match not working

I am using query_string for search. Searching is working fine but its getting all records with small letters and capital letters match.But i want to exact match with case sensitive?
For example :
Search field : "title"
Current output :
title
Title
TITLE,
I want to only first(title). How to resolved this issue.
My code in java :
QueryBuilder qbString=null;
qbString=QueryBuilders.queryString("title").field("field_name");
You need to configure your mappings / text processing so tokens are indexed without being lowercased.
The "standard"-analyzer lowercases (and removes stopwords).
Here's an example that shows how to configure an analyzer and a mapping to achieve this: https://www.found.no/play/gist/7464654
With Version 5 + on ElasticSearch there is no concept of analyzed and not analyzed for index, its driven by type !
String data type is deprecated and is replaced with text and keyword, so if your data type is text it will behave like string and can be analyzed and tokenized.
But if the data type is defined as keyword then automatically its NOT analyzed, and return full exact match.
SO you should remember to mark the type as keyword when you want to do exact match with case sensitive.
code example below for creating index with this definition:
PUT testindex
{
"mappings": {
"original": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"APPLICATION": {
"type": "text",
"fields": {
"exact": {"type": "keyword"}
}
},
"type": {
"type": "text",
"fields": {
"exact": {"type": "keyword"}
}
}
}
}
}
}

Resources