Mapping in elasticsearch - elasticsearch

Good morning, In my code I can't search data which contain separate words. If I search on one word all good. I think problem in mapping. I use postman. When I put in URL http://192.168.1.153:9200/sport_scouts/video/_mapping and use method GET I get:
{
"sport_scouts": {
"mappings": {
"video": {
"properties": {
"hashtag": {
"type": "string"
},
"id": {
"type": "long"
},
"sharing_link": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "long"
},
"video_preview": {
"type": "string"
}
}
}
}
}
}
All good title have type string but if I search on two or more words I get empty massive. My code in Trait:
public function search($data) {
$this->client();
$params['body']['query']['filtered']['filter']['or'][]['term']['title'] = $data;
$search = $this->client->search($params)['hits']['hits'];
dump($search);
}
Then I call it in my Controller. Can you help me with this problem?

The reason that your indexed data can't be found is caused by a mismatch of the analyzing during indexing and a strict term filter when querying the data.
With your mapping configuration, you are using the default analyzing which (besides many other operations) does a tokenizing. So every multi-word data you insert is split at punctuation or whitespaces. If you insert for example "some great sentence", elasticsearch maps the following terms to your document: "some", "great", "sentence", but not the term "great sentence". So if you do a term filter on "great sentence" or any other part of the original value containing a whitespace, you will not get any results.
Please see the elasticsearch docs on how to configure your mapping for indexing without analyzing (https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html#_index_2) or consider doing a match query instead of a term filter on the existing mapping (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html).
Please be aware that if you switch to not_analyzed you will be disabling many of the great fuzzy fulltext query functionality. Of course you can set up a mapping that does both, analyzed and not_analyzed in different fields. Then it's up on you to decide on which field you want to query on.

Related

Merging fields in Elastic Search

I am pretty new to Elastic Search. I have a dataset with multiple fields like name, product_info, description etc., So while searching a document, the search term can come from any of these fields (let us call them as "search core fields").
If I start storing the data in elastic search, should I derive a field which is a concatenated term of all the "search core fields" ? and then index this field alone ?
I came across _all mapping concept and little confused. Does it do the same ?
no, you don't need to create any new field with concatenated terms.
You can just use _all with match query to search a text from any field.
About _all, yes, it searches the text from any field
The _all field has been removed in ES 7, so it would only work in ES 6 and previous versions. The main reason for this is that it used too much storage space.
However, you can define your own all field using the copy_to feature. You basically specify in your mapping which fields should be copied to your custom all field and then you can search on that field.
You can define your mapping like this:
PUT my-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "custom_all"
},
"product_info": {
"type": "text",
"copy_to": "custom_all"
},
"description": {
"type": "text",
"copy_to": "custom_all"
},
"custom_all": {
"type": "text"
}
}
}
}
PUT my-index/_doc/1
{
"name": "XYZ",
"product_info": "ABC product",
"description": "this product does blablabla"
}
And then you can search on your "all" field like this:
POST my-index/_search
{
"query": {
"match": {
"custom_all": {
"query": "ABC",
"operator": "and"
}
}
}
}

I want to find exact term of sub string, exact term not just part of the term

I have group of json documents from wikidata (http://www.wikidata.org) to index to elasticsearch for search.
It has several fields. For example, it looks like below.
{
eId:Q25338
eLabel:"The Little Prince, Little Prince",
...
}
Here, what I want to do is for user to search 'exact term', not part of the term. Meaning, if a user search 'prince', I don't want to show this document in the search result. When user types the whole term 'the little prince' or 'little prince', I want to make this json included in the search result, namely.
Should I pre-process all the comma separate sentence (some eLabel has tens of elements in the list) and make it bunch of different documents and make the keyword term field respectively?
If not, how can I make a mapping file to make this search as expected?
My current Mappings.json.
"mappings": {
"entity": {
"properties": {
"eLabel": { # want to replace
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}
Should I pre-process all the comma separate sentence (some eLabel has tens of elements in the list) and make it bunch of different documents and make the keyword term field respectively?
This is exactly what you should do. Elasticsearch can't process the comma-separated list for you. It will think your data is just 1 whole string. But if you preprocess it, and then make the resulting field a Keyword field, that will work very well - it's exactly what the Keyword field type is designed for. I'd recommend using a Term query to search for exact matches. (As opposed to a Match query, a Term query does not analyse the incoming query and is thus more efficient.)

Is field named "language" somehow special?

In my query I have following filter:
"term": {
"language": "en-us"
}
And it's not returning any results despite there are a lot of docs with "language" = "en-us" and this field is defined in the mapping correctly. When I change filter for example for:
"term": {
"isPublic": true
}
Then it correctly filter by "isPublic" field.
My suspicion here is that field named "language" is treated somehow special? Maybe it's reserved keyword in ES query? Can't find it in docs.
ES v2.4.0
Mapping of document:
"mappings": {
"contributor": {
"_timestamp": {},
"properties": {
"createdAt": {
"type": "date",
"format": "epoch_millis||dateOptionalTime"
},
"displayName": {
"type": "string"
},
"followersCount_en_us": {
"type": "long"
},
"followersCount_zh_cn": {
"type": "long"
},
"id": {
"type": "long"
},
"isPublic": {
"type": "boolean"
},
"language": {
"type": "string"
},
"photoUrl": {
"type": "string",
"index": "not_analyzed"
},
"role": {
"type": "string",
"store": true
},
"slug": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
The field language is nothing special. It should be all in the mapping. Several possible causes come to mind:
query analyzer != index analyzer
the analyzer first splits into two tokens, en and de and then throws away short tokens, which would leave both, query and index empty:-)
the field is not indexed, just stored.
The - is not a normal ascii dash in the index or the query. I have seen crazy things happening when people paste queries from a word processor, like quotes are no longer straight quotes, dashes are ndash or mdash, ü ist not one character but a combined character.
EDIT after mapping was added to the question:
The type string is analyzed with the Standard Analyzer which splits text into tokens in particular at dashes too, so the field contains two tokens, "en" and "us". Your search is a term query, which should probably be called token-query, because it queries exactly this, the token as you write it: "en-us". But this token does not exist in the field.
Two ways to remedy this:
set the field to not-analyzed and keep the query as is
change the query to a match query.
I would rather use (1), since the language field content is something like an ID and should not be analyzed.
More about the topic: "Why doesn’t the term query match my document?" on https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-term-query.html

Using both term and match query on same text field?

I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.

How can I get a search term with a space to be one search term

I have an elasticsearch index, with a field called "name" with a mapping as follows:
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
Now let's say I have a record "Brooklyn Technical High School".
I would like somebody searching for "brooklyn t*" to have that show up. For example: http://myserver/_search?q=name:brooklyn+t*
It seems however to be tokening the search term, and searching for both "brooklyn" and "t", because I get back results like: "Ps 335 Granville T Woods".
I would like it to search the not_analyzed term using the whole term. Enclosing it in quotes doesn't seem to help either.
You need to use the term query -
Term query wont analyzer/tokenize the string before it apply the search.
{
"query": {
"term": {
"user": "kimchy"
}
}
}

Resources