Boost field on index in Elastic - elasticsearch

I'm using Elastic 1.7.3 and I would like to have a boost on some fields in a index with documents like this fictional example :
{
title: "Mickey Mouse",
content: "Mickey Mouse is a fictional ...",
related_articles: [
{"title": "Donald Duck"},
{"title": "Goofy"}
]
}
Here eg: title is really important, content too, related_articles is a bit more important. My real document have lot of fields and nested object.
I would like to give more weight to the title field than content, and more to content than related_articles.
I have seen the title^5 way, but I must use it at each query and I must (I guess) list all my fields instead of a "_all" query.
I do a lot of search but I found lot of deprecated solutions (_boost by eg).
As I used to work with Sphinx : I search something that works like the field weight option where you can give some weight to field that are really important in your index than others.

You're right that the _boost meta-field that you could use at the type level has been deprecated.
But you can still use the boost property when defining each field in your mapping, which will boost your field at indexing time.
Your mapping would look like this:
{
"my_type": {
"properties": {
"title": {
"type": "string", "boost": 5
},
"content": {
"type": "string", "boost": 4
},
"related_articles": {
"type": "nested",
"properties": {
"title": {
"type": "string", "boost": 3
}
}
}
}
}
}
You have to be aware, though, that it's not necessarily a good idea to boost your field at index time, because once set, you cannot change it unless you are willing to re-index all of your documents, whereas using query-time boosting achieves the same effect and can be changed more easily.

Related

Merging fields in Elastic Search

I am pretty new to Elastic Search. I have a dataset with multiple fields like name, product_info, description etc., So while searching a document, the search term can come from any of these fields (let us call them as "search core fields").
If I start storing the data in elastic search, should I derive a field which is a concatenated term of all the "search core fields" ? and then index this field alone ?
I came across _all mapping concept and little confused. Does it do the same ?
no, you don't need to create any new field with concatenated terms.
You can just use _all with match query to search a text from any field.
About _all, yes, it searches the text from any field
The _all field has been removed in ES 7, so it would only work in ES 6 and previous versions. The main reason for this is that it used too much storage space.
However, you can define your own all field using the copy_to feature. You basically specify in your mapping which fields should be copied to your custom all field and then you can search on that field.
You can define your mapping like this:
PUT my-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "custom_all"
},
"product_info": {
"type": "text",
"copy_to": "custom_all"
},
"description": {
"type": "text",
"copy_to": "custom_all"
},
"custom_all": {
"type": "text"
}
}
}
}
PUT my-index/_doc/1
{
"name": "XYZ",
"product_info": "ABC product",
"description": "this product does blablabla"
}
And then you can search on your "all" field like this:
POST my-index/_search
{
"query": {
"match": {
"custom_all": {
"query": "ABC",
"operator": "and"
}
}
}
}

Elasticsearch Nested-field vs Depth? Check for document depth via Kibana?

I'm reading about mapping in elasticsearch and I see these 2 terms: Nested-field & Depth. I think these 2 terms are quite equivalent. I'm currently confused by these 2. Please can anyone clear me out? Thank you.
And btw, are there any ways to check a document depth via Kibana?
Sorry for my english.
The source of confusion is probably because in Elasticsearch term nested can be used in two different contexts:
"nested" as a regular JSON notation nested, i.e. JSON object within JSON object;
"nested" as Elasticsearch nested data type.
In the mappings documentation page when they mention "depth" they refer to the first meaning. Here the setting index.mapping.depth.limit defines how deeply nested can your JSON documents be.
How is JSON depth interpreted by Elasticsearch mapping?
Here is an example of JSON document with depth 1:
{
"name": "John",
"age": 30
}
Now with depth 2:
{
"name": "John",
"age": 30,
"cars": {
"car1": "Ford",
"car2": "BMW",
"car3": "Fiat"
}
}
By default (as of ES 6.3) the depth cannot exceed 20.
What is a nested data type and why isn't it the same as a document with depth>1?
nested data type allows to index arrays of objects and query their items individually via nested query. What this means is that Elasticsearch will index a document with such fields differently (see the page Nested Objects of the Definitive Guide for more explanation).
For instance, if in the following example we do not define "user" as nested field in the mapping, a query for user.first: John and user.last: White will return a match and it will be a mistake:
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
If we do, Elasticsearch will index each item of the "user" list as an implicit sub-document and thus will use more resources, more disk and memory. This is why there is also another setting on the mappings: index.mapping.nested_fields.limit regulates how many different nested fields one can declare (which defaults to 50). To customize this you can see this answer.
So, Elasticsearch documents with depth > 1 are not indexed as nested unless you explicitly ask it to do so, and that's the difference.
Can I have nested fields inside nested?
Yes, you can! Just to stop this confusion, yes, you can define a nested field inside nested field in a mapping. It will look something like this:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"user": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"cars": {
"type": "nested",
"properties": {
"brand": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
But keep in mind that the amount of implicit documents to be indexed will be multiplied, and it will be simply not that efficient.
Can I get the depth of my JSON objects from Kibana?
Most likely you can do it with scripts, check this blog post for further details: Using Painless in Kibana scripted fields.

elasticsearch - field filterable but not searchable

Using elastic 2.3.5. Is there a way to make a field filterable, but not searchable? For example, I have a language field, with values like en-US. Setting several filters in query->bool->filter->term, I'm able to filter the result set without affecting the score, for example, searching for only documents that have en-US in the language field.
However, I want a query searching for the term en-US to return no results, since this is not really an indexed field for searching, but just so I can filter.
Can I do this?
ElasticSearch use an _all field to allow fast full-text search on entire documents. This is why searching for en-US in all fields of all documents return you the one containing 'language':'en-US'.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
You can specify "include_in_all": false in the mapping to deactivate include of a field into _all.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string"
},
"country": {
"type": "string"
},
"language": {
"type": "string",
"include_in_all": false
}
}
}
}
}
In this example, searching for 'US' in all field will return only document containing US in title or country. But you still be able to filter your query using the language field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html

Mapping in elasticsearch

Good morning, In my code I can't search data which contain separate words. If I search on one word all good. I think problem in mapping. I use postman. When I put in URL http://192.168.1.153:9200/sport_scouts/video/_mapping and use method GET I get:
{
"sport_scouts": {
"mappings": {
"video": {
"properties": {
"hashtag": {
"type": "string"
},
"id": {
"type": "long"
},
"sharing_link": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "long"
},
"video_preview": {
"type": "string"
}
}
}
}
}
}
All good title have type string but if I search on two or more words I get empty massive. My code in Trait:
public function search($data) {
$this->client();
$params['body']['query']['filtered']['filter']['or'][]['term']['title'] = $data;
$search = $this->client->search($params)['hits']['hits'];
dump($search);
}
Then I call it in my Controller. Can you help me with this problem?
The reason that your indexed data can't be found is caused by a mismatch of the analyzing during indexing and a strict term filter when querying the data.
With your mapping configuration, you are using the default analyzing which (besides many other operations) does a tokenizing. So every multi-word data you insert is split at punctuation or whitespaces. If you insert for example "some great sentence", elasticsearch maps the following terms to your document: "some", "great", "sentence", but not the term "great sentence". So if you do a term filter on "great sentence" or any other part of the original value containing a whitespace, you will not get any results.
Please see the elasticsearch docs on how to configure your mapping for indexing without analyzing (https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html#_index_2) or consider doing a match query instead of a term filter on the existing mapping (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html).
Please be aware that if you switch to not_analyzed you will be disabling many of the great fuzzy fulltext query functionality. Of course you can set up a mapping that does both, analyzed and not_analyzed in different fields. Then it's up on you to decide on which field you want to query on.

ElasticSearch what analyzer should be used for searching for both url fragment and exact url path

I want to store uri in a mapping and I want to make it searchable the following way:
Exact match (i.e. if I stored: http://stackoverflow.com/questions then looking for the term http://stackoverflow.com/questions retrieves the item.
Bit like letter tokenizer all "words" should be searchable. So searching for either questions, stackoverflow or maybe com will bring back http://stackoverflow.com/questions as a hit.
Looking for '.' or '/' separated url fragments should be still searchable. So searching for stackoverflow.com will bring back http://stackoverflow.com/questions as a hit.
should be case insensitive. (like lowercase)
The html://, htmls://, www. etc. is optional for searching. So searching for either http://stackoverflow.com or stackoverflow.com will bring back http://stackoverflow.com/questions as a hit.
Maybe a solution should be something like chaining tokenizers or something like that. I'm quite new to ES so this is maybe a trivial question.
So what kind of analyzer should I use/build to achieve this functionality?
Any help would be greatly apprechiated.
You are absolutely, correct. You will want to set your field type as multi_field and then create analyzers for each scenario. At the core, you can then do a multi_match query:
=============type properties===============
{
"fun_documents": {
"properties": {
"url": {
"type": "multi_field",
"fields": {
"keyword": {
"type": "string",
"analyzer": "keyword"
},
"alphanum_only": {
"type": "string",
"analyzer": "my_custom_alpha_num_analyzer"
},
{
"etc": "etc"
}
}
}
}
}
}
==================query=====================
{
"query": {
"multi_match": {
"query": "stackoverflow",
"fields": [
"url.keyword",
"url.alphanum_only",
"url.optional_fun"
]
}
}
}
Note that you can get fancy with multi_field aliases and reusing the same name, but this is the simple demonstration.

Resources