How can I get term aggregation to match a total string? - elasticsearch

I have a some data that I'm aggregating with elasticsearch 1.5.2 and when I do a terms aggregation on a field like city the buckets don't match full strings from the field. Ex.) If city is St. Louis then one bucket would be St. and the other Louis. Does anyone know how to make sure that when it aggregates it goes into a St. Louis bucket?
note: This may be caused from the data being analyzed which I'm pretty sure breaks up strings when comparing and searching etc.

You're correct. So you simply need to map your city field as a not_analyzed string using this mapping:
{
"your_type" : {
"properties" : {
"city" : {
"type" : "string",
"index" : "analyzed",
"fields" : {
"raw" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
And then you can simply run your aggregation on the city.raw field (which contains the un-analyzed value, i.e. St. Louis) instead of city, which is analyzed and breaks up the content into several tokens (i.e. st and louis).
If you know in advance, you're never going to need the analyzed field, you can simply store the not_analyzed field like this (i.e. no need for the fields part declaring a multi-field):
{
"your_type" : {
"properties" : {
"city" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}

Related

What does "mappings" do in Elasticsearch?

I just started learning Elasticsearch. I am trying out to create index, adding data, deleting data, and search data.
I can also understand the settings of Elasticsearch.
When using "PUT" to use settings
{
"settings": {
"index.number_of_shards" : 1,
"index.number_of_replicas" : 0
}
}
When using "GET" to retrieve settings information
{
"dsm" : {
"settings" : {
"index" : {
"creation_date" : "1555487684262",
"number_of_shards" : "1",
"number_of_replicas" : "0",
"uuid" : "qsSr69OdTuugP2DUwrMh4g",
"version" : {
"created" : "7000099"
},
"provided_name" : "dsm"
}
}
}
}
However,
What does "mappings" do in Elasticsearch?
{
"kibana_sample_data_flights" : {
"aliases" : { },
"mappings" : {
"properties" : {
"AvgTicketPrice" : {
"type" : "float"
},
"Cancelled" : {
"type" : "boolean"
},
"Carrier" : {
"type" : "keyword"
},
"Dest" : {
"type" : "keyword"
},
"DestAirportID" : {
"type" : "keyword"
},
"DestCityName" : {
}, // just part of data
The mapping document is a way of describing the structure of your data and defining the types eg boolean, text, keyword. These types are important as they determine how your fields are indexed and analysed.
Elasticsearch supports dynamic mapping, so effectively performs an automatic best guess of the appropriate types but you may wish to override these.
I found this to be a useful article to explain the mapping process:
https://www.elastic.co/blog/found-elasticsearch-mapping-introduction
Indexing is determined by the field type for example where the type is 'keyword' the search engine will be expecting an exact match, when the type is 'text' the search engine will be trying to determine how well the document matches the query term and in so doing so will be performing a 'full text search'.
So for example:
- A search for jump should also match jumped, jumps, jumping, and perhaps even leap.
This is a great article describing exact vs full text search and is where I took the jump example: https://www.elastic.co/guide/en/elasticsearch/guide/current/_exact_values_versus_full_text.html
Much of the power of elasticsearch is in the mapping and analysis.
Its the mapping of the index. This means it describes the data that is stored in this index. Take a deeper look here.

How to implement fuzzy field-centric (cross_fields) query on fields with multiple analysers?

Mapping:
{
"articles" : {
"mappings" : {
"data" : {
"properties" : {
"author" : {
"type" : "text",
"analyzer" : "standard"
},
"content" : {
"type" : "text",
"analyzer" : "english"
},
"tags" : {
"type" : "keyword"
},
"title" : {
"type" : "text",
"analyzer" : "english"
}
}
}
}
}
}
Example data:
{
"author": "John Smith",
"title": "Hello world",
"content": "This is some example article",
"tags": ["programming", "life"]
}
So as you see I have mapping with different analysers on different fields. Now I want to search across those fields in a following way:
only documents matching all search keywords are returned (like multi_match with cross_fields as a type and and as operator)
query should be fuzzy so it can tolerate some typos
different fields should have different boost values (e.g. title more important than content)
For example following query should match above document:
programing worlds john examlpe
How can I do it? According to documentation fuzziness won't work with cross_fields nor fields with different analysers.
One way of doing it would be implementing custom _all fields and coping all values there using copy_to but with this approach I can't assign different weights nor use different analysers.

How to add default values while adding a new field in existing mapping in elasticsearch

This is my existing mapping in elastic search for one of the child document
sessions" : {
"_routing" : {
"required" : true
},
"properties" : {
"operatingSystem" : {
"index" : "not_analyzed",
"type" : "string"
},
"eventDate" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"durations" : {
"type" : "integer"
},
"manufacturer" : {
"index" : "not_analyzed",
"type" : "string"
},
"deviceModel" : {
"index" : "not_analyzed",
"type" : "string"
},
"applicationId" : {
"type" : "integer"
},
"deviceId" : {
"type" : "string"
}
},
"_parent" : {
"type" : "userinfo"
}
}
in above mapping "durations" field is an integer array. I need to update the existing mapping by adding a new field called "durationCount" whose default value should be the size of durations array.
PUT sessions/_mapping
{
"properties" : {
"sessionCount" : {
"type" : "integer"
}
}
}
using above mapping I am able to update the existing mapping but I am not able to figure out how to assign a value ( which would vary for each session document like it should be durations array size ) while updating the mapping. any ideas ?
Well 2 recommendations here -
Instead of adding default value , you can adjust it in the query using missing filter. Lets say , you want to search based on a match query - Instead of just match query , use a bool query with should clause having the match and missing filter. inside filtered query. This way , those documents which did not have the field is also accounted.
If you absolutely need the value in that field for existing documents , you need to reindex the whole set of documents. Or , use the out of box plugin , update by query -

Elasticsearch phrase prefix query on multiple fields

I'm new to ES and I'm trying to build a query that would use phrase_prefix for multiple fields so I dont have to search more than once.
Here's what I've got so far:
{
"query" : {
"text" : {
"first_name" : {
"query" : "Gustavo",
"type" : "phrase_prefix"
}
}
}
}'
Does anybody knows how to search for more than one field, say "last_name" ?
The text query that you are using has been deprecated (effectively renamed) a while ago in favour of the match query. The match query supports a single field, but you can use the multi_match query which supports the very same options and allows to search on multiple fields. Here is an example that should be helpful to you:
{
"query" : {
"multi_match" : {
"fields" : ["title", "subtitle"],
"query" : "trying out ela",
"type" : "phrase_prefix"
}
}
}
You can achieve the same using the Java API like this:
QueryBuilders.multiMatchQuery("trying out ela", "title", "subtitle")
.type(MatchQueryBuilder.Type.PHRASE_PREFIX);

Storing only selected fields and not storing _all in pyes/elasticsearch

I am trying to use pyes with elasticsearch as full text search engine, I store only UUIDs and indexes of string fields, actual data is stored in MonogDB and retrieved using UUIDs. Unfortunately, I am unable to create a mapping that wouldn't store original data, I've tried various combinations of "store"/"source" fields and disabling "_all" but I can still get text of indexed fields. It seems that documentation is misleading on this topic as it's just a copy of original docs.
Can anyone please provide an example of mapping that would only store some fields and not the original document JSON?
Sure, you could use something like this (with two fields, 'uuid' and 'body'):
{
"mytype" : {
"_source" : {
"enabled" : false
},
"_all" : {
"enabled" : false
},
"properties" : {
"data" : {
"store" : "no",
"type" : "string"
},
"uuid" : {
"store" : "yes",
"type" : "string",
"index" : "not_analyzed"
}
}
}
}

Resources