Aggregating full text fields - elasticsearch

I'm trying to display the number of markets in an index. Each document has a field called market and I want aggregate the results like this:
"Advertising and sales" : 400
"Oil Industry" : 250
"Metal Industry" : 125
I know how to display these results using the query:
"aggs":{
"group_by_market":{
"terms":{
"field": "market"
}
}
}
The problem is that when they are displayed; they don't get displayed correctly. The markets are displayed separately. For example:
"Advertising": 400
"Sales": 400
"Oil": 322
...etc
How do I make it so the markets are aggregated with all the text?

The type of your field is text. You need to specify mapping of the field as "keyword" field ( Elasticsearch version 5 + ) Mappings
In older versions,mapping need to have "not_analyzed" Mappings
The basic difference between two is that one gets tokenized and meant for full text search while other one is meant for usecases like yours.

Related

Elastic Search: get documents where value > AVG

Hi I would like to understand how deep you can go with Elastic Search queries.
I have an index with a property "Price" and i would extract any document where Price > AVG(Price).
For example if I have 6 documents with this prices:
532,400,299,100,100,33
it should extracts documents 299, 400, 532 because > of price average (244).
I can reach this goal with simple elastic search query or I need to use something else, for example scripting (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html) or custom script with another language (nodejs, python, .Net etc) or use some ETL tool like Logstash?
I have some difficulties to identify the road. I tried to use subquery in ES using query but it's not supported.
({ "query" : "select * from myIndex where Price > (select avg('Price') from myIndex) "})
I think this query will solve your problem in efficient way:
{
"aggs":{
"price_gte_244":{
"filter":{
"range":{
"price":{
"gte":244
}
}
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"
}
}
}
}
}
}
When 244 could be any value/variable that you want
If you want to read more about filter aggregation: Filter aggregation docs
Depending on how your data is structured, you can get the avg value across all of your documents through an AVG aggregation. Then, use that value in a Range query to find the documents greater than this value. You can also look into the Bucket Selector Aggregation which has a lot of similarity with a SQL HAVING clause.

Finding all words and their frequencies in an elasticsearch index

Elasticsearch Newbie here. I have an elasticsearch cluster and an index http://localhost:9200/products and each product looks like this:
{
"name": "laptop",
"description" : "Intel Laptop with 16 GB RAM",
"title" : "...."
}
I wanted all keywords in a field and their frequencies across all documents for an index. For eg.
description : intel -> 2500, laptop -> 40000 etc. I looked at termvectors https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html but that only let's me do it across a single document. I want it across all documents in a particular field.
I wrote a plug-in for this ..but its expensive call ( based on how many terms you want to get and cardinality of terms ) https://github.com/nirmalc/es-termstat
Currently, there is no way to use term vectors on all documents at a time in an index. You can either use single term vector API for single document's term frequency count or multi-term vectors API to multiple document's term frequency. But a possible workaround could be like this -
make a scan request in order to get all documents from a given type,
and for each page to build a multi-term vector mentioned above to
request to get term vectors.
POST /products/_mtermvectors
{
"ids" : ["1", "2"],
"parameters": {
"fields": [
"description"
],
"term_statistics": true
}
}

ElasticSearch: Return the query within the response body when hits = 0

Please note that the following example is a very minified version of a real life use case, it is for the question to be easy to read and to make a point.
I have the following document structure:
{
"date" : 1400500,
"idc" : 1001,
"name": "somebody",
}
I am performing an _msearch query (multiple searchs at a time) based on different values (the "idc" and a "date" range)
When ES could not find any documents for the given date range it returns:
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
But, since there are N results, I cannot tell which "idc" and what "date" range was this result for.
I would like the response to have the "searched" date range and "idc" when there are no results for the given query. for example, if I am searching documents for IDC = 1001 and date between 1400100 and 1400200, but there are no results found, the response should have the query terms in the response body, something like this:
"hits":{
"total":0,
"max_score":null,
"query": {
"date": {
"gt": 1400100,
"lte": 1400200,
}
"idc": 1001,
}
}
That way I can tell what date range and "idc" combination has no results.
Please note that the above example is a very minified version of a real life use case, it is for the question to be easy to read and to make a point.
This is from the docs
multi search API(_msearch) response returns a responses array, which includes the search
response and status code for each search request matching its order in
the original multi search request.
since you know the order in which you sent the requests , you can find out which request failed.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Elastic completion suggester creating inputs

I have more than 200 000 records so I need to automatically create inputs for complete suggester.
I need to get results also for incorrect order ("Potter Harry" instead of "Harry Potter").
Mapping for suggestion:
"title_suggest":
{
"type": "completion"
}
Indexing:
{
"title" : {$title},
"title_suggest" :
{
"input" : {...},
"output": {$title}
}
}
Examples:
The simple one:
"Harry Potter" has input {"Harry Potter", "Potter Harry"}.
But how to create input for long titles? Eg. "Diary of a modern couple or women are from Venus and men are a moron"? It makes 1 307 674 368 000 variants of words order.
I hope it is clear what I need.
I changed the suggester. I'm not using Completion Suggester.
I'm using ngrams from here:
https://stackoverflow.com/a/29754468/1564987

How do I make a field have varying type in Elastic Search

I need a field, here score, to be a number, and other times a string. Like:
{
"name": "Joe"
"score": 32.5
}
{
"name": "Sue"
"score": "NOT_AVAILABLE"
}
How can I express this in this in the index settings in Elastic Search?
I basically want "dynamic typing" on the field. The code may not make sense to you (like: why not split it into 2 different fields), but it's necessary to be this way on my end (for consistency reasons).
I don't need/want the property to be indexed/"searchable" btw. I just need the data to be in the json response. I need something like "any object will fit here".
Finally figured it out. All I had to do was to set enabled to false, and elastic search will not attempt to do anything with the data - but it's still present in the json response.
Like so:
"score": {
"enabled": false
}
Just define "score" field to be of type "string" in your mapping. That's it. Make sure you do define the mapping before indexing any document though. Otherwise if you let the mapping be created dynamically and the type of value of "score" field is anything but string in the first document you index, you won't be able to index any document next in which "score" holds a string.

Resources