Get uniq results from elastic search - ruby

Get single record in given date. If it have multiple records then return single uniq record in every request (If it have single record it can be return same retult).
"query": {
"bool": {
"must": [
{
"match": {
"site_name": "blog_new_post"
}
},
{
"match": {
"postdate_yyyymmdd": "20190715"
}
}
]
}
},
"size": 1
}
I tried with size. So, size returning same record at some times.
{
"took": 152,
"timed_out": false,
"_shards": {
"total": 1180,
"successful": 1180,
"failed": 0
},
"hits": {
"total": 6624,
"max_score": 3.6852486,
"hits": [
{
"_index": "some-*",
"_type": "data-*",
"_id": "8a9e351e92e6b9b26c8d8fb0173cadd9",
"_score": 3.6852486,
"_source": {
"uniq_id": "8a9e351e92e6b9b26c8d8fb0173cadd9",
"postdate_yyyymmdd": "20190715"
}
}]
}
}
Uniq record based on uniq_id. Uniq id is different for every record.

For this you have to aggregate your results on the basis of aggregation.
As you said you want records based on uniq_id and I am assuming that you might have multiple records for this one id in your index and you want to return only one record for each uniq_id
You can refer to elastic search aggregations documentation.
you can sort records by newest and fetch top_hits in your aggregation. specifying the size of top_hits as 1 would get you 1 record for your uniq_id
Refer to documentation for aggregation instructions
Top_Hits Aggregation

Related

elasticsearch, multi_match and filter by date

It seems I followed every similar answer I found, but I just cant figure out what is wrong...
This is a "match all" query:
{
"query": {
"match_all": {}
}
}
..and the results:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "unittest_index_unittestdocument",
"_type": "unittestdocument",
"_id": "a.b",
"_score": 1,
"_source": {
"id": "a.b",
"docdate": "2018-01-24T09:45:44.4168345+02:00",
"primarykeywords": [
"keyword"
],
"primarytitles": [
"the title of a document"
]
}
}
]
}
}
but when I try to filter that with a date like this:
{
"query":{
"bool":{
"must":{
"multi_match":{
"type":"most_fields",
"query":"document",
"fields":[ "primarytitles","primarykeywords" ]
}
},
"filter": [
{"range":{ "docdate": { "gte":"1900-01-23T15:17:12.7313261+02:00" } } }
]
}
}
}
I have zero hits...
I tried to follow this https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html and this filtering by date in elasticsearch with no success at all..
Is there any difference that I cannot see????
Please note that when I remove the date filter and I add a term filter on "primarykeywords" i get the results i want. The only problem is the range filter
Apparently there was no error with my query, the problem was that the docdate field wasn't index... :/
Don't know why I initially skipped indexing that field (my mistake), but I do believe elastic should warn me that I am trying to query something that has "index: false"
This thing that elastic just doesn't return results without informing what is going on is, in my opinion, a major issue. I lost one day reading everything I could find on the web, just because I didn't had a proper feedback from the engine.
Fail safe died for this reason...

Query with `field` returns nothing

I'm new to elastic search and am having troubles with my queries.
When I do a match all I get this;
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [{
"_index": "stations",
"_type": "station",
"_id": "4432",
"_score": 1,
"_source": {
"SiteName": "Abborrkroksvägen",
"LastModifiedUtcDateTime": "2015-02-13 10:34:20.643",
"ExistsFromDate": "2015-02-14 00:00:00.000"
}
},
{
"_index": "stations",
"_type": "station",
"_id": "9110",
"_score": 1,
"_source": {
"SiteName": "Abrahamsberg",
"LastModifiedUtcDateTime": "2012-03-26 23:55:32.900",
"ExistsFromDate": "2012-06-23 00:00:00.000"
}
}
]
}
}
My search query looks like this:
{
"query": {
"query_string": {
"fields": ["SiteName"],
"query": "a"
}
}
}
The problem is that when I run the query above I get empty results which is strange. I should receive both of the documents from my index, right?
What am I doing wrong? Did I index my data wrong or is my query just messed up?
Appreciate any help I can get. Thanks guys!
There is nothing wrong either in your data or query. It seems you didn't understand how data get stored in elasticsearch!
Firstly, when you index data("SiteName": "Abborrkroksvägen" and "SiteName": "Abrahamsberg") they will get stored as individual analysed terms.
When you query ES using "query":"a"(means here you are looking for the term "a" ) then it will look for whether there is any match with term a but as there are no terms so you will get empty results.
When you query ES using "query":"a*"(means all terms starts with "a") then it will return you as you expected.
Hope this clarifies your question!
Also you may have a look at article I found recently about search - https://www.timroes.de/2016/05/29/elasticsearch-kibana-queries-in-depth-tutorial/

Using scan and scroll for elasticSearch on sense

I am trying to iterate over several documents in elasticSearch, and am using Sense (the google chrome plugin to do so). Using scan and scroll for efficiency I get the scroll id as:
POST _search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
The result of which is:
{
"_scroll_id": "c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs=",
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 20000,
"max_score": 0,
"hits": []
}
}
Then pass this to a GET as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
but I get 0 results, specifically:
{
"_index": "my_index",
"_type": "_search",
"_id": "scroll",
"found": false
}
I found the problem, I had specified the index my_index in the server box on sense. Removing this and re-executing the post command as:
POST /my_index/_search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
and passing the resulting scroll_id as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
worked!
This works in my sense (of course you should replace the id from your case; don't use ")
POST /test/_search?search_type=scan&scroll=1m
GET /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzI[...]Tt0b3RhbF9oaXRzOjQ7

Change the structure of ElasticSearch response json

In some cases, I don't need all of the fields in response json.
For example,
// request json
{
"_source": "false",
"aggs": { ... },
"query": { ... }
}
// response json
{
"took": 123,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 123,
"max_score": 123,
"hits": [
{
"_index": "foo",
"_type": "bar",
"_id": "123",
"_score": 123
}
],
...
},
"aggregations": {
"foo": {
"buckets": [
{
"key": 123,
"doc_count": 123
},
...
]
}
}
}
Actually I don't need the _index/_type every time. When I do aggregations, I don't need hits block.
"_source" : false or "_source": { "exclude": [ "foobar" ] } can help ignore/exclude the _source fields in hits block.
But can I change the structure of ES response json in a more common way? Thanks.
I recently needed to "slim down" the Elasticsearch response as it was well over 1MB in json and I started using the filter_path request variable.
This allows to include or exclude specific fields and can have different types of wildcards. Do read the docs in the link above as there is quite some info there.
eg.
_search?filter_path=aggregations.**.hits._source,aggregations.**.key,aggregations.**.doc_count
This reduced (in my case) the response size by half without significantly increasing the search duration, so well worth the effort..
In the hits section, you will always jave _index, _type and _id fields. If you want to retrieve only some specific fields in your search results, you can use fields parameter in the root object :
{
"query": { ... },
"aggs": { ... },
"fields":["fieldName1","fieldName2", etc...]
}
When doing aggregations, you can use the search_type (documentation) parameter with count value like this :
GET index/type/_search?search_type=count
It won't return any document but only the result count, and your aggregations will be computed in the exact same way.

Can I use ElasticSearch Facets as an equivalent to GROUP BY and how?

I'm wondering if I can use the ElasticSearch Facets features to replace to Group By feature used in rational databases or even in a Sphinx client?
If so, beside the official documentation, can someone point out a good tutorial to do so?
EDIT :
Let's consider an SQL table products in which I have the following fields :
id
title
description
price
etc.
I omitted the others fields in the tables because I don't want to put them into my ES index.
I've indexed my database with ElasticSearch.
A product is not unique in the index. We can have the same product with different price offers and I wish to group them by price range.
Facets gives you the number of the docs it a particular word is present for a particular field...
Now let's suppose you have an index named tweets, with type tweet and field "name"...
A facet query for the field "name" would be:
curl -XPOST "http://localhost:9200/tweets/tweet/_search?search_type=count" -d'
{
"facets": {
"name": {
"terms": {
"field": "name"
}
}
}
}'
Now the response you get is the as below
"hits": {
"total": 3475368,
"max_score": 0,
"hits": []
},
"facets": {
"name": {
"_type": "terms",
"total": 3539206,
"other": 3460406,
"terms": [
{
"term": "brickeyee",
"count": 9205
},
{
"term": "ken_adrian",
"count": 9160
},
{
"term": "rhizo_1",
"count": 9143
},
{
"term": "purpleinopp",
"count": 8747
}
....
....
This is called term facet as this is term based count...There are other facets also which can be seen here

Resources