Aggregate by fields in _source - elasticsearch

I have an index in elasticsearch with documents that look like this:
"hits": [
{
"_index": "my-index2",
"_type": "my-type",
"_id": "1",
"_score": 1,
"_source": {
"entities": {
"persons": [
"Kobe Bryant",
"Michael Jordan"
],
"dates": [
"Yesterday"
],
"locations": [
"Munich",
"New York"
]
},
"my_field": "Kobe Bryant was one of the best basketball players of all times. Not even Michael Jordan has ever scored 81 points in one game. Munich is really an awesome city, but New York is as well. Yesterday has been the hottest day of the year."
}
}
Is it possible to use the aggregate function to aggregate by fields in the entities object? I tried this and it didn't work
{
"aggs" : {
"avg_date" : {
"avg" : {
"script" : {
"source" : "doc.entities.dates"
}
}
}
}
}
The error said that my index doesn't have an entities field.
EDIT: With the following term aggregation query:
{
"aggs" : {
"dates" : {
"terms" : { "field" : "entities.dates" }
}
}
}
I get an error saying
Fielddata is disabled on text fields by default. Set fielddata=true on [entities.dates] in order to load fielddata in memory by uninverting the inverted index.
I can set fielddata=true like the error says I should however the documentation warns against this because it uses a lot of heap space.Is there another way I can do this query?
EDIT 2: Solved this by setting all fields in entities to keywords in the index.

Related

Elasticsearch query nested object

I have this record in elastic:
{
"FirstName": "Winona",
"LastName": "Ryder",
"Notes": "<p>she is an actress</p>",
"Age": "40-50",
"Race": "Caucasian",
"Gender": "Female",
"HeightApproximation": "No",
"Armed": false,
"AgeCategory": "Adult",
"ContactInfo": [
{
"ContactPoint": "stranger#gmail.com",
"ContactType": "Email",
"Details": "Details of tv show",
}
]
}
I want to query inside the contact info object and I used the query below but I dont get any result back:
{
"query": {
"nested" : {
"path" : "ContactInfo",
"query" : {
"match" : {"ContactInfo.Details" : "Details of tv show"}
}
}
}
}
I also tried:
{
"query": {
"term" : { "ContactInfo.ContactType" : "email" }
}
}
here is the mapping for contact info:
"ContactInfo":{
"type": "object"
}
I think I know the issue which is the field is not set as nested in mapping, is there a way to still query nested without changing the mapping, I just want to avoid re-indexing data if its possible.
I'm pretty new to elastic search so need your help.
Thanks in advance.
Elasticsearch has no concept of inner objects.
Some important points from Elasticsearch official documentation on Nested field type
The nested type is a specialized version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.
If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested datatype instead of the object data type.
Internally, nested objects index each object in the array as a separate hidden document, such that that each nested object can be queried independently of the others with the nested query.
Refer to this SO answer, to get more details on this
Adding a working example with index mapping, search query, and search result
You have to reindex your data, after applying nested data type
Index Mapping:
{
"mappings": {
"properties": {
"ContactInfo": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"nested" : {
"path" : "ContactInfo",
"query" : {
"match" : {"ContactInfo.Details" : "Details of tv show"}
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64269180",
"_type": "_doc",
"_id": "1",
"_score": 1.1507283,
"_source": {
"FirstName": "Winona",
"LastName": "Ryder",
"Notes": "<p>she is an actress</p>",
"Age": "40-50",
"Race": "Caucasian",
"Gender": "Female",
"HeightApproximation": "No",
"Armed": false,
"AgeCategory": "Adult",
"ContactInfo": [
{
"ContactPoint": "stranger#gmail.com",
"ContactType": "Email",
"Details": "Details of tv show"
}
]
}
}
]

Can i use Elasticsearch with data that needs authentication to view(ex.Logged in users only)

I want to implement searching on my website.Users should be able to search for products that they have in their shop.Obviously,the products returned should only be theirs,same if customers search on their website.How can i implement this with Elasticsearch?Obviously,i will have my backend do the query not the front-end,but how will i limit the search results to be only for one user.Is it only possible through filtering with my own code.Does it have something like WHERE from sql?Am i going about it the wrong way?Will it be better if i use the Full text search from PostgreSQL.
I am using GO btw.
Best regards
Update:My usecase as requested:
User is paired with an ID.He is in his dashboard and searches for a product he has in his shop.His requests passes the session token cookie and i get his ID on my server.Then i need to get the products that match his query and only his.
In SQL it would be SELECT * FROM products WHERE shop_id=ID for example.Is it possible with Elasticsearch?Is it more trouble than worth instead of implementing full text search on PostgreSQL?
Iy can be easily achieved using Elasticsearch and you should define shop_id as a keyword field and later on use that in filter context of query to make sure, you search only on the products belong to a particular shop_id.
Using shop_id in filter context also improves the performance of your search significantly as these are by default cached at Elasticsearch as explained in the official doc
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Is the status field set to "published"?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Sample mapping and query according to your requirement:
Index mapping
{
"mappings" :{
"properties" : {
"product" : {
"type" : "text"
},
"shop_id" :{
"type" : "keyword"
}
}
}
}
Index sample docs for 2 diff shop ids
{
"product" : "foo",
"shop_id" : "stackoverflow"
}
{
"product" : "foo",
"shop_id" : "opster"
}
Search for foo product where shop_id is stackoverflow
{
"query": {
"bool": {
"must": [
{
"match": {
"product": "foo"
}
}
],
"filter": [
{
"term": {
"shop_id": "stackoverflow"
}
}
]
}
}
}
Search result
"hits": [
{
"_index": "productshop",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {. --> note only foo belong to `stackoverflow` returned
"product": "foo",
"shop_id": "stackoverflow"
}
}
]

SuggestionBuilder with BoolQueryBuilder in Elasticsearch

I am currently using BoolQueryBuilder to build a text search. I am having an issue with wrong spellings. When someone searches for a "chiar" instead of "chair" I have to show them some suggestions.
I have gone through the documentation and observed that the SuggestionBuilder is useful to get the suggestions.
Can I send all the requests in a single query, so that I can show the suggestions if the result is zero?
No need to send different search terms ie chair, chiar to get suggestions, it's not efficient and performant and you don't know all the combinations which user might misspell.
Instead, Use the fuzzy query or fuzziness param in the match query itself, which can be used in the bool query.
Let me show you an example, using the match query with the fuzziness parameter.
index def
{
"mappings": {
"properties": {
"product": {
"type": "text"
}
}
}
}
Index sample doc
{
"product" : "chair"
}
Search query with wrong term chiar
{
"query": {
"match" : {
"product" : {
"query" : "chiar",
"fuzziness" : "4" --> control it according to your application
}
}
}
}
Search result
"hits": [
{
"_index": "so_fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 0.23014566,
"_source": {
"product": "chair"
}
}

Elasticsearch shuffle index sorting

Thanks in advance. I expose the situation first and in the end the solution.
I have a collection of 2M documents with the following mapping:
{
"image": {
"properties": {
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
}
I have a webpage which paginates through all the documents with the following search:
{
"from":STARTING_POSITION_NUMBER,
"size":15,
"sort" : [
{ "_id" : {"order" : "desc"}}
],
"query" : {
"match_all": {}
}
}
And a hit looks like this(note that the _id value is a hash of the url to prevent duplicated documents):
{
"_index": "images",
"_type": "image",
"_id": "2a750a4817bd1600",
"_score": null,
"_source": {
"url": "http://test.test/test.jpg",
"timestamp": "2014-02-13T17:01:40.442307",
"title": "Test image!"
},
"sort": [
null
]
}
This works pretty well. The only problem I have is that the documents appear sorted chronologically (The oldest documents appear on the first page, and the ones indexed more recently on the last page), but I want them to appear on a random order. For example, page 10 should always show always the same N documents, but they don't have to appear sorted by the date.
I though of something like sorting all the documents by their hash, which is kind of random and deterministic. How could I do it?
I've searched on the docs and the sorting api just works for sorting the results, not the full index. If I don't find a solution I will pick documents randomly and index them on a separated collection.
Thank you.
I solved it using the following search:
{
"from":STARTING_POSITION_NUMBER,
"size":15,
"query" : {
"function_score": {
"random_score": {
"seed" : 1
}
}
}
}
Thanks to David from the Elasticsearch mailing list for pointing out the function score with random scoring.

Trying to search unique results with Elasticsearch and highlighting feature

I am trying to implement an Elasticsearch query that will return highlighted distinct results based on the field queried upon.
I am aware that this isn't initially supported in ES and that most people are satisfied with aggregate features like facets. But since I need more data than what a facet can provide and that result highlighting is key to the project, I have been looking for alternate solutions.
I am using the Tire gem for a Rails project and so far this was my strategy:
Query Elastic Search with on one part the query string with highlighting, and on the other the search
{
"query": {
"match": {
"name": {
"query": "Banana",
"analyzer": "query_analyzer",
"operator": "AND"
}
}
},
"facets": {
"group_by": {
"terms": {
"fields": [
"name"
],
"size": 10,
"all_terms": false
}
}
},
"highlight": {
"fields": {
"name": {}
},
"pre_tags": [
"<span class=\"highlight\">"
],
"post_tags": [
"</span>"
]
},
"size": 100
}
Cross-reference the unique results with the first matching element from the query result to not only retrieve the missing information but also the highlighting for the result.
The problem with this approach is that even if I limit my query results to ten times more than my initial result size, the cross-reference could end up not finding 10 unique results in the query.
Also if I disregard the query and cross-reference the facet results with my database, I will lose the highlight.
I was also thinking that maybe I could even index my data differently a second time to enforce uniqueness server-side but this has proven to be another challenge altogether.
I am running out of ideas right now so if anyone sees something I'm missing I would be very grateful for any help.
Edit:
As an example, lets say I have these documents indexed in ES
[
{
id: 1,
name: 'Banana',
countryOfOrigin: 'Banana land'
},
{
id: 2,
name: 'Banana',
countryOfOrigin: 'Candy mountain'
},
{
id: 3,
name: 'Carrot',
countryOfOrigin: 'United Kingdom'
},
{
id: 4,
name: 'Barrel',
countryOfOrigin: 'Canada'
}
]
And I search for "Ba" in the same fashion as the query above, I would expect to find something like this:
{
"_shards":{
/* ... */
},
"hits":{
"total" : 2,
"hits" : [
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "1",
"_source" : {
"id": '1',
"name": 'Banana',
"countryOfOrigin": 'Banana land'
}
"highlight": {
"name": ["<span class='highlight'>Ba</span>nana"]
}
},
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "4",
"_source" : {
"id": '4',
"name": 'Barrel',
"countryOfOrigin": 'Canada'
}
"highlight": {
"name": ["<span class='highlight'>Ba</span>rrel"]
}
}
]
}
}
This would basically allow me to search for distinct item names in my records.

Resources