Open Search Server: Facet terms limited to number of documents in index - faceted-search

I'm using Open Search Server on a Linux box. Everything is running well except that the number of facet results for any search seems to be limited to the number of documents in my index, which is not correct.
I'm indexing users, and users have tags. There are currently 2 users in my database, and they are tagged with 5 different tags. However, if I run a search that returns both users, only 2 different tags are returned as facets (there should be 5). If I then add a 3rd user to my index (and the new user has 0 tags) my search will return 3 tags as facets.
As far as I can tell, this is only a problem with facets. I am able to filter successfully on any of the 5 tags, and I can search successfully on the text of all 5 tags.
My index:
user_1 | tag_1, tag_2, tag_3, tag_4
user_2 | tag_2, tag_4, tag_5
Search for "":
Results:
user_1
user_2
Facets Actually Returned:
tag_1 (1)
tag_2 (2)
Facets That Should Be Returned:
tag_1 (1)
tag_2 (2)
tag_3 (1)
tag_4 (2)
tag_5 (1)
Search for "tag_5":
Results:
user_2
Facets Actually Returned:
tag_1 (0)
tag_2 (1)
Facets That Should Be Returned:
tag_1 (0)
tag_2 (1)
tag_3 (0)
tag_4 (1)
tag_5 (1)
Has anyone encountered this before? Have suggestions?
Edit: Should have mentioned, multivalued is set to yes on the facet.

OpenSearchServer knows two ways to compute facets. "Single valued method" and "Multivalued method".
Edit your search request and set "Multivalued" to "yes".
There is also two implementations for multivalued fields. One uses the "TermDocs" features, the other one uses the "TermVectors".
https://github.com/jaeksoft/opensearchserver/blob/master/src/main/java/com/jaeksoft/searchlib/facet/Facet.java
To test the one based on TermVectors you have to enabled the TermVector (set it to Yes) on your faceted field and index again the data.

Related

Query a text/keyword field in Elasticsearch that contains at least one item not matching a set

I have a document has a "bag.contents" field (indexed as text with a .keyword derivative) that contains a comma separated list of items contained in it. Below are some samples:
`Apple, Apple, Apple`
`Apple, Orange`
`Car, Apple` <--
`Orange`
`Bus` <--
`Grape, Car` <--
'Car, Bus` <--
The desired query results should be all documents where there is at least one instance of something other than 'Apple', 'Orange', 'Grape', as per the arrows above.
I'm sure the DSL is a combination of must and not but after 20 or so iterations it seems very difficult to get Elasticsearch to return the correct result set short of one that doesn't contain any of those 3 things.
It is also worth noting that this field in the original document is a JSON array and Kibana shows it as a single field with the elements as a comma-separated field. I suspect this may be complicating it.
1 - If it is showing up as single field, probably its not indexed as array - Please make sure document to index is formed properly. i.e, you need it to be
{ "contents": ["apple","orange","grape"]}
and not
{"contents": "apple,orange,grape"}
2- Regarding query - if you know all the terms possible while doing query- you can form a term_set query with all other terms but apple , orange and grape. termset query allows to control min matches required ( 1 in your case)
If you dont know all possible terms , may be create a separate field for indexing all other words minus apple orange and grape and query against that field.

Suggest Feature in Elastic Search

I am trying to implement suggest feature - Suggest Usage | Elasticsearch .NET Client [8.4] | Elastic 1 for handling misspelled words in my search implementation.
My search query is executed across multiple indices but while trying to use the suggest functionality , i am running into failures due to unmappaed fields.
Suppose i have an index named People which has a field - "name". Another index named news which has a field named - "title". My query was executed across both indices at the same time and search query had rules defined for both name and title fields. But while using suggest, i only want to return suggestions for name field in person index as part of the same query. As a result of this my news index is returning a failure that no mapping found for field name.
Is there a work- around in the suggest functionality via which i can specify an index name for the field mentioned in suggest - Suggest Usage | Elasticsearch .NET Client [8.4] | Elastic 1 OR can i ignore unmapped fields and continue to return search results from the other index (news) without returning any suggestions for misspelled words for that index.

Elasticsearch query to return limited amount of result (10) which will contain 2 from each specified keyword

I have articles stored in Elasticsearch and I've been wondering if there is a way I can query by date but the result to contain a specific amount of articles from each publisher. More specifically, I have 5 different publishers and I want to get the 10 latest articles, 2 from each publisher. I'm storing the publishers name as a keyword field in elastic.
The only idea I've come up with is to run a query for each publisher separately and limit the result to the first 2 (and then merge the results programmatically), but it will be more efficient I think if there is way I can do this in a single query.
Thanks
This sounds like a case for field collapsing.
You would collapse on the publisher field (as long as it is a keyword or a number) and then request inner_hits, the actual articles.

Elasticsearch : search for sets of items instead of items

I created a website where I log users actions: visit page, download document, log in, etc. Each action is timestamped, attached to a user and indexed in Elasticsearch
I would like to recognize predefined patterns in thoses actions. eg:
find users who visited this page, this other page and downloaded 2 documents in the last 3 weeks
find users who logged in and visited at least 5 pages in the same day
The problem I have is I always used ES to find items that match criterias but never to find set of items.
How would you start to solve this problem ?
Thank you for your help.
For the second query I would suggest aggregations (like SQL GROUP BY): count the number of page visits aggregated per user and day.
And then add conditions on these aggregated results (like SQL HAVING)
To filter on aggregation results I found this (not tested or tried to understand:):
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-bucket-selector-aggregation.html
Hope it helps

ElasticSearch position in index

I'm considering the use of Elasticsearch to build a rank. If I index a list of elements that is ordered according to a score. Can I query by an element name and get its position on the Index?
e.g i build an index with two elements:
"Element1", score: 8
"Element2", score: 7
"Element3", score: 10
When I query by "Element2" I would like to obtain position = 3
Elasticsearch doesn't know the place until it actually collects results and it collects results only to send them back to client. So, there is really no way to just get the place without going through results until you find the document you are looking for. If sending all these results to client doesn't work for you, you can write a plugin that will do it on the server side.

Resources