How do I view synonyms indexed in a document? - elasticsearch

I have added a synonyms token filter to my index and I think it is working as planned, but I want a way to confirm the exact values that are being stored for each document (some queries aren't using the synonym values as I expect, and I need to verify if the right values were stored at the time of indexing).
Is there a standard way to figure this out?
Example:
At some point I configured a synonym for NICE and PLEASANT.
At some point I indexed a document that has the word NICE in it.
Givens
_termvectors shows my document has the term NICE in it.
_analyze for my analyzer shows NICE and PLEASANT are synonyms.
Question:
How can I tell if the indexed document is using PLEASANT as a term/synonym?
Update
Adapting the answer from user3775217 (I had to update the syntax to work for ElasticSearch 5.2):
{
"query":{
"term": { "{someFieldToFilterOn}": "{SomeFieldValue}"}
},
"script_fields":{
"terms":{
"script":{
"lang":"groovy",
"inline":"doc[field].values",
"params":{
"field":"{TheFieldIwantIndexedTermsFrom}"
}
}
}
}
}

I have prepared this query couple of years back to find the indexed values for the document. You can use this query to learn about the values indexed in the field for each document.
You will need doc_id for each document and the document field you want to check on.
curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_id": "1770"
}
}
]
}
}
}
},
"script_fields": {
"terms": {
"script": "doc[field].values",
"params": {
"field": "input"
}
}
}
}'
Hope this helps

Related

Escaping a hash in a parent ID for elastic search?

We were having some issues where our queries weren't returning items with specific version IDs using ElasticSearch 2.3. After some investigation, it looks like our current elasticsearch query is not behaving when there is a '#' in the version ID.
The query I am trying to perform is something like the following:
{
"query": {
"constant_score": {
"filter": {
"terms": {
"_parent": [
"faro-deployments-webservice-infrastructure|#abc123",
"faro-deployments-webservice-infrastructure|xyz321"
]
}
}
}
}
}
This works fine but excludes any results where the parent ID has a '#' character in it.
I can't seem to find it again, but I recall reading somewhere that # has a specific meaning in this context. I have tried a variety of ways to attempt to escape the #, is there a way to support versions with a # character in it for this or perform a similar query with similar results?
The following seems to work for me. I changed the query to do something similar and did not use the "_parent" field.
{
"query": {
"has_parent": {
"type": "deck",
"query": {
"constant_score": {
"filter": {
"terms": {
"_id": [
"faro-deployments-webservice-infrastructure|#abc123",
"faro-deployments-webservice-infrastructure|xyz321"
]
}
}
}
}
}
}
}

Search in every field with a fixed parameter

Perhaps it's a basic question; by the way, I need to search in every indexed field and to have a specific fixed value for another field.
How can I do it?
Currently I have a simple: query( "aValue", array_of_models )
I tried many options without success, for example:
query({
"query": {
"bool": {
"query": "aValue",
"filter": {
"term": {
"published": "true"
}
}
}
}
})
I would prefer to avoid to specify the fields to search in because I use the same search params for different models.
I found a solution, perhaps it's not optimized but works:
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": "aValue"
}
}
],
"filter": {
"term": {
"published": true
}
}
}
}
}
Not sure if I understood correctly your intention.
The _all field is as default enabled. So if you have no special mapping every indexed field value is added as text string to the _all field.
You can use the
Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
Simple Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
With a simple query like this, that should work for you.
GET my_index/_search
{
"query": {
"simple_query_string": {
"query": "aValue",
"fields": []
}
}
}
Both query types contains parameters, that should suffice your use case IMHO.

Hide a single record in Elastic Search on a per user basis

As a logged in user, I want to be able to hide a single record that I never want to see again if I perform the same search. Is this possible with ElasticSearch?
I've read about multitenancy and filters but I'm not quite sure how a top level implementation might look like.
One of my ideas is that I store some reference to the unwanted record in an RDB and then add those references into a filter query but I'm not sure what reference to use since Elastic Search generates it's own ID's that may not stay the same when a re-index happens.
It depends. If you have not many users and not too big documents you can go with field on the document, Add field dismissedBy and when use dismiss write update to document
POST test/type1/1/_update
{
"script" : {
"inline": "ctx._source.dismissedBy.add(params.userId)",
"lang": "painless",
"params" : {
"userId" : "1"
}
}
}
And query:
POST /index/documents/_search
{
"query": {
"bool": {
"must_not": {
"term": {
"dismissedBy": 1
}
}
}
}
}
Problem with this approach is that if you re-index document all settings will be overwritten so you must keep copy in some other places too.
Other option if documents are large or you have lots of users then I would go with parent/child approach
If user hit dismiss then you should index it
PUT /indexname/dissmisses/1?parent=dismissforid
{
"userId": 1
}
Then when you search you do
POST /index/documents/_search
{
"query": {
"bool": {
"must_not": {
"has_child": {
"type": "dissmiss",
"query": {
"term": {
"userId": 1
}
}
}
}
}
}
}

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

what's difference between simple_query_string and query_string?

I had a nested field source in my index seems like this:
"source": [
{
"name": "source_c","type": "type_a"
},
{
"name": "source_c","type": "type_b"
}
]
I used query_string query and simple_query_string query to query type_a and got two different result.
query_string
{
"size" : 3,
"query" : {
"bool" : {
"filter" : {
"query_string" : {
"query" : "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
I got 163459 hits in 294088 docs.
simple_query_string
{
"size": 3,
"query": {
"bool": {
"filter": {
"simple_query_string": {
"query": "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
I got 163505 hits in 294088 docs.
I only made three different types type_a,type_b,type_c randomly. So I had to say 163459 and 163505 were very little difference in 294088 docs.
I noly got one info in Elasticsearch Reference [2.1]
Unlike the regular query_string query, the simple_query_string query will never throw an exception, and discards invalid parts of the query.
I don't think it's the reason to make the difference.
I want to know what make the little different results between query_string and simple_query_string?
As far as I know, nested query syntax is not supported for either query_string or simple_query_string. It is an open issue, and this is the PR regarding that issue.
Then how are you getting the result? Here Explain API will help you understand what is going on. This query
{
"size": 3,
"query": {
"bool": {
"filter": {
"simple_query_string": {
"query": "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
have a look at the output, you will see
"description": "ConstantScore(QueryWrapperFilter(_all:source _all:source.type _all:type_a)),
so what is happening here is that ES looking for term source , source.type or type_a, it finds type_a and returns the result.
You will also find something similar with query_string using explain api
Also query_string and simple_query_string have different syntax, for e.g field_name:search_text is not supported in simple_query_string.
Correct way to query nested objects is using nested query
EDIT
This query will give you desired results.
{
"query": {
"nested": {
"path": "source",
"query": {
"term": {
"source.type": {
"value": "type_a"
}
}
}
}
}
}
Hope this helps!!
Acording to the documentation simple_query_string is meant to be used with unsafe input.
So that users can enter anything and it will not throw exception if input is invalid. Will simply discard invalid input.

Resources