Elasticearch - check if Boolean keyword is in text

Elasticearch - check if Boolean keyword is in text - elasticsearch

having trouble with an elasticsearch query.
So I have about 22,000 Boolean Keywords(e.g. ("stock market") ("trading"), ("ecology"|"pollution"), etc.) in an index and I want to query for those documents whose keyword can be found in a certain text.
EDIT:
mappings are such:
{
"mapping": {
"properties": {
"id": {
"type": "long"
},
"word": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
}
}
These are examples of what this index's records look like:
"id" : 41074
"word" : """("Technical-Vocational-Livelihood "|"TVL") ("Track")""",
"id" : 38684,
"word" : """("Special Education Program")""",
Now, I want to filter only the documents whose "word" can be found in a specific text, E.G.
…Here, a student has the following options: prepare for college education, look for work, or start a business after graduation. K-12, which took effect in 2016, has four tracks: academic, technical-vocational-livelihood (TVL), arts and design and sports. “Before, when you drop out of high school, you are really a dropout; as you are expected to proceed to college or higher education…
Currently, I have this query:
GET client_keywords/_search
{
"query": {
"bool": {
"must": [
{"terms": {
"id": [
41074,
38684,
...
]
}
}
],
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "",
"params": {}
}
}
}
}
}
}
I was thinking of filtering this through ES's simple query API, but I don't know if it's possible to invoke the simple query API through scripting.

Related

Number of nested objects in Elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.

You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning

The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

Elasticsearch - extract values from a flattened field inside a nested type using a runtime field script

I have this mapping, a big document actually, a lot of fields excluded for brevity
{
"items": {
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword"
},
"costs": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"costs_samples": {
"type": "flattened"
}
}
}
}
}
}
}
costs_samples is a flattened field type huge collection of possible costs (sometimes more than 10k entries), based on some dynamic dimensions. Have to highlight that costs_sample cannot live outside costs, since at query time some conditions from costs should be composed with should or match clauses, like (costs.country=this_value AND some_other costs_samples_condition.
I would like to be able to extract and eventually inject a new field at costs level as a runtime field, and then use that field to sort, filter and aggregate.
Something like this
{
"runtime_mappings": {
"costs.selected_cost": {
"type": "long",
"script": {
"source":
"for (def cost : doc['costs.costs_samples']) { if(cost.values!= null) {emit(Long.parseLong(cost.values.some_dynamic_identity_known_at_query_time.last))} }"
}
}
},
"query":{
"nested": {
"path": "costs",
"query": {
"bool": {
"filter": [
{
"terms": {
"costs.id": ["id-1","id-2"]
}
},
{
"term": {
"costs.selected_cost": 10
}
}
]
}
}
}
},
"fields": ["costs.selected_cost"]
}
The trouble is selected_cost it is not created / returned by ES. No error message.
Where did I do wrong? Documentation was not helpful.
Maybe worth mention that I've also tried with 2 different documents, like items and costs and then execute kind of a join operation, but the performance tests were really poor.
Thanks!

Long story short, there is no way to use a runtime field inside a nested field type.

How to boost documents matching one of the query_string

Elasticsearch newbie here. I'm trying to lookup documents that has foo in its name but want to prioritize that ones having bar as well i.e. those with bar will be at the top of the list. The result doesn't have the ones with bar at the top. boost here doesn't seem to have any effect, likely I'm not understanding how boost works here. Appreciate any help here.
query: {
bool: {
should: [
{
query_string: {
query: `name:foo*bar*`,
boost: 5
}
},
{
query_string: {
query: `name:*foo*`,
}
}
]
}
}
Sample document structure:
{
"name": "foos, one two three",
"type": "car",
"age": 10
}
{
"name": "foos, one two bar three",
"type": "train",
"age": 30
}
Index mapping
{
"detail": {
"mappings": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"servings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}

Try switching the order for the query like so:
query: {
bool: {
should: [
{
query_string: {
query: `name:*foo*`,
}
},
{
query_string: {
query: `name:foo*bar*`,
boost: 5
}
}
]
}
}
it should work but if not you might need to do a nested search.

Search against keyword field.
If you will only run first part of the query ("query": "name:foo*bar*"), you will see that it is not returning anything. It is searching against tokens generated rather than whole string.
Text "foos, one two bar three" generates tokens like ["foos","one","two","bar","three"] and query is searching for "foo*bar*" in individual tokens hence no result. Keyword fields are stored as it is so search is happening against entire text.
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "name.keyword:foo*bar*",
"boost": 5
}
},
{
"query_string": {
"query": "name.keyword:*foo*"
}
}
]
}
}
Wildcards take huge memory and don't scale well. So it is better to avoid it. If foo and bar appear at start of words , you can use prefix query
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foo"
}
},
{
"prefix": {
"name": "bar"
}
}
]
}
}
}
You can also explore ngrams

ElasticSearch sort by value

I have ElasticSearch 5 and I would like to do sorting based on field value. Imagine having document with category e.g. genre which could have values like sci-fi, drama, comedy and while doing search I would like to order values so that first comes comedies then sci-fi and drama at last. Then of course I will order within groups by other criteria. Could somebody point me to how do this ?

Elasticsearch Sort Using Manual Ordering
This is possible in elasticsearch where you can assign order based on particular values of a field.
I've implemented what you are looking for using script based sorting which makes use of painless script. You can refer to the links I've mentioned to know more on these for below query would suffice what you are looking for.
Also assuming you have genre and movie as keyword with the below mapping.
PUT sampleindex
{
"mappings": {
"_doc": {
"properties": {
"genre": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"movie": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
You can make use of below query to get what you are looking for
GET sampleindex/_search
{
"query": {
"match_all": {}
},
"sort": [{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "if(params.scores.containsKey(doc['genre.raw'].value)) { return params.scores[doc['genre.raw'].value];} return 100000;",
"params": {
"scores": {
"comedy": 0,
"sci-fi": 1,
"drama": 2
}
}
},
"order": "asc"
}
},
{ "movie.raw": { "order": "asc"}
}]
}
Note that I've included sort for both genre and movie. Basically sort would happen on genre and post that further it would sort based on movie for each genre.
Hope it helps.

Autocomplete functionality using elastic search

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:
mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4
Usecase:
My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.
I investigated 2 ways for doing that but non of the methods seem to address the usecase above:
1) Suggester autocomplete:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html
With this it seems I have to add another field called "suggest" replicating the data which is not desirable.
2) using a prefix filter/query:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html
this gives the whole document back not the exact matching terms.
Is there a clean way of achieving this, please advise.

Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.
GET /netflix/movie/_search
{
"query": {
"query_string": {
"query": "sta*"
}
}
}

I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.
This is a sample index(I am assuming you are using ES 1.7 from your question
PUT netflix
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
},
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
},
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"movie":{
"properties": {
"name":{
"type": "string",
"fields": {
"prefix":{
"type":"string",
"index_analyzer" : "prefix_analyzer",
"search_analyzer" : "keyword_analyzer"
},
"raw":{
"type": "string",
"analyzer": "keyword_analyzer"
}
}
},
"tags":{
"type": "string", "index": "not_analyzed"
}
}
}
}
}
Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter
so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.
The following query will give top 10 suggestions.
GET netflix/movie/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sci-fi"
}
},
"query": {
"match": {
"name.prefix": "sta"
}
}
}
},
"size": 0,
"aggs": {
"unique_movie_name": {
"terms": {
"field": "name.raw",
"size": 10
}
}
}
}
Results will be something like
"aggregations": {
"unique_movie_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "star trek",
"doc_count": 1
},
{
"key": "star wars",
"doc_count": 1
}
]
}
}
UPDATE :
You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.
{
"query": {
"query_string": {
"query": "sta*"
}
},
"_source": false,
"highlight": {
"fields": {
"*": {}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticearch - check if Boolean keyword is in text - elasticsearch

Related

Number of nested objects in Elasticsearch

Elasticsearch - extract values from a flattened field inside a nested type using a runtime field script

How to boost documents matching one of the query_string

ElasticSearch sort by value

Autocomplete functionality using elastic search

Categories

Resources