Find all entries on a list within Kibana via Elasticserach Query DSL - elasticsearch

Could you please help me on this? My Kibana Database within "Discover" contains a list of trades. I know want to find all trades within this DB that have been done in specific instruments (ISIN-Number). When I add a filter manually and switch to Elasticserach Query DSL, I find the following:
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"obdetails.isin": "CH0253592783"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0315622966"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0357659488"
}
}
],
"minimum_should_match": 1
}
}
}
Since I want to check the DB for more than 200 ISINS, this seems to be inefficient. Is there a way, in which I could just say "show me the trade if it contains one of the following 200 ISINs?".
I already googled and tried this, which did not work:
{
"query": {
"terms": {
"obdetails.isin": [ "CH0357659488", "CH0315622966"],
"boost": 1.0
}
}
}
The query works, but does not show any results.

To conclude. A field of type text is analyzed which basically converts the given data to a list of terms using given analyzers etc. rather than it being a single term.
Given behavior causes the terms query to not match these values.
Rather than changing the type of the field one may add an additional field of type keyword. That way a terms queries can be performed whilst still having the ability to match on the field.
{
"isin": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
The above example will add an extra field called obdetails.isin.keyword which can be used for terms. While still being able to use match queries on obdetails.isin

Related

How to search by non-tokenized field length in ElasticSearch

Say I create an index people which will take entries that will have two properties: name and friends
PUT /people
{
"mappings": {
"properties": {
"friends": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and I put two entries, each one of them has two friends.
POST /people/_doc
{
"name": "Jack",
"friends": [
"Jill", "John"
]
}
POST /people/_doc
{
"name": "Max",
"friends": [
"John", "John" # Max will have two friends, but both named John
]
}
Now I want to search for people that have multiple friends
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['friends.keyword'].length > 1"
}
}
}
]
}
}
}
This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.
Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "ctx._source.friends.length > 1"
}
}
}
]
}
}
}
But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.
One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.
Thanks a lot.
There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:
Add fields to existing documents without reindexing your data
Start working with your data without understanding how it’s structured
Override the value returned from an indexed field at query time
Define fields for a specific use without modifying the underlying schema
you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:
Index Time:
PUT my-index/
{
"mappings": {
"runtime": {
"friends_count": {
"type": "keyword",
"script": {
"source": "doc['#friends'].size()"
}
}
},
"properties": {
"#timestamp": {"type": "date"}
}
}
}
You can also use runtime fields in search time for more information check here.
Search Time
GET my-index/_search
{
"runtime_mappings": {
"friends_count": {
"type": "keyword",
"script": {
"source": "ctx._source.friends.size()"
}
}
}
}
Update:
POST mytest/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.arrayLength = ctx._source.friends.size()"
}
}
You can update all of your document with query above and adjust your query.
For everyone wondering about the same issue, I think #Kaveh answer is the most likely way to go, but I did not manage to make it work in my case. It seems to me that source is created after the query is performed and therefore you cannot access source for the purposes of filtering query.
This leaves you with two options:
filter the result on the application level (ugly and slow solution)
actually save the filed length in a separate field. Such as friends_count
possibly there is another option I don't know about(?).

handling both exact and partial search on the same search string

I want to define the schema which can tackle the partial as well as the exact search for the same search value.
The exact search should always return the "exact match", ES should not break the search string into tokens in this case.
For partial match data type of the property should be text and for exact it should be keyword. For having the feasibility to have both partial and exact search without having to index the data to different properties you can leverage using fields. What it does is that it helps to index same data into different ways.
So, lets say you want to index name of persons, and have the ability for partial and exact search. In such case the mapping would be:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Lets index a few docs:
PUT test/_doc/1
{
"name": "Nishant Saini"
}
PUT test/_doc/2
{
"name": "Nishant Kumar"
}
For partial search we have to query name field and it is of type text.
GET test/_doc/_search
{
"query": {
"query_string": {
"query": "Nishant Saini",
"field": [
"name"
]
}
}
}
The above query will return both docs (1 and 2) because one token i.e. Nishant appears in both the document for field name.
For exact search we need to query on name.keyword. To perform exact match we can use term query as below:
{
"query": {
"term": {
"name.keyword": "Nishant Saini"
}
}
}
This would match doc 1 only.

How to store URLs in Elasticsearch for fast access?

I would like to use Elasticsearch for a website and need a fast way to retrieve documents via the page's URL strings - actually paths (e.g. /shoes/sneakers/nike). The paths are unique.
Following solutions come to my mind:
Store as string, indexed, not analyzed
Store in the _id field
Which one would be the better solution and are there maybe better methods?
Thanks!
You can store the url field as keyword datatype and use the below query to get the results.
https://www.elastic.co/guide/en/elasticsearch/reference/master/keyword.html
POST index/_search
{
"query": {
"term": {
"url": "/shoes/sneakers/nike"
}
}
}
if you store it as text data type then elasticsearch will automatically create a keyword field for you and you can use the below query to get the results
mapping created by Elasticsearch
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
query to search
POST index/_search
{
"query": {
"term": {
"url.keyword": "/shoes/sneakers/nike"
}
}
}
{
"query": {
"match_phrase": {
"url": "/shoes/sneakers/nike"
}
}
}
You need to store url field with their values. This match_phrase will solve your problem.

Multiple Paths in Nested Queries

I'm cross-posting this from the elasticsearch forums (https://discuss.elastic.co/t/multiple-paths-in-nested-query/96851/1)
Below is an example, but first I’ll tell you about my use case, because I’m not sure if this is a good approach. I’m trying to automatically index a large collection of typed data. What this means is I’m trying to generate mappings and queries on those mappings all automatically based on information about my data. A lot of my data is relational, and I’m interested in being able to search accross the relations, thus I’m also interested in using Nested data types.
However, the issue is that many of these types have on the order of 10 relations, and I’ve got a feeling its not a good idea to pass 10 identical copies of a nested query to elasticsearch just to query 10 different nested paths the same way. Thus, I’m wondering if its possible to instead pass multiple paths into a single query? Better yet, if its possible to search over all fields in the current document and in all its nested documents and their fields in a single query. I’m aware of object fields, and they’re not a good fit because I want to retrive some data of matched nested documents.
In this example, I create an index with multiple nested types and some of its own types, upload a document, and attempt to query the document and all its nested documents, but fail. Is there some way to do this without duplicating the query for each nested document, or is that actually a performant way to do this? Thanks
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested",
"properties": {
"name": {
"type":"text"
},
"number": {
"type":"text"
}
}
},
"obj2" : {
"type" : "nested",
"properties": {
"color": {
"type":"text"
},
"food": {
"type":"text"
}
}
},
"lul":{
"type": "text"
},
"pucci":{
"type": "text"
}
}
}
}
}
PUT /my_index/type1/1
{
"obj1": [
{ "name":"liar", "number":"deer dog"},
{ "name":"one two three", "number":"you can call on me"},
{ "name":"ricky gervais", "number":"user 123"}
],
"obj2": [
{ "color":"red green blue", "food":"meatball and spaghetti"},
{ "color":"orange", "food":"pineapple, fish, goat"},
{ "color":"none", "food":"none"}
],
"lul": "lul its me user123",
"field": "one dog"
}
POST /my_index/_search
{
"query": {
"nested": {
"path": ["obj1", "obj2"],
"query": {
"query_string": {
"query": "ricky",
"all_fields": true
}
}
}
}
}

Elastic Search 2.0/2.1 Issue with Highlighter and the Bool Query

I am having an issue with highlighting in Elastic 2.0 and 2.1 - it's returning more information than I think it should.
I am constructing a bool query (the filtered query keyword is deprecated in 2.0+ so I am trying to update my syntax). I am building a must section and a filter section within the query, followed by a request for highlighting information.
The documentation says to use the query either in a query context or a filter context, but the highlighter doesn't seem to denote such a distinction.
Here is my fully formed query:
GET /sample04/_search
{
"query": {
"bool": {
"must": [
{
"query": { "query_string": { "query": "east west" } }
}
],
"filter": [
{
"terms": {"OwnerId": ["1", "2","3"]}
}
]
}
},
"highlight": {
"fields": {
"*": { "require_field_match": "false" }
}
}
}
So this query works as expected - we are querying for terms east or west, and we are filtering documents on an Id field that is part of our security requirements, and then I ask for highlighting information.
The downside, however, is the highlighting information contains a hit every instance of every value I submitted in my filter (in this case 1, 2 or 3) that matched any value in any field in any part of my document, like this:
"highlight": {
"SomeTextField": [
"North <em>West</em>"
],
"OwnerId": [
"<em>3</em>"
],
"SerialNumber": [
"<em>3</em>-<em>3</em>"
],
"AssociatedValue": [
"<em>3</em>",
"<em>2</em>"
],
"RelatedValue": [
"<em>3</em>",
"<em>3</em>",
"<em>3</em>",
"<em>3</em>",
"<em>3</em>"
]
}
How do I get the highlighter to match my query in the must section, but ignore the filter? It is my belief that it should ignore highlighting matches that were part of the filter, notably when it's highlighting fields that contain values were requested to filter a SPECIFIC FIELD, but it's utilizing the value anywhere within my document. This seems wrong somehow, but perhaps it's my understanding.
As an FYI, if I set require_field_match to TRUE, then I ONLY get hits that match the filter, and NONE that match the query.
I cannot specify a field to generate highlighting information for, whereas we consume Elastic as a search once find anywhere model, so I don't know field my result will return from.
Can you see what I'm doing wrong? It would be greatly appreciated to understand this.
You can use highlight query for this purpose. change your highlight part to
"highlight": {
"fields": {
"*": {
"highlight_query": {
"query_string": {
"query": "east west"
}
}
}
}
}

Resources