ElasticSearch - Slop use case - elasticsearch

I'm pretty new in Elastic Search, and I'm trying to use slop attribute to solve a problem.
The problem :
I have a multi_match query, and one of the field is full of codes like :
Example 1 : "AE 102 BR, V 415 A, K45863"
Example 2 : "AE 100 BR, AE 101 BR, AE 103 BR, AE 104 BR"
The problem is that sometimes the Example 2 will be choosen for the query "AE 102 BR", because ES find multiple times "AE" and "BR".
The solution I want is boost the "close matching", by this I mean if I have 1 "3 consecutives words" match, it has to be always more relevant than 4 "2 consecutives words" match.
What I've try :
"multi_match": {
"fields": [ "field1^10", "field2^3",
"field3^3", "**field_code**^3", "global_compatibilities^1" ]
, "query": "AE 102 BR",
"slop": 1
}
but it doesn't work (the "slop" doesn't change anything to the score).
Can someone explains me better how to use slop in my case ?

Related

Autocomplete search for MongoDb and

I'm trying to build an efficient autocomplete search for my Spring boot app, but I'm not getting the proper results.
I have a cars database with multiple car models.
This is my current search aggregation
val agg = Document(
"\$search", Document(
"compound", Document(
"must", searchInput.split(" ").map {
Document(
"autocomplete", Document("query", it)
.append("path", "fullName")
)
}
)
)
)
fullName represents the full name of the car: Brand + Model + Year + Power
The search works but not good enough yet.
For instance if I search for "Ford GT" first results that come up are:
Ford Mustang GT V8 5.0
Ford Mustang GT 390 Fastback
Ford Mustang GT 302 HO
and so on.
But the first results should be:
Ford GT - 655 PS
Ford GT40 Mk III
and only afterwards, the ones above.
How can I achieve this?

Binance WebSocket Order Book - depths change every time

Below is a python script that subscribes order book information via Biance's Websocket API (Documentation Here).
In both requests(btcusdt#depth and btcusdt#depth#100ms), each json payload is streamed with a varying depth.
Please shed light on what might be the cause of this? Am I doing something wrong? Or might they have certain criteria as to how many depths of an order book to fetch?
import json
import websocket
socket='wss://stream.binance.com:9443/ws'
def on_open(self):
print("opened")
subscribe_message = {
"method": "SUBSCRIBE",
"params":
[
"btcusdt#depth#100ms"
],
"id": 1
}
ws.send(json.dumps(subscribe_message))
def on_message(self, message):
print("received a message")
###### depths of bid/ask ######
d = json.loads(message)
for k, v in d.items():
if k == "b":
print(f"bid depth : {len(v)}")
if k == "a":
print(f"ask depth : {len(v)}")
def on_close(self):
print("closed connection")
ws = websocket.WebSocketApp(socket,
on_open=on_open,
on_message=on_message,
on_close=on_close)
ws.run_forever()
btcusdt#depth#100ms
opened
received a message
received a message
bid depth : 3
ask depth : 12
received a message
bid depth : 14
ask depth : 12
received a message
bid depth : 17
ask depth : 24
received a message
bid depth : 14
ask depth : 16
received a message
bid depth : 3
ask depth : 5
received a message
bid depth : 16
ask depth : 6
.
.
.
btcusdt#depth
opened
received a message
received a message
bid depth : 135
ask depth : 127
received a message
bid depth : 125
ask depth : 135
received a message
bid depth : 95
ask depth : 85
received a message
bid depth : 68
ask depth : 88
received a message
bid depth : 119
ask depth : 145
received a message
bid depth : 127
ask depth : 145
.
.
.
Your code reads the length of the diff for the last 100 ms or 1000 ms (the default value when you don't specify the timeframe). I.e. the remote API sends just the diff, not the full list.
The varying length of the diff is expected.
Example:
An order book has 2 bids and 2 asks:
ask price 1.02, amount 10
ask price 1.01, amount 10
bid price 0.99, amount 10
bid price 0.98, amount 10
During the timeframe, one more bid is added and one ask is updated. So the message returns:
"b": [
[ // added new bid
0.97,
10
]
],
"a": [
[ // updated existing ask
1.01,
20
]
]
And your code reads this message as
bid depth: 1
ask depth: 1
During another timeframe, two bids are updated
"b": [
[ // updated existing bid
0.98,
20
],
[ // updated existing bid
0.99,
20
]
],
"a": [] // no changes
So your code reads this as
bid depth: 2
ask depth: 0
"btcusdt#depth#100ms" only provides the change in the order book, not the order book itself (as mentioned by the other answer)
Use: "btcusdt#depth10#100ms" if you want to stream the book 10 best bids and 10 best asks.

Find sequences in time series data using Elasticsearch

I'm trying to find example Elasticsearch queries for returning sequences of events in a time series. My dataset is rainfall values at 10-minute intervals, and I want to find all storm events. A storm event would be considered continuous rainfall for more than 12 hours. This would equate to 72 consecutive records with a rainfall value greater than zero. I could do this in code, but to do so I'd have to page through thousands of records so I'm hoping for a query-based solution. A sample document is below.
I'm working in a University research group, so any solutions that involve premium tier licences are probably out due to budget.
Thanks!
{
"_index": "rabt-rainfall-2021.03.11",
"_type": "_doc",
"_id": "fS0EIngBfhLe-LSTQn4-",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2021-03-11T16:00:07.637Z",
"current-rain-total": 8.13,
"rain-duration-in-mins": 10,
"last-recorded-time": "2021-03-11 15:54:59",
"rain-last-10-mins": 0,
"type": "rainfall",
"rain-rate-average": 0,
"#version": "1"
},
"fields": {
"#timestamp": [
"2021-03-11T16:00:07.637Z"
]
},
"sort": [
1615478407637
]
}
Update 1
Thanks to #Val my current query is
GET /rabt-rainfall-*/_eql/search
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence
[ rainfall where "rain-last-10-mins" > 0 ]
[ rainfall where "rain-last-10-mins" > 0 ]
until [ rainfall where "rain-last-10-mins" == 0 ]
"""
}
Having a sequence query with only one rule causes a syntax error, hence the duplicate. The query as it is runs but doesn't return any documents.
Update 2
Results weren't being returned due to me not escaping the property names correctly. However, due to the two sequence rules I'm getting matches of length 2, not of arbitrary length until the stop clause is met.
GET /rabt-rainfall-*/_eql/search
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence
[ rainfall where `rain-last-10-mins` > 0 ]
[ rainfall where `rain-last-10-mins` > 0 ]
until [ rainfall where `rain-last-10-mins` == 0 ]
"""
}
This would definitely be a job for EQL which allows you to return sequences of related data (ordered in time and matching some constraints):
GET /rabt-rainfall-2021.03.11/_eql/search?filter_path=-hits.events
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence with maxspan=12h
[ rainfall where `rain-last-10-mins` > 0 ]
until `rain-last-10-mins` == 0
"""
}
What the above query seeks to do is basically this:
get me the sequence of events of type rainfall
with rain-last-10-mins > 0
happening within a 12h window
up until rain-last-10-mins drops to 0
The until statement makes sure that the sequence "expires" as soon as an event has rain-last-10-mins: 0 within the given time window.
In the response, you're going to get the number of matching events in hits.total.value and if that number is 72 (because the time window is limited to 12h), then you know you have a matching sequence.
So your "storm" signal here is to detect whether the above query returns hits.total.value: 72 or lower.
Disclaimer: I haven't tested this, but in theory it should work the way I described.

Elasticsearch analyze/ignore whitespaces in keyword

Elasticsearch version : 6.8.5
I wonder if this is actually available.
I have some entries like this
- 100 x 2c ABSD
- 100 x 3a DDDDD
And I want to get both result when using "100 x" or "100x" as keywords
so far only this keyword works 100 x

Search with complete suggester and german analyzer

I created a simple index with a suggest field and a completion type. I indexed some city names. For the suggest field I use a german analyzer.
PUT city_de
{
"mappings": {
"city" : {
"properties": {
"name" : {
"type": "text",
"analyzer": "german"
},
"suggest" : {
"type": "completion",
"analyzer": "german"
}
}
}
}
}
The analyzer works fine and the search by using umlauts is good. Also the autocompletion is perfect. But I faced an issue by searching for the term wie.
Lets say I have two documents Wiesbaden and Wien with the same name as suggest completion term.
If I searching for wie I assume that the cities Wien and Wiesbaden are in the response. But unfortunately I get no response. I suppose that wie has a restriction because of the german analyzer. Because if I search for wi or wies I get valid responses.
Same is for term was, er, sie, und which looks like stemming words in german.
Do I need any additional configuration to get also a result if I search for wie or was?
Thanks!
The problem
Searching city names by prefix
"wie" should find "Wien" or "Wiesbaden"
Possible solution approach
For the usecase I would suggest using an edge n-gram https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html and ASCII folding the terms https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html.
Example
wien
token position start offset end offset
w 0 0 1
wi 1 0 2
wie 2 0 3
wien 3 0 4
wiesbaden
token position start offset end offset
w 0 0 1
wi 1 0 2
wie 2 0 3
wies 3 0 4
...
wiesbaden 8 0 9
Keep in mind that the system has to work in a asymmetric way now. The query should not be analyzed (use keyword) but the data in the index has to be analyzed.
There are two ways to achieve this:
1.) Add the query analyzer to use the query
2.) Bind the query analyzer to the field
"cities": {
"type": "text",
"fields": {
"autocomplete": {
"type": "text",
"analyzer": "autocomplete_analyzer", <-- index time analyzer
"search_analyzer": "autocomplete_search" <-- search time analyzer
}
}
}
Why does the german analyzer not work
The analyzer is designed for german text and uses an easy algorithm to remove flection and morphology.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#german-analyzer
Here is an example for the typical terms generated by this tokenizer
Hallo hier ist der Text über Wiesbaden und Wien. Es scheint angebracht über Wände und Wandern zu sprechen.
hallo 0 0 5
text 4 19 23
wiesbad 6 29 38
wien 8 43 47
scheint 10 52 59
angebracht 11 60 70
wand 13 76 81
wandern 15 86 93
sprech
If it works on city names this happens just by coincidence.

Resources