Grafana select max from multiple fields in Elasticsearch - elasticsearch

For context, my data in Elasticsearch looks like the following:
"data": {
"errors": {
"file/name/somewhere.kt": 7,
"file/name/somewhereElse.kt": 1,
"file/name/some.kt": 2,
"file/name/where.kt": 4
}
Now as far as I can see, in Grafana I can only filter on the exact name of the field (data.errors.file/name/somewhere.kt).
What I would like to do is a wildcard search that returns the name and number of the highest number. Something like:
data.errors.*
Which would then return:
"file/name/somewhere.kt": 7
As that's the one with the highest number of errors.
Unfortunately such a wildcard does not seem to work: it returns 0 results.
Is this possible, and if so, how? Possibly with a Lucene query?

Related

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

Validating my understanding of Dismax query in elasticsearch

I have tried understanding how dismax query works and I want to validate my understanding, please see if I understood it correctly.
According to documentation a dismax query is:
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.
Suppose, the total documents in our ES cluster be as follows:
{"FOO":"ABC"},{"FOO":"XYZ"},{"FOO":"ABC XYZ"},{"FOO":"ABC DEF"},{"FOO":"DEF"} and the dismax query is:
"dis_max": {
"queries": [
{
"match": {
"FOO": "ABC"
}
},
{
"match": {
"FOO": "XYZ"
}
}
]
}
}
So, as per the documentation let us first find out union of documents returned by dismax's sub-queries. The union of documents would be {"FOO":"ABC"},{"FOO":"XYZ"},{"FOO":"ABC XYZ"},{"FOO":"ABC DEF"}. According to the next step we need to score each document with the maximum score for that document as produced by any subquery. Which will be something like:
{"FOO":"ABC"}will be scored on {"match":{"FOO": "ABC"}} and {"match":{"FOO": "XYZ"}} and the maximum score returned will be used.
And similarly, {"FOO":"XYZ"}will be scored on {"match":{"FOO": "ABC"}} and {"match":{"FOO": "XYZ"}} and the maximum score returned will be used and this will be done for all the union of documents and finally the documents will be returned in a sorted way.
Is this how dismax query works? Or did I misunderstand or miss out anything?

Elasticsearch query based on properties of another document

Is there a way in ES to do a single query that finds documents that are based on values "close" (whose logic I determine) to values in another document?
Example: i have document like this:
{
"myId": 10,
"price": 200
}
Now I want to run a query that finds documents that are within 100 either side of the price of the above document (but I don't know the price of document on the client..all I have is the myId)
In other words, i want to write a client method like this:
GetSimilarDocuments(int myId);
Is that possible to do in a single ES query? Or do I need two round trips? (get the document, then do another query based on the values of the document)

The boolean fuzzy query in elasticsearch is not returning expected result

I am trying to build a fuzzy bool query on first and last names in elasticsearch 7.2.0. I have a document with "asim" and "banskota" as first and last name respectively. But when I query with "asi" or "asimmm" and the exact last name, elasticsearch returns no result. However, when queried with exact first name or "asimm", it returns me the intended result from the document.
I also wrote a "fuzzy" query instead of "match". I experimented with different fuzziness parameters, but the outcome is same. Both first name and last names are analyzed, and I queried the 'analyzer' API wrt how it analyze
'asim'. It is indexing the document with 'asim' as a single token with standard analyzer.
EDIT: It turns out that the fuzzy query works with 'Substitution' case, for example, it returns the result for 'asim' when queried with 'asmi' but not for deletion. It is surprising to me as the edit distance in the substitution is greater than in the deletion case. When the string length is greater, for instance with the last name 'Banskota', fuzzy matching works for 'deletion' case as well. What should I do to make the fuzzy search work in 'deletion' case with string length of 4 or 5?
fuzzy_body = {"size": 10,
"query":{
"bool":{
"must": [
{
"match":{"FIRST_NAME_N":{'query': 'asi',"fuzziness": "AUTO"}},
},
{
"fuzzy":{"LAST_NAME_N": "banskota"}
}
]
}
}
}
It turns out that if the name fields are indexed as keyword type, the query returns the expected results with "AUTO" fuzziness.

Elastic - Search across object without key specification

I have an index with hundreds of millions docs and each of them has an object "histogram" with values for each day:
"_source": {
"proxy": {
"histogram": {
"2017-11-20": 411,
"2017-11-21": 34,
"2017-11-22": 0,
"2017-11-23": 2,
"2017-11-24": 1,
"2017-11-25": 2692,
"2017-11-26": 11673
}
}
}
And I need one of two solutions:
Find docs where any value inside histogram object is greater then XX
Find docs where avg of values in histogram object is greater then XX
In point 1 I can use range query, but I must specify exactly name of field (i.e. proxy.histogram.2017-11-20). And wildcard version (proxy.histogram.*) doesnot work.
In point 2 I found in ES only average aggregation, but I don't want aggregation of these fields after query (because large of data), I want to only search these docs.

Resources