Reverse mapping in aerospike - go

I have a few records in aerospike in following key-value pair :
Key : "1234"
Value : {
"XYZ":{
"B":[1,3]
"C":[3,4]
}
}
Key : "5678"
Value : {
"XYZ":{
"B":[1,3,5]
"C":[3,4]
}
}
I want to get all the keys from set where field "B" in json value contains let say 3. Is there any way to query all such keys in golang ?

Yes, you can build a secondary index on the values in map key "B" ... at that nested level and then run a secondary index query to get all matching records.
You can do the same in Go using equivalent APIs.
Many Java interactive code examples at: https://developer.aerospike.com/tutorials/java/cdt_indexing
For example, this is top level example with string values:
Then another example where you can build a SI on nested sublevel:

Related

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?
I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values
OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]
The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

How to update a document using index alias

I have created an index "index-000001" with primary shards = 5 and replica = 1. And I have created two aliases
alias-read -> index-000001
alias-write -> index-000001
for indexing and searching purposes. When I do a rollover on alias-write when it reaches its maximum capacity, it creates a new "index-000002" and updates aliases as
alias-read -> index-000001 and index-000002
alias-write -> index-000002
How do I update/delete a document existing in index-000001(what if in case all I know is the document id but not in which index the document resides) ?
Thanks
Updating using an index alias is not directly possible, the best solution for this is to use a search query using the document id or a term and get the required index. Using the index you can update your document directly.
GET alias-read/{type}/{doc_id} will get the required Document if doc_id is known.
If doc_id is not known, then find it using a unique id reference
GET alias-read/_search
{
"term" : { "field" : "value" }
}
In both cases, you will get a single document as a response.
Once the document is obtained, you can use the "_index" field to get the required index.
PUT {index_name}/{type}/{id} {
"required_field" : "new_value"
}
to update the document.

Project the sum of all fields in a document that match a regular expression, in elasticsearch

In Elasticsearch, I know I can specify the fields I want to return from documents that match my query using {"fields":["fieldA", "fieldB", ..]}.
But how do I return the sum of all fields that match a particular regular expression (as a new field)?
For example, if my documents look like this:
{"documentid":1,
"documentStats":{
"foo_1_1":1,
"foo_2_1":5,
"boo_1_1:3
}
}
and I want the sum of all stats that match _1_ per document?
You can define an artificial field called script_field that contains a small Groovy script, which will do the job for you.
So after your query, you can add a script_fields section like this:
{
"query" : {
...
},
"script_fields" : {
"sum" : {
"script" : "_source.documentStats.findAll{ it.key =~ '_1_'}.collect{it.value}.sum()"
}
}
}
What the script does is simply to retrieve all the fields in documentStats whose name matches _1_ and sums all their values, in this case, you'll get 4.
Make sure to enable dynamic scripting in elasticsearch.yml and restart your ES node before trying this out.

Sum of total tokens in array

I have a document as below -
{
"array" : [ "Aone" , "Btwo" , "Aone" ]
}
I need to aggregate the sum of number of elements in array using aggregation.
value_count is giving me the unique tokens , but that is not what i am looking for.
First you need to make array a multi field with a new field called numOfTokens . Declare this field as token count.
You can find more about it here -http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
This will create an addition field called array.numOfTokens per document that will have the number of tokens for that field.
Next you can do a simple sum aggregation on that field using - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#search-aggregations-metrics-sum-aggregation

mongodb aggregation very slow when $match key not in index (does tablescan?)

db.collection.aggregate(
{ "$match" : {
"key" : "mykey"
}
},
{
"$sort" : {
"time" : -1
}
},
{
"$limit" : 1
}
)
example document:
{
key: "key1",
time: ISODate("2014-07-04T20:04:46.904Z")
}
indexes
"time" : -1
"key" : 1,
"_id" : 1
when "mykey" exists in the collection the query takes 30ms, when "mykey" does not exist it takes 10s,
explain tells me indexes are used.
This is a capped collection, therefor it usually occurs that "keys" are missing.
Why does it take that long.
btw. Mongodb 2.4
further exploration:
removing the index for the sort reduces the lookup time:
explain for aggregate with and without index on the sort field shows that with index the sort gets executed at the start of the pipeline, without index on sort it gets executed as last step of the pipeline
Your query is equality on key and sort on time which means that you are using the wrong index for this (your index is on time:1, key:1 in essence).
The order of fields for the query you are running should be key:1, time:1 (as first two fields) in order to have effective help from it. With that index, the matched key value can be jumped to directly, and then if there are multiple time values for that key then they are sorted and the highest one can be immediately fetched. If key is not found in the index then you're done.
As it is, the query is forced to scan all time values in Index (leading field) so that when you find the first matching key you'll be able to return. When key you are looking for doesn't exist, the query ends up scanning through the entire index before it can return.

Resources