Filter with complex key not work (using startkey and endkey) - view

I create a view with Map function:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.logTime,doc.dbName,doc.tableName], 1);
}
}
I want to filter the data with multi-keys:
_design/select_data/_view/new-view/?limit=10&skip=0&include_docs=false&reduce=false&descending=true&startkey=["2018-06-19T09:16:47,527","stage"]&endkey=["2018-06-19T09:16:43,717","stage"]
but I still got:
{
"total_rows": 248133,
"offset": 248129,
"rows": [
{
"id": "01CGBPYVXVD88FPDVR3NP50VJW",
"key": [
"2018-06-19T09:16:47,527",
"ods",
"o_ad_dsp_pvlog_realtime"
],
"value": 1
},
{
"id": "01CGBQ6JMEBR8KBMB8T7Q7CZY3",
"key": [
"2018-06-19T09:16:44,824",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ4BKT8S2VDMT2RGH1FQ71",
"key": [
"2018-06-19T09:16:44,707",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ18CBHQX3F28649YH66B9",
"key": [
"2018-06-19T09:16:43,717",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
}
]
}
the key "ods" should not in the results.
What did I do wrong?

Your query is not multi-key .. ist start and endkey.
if you want to have results by dbname in a special time range.. you need to change the emit to [doc.dbName,doc.logTime,doc.tableName]
then you query startkey=["stage","2018-06-19T09:16:43,717"]&endkey=["stage","2018-06-19T09:16:47,527"]
(btw. are you sure that your timestamp is in the right order ? In your example the second TS is larger than the first..)

As you have chosen a full date/time stamp as the first level of your key, down to millisecond precision, there are unlikely to be any repeating values in the first level of your compound key. If you indexed just the date, say, as the first key, your date would be grouped by date, dbame and table name in a more predictable way
e.g.
["2018-06-19","ods","o_ad_dsp_pvlog_realtime"]
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"]
["2018-06-19",stage","s_ad_ztc_realpv_base_indirect"
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"
With this key structure, the hierarchical grouping of keys works in your favour i.e. all the data from "2018-06-19" is together in the index, with all the data matching ["2018-06-19","stage"] adjacent to each other.
If you need to get to millisecond precision, you could index the data as follows:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.dbName,doc.logTime], 1);
}
}
This would create index organised by dbName, but with a secondary sort on time. You can then extract the data for specified dbName between two timestamps.

Related

How to create a HashMap with custom object as a key?

In Elasticsearch, I have an object that contains an array of objects. Each object in the array have type, id, updateTime, value fields.
My input parameter is an array that contains objects of the same type but different values and update times. Id like to update the objects with new value when they exist and create new ones when they aren't.
I'd like to use Painless script to update those but keep them distinct, as some of them may overlap. Issue is that I need to use both type and id to keep them unique. So far I've done it with bruteforce approach, nested for loop and comparing elements of both arrays, but I'm not too happy about that.
One of the ideas is to take array from source, build temporary HashMap for fast lookup, process input and later store all objects back into source.
Can I create HashMap with custom object (a class with type and id) as a key? If so, how to do it? I can't add class definition to the script.
Here's the mapping. All fields are 'disabled' as I use them only as intermidiate state and query using other fields.
{
"properties": {
"arrayOfObjects": {
"properties": {
"typ": {
"enabled": false
},
"id": {
"enabled": false
},
"value": {
"enabled": false
},
"updated": {
"enabled": false
}
}
}
}
}
Example doc.
{
"arrayOfObjects": [
{
"typ": "a",
"id": "1",
"updated": "2020-01-02T10:10:10Z",
"value": "yes"
},
{
"typ": "a",
"id": "2",
"updated": "2020-01-02T11:11:11Z",
"value": "no"
},
{
"typ": "b",
"id": "1",
"updated": "2020-01-02T11:11:11Z"
}
]
}
And finally part of the script in it's current form. The script does some other things, too, so I've stripped them out for brevity.
if (ctx._source.arrayOfObjects == null) {
ctx._source.arrayOfObjects = new ArrayList();
}
for (obj in params.inputObjects) {
def found = false;
for (existingObj in ctx._source.arrayOfObjects) {
if (obj.typ == existingObj.typ && obj.id == existingObj.id && isAfter(obj.updated, existingObj.updated)) {
existingObj.updated = obj.updated;
existingObj.value = obj.value;
found = true;
break;
}
}
if (!found) {
ctx._source.arrayOfObjects.add([
"typ": obj.typ,
"id": obj.id,
"value": params.inputValue,
"updated": obj.updated
]);
}
}
There's technically nothing suboptimal about your approach.
A HashMap could potentially save some time but since you're scripting, you're already bound to its innate inefficiencies... Btw here's how you initialize & work with HashMaps.
Another approach would be to rethink your data structure -- instead of arrays of objects use keyed objects or similar. Arrays of objects aren't great for frequent updates.
Finally a tip: you said that these fields are only used to store some intermediate state. If that weren't the case (or won't be in the future), I'd recommend using nested arrays to enable querying independently of other objects in the array.

Count Unique Objects

My index looks like this:
"_source": {
"ProductName": "Random Product Name",
"Views": {
"Washington": [
{ "4nce5bbszjfppltvc": "2018-04-07T18:25:16.160Z" },
{ "4nce5bba8jfpowm4i": "2018-04-07T18:05:39.714Z" },
{ "4nce5bbszjfppltvc": "2018-04-07T18:36:23.928Z" },
]
}
}
I am trying to count the number of unique objects in Views.Washington.
In this case, the result would be 2, since two objects have the same key names. ( first and third object in the array ).
Obviously, my first thought was to use aggregations, but I am not sure how to use them with nested objects, like these.
Can this be done with normal aggregations?
Will I need to use a script?
Yes this can be done with Aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

couchDB- complex query on a view

I am using cloudantDB and want to query a view which looks like this
function (doc) {
if(doc.name !== undefined){
emit([doc.name, doc.age], doc);
}
what should be the correct way to get a result if I have a list of names(I will be using option 'keys=[]' for it) and a range of age(for which startkey and endkey should be used)
example: I want to get persons having name "john" or "mark" or "joseph" or "santosh" and lie between age limit 20 to 30.
If i go for list of names, query should be keys=["john", ....]
and if I go for age query should use startkey and endkey
I want to do both :)
Thanks
Unfortunately, you can't do so. Using the keys parameter query the documents with the specified key. For example, you can't only send keys=["John","Mark"]&startkey=[null,20]&endkey=[{},30]. This query would only and ONLY return the document having the name John and Mark with a null age.
In your question you specified CouchDB but if you are using Cloudant, index query might be interesting for you.
You could have something like that :
{
"selector": {
"$and": [
{
"name": {
"$in":["Mark","John"]
}
},
{
"year": {
"$gt": 20,
"$lt": 30
}
}
]
},
"fields": [
"name",
"age"
]
}
As for CouchDB, you need to either separate your request (1 request for the age and 1 for the people) or you do the filtering locally.

CouchDB pagination and sorting

So I am using this approach on CouchDB docs to perform pagination.
Request rows_per_page + 1 rows from the view
Display rows_per_page rows, store + 1 row as next_startkey and next_startkey_docid
As page information, keep startkey and next_startkey
Use the next_* values to
create the next link, and use the others to create the previous link
One thing I don't understand is, how do I perform sorting using this approach, assuming each document have a last updated timestamp and I want to sort using that field instead of sorting using ids.
First of all, sorting will always be on the KEYS.
Querying _all_docs result by query a table where the key is the _id.
[
{
"key": "my_first_id",
"value": {}
},
{
"key": "my_second_id",
"value": {}
}
]
So if you want to sort on another field than _id, you will need to use Map/Reduce(Views) For example, you could create a view where the key is the updatedAt field.
This would result in something like this :
[
{
"key": "1475858068",
"value": {}
},
{
"key": "1475553268",
"value": {}
}
]
So using the sort would result by sorting the key :)

Which is the better design for this API response

I'm trying to decide upon the best format of response for my API. I need to return a reports response which provides information on the report itself and the fields contained on it. Fields can be of differing types, so there can be: SelectList; TextArea; Location etc..
They each use different properties, so "SelectList" might use "Value" to store its string value and "Location" might use "ChildItems" to hold "Longitude" "Latitude" etc.
Here's what I mean:
"ReportList": [
{
"Fields": [
{
"Id": {},
"Label": "",
"Value": "",
"FieldType": "",
"FieldBankFieldId": {},
"ChildItems": [
{
"Item": "",
"Value": ""
}
]
}
]
}
The problem with this is I'm expecting the users to know when a value is supposed to be null. So I'm expecting a person looking to extract the value from "Location" to extract it from "ChildItems" and not "Value". The benefit to this however, is it's much easier to query for things than the alternative which is the following:
"ReportList": [
{
"Fields": [
{
"SelectList": [
{
"Id": {},
"Label": "",
"Value": "",
}
]
"Location": [
{
"Id": {},
"Label": "",
"Latitude": "",
"Longitude": "",
"etc": "",
}
]
}
]
}
So this one is a reports list that contains a list of fields which on it contains a list of fieldtype for every fieldtype I have (15 or something like that). This is opposed to just having a list of reports which has a list of fields with a "fieldtype" enum which I think is fairly easy to manipulate.
So the Question: Which format is best for a response? Any alternatives and comments appreciated.
EDIT:
To query all fields by fieldtype in a report and get values with the first way it would go something like this:
foreach(field in fields)
{
switch(field.fieldType){
case FieldType.Location :
var locationValue = field.childitems;
break;
case FieldType.SelectList:
var valueselectlist = field.Value;
break;
}
The second one would be like:
foreach(field in fields)
{
foreach(location in field.Locations)
{
var latitude = location.Latitude;
}
foreach(selectList in field.SelectLists)
{
var value= selectList.Value;
}
}
I think the right answer is the first one. With the switch statement. It makes it easier to query on for things like: Get me the value of the field with the id of this guid. It just means putting it through a big switch statement.
I went with the first one because It's easier to query for the most common use case. I'll expect the client code to put it into their own schema if they want to change it.

Resources