how to filter for results for which a property has a value contained in array X - rethinkdb

Say I've got a dynamic array A of values [x,y,z].
I want to return all results for which property P has a value that exists in A.
I could write some recursive filter that concatenates 'or's for each value in A, but it's extremely clunky.
Any other out-of-the-box way to do this?

You can use the filter command in conjunction with the reduce and contains command to accomplish this.
Example
Let's say you have the following documents:
{
"id": "41e352d0-f543-4731-b427-6e16a2f6fb92" ,
"property": [ 1, 2, 3 ]
}, {
"id": "a4030671-7ad9-4ab9-a21f-f77cba9bfb2a" ,
"property": [ 5, 6, 7 ]
}, {
"id": "b0694948-1fd7-4293-9e11-9e5c3327933e" ,
"property": [ 2, 3, 4 ]
}, {
"id": "4993b81b-912d-4bf7-b7e8-e46c7c825793" ,
"property": [ "b" ,"c" ]
}, {
"id": "ce441f1e-c7e9-4a7f-9654-7b91579029be" ,
"property": [ "a" , "b" , "c" ]
}
From these sequence, you want to get all documents that have either "a" or 1 in their property property. You can write a query that returns a chained contains statement using reduce.
r.table('30510212')
// Filter documents
.filter(function (row) {
// Array of properties you want to filter for
return r.expr([ 1, 'a' ])
// Insert `false` as the first value in the array
// in order to make it the first value in the reduce's left
.insertAt(0, false)
// Chain up the `contains` statement
.reduce(function (left, right) {
return left.or(row('property').contains(right));
});
})
Update: Better way to do it
Actually, you can use 2 contains to execute the same query. This is shorter and probably a bit easier to understand.
r.table('30510212')
.filter(function (row) {
return row('property').contains(function (property) {
return r.expr([ 1, 'a' ]).contains(property);
})
})

Related

How to access celldata objects in sheets api

I'm working on a google sheets integration project where I'd like to add formatted text to cells (bold, italic). This needs to be for only part of the cell (e.g. only some of the text in the cell is bold ) I can see that this can be done though the CellData object, documented in the sheets api here:
CellData
But I can't work out how to get an instance of these objects. I'm using the sheets service to successfully get a SpreadSheet, Sheet and ValueRange objects, but I can't work out how to get through to the cell data objects themselves to use these methods.
When a part of value of a cell has several formats, you want to retrieve the formats.
You want to put a value with several formats to a cell.
I understand your question as above. If my understanding is correct, how about these samples?
1. Retrieve value
When a part of value of a cell has several formats like below image,
the script for retrieving the values with the formats is as follows.
Sample script:
This sample script retrieves the value from the cell "A1" of "Sheet1".
spreadsheet_id = '### spreadsheet ID ###'
ranges = ['Sheet1!A1']
fields = 'sheets(data(rowData(values(textFormatRuns,userEnteredValue))))'
response = service.get_spreadsheet(spreadsheet_id, ranges: ranges, fields: fields)
Result:
{
"sheets": [
{
"data": [
{
"rowData": [
{
"values": [
{
"userEnteredValue": {
"stringValue": "abcdefg"
},
"textFormatRuns": [
{
"format": {}
},
{
"format": {
"fontSize": 24,
"foregroundColor": {
"red": 1
},
"bold": true
},
"startIndex": 2
},
{
"format": {},
"startIndex": 5
}
]
}
]
}
]
}
]
}
]
}
2. Put value
When a value with several formats is put to a cell, the script is as follows.
Sample script:
This sample script puts the value to the cell "B1" of "Sheet1". As a sample, update_cells is used for this situation.
spreadsheet_id = '### spreadsheet ID ###'
requests = {requests: [
update_cells: {
fields: 'userEnteredValue,textFormatRuns',
range: {sheet_id: 0, start_row_index: 0, end_row_index: 1, start_column_index: 1, end_column_index: 2},
rows: [{values: [{user_entered_value: {
string_value: 'abcdefg'},
text_format_runs: [{format: {}}, {format: {font_size: 24, foreground_color: {red: 1}, bold: true}, start_index: 2}, {format:{}, start_index: 5}]
}]}]
}
]}
response = service.batch_update_spreadsheet(spreadsheet_id, requests, {})
About sheet_id: 0, if you want to other sheet, please modify it.
Result:
Note:
These sample scripts supposes that your environment can use Sheets API.
These are simple samples. So please modify them to your situation.
References:
spreadsheets.get
spreadsheets.batchUpdate
textFormatRuns
updateCells

Filter with complex key not work (using startkey and endkey)

I create a view with Map function:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.logTime,doc.dbName,doc.tableName], 1);
}
}
I want to filter the data with multi-keys:
_design/select_data/_view/new-view/?limit=10&skip=0&include_docs=false&reduce=false&descending=true&startkey=["2018-06-19T09:16:47,527","stage"]&endkey=["2018-06-19T09:16:43,717","stage"]
but I still got:
{
"total_rows": 248133,
"offset": 248129,
"rows": [
{
"id": "01CGBPYVXVD88FPDVR3NP50VJW",
"key": [
"2018-06-19T09:16:47,527",
"ods",
"o_ad_dsp_pvlog_realtime"
],
"value": 1
},
{
"id": "01CGBQ6JMEBR8KBMB8T7Q7CZY3",
"key": [
"2018-06-19T09:16:44,824",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ4BKT8S2VDMT2RGH1FQ71",
"key": [
"2018-06-19T09:16:44,707",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ18CBHQX3F28649YH66B9",
"key": [
"2018-06-19T09:16:43,717",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
}
]
}
the key "ods" should not in the results.
What did I do wrong?
Your query is not multi-key .. ist start and endkey.
if you want to have results by dbname in a special time range.. you need to change the emit to [doc.dbName,doc.logTime,doc.tableName]
then you query startkey=["stage","2018-06-19T09:16:43,717"]&endkey=["stage","2018-06-19T09:16:47,527"]
(btw. are you sure that your timestamp is in the right order ? In your example the second TS is larger than the first..)
As you have chosen a full date/time stamp as the first level of your key, down to millisecond precision, there are unlikely to be any repeating values in the first level of your compound key. If you indexed just the date, say, as the first key, your date would be grouped by date, dbame and table name in a more predictable way
e.g.
["2018-06-19","ods","o_ad_dsp_pvlog_realtime"]
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"]
["2018-06-19",stage","s_ad_ztc_realpv_base_indirect"
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"
With this key structure, the hierarchical grouping of keys works in your favour i.e. all the data from "2018-06-19" is together in the index, with all the data matching ["2018-06-19","stage"] adjacent to each other.
If you need to get to millisecond precision, you could index the data as follows:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.dbName,doc.logTime], 1);
}
}
This would create index organised by dbName, but with a secondary sort on time. You can then extract the data for specified dbName between two timestamps.

Matching by array elements in Elasticsearch

I have to construct quite a non-trivial (as it seems to be now) query in Elasticsearch.
Suppose I have a couple of entities, each with an array element, consisting of strings:
1). ['A', 'B']
2). ['A', 'C']
3). ['A', 'E']
4). ['A']
Mappings for array element is as follows (using dynamic templates):
{
"my_array_of_strings": {
"path_match": "stringArray*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
Json representation of entity looks like this:
{
"stringArray": [
"A",
"B"
]
}
Then I have user input:
['A', 'B', 'C'].
What I want to achieve is to find entities which contain only elements specified in input - expected results are:
['A', 'B'], ['A', 'C'], ['A'] but NOT ['A', 'E'] (because 'E' is not present in user input).
Can this scenario be implemented with Elasticsearch?
UPDATE:
Apart from the solution with using the scripts - which should work nicely, but will most likely slow down the query considerably in case when there are many records that match - I have devised another one. Below I will try to explain its main idea, without code implementation.
One considerable condition that I failed to mention (and which might have given other users valuable hint) is that arrays consist of enumerated elements, i.e. there are finite number of such elements in array. This allows to flatten such array into separate field of an entity.
Lets say there are 5 possible values: 'A', 'B', 'C', 'D', 'E'. Each of these values is a boolean field - true if it is empty (i.e. array version would contain this element ) and false otherwise.
Then each of the entities could be rewritten as follows:
1).
A = true
B = true
C = false
D = false
E = false
2).
A = true
B = false
C = true
D = false
E = false
3).
A = true
B = false
C = false
D = false
E = true
4).
A = true
B = false
C = false
D = false
E = false
With the user input of ['A', 'B', 'C'] all I would need to do is:
a) take all possible values (['A', 'B', 'C', 'D', 'E']) and subtract from them user input -> result will be ['D', 'E'];
b) find records where each of resulting elements is false, i.e. 'D = false AND E = false'.
This would give records 1, 2 and 4, as expected. I am still experimenting with the code implementation of this approach, but so far it looks quite promising. It has yet to be tested, but I think this might perform faster, and be less resource demanding, than using scripts in query.
To optimize this a little bit further, it might be possible not to provide fields which will be 'false' at all, and modify the previous query to 'D = not exists AND E = not exists' - result should be the same.
You can achieve this with scripting, This is how it looks
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"name": [
"A",
"B",
"C"
]
}
},
{
"script": {
"script": "if(user_input.containsAll(doc['name'].values)){return true;}",
"params": {
"user_input": [
"A",
"B",
"C"
]
}
}
}
]
}
}
}
}
}
This groovy script is checking if the list contains anything apart from ['A', 'B', 'C'] and returns false if it does, so it wont return ['A', 'E']. It is simply checking for sublist match. This script might take couple of seconds. You would need to enable dynamic scripting, also syntax might be different for ES 2.x, let me know if it does not work.
EDIT 1
I have put both conditions inside filter only. First only those documents that have either A, B or C are returned, and then script is applied on only those documents, so this would be faster than the previous one. More on filter ordering
Hope this helps!!
In same case for me I have done the follow steps:
First of all I have deleted the index to redefine analyzer/settings with sense plugin.
DELETE my_index
Then I have defined custom analyzer for my_index
PUT my_index
{
"index" : {
"analysis" : {
"tokenizer" : {
"comma" : {
"type" : "pattern",
"pattern" : ","
}
},
"analyzer" : {
"comma" : {
"type" : "custom",
"tokenizer" : "comma"
}
}
}
}
}
Then I have defined mapping properties inside my code, but you can also do that with sense. both of them are same.
PUT /my_index/_mapping/my_type
{
"properties" : {
"conduct_days" : {
"type" : "string",
"analyzer" : "comma"
}
}
}
Then For testing do these bellow steps:
PUT /my_index/my_type/1
{
"coduct_days" : "1,2,3"
}
PUT /my_index/my_type/2
{
"conduct_days" : "3,4"
}
PUT /my_index/my_type/3
{
"conduct_days" : "1,6"
}
GET /my_index/_search
{
"query": {"match_all": {}}
}
GET /my_index/_search
{
"filter": {
"or" : [
{
"term": {
"coduct_days": "6"
}
},
{
"term": {
"coduct_days": "3"
}
}
]
}
}

Error while accessing nested JSON object

This is a sample row in my RethinkDB table.
{
"a1": "val1" ,
"a2": "val2" ,
"a3": "val3" ,
"a4": "val4" ,
"part": [
{
"id": "reql" ,
"position": "student"
} ,
{
"id": "sdsadda" ,
"position": "officer"
}
] ,
"a5": "val5"
}
I want to access a nested json object but I get the error e: Cannot perform bracket on a non-object non-sequence "string"
I need the entire row in the output for rows matching id to "reql"
This is my query
r.db('dbname').table('tablename').filter(r.row('part').contains(function(product) {
return product('id').eq("reql");
}))
This query worked before .It doesn't right now.
You'd get that error if you'd somehow ended up with an element in your part array that's a string instead of an object. Try running .filter(r.row('part').contains(function(product) { return product.typeOf().ne('OBJECT'); }), that should return all the rows that have a string in the part array.
Regarding your comment #Puja, I think this should do it for you:
r.db('dbname').table('tablename').filter(function(d){
d("part").typeOf().eq("ARRAY");
}).filter(r.row('part').contains(function(d) {
return d('id').eq("reql");
}))
Although, this is less efficient than #mlucy's answer, and you should definitely just do the one pass over your dataset to clean it up by fixing all the documents where part: STRING.

How can I modify array fields in place?

Let's say I have this object:
{
"id": "1a48c847-4fee-4968-8cfd-5f8369c01f64" ,
"sections": [
{
"id": 0 ,
"title": "s1"
} ,
{
"id": 1 ,
"title": "s2"
} ,
{
"id": 2 ,
"title": "s3"
}
]
}
How can I directly change 2nd title "s2" to other value? without loading the object and save again? Thanks.
Update plus the changeAt term:
r.table('blog').get("1a48c847-4fee-4968-8cfd-5f8369c01f64").update(function(row){
return {
sections: row('sections').changeAt(1,
row('sections')(1).merge({title: "s2-modified"}))
}
}
The above is good if you already know the index of the item you want to change. If you need to find the index, then update it, you can use the .offsetsOf command to look up the index of the element you want:
r.table('table').get("1a48c847-4fee-4968-8cfd-5f8369c01f64").update(function(row){
return row('sections').offsetsOf(function(x){
return x('title').eq('s2')
})(0).do(function(index){
return {
sections: row('sections').changeAt(index,
row('sections')(index).merge({title: "s2-modified"}))
}
})
})
Edit: modified answer to use changeAt

Resources