how to plot non linear data in dc.js - dc.js

I have a json data file with a structure that is kind of complicated, something like that:
[
{
"patient_id": "f1ff9870",
"demographics": {
"gender": "female",
"age": 78
},
"measurements": [
{
"unit": "bpm",
"kind": "pulse",
"value": "130",
"measurementDate": "2017-05-04 03:00:00+03"
}
],
"problems": [
{
"name_title": "problem1",
"category": "Primary disease",
"startDate": "2017-05-12 03:00:00+03"
},
{
"name_title": "problem2",
"category": "Primary disease",
"startDate": "2017-05-12 03:00:00+03"
}
]
},
{
"patient_id": "c9047712",
"demographics": {
"gender": "male",
"age": 60
}
}
]
each object is a patient who can have several fields, some of them are arrays, and not all the patients have the same number of fields.
I am trying to find a way to use the crossfilter and make the groups for the plots but I am not really close to a solution. I want to use charts, for each field(i.e. problems, age etc), and each one should be a filter. So, if i select a specific problem i will be able to see how many patients have that problem, their age, measurements etc. Is there a way to work around non linear data?

This is a broad question. Let me address this specific part of it:
if i select a specific problem i will be able to see how many patients have that problem, their age, measurements etc.
"Problems" can be thought of as a tag or array dimension.
So you could aggregate problems and age like this:
var cf = crossfilter(data);
var problemsDim = cf.dimension(d => d.problems.map(p => p.category), true);
var ageDim = cf.dimension(d => d.demographics.age);
var problemsGroup = problemsDim.group(); // default is "reduceCount"
var ageGroup = ageDim.group();
Now when you select a specific problem you'll see counts of all patients who have that problem. When you select a specific age, you'll get a count of all patients who have each problem.
Note that with tag dimensions, the total of all the counts will usually add up to more than the number of records.
You could do something similar for measurements. Of course some patients may not have problems or measurements, so they wouldn't show up in those charts. Also you might end up with weird results if a patient had more than one of the same measurement.

Related

ACID update of ElasticSearch Document

I'm trying to build a Tinder-like system right now. Here I need to know which cards have already been seen.
If I save the cards in ElasticSearch, and then have such a document:
{ nama: David, location: {lat, lon}, seenFromUsers: [] }
I'm just wondering if it makes sense to create a list in the object itself. Probably there are 2000 entries in it.
But if I do an update in ElasticSearch, then I always have to pass all 2000 entries. If two users do this at the same time, does one get lost? How can I simply add another ID to the array? Is that even possible?
What other solutions are there?
One other solution would be a complete different approach. Instead if creating documents like this
{
"name": "David",
"location": { "lat": ..., "lon": ...},
"seenFromUsers": ["Laura", "Simone"]
}
think in Relations like this:
{
"name": "David",
"seenBy": "Laura"
}
{
"name": "David",
"seenBy": "Simone"
}
this approach will give you simpler queries, and the ACID problem is solved. New profile views are simply new documents...
As a benefit, you´ll get rid of inner objects and it will be more easy to add additional data to this relation:
{
"name": "David",
"seenBy": "Laura",
"timestamp": ...,
"liked": true
}
{
"name": "David",
"seenBy": "Simone",
"timestamp": ...,
"liked": false
}
And now you´ll be able to do a simple query for all positive likes of a profile, or bi-directional likes/matches...

is there any way where i can apply group and pagination using createQuery?

Query like this,
http://localhost:3030/dflowzdata?$skip=0&$group=uuid&$limit=2
and dflowzdata service contains data like,
[
{
"uuid": 123456,
"id": 1
},
{
"uuid": 123456,
"id": 2
},
{
"uuid": 7890,
"id": 3
},
{
"uuid": 123456,
"id": 4
},
{
"uuid": 4567,
"id": 5
}
]
Before Find Hook like,
if (query.$group !== undefined) {
let value = hook.params.query.$group
delete hook.params.query.$group
const query = hook.service.createQuery(hook.params.query);
hook.params.rethinkdb = query.group(value)
}
Its gives correct result but without pagination, like I need only two records but its give me all records
result is,
{"total":[{"group":"123456","reduction":3},{"group":"7890","reduction":1},{"group":"4567","reduction":3}],"data":[{"group":"123456","reduction":[{"uuid":"123456","id":1},{"uuid":"123456","id":2},{"uuid":"123456","id":4}]},{"group":"7890","reduction":[{"uuid":"7890","id":3}]},{"group":"4567","reduction":[{"uuid":"4567","id":5}]}],"limit":2,"skip":0}
can anyone help me how should get correct records using $limit?
According to the documentation on data types, ReQL commands called on GROUPED_DATA operate on each group individually. For more details, read the group documentation. So limit won't apply to the result of group.
The page for group tells: to operate on all the groups rather than operating on each group [...], you can use ungroup to turn a grouped stream or grouped data into an array of objects representing the groups.
Hence ungroup to apply functions to group's result:
r.db('db').table('table')
.group('uuid')
.ungroup()
.limit(2)

Elastic search - Displaying documents in result multiple times with sorting

We use Elastic search 5.4. We have some documents with nested event related data. Inside these nested documents there can be multiple events, differing in event types.
Let's say that a nested document about a person can store data of type birthData, deathData, etc.
We want to get ES documents sorted by [MONTH-DAY DESC], [YEAR ASC].
Filtering and sorting by a single event type is a piece of cake, but what if I want to sort by multiple event dates and display a document as many times as there are dates inside?
As we have several millions of docs, and multiple event types, result manipulaton on application server would be a no-go
Short example:
{
"id": "ID1",
"fullName": "John Smith",
"eventsData": [
{
"event_data_type": "BD",
"event_date": "1971-12-30T00:00:00Z",
},
{
"event_data_type": "DD",
"event_date": "2013-02-11T00:00:00Z",
}
]
},
{
"id": "ID2",
"fullName": "Jake Smith",
"eventsData": [
{
"event_data_type": "BD",
"event_date": "1965-02-02T00:00:00Z",
},
{
"event_data_type": "DD",
"event_date": "2011-12-30T00:00:00Z",
}
]
}
The printed results with relevant data would be:
ID1, BD, 1971-12-30
ID2, DD, 2011-12-30
ID1, DD, 2013-02-11
ID2, BD, 1965-02-02
So if a document has n events, then it is displayed n times.
Can you offer any elastic search solution for this problem? What feature should we use?

couchDB- complex query on a view

I am using cloudantDB and want to query a view which looks like this
function (doc) {
if(doc.name !== undefined){
emit([doc.name, doc.age], doc);
}
what should be the correct way to get a result if I have a list of names(I will be using option 'keys=[]' for it) and a range of age(for which startkey and endkey should be used)
example: I want to get persons having name "john" or "mark" or "joseph" or "santosh" and lie between age limit 20 to 30.
If i go for list of names, query should be keys=["john", ....]
and if I go for age query should use startkey and endkey
I want to do both :)
Thanks
Unfortunately, you can't do so. Using the keys parameter query the documents with the specified key. For example, you can't only send keys=["John","Mark"]&startkey=[null,20]&endkey=[{},30]. This query would only and ONLY return the document having the name John and Mark with a null age.
In your question you specified CouchDB but if you are using Cloudant, index query might be interesting for you.
You could have something like that :
{
"selector": {
"$and": [
{
"name": {
"$in":["Mark","John"]
}
},
{
"year": {
"$gt": 20,
"$lt": 30
}
}
]
},
"fields": [
"name",
"age"
]
}
As for CouchDB, you need to either separate your request (1 request for the age and 1 for the people) or you do the filtering locally.

Which is the better design for this API response

I'm trying to decide upon the best format of response for my API. I need to return a reports response which provides information on the report itself and the fields contained on it. Fields can be of differing types, so there can be: SelectList; TextArea; Location etc..
They each use different properties, so "SelectList" might use "Value" to store its string value and "Location" might use "ChildItems" to hold "Longitude" "Latitude" etc.
Here's what I mean:
"ReportList": [
{
"Fields": [
{
"Id": {},
"Label": "",
"Value": "",
"FieldType": "",
"FieldBankFieldId": {},
"ChildItems": [
{
"Item": "",
"Value": ""
}
]
}
]
}
The problem with this is I'm expecting the users to know when a value is supposed to be null. So I'm expecting a person looking to extract the value from "Location" to extract it from "ChildItems" and not "Value". The benefit to this however, is it's much easier to query for things than the alternative which is the following:
"ReportList": [
{
"Fields": [
{
"SelectList": [
{
"Id": {},
"Label": "",
"Value": "",
}
]
"Location": [
{
"Id": {},
"Label": "",
"Latitude": "",
"Longitude": "",
"etc": "",
}
]
}
]
}
So this one is a reports list that contains a list of fields which on it contains a list of fieldtype for every fieldtype I have (15 or something like that). This is opposed to just having a list of reports which has a list of fields with a "fieldtype" enum which I think is fairly easy to manipulate.
So the Question: Which format is best for a response? Any alternatives and comments appreciated.
EDIT:
To query all fields by fieldtype in a report and get values with the first way it would go something like this:
foreach(field in fields)
{
switch(field.fieldType){
case FieldType.Location :
var locationValue = field.childitems;
break;
case FieldType.SelectList:
var valueselectlist = field.Value;
break;
}
The second one would be like:
foreach(field in fields)
{
foreach(location in field.Locations)
{
var latitude = location.Latitude;
}
foreach(selectList in field.SelectLists)
{
var value= selectList.Value;
}
}
I think the right answer is the first one. With the switch statement. It makes it easier to query on for things like: Get me the value of the field with the id of this guid. It just means putting it through a big switch statement.
I went with the first one because It's easier to query for the most common use case. I'll expect the client code to put it into their own schema if they want to change it.

Resources