Join in Couchbase Lite, how is it possible? - couchbase-lite

Consider two types of documents Company and Person:
Company has 2 fields:
name of type String
employees of type List of Person
Person has 2 fields:
name of type String
city of type String
How can I create a query where I find all the companies who have at least N employees in a given city ?
EDIT: In other words, How is it possible to do something like this with CouchBase Lite.

I think there are several ways to approach this.
One suggestion is to create a view that, when given a Company document, emits a key/value pair via the map stage. The key could be a map containing the company name and city, and the value could be anything (e.g. employee name). Then add a reduce function that sums all the index entries (that's what the first part creates) with the same key.
So the result of the reduce stage output for the view is the total number of employees keyed by company + city. You can then do queries to get your result.
Views and Queries are really powerful, but can take some thought. Focus on getting the information you need out of the View, so you can query flexibly.
Take a look at the View and Query documentation for more details.

Related

Queries in Dynamodb

I have an application written in Nodejs that needs to find ONE row based on a city name (this could just be the table's name, different cities will be categorized as different tables), and a field named "currentJobLoads" which is a number. For example, a user might want to find ONE row with the city name "Chicago" and the lowest currentJobLoads. How can I achieve this in Dynamodb without scan operations(since scan would be slower and can only read so much data before it gets terminated)? Any suggestions would be highly appreciated.
You didn't specify what your current partition key and sort key for the table are, but I'm guessing the currentJobLoads field isn't one of them. So you would need to create a Global Secondary Index on the currentJobLoads field, at which point you will be able to run query operations against that field.

HBase - What are the pros and cons of using one column with a list of values vs using one column family with a list of columns?

Say we were modeling Users and Friends, and Friends have a type.
We could model it in Oracle like:
User: id, name, sex, age
Friendship: user_id, friend_id, type
So in HBase, we could do:
(this first model is from here, which is recommended by the HBase FAQ)
Table: Users
RowKey = <user_id>
Column Family = Info; Columns = "Name", "Sex", "Age"
Column Family = Friend; Columns = "Friend:<user_id>"=type
(where "Friend:"=type could be one more more user_ids)
or
Table: Users
RowKey = <user_id>
Column Family = Info; Columns = "Name", "Sex", "Age", "Friends"
(where "Friends" is a JSON string in the form [{user_id:, type:}, ...]
However, if a friend did not have a type, the second model could simply be [user_id:<user_id>, ...]. What would the first model do if friends didn't have a type?
What are the pros and benefits of either approach?
One column with a list of values breaks normalization rules. If you don't know what those are or why they're important, please do some research.
I don't think either model is correct. A one to many relationship ought to be modeled correctly. Both your schemas break normalization rules.
It really depends on how many friends you have and what your read and write access pattern is.
In the first case, with a friend per column you can add a friend without reading all of the other friends. However, you also get a separate timestamp value per friend and thus increase the total storage requirement per friend.
Also, if you don't always read the friends when you read a user, the first case doesn't require you to load the friends. You can do a single column family scan and avoid all the extra IO.
The downside to more column families is you have more MemStores and therefore more memory is required for your regions. It also means more non-sequential disk flushing as each column family is a separate disk flush.

Efficient way to query

My app has a class that saves picture that users upload. Each object in the class has a city property that holds the name of the city that the picture was taken at, and a like property that tracks the number of likes.
I want to be able to send a query that returns one picture per city and each picture should have the highest ranking of likes in the city it belongs to. How can I do that?
One way which I first thought about is doing multiple queries by fetching the most liked picture of a city and save it in an array, and then do the same to other cities.
However, each country has more than one city, thus it's not that efficient.
Parse doesn't support the ordinary operations used in databases. Besides, I tried to use a compound query. Unfortunately, I can't set limit or ordering on the subqueries. Any good solution for this?
It would be easy using group by. Unfortunately, Parse does not support "select distinct" or "group by" features.
As you've suggested you need to fetch for each country all the cities, and for each one get the top most rated photo.
BUT, since Parse has strict restrictions on the duration time execution of a request ( 3 sec for an event listener, 7 sec for a custom function ), I suggest you to do this in a background job, saving in a new table the top rated photo for each city. In this way you can easily query the db from client. The Background jobs can be executed up to 15 minuted before parse drop them, so you could make that kind of queries without timeouts.
Hope it helps

How to properly organize search of the person?

Let's say I have list of persons in my datastore. Each person there may have the following fields:
last name (*)
first name
middle name
id (*)
driving licence id (*)
another id (*)
date of birth
region
place of birth
At least one of the fields marked with (*) must exist.
Now user provides me with the same list of fields (and again at least one of the fields marked with (*) must be provided). I should search for the person user provided. But not all fields should be matched. I should display to the user somehow how I am sure in the results of search. Something like:
if person matched by id and last name (and user provided just these 2 fields for the search), then I am sure that result is correct (100%);
if person matched by id and last name (and user provided other fields, which were found in the database, but were not matched), then I am sure that result is almost correct by 60%;
etc.
(numbers are provided just as example)
How can I organize such search? Is there any standard algorithm? I also would like to minimize number of requests to the database.
P.S. I can not provide user with the actual field values from the database.
It sounds like your logic for determining the quality of a match will be too complex to handle at the database layer. I think you'll get the best performance by retrieving all of the records that match at least one of the mandatory keys, calculating the match score for each of them in memory, and returning the best score. For example, if the user provides you with an id, last name and place of birth, your query would look something like:
SELECT * FROM users WHERE id = `the_id` OR last_name = `the_last_name`;
This could be a performance problem if you have a VERY large dataset with lots of common last names but otherwise I would expect not to see too many collisions. You can check this on your own dataset outside of GAE. You could also get better performance if all mandatory fields MUST match by changing the OR to an AND.

WCF Data Services - neither .Expand or .LoadProperty seems to do what I need

I am building a school management app where they track student tardiness and absences. I've got three entities to help me in this. A Students entity (first name, last name, ID, etc.); a SystemAbsenceTypes entity with SystemAbsenceTypeID values for Late, Absent-with-Reason, Absent-without-Reason; and a cross-reference table called StudentAbsences (matching the student IDs with the absence-type ID, plus a date, and a Notes field).
What I want to do is query my entities for a given student, and then add up the number of each kind of Absence, for a given date range. I prepare my currentStudent object without a problem, then I do this...
Me.Data.LoadProperty(currentStudent, "StudentAbsences") 'Loads the cross-ref data
lblDaysLate.Text = (From ab In currentStudent.StudentAbsences Where ab.SystemAbsenceTypes.SystemAbsenceTypeID = Common.enuStudentAbsenceTypes.Late).Count.ToString
...and this second line fails, complaining "Object reference not set to an instance of an object."
I presume the problem is that while it DOES see that there are (let's say) four absences for the currentStudent (ie, currentStudent.StudentAbsences.Count = 4) -- it can't yet "peer into" each one of the absences to look at its type. In fact, each of the four StudentAbsence objects has a property called SystemAbsenceType, which then finally has the SystemAbsenceTypeID.
How do I use .Expand or .LoadProperty to make this happen? Do I need to blindly loop through all these collections, firing off .LoadProperty on everything before I can do my query?
Is there some other technique?
When you load the student, try expanding the related properties.
var currentStudent = context.Students.Expand("StudentAbsences")
.Expand("StudentAbsences/SystemAbsenceTypes")
.Where(....).First();

Resources