mongo slow query - performance

My mongodb is currently loaded with 105,000 documents, and I still have to insert 500,000 more, and it is taking more than 4hours just to insert 1000 documents, due to querying for references:
Insert DocA, and DocA have many citations (about 30)
Find documents in the database which are cited by DocA. [ie: findBy-Doi-Or-Pmid-Or-Pmc(...)]
-so for each of the query for DocA's citation, it is taking about 400ms to complete.
following is one of the profile:
Query { $or [ {$or [ {doi: ""}, {pmid: "10508155"} ] }, {pmc: "" } ]}
{
"ts": ISODate("2012-12-22T11: 55: 39.796Z"),
"op": "query",
"ns": "fyparticles.mArticle",
"query": {
"$or": {
"0": {
"$or": {
"0": {
"doi": ""
},
"1": {
"pmid": "10508155"
}
}
},
"1": {
"pmc": ""
}
}
},
"ntoreturn": NumberInt(1),
"nscanned": NumberInt(105707),
"responseLength": NumberInt(20),
"millis": NumberInt(477),
"client": "192.168.0.15",
"user": ""
}
And the index I have created:
{
"v": NumberInt(1),
"key": {
"doi": NumberInt(1),
"pmid": NumberInt(1),
"pmc": NumberInt(1)
},
"ns": "fyparticles.system.indexes",
"background": NumberInt(1),
"name": "params"
}
Please help me out here! Am I missing something or doing something wrong?

First off you are using an $or which in itself is not the fastest operator in the world due to its need to run multiple queries and then merge duplicates to return a result.
Second you are using an $or with one index. Since an $or is basically one or more queries you may need one or more indexes to cover the unique fields you have in each clause.
Third you are using nested $ors it is good to note that nested $ors do not use indexes: https://jira.mongodb.org/browse/SERVER-3327
So already you have like 3 or more performance problems with your query.
first off, take out that nested $or:
{ $or: [ {doi: ""}, {pmid: "10508155"}, {pmc: ""} ] }
And then you will probably need to create three indexes on this (you might be able to get one to fit all I haven't tested):
db.col.ensureIndex({ doi: 1 });
db.col.ensureIndex({ pmdi: 1 });
db.col.ensureIndex({ pmc: 1 });
That should be the first place to start to make your query faster.

Related

How to cleanly batch queries together in Gremlin

I am writing a GraphQL resolver that retrieves all vertices by a particular edge using the following query (created returns label person):
software {
created {
name
}
}
Which would resolve to the following Gremlin Query for each software node found:
g.V().hasLabel('software').has('name', 'ripple').in('created')
This returns a result that includes all properties of the object:
{
"result": [
{
"#type": "d",
"#rid": "#24:0",
"#version": 6,
"#class": "person",
"in_knows": [
"#35:0"
],
"name": "josh",
"out_created": [
"#32:0",
"#33:0"
],
"age": 32,
"#fieldTypes": "in_knows=g,out_created=g"
}
],
"dbStats": {
...
}
}
I realize that this will fall foul on GraphQL's N+1 query so i'm trying to batch queries together using a Dataloader pattern. (i'm also hoping to do property selections, so i'm not asking the database to return too much info)
So i'm trying to craft a query like so:
g.V().union(
__.hasLabel('software').has('name', 'ripple').
project('parent', 'child').by('id').
by(__.in('created').fold()),
__.hasLabel('software').has('name', 'lop').
project('parent', 'child').by('id').
by(__.in('created').fold())
)
But this results in the following where the props are missing and it just includes the id of the vertices I want:
{
"result": [
{
"parent": "ripple",
"child": [
"#24:0"
]
},
{
"parent": "lop",
"child": [
"#22:0",
"#23:0",
"#24:0"
]
}
],
"dbStats": {
...
}
}
My Question is, how can I have the Gremlin query return all of the props for the found vertices and none of the other props? Should I even been doing batching this way?
For anyone else reading, the query I was trying to write wouldn't work because the TraversalSet created in the .by(_.in('created') can't be cast from a List to an ElementMap as the stream cardinality wouldn't be enforced. (You can only have one record per row, I think?)
My working query would be to duplicate the keys for each row and specify the props needed (the query below is ok for gremlin 3.3 as used in ODB, otherwise if you've got < gremlin 3.4 replace the last by step with be(elementMap('name', 'age')):
g.V().union(
__.hasLabel('software').has('name', 'ripple').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value())),
__.hasLabel('software').has('name', 'lop').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value()))
)
So that you get a result like this:
{"data": [
{
"parent": "ripple",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
},
{
"parent": "lop",
"child": {
"id": 5709,
"name": "peter",
"age": 35
}
},
{
"parent": "lop",
"child": {
"id": 5713,
"name": "marko",
"age": 29
}
},
{
"parent": "lop",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
}
]
}
Which would allow you to create a lookup where you concat all results for "lop" and "ripple" into arrays.

graphQL filter array containing ALL

I am quite new to graphQL, and after searching the whole afternoon, i didn't found my answer to a relative quite simple problem.
I have two objects in my strapi backend :
"travels": [
{
"id": "1",
"title": "Bolivia: La Paz y Salar de Uyuni",
"travel_types": [
{
"name": "Culturales"
},
{
"name": "Aventura"
},
{
"name": "Ecoturismo"
}
]
},
{
"id": "2",
"title": "Europa clásica 2020",
"travel_types": [
{
"name": "Clasicas"
},
{
"name": "Culturales"
}
]
}
]
I am trying to get a filter where I search for travels containing ALL the user-selected travel_types.
I then wrote a query like that :
query($where: JSON){
travels (where:$where) {
id # Or _id if you are using MongoDB
title
travel_types {name}
}
And the parameter i try to input for testing :
{
"where":{
"travel_types.name_contains": ["Aventura"],
"travel_types.name_contains": ["Clasicas"]
}
}
This should return an empty array, because none of the travels have both Aventura and Clasicas travel-types.
But instead it returns the travel with id=2. It seems that only the second filter is taken.
I searched for a query which would be like Array.every() in javascript, but i wasn't able to find.
Does someone has an idea how to achieve this type of filtering ?
Thank you very much,

Strapi GraphQL query: "start" argument wouldn't work

I am running into a very strange problem with my queries in Strapi (version 3.0.0-alpha.26.2). I have a users collection with 3 documents that I'm trying to fetch via GraphQL. To fetch all users the query is:
users {
firstName
}
This returns the following:
{
"data": {
"users": [
{
"firstName": "Arnold"
},
{
"firstName": "Bill"
},
{
"firstName": "Vin"
}
]
}
}
3 names. Now, say, I wished to retrieve only the first 2 users. For such pagination use-cases, there's two arguments one could pass in a Strapi query: start (defines the index to start at) and limit (defines the number of elements to return). So now the query would be:
users(start: 0, limit: 2) {
firstName
}
This returns the first two names as expected:
{
"data": {
"users": [
{
"firstName": "Arnold"
},
{
"firstName": "Bill"
}
]
}
}
But what if I want the last 2 users here, i.e. Bill and Vin? Should be as straightforward as:
users(start: 1, limit: 2){
firstName
}
But this still returns Arnold and Bill, while you'd expect the following:
{
"data": {
"users": [
{
"firstName": "Bill"
},
{
"firstName": "Vin"
}
]
}
}
No matter what value I use for start, it always starts at the 0th item. You could do start: 200 (when there are only 3 items in the users collection) and it'd still return the exact same result! What sorcery is this??
The issue can be reproduced at https://dev.schandillia.com/graphql.

Parse Query by subfield/dot notation

tl;dr
Can ParseCloud/MongoDB filter by Pointer<class>.filed ? By
Pointer<class>.Pointer<class> ? By existence of data in that filed?
Long question:
Round is object which will be played automatically when time will come.
Payment object which indicates that user made payment. When payment being spent we set field round to it.
Player which links online User with Payment
I need to query player for few conditions:
Player
online
has valid(no round and valid equal to 'valid') payment
Player
user equal to specific user
has no payment
Player
user equal to specific user
has valid(no round and valid equal to 'valid') payment
And I made everything to work except validating Payment inside Player query.
Here is condition 1 from the list.
var query = new Parse.Query(keys.Player);
query.skip(0);
query.limit(oneRoundMaxPlayers);
query.greaterThanOrEqualTo(keys.last_online_date, lastAllowedOnline);
// looks like no filter applied here
query.doesNotExist("payment.round");
query.exists(keys.payment);
// This line will make query return 0 elements
// query.equalTo("payment.valid", "valid");
query.include(keys.user);
query.include(keys.payment);
Here is 2 OR 3
var queryPaymentExists = new Parse.Query(keys.Player);
queryPaymentExists.skip(0);
queryPaymentExists.limit(1);
queryPaymentExists.exists(keys.payment);
//This line not filtering
queryPaymentExists.doesNotExist(keys.payment + "." + keys.round);
queryPaymentExists.equalTo(keys.user, user);
// This line makes query always return 0 elements
// queryPaymentExists.equalTo(keys.payment + "." + keys.valid, keys.payment_valid);
var queryPaymentDoesNotExist = new Parse.Query(keys.Player);
queryPaymentDoesNotExist.skip(0);
queryPaymentDoesNotExist.limit(1);
queryPaymentDoesNotExist.doesNotExist(keys.payment);
queryPaymentDoesNotExist.equalTo(keys.user, user);
var compoundQuery = Parse.Query.or(queryPaymentExists, queryPaymentDoesNotExist);
compoundQuery.include(keys.user);
compoundQuery.include(keys.payment);
compoundQuery.include(keys.payment + "." + keys.round);
I've checked logs from Mongo and they looks following
verbose: REQUEST for [GET] /classes/Player: {
"include": "user,payment,payment.round",
"where": {
"$or": [
{
"payment": {
"$exists": true
},
"payment.round": {
"$exists": false
},
"user": {
"__type": "Pointer",
"className": "_User",
"objectId": "ASPKs6UVwb"
}
},
{
"payment": {
"$exists": false
},
"user": {
"__type": "Pointer",
"className": "_User",
"objectId": "ASPKs6UVwb"
}
}
]
}
}
Here is response:
verbose: RESPONSE from [GET] /classes/Player: {
"response": {
"results": [
{
"objectId": "VHU9uwmLA7",
"last_online_date": {
"__type": "Date",
"iso": "2017-10-28T15:15:23.547Z"
},
"user": {
"objectId": "ASPKs6UVwb",
"username": "cn92Ekv5WPJcuHjkmTajmZMDW",
},
"createdAt": "2017-10-22T11:43:16.804Z",
"updatedAt": "2017-10-25T09:23:20.035Z",
"ACL": {
"*": {
"read": true
},
"ASPKs6UVwb": {
"read": true,
"write": true
}
},
"__type": "Object",
"className": "_User"
},
"createdAt": "2017-10-27T21:03:35.442Z",
"updatedAt": "2017-10-28T15:15:23.556Z",
"payment": {
"objectId": "nr7ln7U3eJ",
"payment_date": {
"__type": "Date",
"iso": "2017-10-27T23:42:50.614Z"
},
"user": {
"__type": "Pointer",
"className": "_User",
"objectId": "ASPKs6UVwb"
},
"createdAt": "2017-10-27T23:42:50.624Z",
"updatedAt": "2017-10-28T15:12:30.131Z",
"valid": "valid",
"round": {
"objectId": "jF9gqG4ndh",
"round_date": {
"__type": "Date",
"iso": "2017-10-28T15:12:00.027Z"
},
"createdAt": "2017-10-28T15:11:00.036Z",
"updatedAt": "2017-10-28T15:12:30.108Z",
,
"ACL": {
"*": {
"read": true
}
},
"__type": "Object",
"className": "Round"
},
"ACL": {
"ASPKs6UVwb": {
"read": true
}
},
"__type": "Object",
"className": "Payment"
},
"ACL": {
"ASPKs6UVwb": {
"read": true
}
}
}
]
}
}
You can see that response contains payment.round.
My question is following:
Can ParseCloud/MongoDB filter by Pointer<class>.filed ? By Pointer<class>.Pointer<class> ? By existence of data in that filed?
How can I workaround in situation when I need to check field presence if User can have may Players, User can have many Payments.
UPD
As far as I found mongo should support filtering by "dot notation"
mongodb query by sub-field
So what am I doing wrong?
Short answer:
No
Simplify your data structure
Long answer:
Dot notation can be used to
include documents of pointers, as you already did in your code, e.g. include(keys.user)
filter for properties of fields, e.g. {properyA: 1, propertyB: 2}. All the data is in the field, not in another document in another collection that is referenced by a Parse pointer.
Dot notation cannot be used as filter parameter for referenced pointers in a Parse query. MongoDB also does not support such a filtering, the concept of pointer is one by Parse and not by MongoDB. In a NoSQL environment like MongoDB there are no relations between tables to be used in the query language, as it is not a "relational database" like an SQL database. However Parse provides some comfort of an SQL for simple queries with its concepts of pointer, compoundQuery and matchesKeyInQuery.
If that is not sufficient in your case, simply add the fields to the collection. To the expense that you may have the same fields and data in multiple collections but with the advantage of faster query execution time.
Finding the right data structure is one of the big topics for NoSQL as there is no general right structure. The collections and document structures are basically designed as a trade off between:
execution performance
query necessity / frequency
security (access level)
and data storage size
And they are liquid and can change over time. As your app and its queries mutate you'd also change the data structure if the long term gain is greater than the one time effort.

mgo with aggregation and grouping

I am trying to perform a query using golang mgo
to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.
Something like this:
pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}
err := d.Pipe(pipe).All(&result)
If I comment out the $group section, the query returns the join as expected.
If I run as is, I get NULL
If I move the $group to the bottom of the pipe I get an array response with Null values
Is it possible to do do an aggregation with a $group (with the goal of simulating DISTINCT) ?
The reason you're getting NULL is because your $match filter is filtering out all of documents after the $group phase.
After your first stage of $group the documents are only as below example:
{"_id": { "user": "foo"}},
{"_id": { "user": "bar"}},
{"_id": { "user": "baz"}}
They no longer contains the other fields i.e. user, date_updated and organization. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables
As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization but not for date_updated. Please choose a more appropriate accumulator operator.
{"$group": {
"_id":"$user",
"date_updated": {"$first":"$date_updated"},
"organization": {"$first":"$organization"}
}
}
Note that the above also replaces {"_id":{"user":"$user"}} with simpler {"_id":"$user"}.
Next we'll add $project stage to rename our result of _id field from the group operation back to user. Also carry along the other fields without modifications.
{"$project": {
"user": "$_id",
"date_updated": 1,
"organization": 1
}
}
Your $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization for more.
So, all of those combined will look something as below:
[
{"$group":{
"_id": "$user",
"date_updated": { "$first": "$date_updated"},
"organization": { $first: "$organization"}
}
},
{"$project":{
"user": "$_id",
"date_updated": 1,
"organization": 1
}
},
{"$match":{
"date_updated": {"$gt": durationDays } }
},
{"$lookup":{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details"
}
},
{"$lookup":{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details"
}
}
]
(I know you're aware of it) Lastly, based on the database schema above with users and organizations collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.

Resources