MongoDB Near Query Strange Behavior - spring

I have a near query I perform using Spring Data's NearQuery operation. Everything works fine for the most part. However, I have my code on several test machines and a production machine. The query works on some of the test machines, yet it does not return results for newly created objects on the production machine and on one of my test machines. When I drop the mongodb collection on the machines that do not work, then use the same code to insert a new document and recreate the collection, the query begins to work again. My question is; what can cause this type of behavior? Can adding new variables to a class cause mongodb near queries to stop performing? If there are documents added to a collection with different variables than what already exist can that cause problems? In production, I can not simply drop collections to fix this. Is there something I am missing about keeping data in mongodb collections consistent so that my spring mongodb code continues to work?
The mongoTemplate code:
Point point = new Point(locationAsDoubleArray[0],locationAsDoubleArray[1]);
NearQuery query = NearQuery.near(point.getX(),point.getY()).spherical(true).maxDistance(maxDistance,Metrics.MILES).distanceMultiplier(Metrics.MILES).query(regularQuery);//maxDistance(new Distance(radius,Metrics.MILES));
GeoResults<CalendarEvent> results = ((MongoOperations)mongoTemplate).geoNear(query, CalendarEvent.class);
The document that should be returned in JSON format:
{ "_class" : "com.eatmyfish.services.custom.CalendarEvent" , "_id" : { "$oid" : "5011c5cf51527fce6c4d2a00"} , "_keywords" : [ "test" , "search" , "function" , "test" , ""] , "address1" : "221 East 5th Street" , "address2" : "" , "allDay" : false , "categories" : [ 14] , "city" : "Saint Paul" , "clientId" : 109 , "clientProductId" : 962 , "color" : "#003666" , "createUser" : "peterson.dean" , "description" : "test" , "end" : "2012-07-26 14:00:00" , "endDate" : { "$date" : "2012-07-26T19:00:00.000Z"} , "externalLink" : "<a href='http://'>More Info</a>" , "geoLocation" : [ -93.0875195 , 44.9490055] , "latitude" : 0.0 , "location" : "221 East 5th Street Saint Paul,MN 55101 " , "locationManuallyEntered" : false , "locationName" : "My Cubicle" , "longitude" : 0.0 , "moreInfoLink" : "<a href='http://localhost:8080/posts/list/3150.page'>More Info</a>" , "note" : "" , "privateEventIn" : "N" , "restFormattedAddress" : "221+East+5th+Street+Saint+Paul,+MN+55101" , "start" : "2012-07-26 04:00:00" , "startDate" : { "$date" : "2012-07-26T09:00:00.000Z"} , "state" : "MN" , "title" : "Test Search Function" , "topicId" : 3150 , "url" : "http://localhost:8080/posts/list/3150.page" , "zip" : "55101"}
The code works differently depending on the machine it is being run. I have ensured my jar files, etc. are identical on each machine. The only thing that will make the query work once it begins to fail, is to drop the collection and start over. I am not sure when or what causes the query to stop working however. I do not think the code is the problem. There may be some administrative task I do not know about that will clean the data. I have used the repair command already without any luck.

I had some old entries that had the long/lat order reversed. That caused all my near queries to fail. It is odd that having a few long/lat values in reverse order would cause this. Still, that is the cause. When I fixed the order of the long/lats for the entries in reverse, the queries are working again. To find this out I had to build and use direct mongodb commands in java rather than use Spring's succinct approach. By viewing the command's return value while debugging, I could actually see the error message about having incorrect values for latitude. No such errors were returned using Spring's near query operation. Spring's inadequate error messaging made this bug very hard to track down.

Related

Remove a document from Mongo based on the value of a subdocument

I am very new to Mongo but I have SQL experience so I am trying to wrap my head around this concept. I am attempting to remove a whole document based on the result of a subdocument.
The document/row looks close to the following:
{
"_id" : ObjectId("5a7e04e3809303035bf6437a"),
"receivedTime" : ISODate("2018-02-09T20:30:27.118Z"),
"status" : "NORMALIZED",
"originalHeaders" : {
"name" : "My Alert Name",
"description" : null,
"version" : 0,
"severity" : 3
},
"partOfIncident" : false
}
I want to remove all documents that have the name = "My Alert Name". I have been trying something like the following by calling it from a bash script. This is the command after variable substitution has been performed:
++ mongo admin -u admin -p password --eval 'db.getSiblingDB("database_name").collection.deleteMany({originalHeaders: {name: "I ALERT EVERYTHING"} })'
After calling it, nothing is removed. Any pointers on how to accomplish my end goal would be greatly appreciated. I suppose it is possible to run through a find and save all of the node _id to run for deletion but that sounds terribly inefficient.
When accessing a nested field you need to use dot notation.
db.collection_name.deleteMany( { "originalHeaders.name" : "My Alert Name" } )
This will delete all documents where originalHeaders.name = "My Alert Name"

Why does MongoDB's positional operator ($[]) fail on a Windows machine but work on a Mac?

I have the following data in a MongoDB collection named users:
{
"_id" : ObjectID("5a3903562cdc59fad5fdc098"),
"name" : "Ana",
"hobbies" : [
{
"title" : "kissing",
"with" : "pets"
},
{
"title" : "playing",
"with" : "pets"
},
{
"title" : "sleeping",
"with" : "pets"
}
]
}
{
"_id" : ObjectID("5a3903a32cdc59fad5fdc099"),
"name" : "Bart",
"hobbies" : [
{
"title" : "hitting",
"with" : "pets"
},
{
"title" : "beating",
"with" : "pets"
},
{
"title" : "eating",
"with" : "pets"
}
]
}
I need to replace the pets value of all the with keys with a new value like legos.
MongoDB's documentation for version 3.6 states the following:
The $[] operator can be used for queries which traverse more than one array and nested arrays.
As each of the with keys lives inside of two separate arrays, using $[] should accomplish what I need to do. And on a Mac, it works perfectly but on a Windows machine, I get this error:
cannot use the part (hobbies of hobbies.$[].with) to traverse the element
Both machines are running MongoDB shell version 3.6.0. The operating system for the Mac is macOS Sierra 10.12.6 and for the Windows machine, it is Windows 10.
SO has many questions related to the positional operator and to the error I am getting specifically. But none of them address why identical operations executed on identical collections fail on Windows but are successful on Mac.
I have tried the following two commands to achieve the result I need. Both work on Mac and both fail on Windows with the same error given above.
db.users.updateMany({}, {$set: {"hobbies.$[].with": "legos"}});
and
db.users.update({}, {$set: {"hobbies.$[].with": "legos"}}, {multi: true});
You can see screen recordings of the difference here. My apologies in advance that the text in the recording on the Windows machine is on the smaller side.
Any help to understand how to resolve this on Windows is greatly appreciated.
Check your feature compatibility version.
By default it may be 3.4. Documentation.
You need to set feature compatibility version to 3.6
db.adminCommand( { setFeatureCompatibilityVersion: "3.6" } )
I can't tell but on windows it looks like for the query you have () instead of {}
The same issue exists on Linux too.
I'm using Ubuntu 16.04, MongoDB 3.6.1
Following is the output after the $[] is executed
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 0 }
P.S: I don't have enough reputation to add a comment, hence posting it here
I check the mongodb reference, the least version is 3.6 which contains $[].

Update By Query to update multiple fields at a time in elastic search

i have a use case of update by query in elastic search.
I am keeping a doc like -
"a" : "1",
"b" : "2",
"version1" : 456
"c" : {
"version2" : 123,
"d" : "3"
}
Now dependently for my use case-
I either want to update field a, b and version1
Or I will update field d and version2.
I got partial answer in Update By Query in Elasticsearch using Java.
I am trying things like this-
BulkIndexByScrollResponse r = ubqrb.script(script)
.script(script1)
.script(script2)
.script(script3)
.script(script4)
.filter(qb).get();
However UpdateByQueryRequestBuilder doesn't allow me to give multiple scripts and only and last one, ie script4 is used and rest are neglected.
I also tried -
Script script4 = new Script("ctx._source.a=\"abc\",b=\"xyz\"");
However this one also failed.
Any idea what can be done using update by where.
Thanks in advance
Try :
new Script("ctx._source.a=\"abc\"; ctx._source.b=\"xyz\"");

MongoDB extremely slow at counting null values (or {$exists: false})

I have a Mongo server running on an VPS with 16GB of memory (although probably with slow IO using magnetic disks).
I have a collection of around 35 million records which doesn't fit into main memory (db.stats() reports a size of 35GB and a storageSize of 14GB), however the 1.7GB reported for totalIndexSize should comfortably fit there.
There is particular field bg I'm querying over which can be present with value true or absent entirely (please no discussions about whether this is the best data representation – I still think Mongo is behaving weirdly). This field is indexed with a non-sparse index with a reported size of 146MB.
I'm using the WiredTiger storage engine with a default cache size (so it should be around 8GB).
I'm trying to count the number of records missing the bg field.
Counting true values is tolerably fast (a few seconds):
> db.entities.find({bg: true}).count()
8300677
However the query for missing values is extremely slow (around 5 minutes):
> db.entities.find({bg: null}).count()
27497706
To my eyes, explain() looks ok:
> db.entities.find({bg: null}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "testdb.entities",
"indexFilterSet" : false,
"parsedQuery" : {
"bg" : {
"$eq" : null
}
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"bg" : {
"$eq" : null
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"bg" : 1
},
"indexName" : "bg_1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"bg" : [
"[null, null]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "mongo01",
"port" : 27017,
"version" : "3.0.3",
"gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105"
},
"ok" : 1
}
However the query remains stubbornly slow, even after repeated calls. Other count queries for different values are fast:
> db.entities.find({bg: "foo"}).count()
0
> db.entities.find({}).count()
35798383
I find this kind of strange, since my understanding is that missing fields in non-sparse indexes are simply stored as null, so the count query with null should be similar to counting an actual value (or maybe up to three times for three times as many positive values, if it has to count more index entries or something). Indeed, this answer reports vast speed improvements over similar queries involving null values and .count(). The only point of differentiation I can think of is WiredTiger.
Can anyone explain why is my query to count null values so slow or what I can do to fix it (apart from doing the obvious subtraction of the true counts from the total, which would work fine but wouldn't satisfy my curiosity)?
This is expected behavior, see: https://jira.mongodb.org/browse/SERVER-18653. Seems like a strange call to me to, but there you go, I'm sure there are programmers that know more about MongoDB than I do that are responsible.
You will need to use a different value to mean null. I guess this will depend on what you use the field for. In my case it is a foreign reference, so I'm just going to start using false to mean null. If you are using it to store a boolean value then you may need to use "null", -1, 0, etc.

MongoDB complex find

I need to grab the top 3 results for each of the 8 users. Currently I am looping through for each user and making 8 calls the the db. Is there a way to structure the query to pull the same 8X3 dataset in a single db pull?
selected_users = users.sample(8)
cur = 0
while cur <= selected_users .count-1
cursor = status_store.find({'user' => selected_users[cur]},{:fields =>params}).sort('score', -1).limit(3)
*do something*
cur+=1
end
The collection I am pulling from looks like the below. Each user can have an unbound number of tweets so I have not embedded them within within a user document.
{
"_id" : ObjectId("51e92cc8e1ce7219e40003eb"),
"id_str" : "57915476419948544",
"score" : 904,
"text" : "Yesterday we had a bald eagle on the show. Oddly enough, he was in the country illegally.",
"timestamp" : "19/07/2013 08:10",
"user" : {
"id_str" : "115485051",
"name" : "Conan O'Brien",
"screen_name" : "ConanOBrien",
"description" : "The voice of the people. Sorry, people.",
}
}
Thanks in advance.
Yes you can do this using the aggregation framework.
Another way would be to keep track of the top 3 scores for in the user documents. If this is faster or not depends on how often you write to scores vs read to top scores by users.

Resources