CouchDB performance issue reducing pair of values - performance

Using CouchDB, I'm getting very poor performance trying to compute a "last pass" and "last fail" time over a set of automated test results.
I have a DB of ~5000 records of the form:
{
"completionTime": "2013-06-06T17:28:09.384Z",
"environment": "ENV1",
"passed": true,
"duration": 59142,
"summary": "",
"origin": {
"rowId": "1",
"worksheet": "Sheet1",
"workbook": "book.xlsm"
}
}
I have a view defined with map:
function(run) {
if (run.environment && run.origin && run.origin.rowId && run.origin.worksheet && run.origin.workbook && run.completionTime) {
var key = [run.environment, run.origin.rowId, run.origin.worksheet, run.origin.workbook]
var completionTime = Date.parse(run.completionTime)
if (run.passed)
emit(key, [completionTime, null] );
else
emit(key, [null, completionTime] );
}
}
And reduce:
function (key, values, rereduce) {
var latestPass = null;
var latestFail = null;
for (var i = 0; i < values.length; i++) {
latestPass = Math.max(values[i][0], latestPass);
latestFail = Math.max(values[i][1], latestFail);
}
return [latestPass, latestFail];
}
When querying this view for all results (about 750), it takes anywhere from 10-50 seconds, which is significantly slower than I'd expect.
Am I doing something obviously wrong?

From my limited experience with CouchDB view tuning, I found that writing the view in Erlang significantly improved performance.
Start with this: http://wiki.apache.org/couchdb/EnableErlangViews
Then write your view in Erlang (some examples): Emit Tuples From Erlang Views In CouchDB
It's a bit tricky to get the syntax of the Erlang views correct, but it's fun to try and I saw a little over a 50% increase in performance compared to Javascript views.

I switched to MongoDB, and the same queries ran in hundreds of milliseconds, rather than tens of seconds.

Related

Cypress - Loop looking for data and refresh if not found

I need to loop looking for an item in a table and if it's not found, click a refresh button to reload the table. I know I can't use a simple while loop due to the asynchronous nature of cypress. Is there another way to accomplish something like this.
I tried to tweak an example from another post but no luck. Here's my failed attempt.
let arry = []
for (let i = 0; i < 60; i++) { arry.push(i) }
cy.wrap(arry).each(() => {
cy.get('table[class*="MyTableClass"]').then(function($lookForTheItemInTheTable) {
if($lookForTheItemInTheTable.find("MySearchValue")) {
return true
}
else {
cy.get('a[class*="selRefreshTable"]').click()
cy.wait(2000)
}
})
})
Cypress is bundled with lodash. Instead of using a for loop, you can the _.times(). However, I wouldn't recommend this for your situation as you do not know how many times you would like to reiterate.
You'll want to use the cypress-recurse plugin and use it like this example in that repo:
import { recurse } from 'cypress-recurse'
it('gets 7 after 50 iterations or 30 seconds', () => {
recurse(
() => cy.task('randomNumber'), // actions you want to iterate
(n) => n === 7, // until this condition is satisfied
{ // options to pass along
log: true,
limit: 50, // max number of iterations
timeout: 30000, // time limit in ms
delay: 300 // delay before next iteration, ms
},
)
})
Even with the above mentioned, there may be a simplified approach to solving your problem with setting up your app to have the table always display what you are seeking on the first try.

Different size for the same view on CouchDB

I have two databases with similar data (organized differently) and I've created a view for each one returning the same response. I have notice that the time response of the query is different even returning the same response, one being 3182ms, other being 217ms approximately, having queried 5 times.
I query both using:
curl -x GET ...db1/_design/query1/view/q1?group=true and
curl -x GET ...db2/_design/query1/view/q1?group=true.
I have checked the data sizes of the design documents using curl -x GET ...db1/_design/query1/_info. The design data size of the first is 146073878 bites and the second is 3739596 bites.
I thought both should have the same size, because they return the same view, and i havent used any filters, both views beeing equal.
Somebody can explain me why the same view created by different databases have different sizes?
My data is organized using two differents roots, but the same data, changing only the root:
Customer data in the root:
{
"c_customer_sk": 65836,
"c_first_name": "Frank",
"c_last_name": "White",
"store_sales": [
{
"ss_sales_price": 20.24,
"ss_ext_sales_price": 1012,
"ss_coupon_amt": 0,
"date": [
{
"d_month_seq": 1187,
"d_year": 1998
}
],
"item": [
{
"i_item_sk": 10454,
"i_item_id": "AAAAAAAAGNICAAAA",
"i_item_desc": "Results highlight as patterns; so right years show. Sometimes suitable lips move with the critics. English, old mothers ought to lift now perhaps future managers. Active, single ch",
"i_current_price": 2.88,
"i_class": "romance",
"i_category_id": 9,
"i_category": "Books"
}
]
},
{
"ss_sales_price": 225,
"ss_ext_sales_price": 1023,
"ss_coupon_amt": 0,...
View function for customer in the root:
function(doc)
{
for each (store_sales in doc.store_sales) {
var s=store_sales.ss_ext_sales_price;
if(s==null){s=0}
for each (item in store_sales.item){
var item_id=item.i_item_id;
var item_desc=item.i_item_desc;
var category=item.i_category;
var class=item.i_class;
var price=item.i_current_price;}
if(category=="Music" || category=="Home" || category=="Sports"){
for each (date in store_sales.date){
var g=date.d_month_seq;}
if (g>=1200 && g<=1211){
emit({item_id:item_id,item_desc:item_desc, category:category, class:class, current_price:price},s);
}
}}}
reduce:_sum
Example of answer:
key:
{"item_id": "AAAAAAAAAAAEAAAA", "item_desc": "Rates expect probably necessary events. Circumstan", "category": "Sports", "class": "optics", "current_price": 3.99}
Value:
106079.49999999999
Item data in the root:
{
"i_item_sk": 10454,
"i_item_id": "AAAAAAAAGNICAAAA",
"i_item_desc": "Results highlight as patterns; so right years show. Sometimes suitable lips move with the critics. English, old mothers ought to lift now perhaps future managers. Active, single ch",
"i_current_price": 2.88,
"i_class": "romance",
"i_category_id": 9,
"i_category": "Books",
"store_sales": [
{
"ss_sales_price": 20.24,
"ss_ext_sales_price": 1012,
"ss_coupon_amt": 0,
"date": [
{
"d_month_seq": 1187,
"d_year": 1998
}
],
"customer": [
{
"c_customer_sk": 65836,
"c_first_name": "Frank",
"c_last_name": "White",
}
]
},
{
"ss_sales_price": 225,
"ss_ext_sales_price": 1023,
"ss_coupon_amt": 0,...
View for item on root:
function(doc)
{
var item_id=doc.i_item_id;
var item_desc=doc.i_item_desc;
var category=doc.i_category;
var class=doc.i_class;
var price=doc.i_current_price;
if(category=="Music" || category=="Home" || category=="Sports"){
for each (store_sales in doc.store_sales) {
var s=store_sales.ss_ext_sales_price;
if(s==null){s=0}
for each (date in store_sales.date){
var g=date.d_month_seq;}
if (g>=1200 && g<=1211){
emit({item_id:item_id,item_desc:item_desc, category:category, class:class, current_price:price},s);
}
}}}
reduce:_sum
Returning the same answer.
I have made the cleanup and compaction of the designs and the time response of the database which the itens data are in the root is much faster, and the sizes of the data size is smaller too, but I dont know why.
Can someone explain me?
could it be a difference of database compaction? When you replicate an existing databases to an empty one, only the last revision of each documents are sent to the new one, making it potentially way lighter. The same applies to views

How about performance for the $near operation in mongodb(meteor)

I am using meteor to implement a 'near' query.
In fact, it works well. But I am wondering how about the performance for server side.
This is the code for near query:
var geolocation = Session.get('location');
var lnglat = [0,0];
if(geolocation){
lnglat = [geolocation.longitude,geolocation.latitude];
}
if(Session.get('type') === 'near'){
return Posts.find({
location: {
$near: {
$geometry: {
type: "Point",
coordinates: lnglat
},
$maxDistance: 20000 //meters
}
}
});
}
The best answer lies in the mongodb source code in github (obviously! duh!). According to the documentation, its important to ensure the indexes. Link: http://docs.mongodb.org/manual/core/geospatial-indexes/

How can I validate DBRefs in a MongoDB collection?

Assuming I've got a MongoDB instance with 2 collections - places and people.
A typical places document looks like:
{
"_id": "someID"
"name": "Broadway Center"
"url": "bc.example.net"
}
And a people document looks like:
{
"name": "Erin"
"place": DBRef("places", "someID")
"url": "bc.example.net/Erin"
}
Is there any way to validate the places DBRef of every document in the people collection?
There's no official/built-in method to test the validity of DBRefs, so the validation must be performed manually.
I wrote a small script - validateDBRefs.js:
var returnIdFunc = function(doc) { return doc._id; };
var allPlaceIds = db.places.find({}, {_id: 1} ).map(returnIdFunc);
var peopleWithInvalidRefs = db.people.find({"place.$id": {$nin: allPlaceIds}}).map(returnIdFunc);
print("Found the following documents with invalid DBRefs");
var length = peopleWithInvalidRefs.length;
for (var i = 0; i < length; i++) {
print(peopleWithInvalidRefs[i]);
}
That when run with:
mongo DB_NAME validateDBRefs.js
Will output:
Found the following documents with invalid DBRefs
513c4c25589446268f62f487
513c4c26589446268f62f48a
you could add a stored function for that. please note that the mongo documentation discourages the use of stored functions. You can read about it here
In essence you create a function:
db.system.js.save(
{
_id : "myAddFunction" ,
value : function (x, y){ return x + y; }
}
);
and once the function is created you can use it in your where clauses. So you could write a function that checks for the existence of the id in the dbRef.

Map reduce to count tags

I am developing a web app using Codeigniter and MongoDB.
I am trying to get the map reduce to work.
I got a file document with the below structure. I would like to do a map reduce to
check how many times each tag is being used and output it to the collection files.tags.
{
"_id": {
"$id": "4f26f21f09ab66c1030d0000e"
},
"basic": {
"name": "The filename"
},
"tags": [
"lorry",
"house",
"car",
"bicycle"
],
"updated_at": "2012-02-09 11:08:03"
}
I tried this map reduce command but it does not count each individual tag:
$map = new MongoCode ("function() {
emit({tags: this.tags}, {count: 1});
}");
$reduce = new MongoCode ("function( key , values ) {
var count = 0;
values.forEach(function(v) {
count += v['count'];
});
return {count: count};
}");
$this->mongo_db->command (array (
"mapreduce" => "files",
"map" => $map,
"reduce" => $reduce,
"out" => "files.tags"
)
);
Change your Map function to:
function map(){
if(!this.tags) return;
this.tags.forEach(function(tag){
emit(tag, {count: 1});
});
}
Yea, this map/reduce simply calculate total count of tags.
In mongodb cookbook there is example you are looking for.
You have to emit each tag instead of entire collection of tags:
map = function() {
if (!this.tags) {
return;
}
for (index in this.tags) {
emit(this.tags[index], 1);
}
}
You'll need to call emit once for each tag in the input documents.
MongoDB documentation for example says:
A map function calls emit(key,value) any
number of times to feed data to the reducer. In most cases you will
emit once per input document, but in some cases such as counting tags,
a given document may have one, many, or even zero tags.

Resources