Searching a MongoDB collection from the end (c#) - mongodb-.net-driver

I am looking for the most efficient way to get the last elements of a fairly large (> 1 million docs) MongoDB collection.
Specifically, it is the oplog collection and I am looking for all entries after a given timestamp. It makes no sense to search the first million or so entries for a timestamp larger than the current one, since they are all definitely older because the collection is stored in its natural order.
Is there a way to tell MongoDB to search from the end of a collection?
I tried a linq query with Skip(N) but it's very slow. It seems it parses through all documents from the beginning and just doesn't return the first N.

The most efficient way is probably using aggregation. If your collection is sorted, you can get the last Timestamp using this aggregation:
var group = new BsonDocument
{
{
"$group", new BsonDocument
{
{"_id", 0},
{"newestTimeStamp", new BsonDocument { {"$last","$timeStamp"} } }
}
}
};
var pipeline = new[] {group};
var result = _dtCollection.Aggregate(pipeline);
}
Then you can deserialize the result into a Timestamp class. If you want to get several elements, you could create a similar expression using $match.
Also make sure to add an index to the collection on the TimeStamp field. This will probably make your LINQ-query faster if you decide to use that instead.

Related

Getting the objects with similar secondary index in Riak?

Is there a way to get all the objects in key/value format which are under one similar secondary index value. I know we can get the list of keys for one secondary index (bucket/{{bucketName}}/index/{{index_name}}/{{index_val}}). But somehow my requirements are such that if I can get all the objects too. I don't want to perform a separate query for each key to get the object details separately if there is way around it.
I am completely new to Riak and I am totally a front-end guy, so please bear with me if something I ask is of novice level.
In Riak, it's sometimes the case that the better way is to do separate lookups for each key. Coming from other databases this seems strange, and likely inefficient, however you may find your query will be faster over an index and a bunch of single object gets, than a map/reduce for all the objects in a single go.
Try both these approaches, and see which turns out fastest for your dataset - variables that affect this are: size of data being queried; size of each document; power of your cluster; load the cluster is under etc.
Python code demonstrating the index and separate gets (if the data you're getting is large, this method can be made memory-efficient on the client, as you don't need to store all the objects in memory):
query = riak_client.index("bucket_name", 'myindex', 1)
query.map("""
function(v, kd, args) {
return [v.key];
}"""
)
results = query.run()
bucket = riak_client.bucket("bucket_name")
for key in results:
obj = bucket.get(key)
# .. do something with the object
Python code demonstrating a map/reduce for all objects (returns a list of {key:document} objects):
query = riak_client.index("bucket_name", 'myindex', 1)
query.map("""
function(v, kd, args) {
var obj = Riak.mapValuesJson(v)[0];
return [ {
'key': v.key,
'data': obj,
} ];
}"""
)
results = query.run()

How do I get unique field values using rethinkdb javascript?

I have a field which has similar values. For eg {country : 'US'} occurs multiple times in the table. Similar for other countries too. I want to return an array which contains non-redundant values of 'country' field. I am new to creating Databases so likely this is a trivial question but I couldn't find anything useful in rethinkdb api.[SOLVED]
Thanks
You can use distinct, but the distinct command was created for short sequences only.
If you have a lot of data, you can use map/reduce
r.table("data").map(function(doc) {
return r.object(doc("country"), true) // return { <country>: true}
}).reduce(function(left, right) {
return left.merge(right)
}).keys() // return all the keys of the final document

Sorting a NotesDocumentCollection based on a date field in SSJS

Using Server side javascript, I need to sort a NotesDcumentCollection based on a field in the collection containing a date when the documents was created or any built in field when the documents was created.
It would be nice if the function could take a sort option parameter so I could put in if I want the result back in ascending or descending order.
the reason I need this is because I use database.getModifiedDocuments() which returns an unsorted notesdocumentcollection. I need to return the documents in descending order.
The following code is a modified snippet from openNTF which returns the collection in ascending order.
function sortColByDateItem(dc:NotesDocumentCollection, iName:String) {
try{
var rl:java.util.Vector = new java.util.Vector();
var tm:java.util.TreeMap = new java.util.TreeMap();
var doc:NotesNotesDocument = dc.getFirstDocument();
while (doc != null) {
tm.put(doc.getItemValueDateTimeArray(iName)[0].toJavaDate(), doc);
doc = dc.getNextDocument(doc);
}
var tCol:java.util.Collection = tm.values();
var tIt:java.util.Iterator = tCol.iterator();
while (tIt.hasNext()) {
rl.add(tIt.next());
}
return rl;
}catch(e){
}
}
When you construct the TreeMap, pass a Comparator to the constructor. This allows you to define custom sorting instead of "natural" sorting, which by default sorts ascending. Alternatively, you can call descendingMap against the TreeMap to return a clone in reverse order.
This is a very expensive methodology if you are dealing with large number of documents. I mostly use NotesViewEntrycollection (always sorted according to the source view) or view navigator.
For large databases, you may use a view, sorted according to the modified date and navigate through entries of that view until the most recent date your code has been executed (which you have to save it somewhere).
For smaller operations, Tim's method is great!

Rearranging active record elements in Yii

I am using a CDbCriteria with its own conditions, with & order clauses. However, the order i want to give to the elements in the array is way too complex to specify in the order clause.
The solution i have in mind consists of obtaining the active records with the defined criteria like this
$theModelsINeed = MyModel::model()->findAll($criteria);
and then rearrange the order from my php code. How can i do this? I mean, i know how to iterate through its elements, but i donĀ“t know if it is possible to actually change them.
I have been looking into this link about populating active records, but it seems quite complicated and maybe someone could have some better advice.
Thanks
There is nothing special about Yii's active records. The find family of methods will return an array of objects, and you can sort this array like any other array in PHP.
If you have complex sort criteria, this means that probably the best tool for this is usort. Since you will be dealing with objects, your user-defined comparison functions will look something like this:
function compare($x, $y)
{
// First sort criterion: $obj->Name
if ($x->Name != $y->Name) {
return $x->Name < $y->Name ? -1 : 1; // this is an ascending sort
}
// Second sort criterion: $obj->Age
if ($x->Age != $y->Age) {
return $x->Age < $y->Age ? 1 : -1; // this is a descending sort
}
// Add more criteria here
return 0; // if we get this far, the items are equal
}
If you do want to get an array as a result, you can use this method for fetching data that supports dbCriteria:
$model = MyModel::model()->myScope();
$model->dbCriteria->condition .= " AND date BETWEEN :d1 AND :d2";
$model->dbCriteria->order = 'field1 ASC, field2 DESC';
$model->dbCriteria->params = array(':d1'=>$d1, ':d2'=>$d2);
$theModelsINeed = $model->getCommandBuilder()
->createFindCommand($model->tableSchema, $model->dbCriteria)
->queryAll();
The above example shows using a defined scope and modifying the condition with named parameters.
If you don't need Active Record, you could also look into Query Builder, but the above method has worked pretty well for me when I want to use AR but need an array for my result.

LINQ performance

I am reading records from database and check some conditions and store in List<Result>. Result is a class. Then performing LINQ query in List<Result> like grouping, counting etc. So there may be chance that min 50,000 records in List<Result>, so in this whether its better to go for LINQ (or) reinsert the records to db and perform the queries?
Why not store it in an IQueryable instead of a List and using LINQ to SQL or LINQ to Entities, the actual dataset will never be pulled into memory, and the queries will actually go down to the database to run.
Example:
Database db = new Database(); // this is what L2E gives you...
var children = db.Person.Where(p => p.Age < 21); // no actual database query performed
// will do : "select count(*) from Person where Age < 21"
int numChildren = children.Count();
var grouped = children.GroupBy(p => p.Age); // no actual query
int youngest = children.Min(p => p.Age); // performs query
int numYoungest = youngest.Count(p => p.Age == youngest); // performs query.
var youngestNames = children.Where(p => p.Age == youngest).Select(p => p.Name); // no query
var anArray = youngestNames.ToArray(); // performs query
string names = string.join(", ", anArray); // no query of course
I'm currently asking the same kind of thing right now. I don't really know the exact answer either, but from what I know, LINQ is not well know to be fast on objects. Also, since List is not indexed, when you do advance query on them, the backend will probably need to do a lot of computing to get what you asked for. Also, this code is generic, so it means slower execution.
The best thing would be, if you are able, do everything in one query, or even do a startproc to do your processing. Or another possibility, if you are always checking the same initial condition, create a view and do your query directly on this table (instead of reinserting from the client). I think that if you have more than 50,000 results, probably using a list is not a good idea (Memory and Performance).
It probably doesn't answer your question directly, but other than doing benchmark, you won't know. It really depends on what you are doing with the data.

Resources