indexeddb sorting index based on integer value - sorting

I am using Dexie
var db = new Dexie('name');
db.version(4)
.stores({
sentence: "&sentenceId, [sentence__authorId+sentenceChapter+sentenceNo], sentenceContent, headingContent, sentenceStarts, sentenceEnds"
)};
and I need to get the result sorted by sentenceNo
db.sentence.where('[sentence_authorId+sentenceChapter+sentenceNo]')
.between([articleId, chapter_selected.toString(), -Infinity], [articleId, chapter_selected.toString(), '\uffff'])
.each (function (sentence) {
//............
});
but the result I get from the above query is like sorting an integer column in string format
eg: 1,11,12,13,14...
how to sort sentenceNo in integer
eg : 1,2,3,4,...

Are you sure that sentenceNo are not stored as strings instead of numbers? IndexedDB enumerate your index in respect to its types.
If you're certain that typeof sentanceNo === 'number', the only thing I could think of would be if you're using a Safari < 10 with IndexedDBShim - theoretically, the shim could have this kind of bug, but that is just a theory.

Related

Slow AQL and data type conversion, how can I improve my AQL performance?

Hello ArangoDB community,
I have imported two collections from sqlite to ArangoDB with arangoimport (via a CSV).
Next, I try to run a simple AQL to cross reference these collections (with an end goal to connect them via edges).
Collection1 has 1,682,642 documents
Collection2 has 3,290 documents
The following AQL takes a whopping 30 seconds to complete:
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER c2._key == TO_STRING(c1.someField) return {"C2": c2._id, "C1": c1._id}
If I switch the conversion like so, it takes forever (I abandoned after 5 minutes):
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER TO_NUMBER(c2._key) == c1.someField return {"C2": c2._id, "C1": c1._id}
Adding an index on "someField" didn't help.
The same JOIN query in Sqlite (from which the data was imported) takes less than 1 second to complete
A few thoughts and questions:
1) How can I know the data types of the fields in a document?
2) _key is a string. I think "someField" is a number (because without the TO_STRING, no results returned).
3) Is adding TO_STRING on "someField" effectively makes the index on the field unusable?
4) Is there a way to make _key a number (preferably an integer). I think number comparison is faster, is it not?
5) Alternatively, can I tell arangoimport to force "someField" to be a string?
6) Is there anything else I can do to make the AQL run faster?
Any input appreciated,
Elad
The supported data types follow the JSON specs. You can determine the data types by looking at a document, e.g. using the Web UI. Use the Code view mode in the document editor to see the document as JSON:
"Aerosmith" is a string, 1973 is a number, the genres are string in an [ ... ] array and each song is an { ... } object. There are also null, true and false literals.
For a programmatic way to determine the data type of an attribute there are Type check functions, e.g. TYPENAME() to return the data type name as string. Example query to count how often the attribute someField is of which data type:
RETURN MERGE( FOR c1 IN Collection1
COLLECT type = TYPENAME(c1.someField) WITH COUNT INTO count
RETURN { [type]: count }
)
_key is always a string indeed. You can use above query if you are unsure what someField is. Please share this information.
If you cast a value which is only known at run-time (here: document attribute) to a different type then yes, no index can be utilized. An index lookup is only possible if you query for a value as-is. However, you may type-cast bind variables and other constant values, as they are known at query compile time.
No, the document key is always a string. There is an index on the _key attribute (the primary index), hence there is no performance penalty because it is a string instead of a numeric value.
arangoimport has an option to convert numeric strings to numbers, "null" to null and "true" / "false" to Boolean values (--convert), but there is no option to force an attribute to become a string. There is a feature request to add the ability to prescribe the desired data types.
In case you want numeric strings to stay strings, use --convert false to turn the auto-conversion off. If the values are numbers in the source file (not in quote marks), then you can adjust the file before you import it. You can also use a one-off AQL query to convert an attribute to a certain data type:
FOR c1 IN Collection1
UPDATE doc WITH { someField: TO_STRING(someField) } IN Collection1
I assume that in SQLite the primary key was an integer value and therefore references to it as well (foreign keys). Because the primary key must be a string in ArangoDB, the references need to be of type string as well. Change the documents to store foreign keys as strings as well. Add a hash index to Collection1 on someField (the field you use for a join). Then this query should be fast and return the expected result:
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER c2._key == c1.someField
RETURN { C2: c2._id, C1: c1._id }

Searching a MongoDB collection from the end (c#)

I am looking for the most efficient way to get the last elements of a fairly large (> 1 million docs) MongoDB collection.
Specifically, it is the oplog collection and I am looking for all entries after a given timestamp. It makes no sense to search the first million or so entries for a timestamp larger than the current one, since they are all definitely older because the collection is stored in its natural order.
Is there a way to tell MongoDB to search from the end of a collection?
I tried a linq query with Skip(N) but it's very slow. It seems it parses through all documents from the beginning and just doesn't return the first N.
The most efficient way is probably using aggregation. If your collection is sorted, you can get the last Timestamp using this aggregation:
var group = new BsonDocument
{
{
"$group", new BsonDocument
{
{"_id", 0},
{"newestTimeStamp", new BsonDocument { {"$last","$timeStamp"} } }
}
}
};
var pipeline = new[] {group};
var result = _dtCollection.Aggregate(pipeline);
}
Then you can deserialize the result into a Timestamp class. If you want to get several elements, you could create a similar expression using $match.
Also make sure to add an index to the collection on the TimeStamp field. This will probably make your LINQ-query faster if you decide to use that instead.

How do I get unique field values using rethinkdb javascript?

I have a field which has similar values. For eg {country : 'US'} occurs multiple times in the table. Similar for other countries too. I want to return an array which contains non-redundant values of 'country' field. I am new to creating Databases so likely this is a trivial question but I couldn't find anything useful in rethinkdb api.[SOLVED]
Thanks
You can use distinct, but the distinct command was created for short sequences only.
If you have a lot of data, you can use map/reduce
r.table("data").map(function(doc) {
return r.object(doc("country"), true) // return { <country>: true}
}).reduce(function(left, right) {
return left.merge(right)
}).keys() // return all the keys of the final document

Sorting a NotesDocumentCollection based on a date field in SSJS

Using Server side javascript, I need to sort a NotesDcumentCollection based on a field in the collection containing a date when the documents was created or any built in field when the documents was created.
It would be nice if the function could take a sort option parameter so I could put in if I want the result back in ascending or descending order.
the reason I need this is because I use database.getModifiedDocuments() which returns an unsorted notesdocumentcollection. I need to return the documents in descending order.
The following code is a modified snippet from openNTF which returns the collection in ascending order.
function sortColByDateItem(dc:NotesDocumentCollection, iName:String) {
try{
var rl:java.util.Vector = new java.util.Vector();
var tm:java.util.TreeMap = new java.util.TreeMap();
var doc:NotesNotesDocument = dc.getFirstDocument();
while (doc != null) {
tm.put(doc.getItemValueDateTimeArray(iName)[0].toJavaDate(), doc);
doc = dc.getNextDocument(doc);
}
var tCol:java.util.Collection = tm.values();
var tIt:java.util.Iterator = tCol.iterator();
while (tIt.hasNext()) {
rl.add(tIt.next());
}
return rl;
}catch(e){
}
}
When you construct the TreeMap, pass a Comparator to the constructor. This allows you to define custom sorting instead of "natural" sorting, which by default sorts ascending. Alternatively, you can call descendingMap against the TreeMap to return a clone in reverse order.
This is a very expensive methodology if you are dealing with large number of documents. I mostly use NotesViewEntrycollection (always sorted according to the source view) or view navigator.
For large databases, you may use a view, sorted according to the modified date and navigate through entries of that view until the most recent date your code has been executed (which you have to save it somewhere).
For smaller operations, Tim's method is great!

Rearranging active record elements in Yii

I am using a CDbCriteria with its own conditions, with & order clauses. However, the order i want to give to the elements in the array is way too complex to specify in the order clause.
The solution i have in mind consists of obtaining the active records with the defined criteria like this
$theModelsINeed = MyModel::model()->findAll($criteria);
and then rearrange the order from my php code. How can i do this? I mean, i know how to iterate through its elements, but i donĀ“t know if it is possible to actually change them.
I have been looking into this link about populating active records, but it seems quite complicated and maybe someone could have some better advice.
Thanks
There is nothing special about Yii's active records. The find family of methods will return an array of objects, and you can sort this array like any other array in PHP.
If you have complex sort criteria, this means that probably the best tool for this is usort. Since you will be dealing with objects, your user-defined comparison functions will look something like this:
function compare($x, $y)
{
// First sort criterion: $obj->Name
if ($x->Name != $y->Name) {
return $x->Name < $y->Name ? -1 : 1; // this is an ascending sort
}
// Second sort criterion: $obj->Age
if ($x->Age != $y->Age) {
return $x->Age < $y->Age ? 1 : -1; // this is a descending sort
}
// Add more criteria here
return 0; // if we get this far, the items are equal
}
If you do want to get an array as a result, you can use this method for fetching data that supports dbCriteria:
$model = MyModel::model()->myScope();
$model->dbCriteria->condition .= " AND date BETWEEN :d1 AND :d2";
$model->dbCriteria->order = 'field1 ASC, field2 DESC';
$model->dbCriteria->params = array(':d1'=>$d1, ':d2'=>$d2);
$theModelsINeed = $model->getCommandBuilder()
->createFindCommand($model->tableSchema, $model->dbCriteria)
->queryAll();
The above example shows using a defined scope and modifying the condition with named parameters.
If you don't need Active Record, you could also look into Query Builder, but the above method has worked pretty well for me when I want to use AR but need an array for my result.

Resources