How to append to dexie entry using a rolling buffer (to store large entries without allocating GBs of memory) - dexie

I was redirected here after emailing the author of Dexie (David Fahlander). This is my question:
Is there a way to append to an existing Dexie entry? I need to store things that are large in dexie, but I'd like to be able to fill large entries with a rolling buffer rather than allocating one huge buffer and then doing a store.
For example, I have a 2gb file I want to store in dexie. I want to store that file by storing 32kb at a time into the same store, without having to allocate a 2gb of memory in the browser. Is there a way to do that? The put method seems to only overwrite entries.

Thanks for putting your question here at stackoverflow :) This helps me build up an open knowledge base for everyone to access.
There's no way in IndexedDB to update an entry without also instanciating the whole entry. Dexie adds the update() and modify() methods, but they only emulate a way to alter certain properties. In the background, the entire document will always be loaded in memory temporarily.
IndexedDB also has Blob support, but when a Blob i stored into IndexedDB, its entire content is cloned/copied into the database by specification.
So the best way to deal with this would be to dedicate a table for dynamic large content and add new entries to it.
For example, let's say you have a the tables "files" and "fileChunks". You need to incrementially grow the "file", and each time you do that, you don't want to instanciate the entire file in memory. You could then add the file chunks as separate entries into the fileChunks table.
let db = new Dexie('filedb');
db.version(1).stores({
files: '++id, name',
fileChunks: '++id, fileId'
});
/** Returns a Promise with ID of the created file */
function createFile (name) {
return db.files.add({name});
}
/** Appends contents to the file */
function appendFileContent (fileId, contentToAppend) {
return db.fileChunks.add ({fileId, chunk: contentToAppend});
}
/** Read entire file */
function readEntireFile (fileId) {
return db.fileChunks.where('fileId').equals(fileId).toArray()
.then(entries => {
return entries.map(entry=>entry.chunk)
.join(''); // join = Assume chunks are strings
});
}
Easy enough. If you want appendFileContent to be a rolling buffer (with a max size and erase old content), you could add truncate methods:
function deleteOldChunks (fileId, maxAllowedChunks) {
return db.fileChunks.where('fileId').equals(fileId);
.reverse() // Important, so that we delete old chunks
.offset(maxAllowedChunks) // offset = skip
.delete(); // Deletes all records older before N last records
}
You'd get other benefits as well, such as the ability to tail a stored file without loading its entire content into memory:
/** Tail a file. This function only shows an example on how
* dynamic the data is stored and that file tailing would be
* simple to do. */
function tailFile (fileId, maxLines) {
let result = [], numNewlines = 0;
return db.fileChunks.where('fileId').equals(fileId)
.reverse()
.until(() => numNewLines >= maxLines)
.each(entry => {
result.unshift(entry.chunk);
numNewlines += (entry.chunk.match(/\n/g) || []).length;
})
.then (()=> {
let lines = result.join('').split('\n')
.slice(1); // First line may be cut off
let overflowLines = lines.length - maxLines;
return (overflowLines > 0 ?
lines.slice(overflowLines) :
lines).join('\n');
});
}
The reason I know that chunks will come in the correct order in readEntireFile() and tailFile() is that indexedDB queries will always be retrieved in in the order of the queried column primary, but secondary in the order of the primary keys, which are auto-incremented numbers.
This pattern could be used for other cases, like logging etc. In case the file is not string based, you would have to alter this sample a little. Specifically, don't use string.join() or array.split().

Related

How does Apollo paginated "read" and "merge" work?

I was reading through the docs to learn pagination approaches for Apollo. This is the simple example where they explain the paginated read function:
https://www.apollographql.com/docs/react/pagination/core-api#paginated-read-functions
Here is the relevant code snippet:
const cache = new InMemoryCache({
typePolicies: {
Query: {
fields: {
feed: {
read(existing, { args: { offset, limit }}) {
// A read function should always return undefined if existing is
// undefined. Returning undefined signals that the field is
// missing from the cache, which instructs Apollo Client to
// fetch its value from your GraphQL server.
return existing && existing.slice(offset, offset + limit);
},
// The keyArgs list and merge function are the same as above.
keyArgs: [],
merge(existing, incoming, { args: { offset = 0 }}) {
const merged = existing ? existing.slice(0) : [];
for (let i = 0; i < incoming.length; ++i) {
merged[offset + i] = incoming[i];
}
return merged;
},
},
},
},
},
});
I have one major question around this snippet and more snippets from the docs that have the same "flaw" in my eyes, but I feel like I'm missing some piece.
Suppose I run a first query with offset=0 and limit=10. The server will return 10 results based on this query and store it inside cache after accessing merge function.
Afterwards, I run the query with offset=5 and limit=10. Based on the approach described in docs and the above code snippet, what I'm understanding is that I will get only the items from 5 through 10 instead of items from 5 to 15. Because Apollo will see that existing variable is present in read (with existing holding initial 10 items) and it will slice the available 5 items for me.
My question is - what am I missing? How will Apollo know to fetch new data from the server? How will new data arrive into cache after initial query? Keep in mind keyArgs is set to [] so the results will always be merged into a single item in the cache.
Apollo will not slice anything automatically. You have to define a merge function that keeps the data in the correct order in the cache. One approach would be to have an array with empty slots for data not yet fetched, and place incoming data in their respective index. For instance if you fetch items 30-40 out of a total of 100 your array would have 30 empty slots then your items then 60 empty slots. If you subsequently fetch items 70-80 those will be placed in their respective indexes and so on.
Your read function is where the decision on whether a network request is necessary or not will be made. If you find all the data in existing you will return them and no request to the server will be made. If any items are missing then you need to return undefined which will trigger a network request, then your merge function will be triggered once data is fetched, and finally your read function will run again only this time the data will be in the cache and it will be able to return them.
This approach is for the cache-first caching policy which is the default.
The logic for returning undefined from your read function will be implemented by you. There is no apollo magic under the hood.
If you use cache-and-network policy then a your read doesn't need to return undefined when data

Getting the objects with similar secondary index in Riak?

Is there a way to get all the objects in key/value format which are under one similar secondary index value. I know we can get the list of keys for one secondary index (bucket/{{bucketName}}/index/{{index_name}}/{{index_val}}). But somehow my requirements are such that if I can get all the objects too. I don't want to perform a separate query for each key to get the object details separately if there is way around it.
I am completely new to Riak and I am totally a front-end guy, so please bear with me if something I ask is of novice level.
In Riak, it's sometimes the case that the better way is to do separate lookups for each key. Coming from other databases this seems strange, and likely inefficient, however you may find your query will be faster over an index and a bunch of single object gets, than a map/reduce for all the objects in a single go.
Try both these approaches, and see which turns out fastest for your dataset - variables that affect this are: size of data being queried; size of each document; power of your cluster; load the cluster is under etc.
Python code demonstrating the index and separate gets (if the data you're getting is large, this method can be made memory-efficient on the client, as you don't need to store all the objects in memory):
query = riak_client.index("bucket_name", 'myindex', 1)
query.map("""
function(v, kd, args) {
return [v.key];
}"""
)
results = query.run()
bucket = riak_client.bucket("bucket_name")
for key in results:
obj = bucket.get(key)
# .. do something with the object
Python code demonstrating a map/reduce for all objects (returns a list of {key:document} objects):
query = riak_client.index("bucket_name", 'myindex', 1)
query.map("""
function(v, kd, args) {
var obj = Riak.mapValuesJson(v)[0];
return [ {
'key': v.key,
'data': obj,
} ];
}"""
)
results = query.run()

Insert lots of data at once using Laravel migrations?

I currently parse a CSV file to insert data into a database, but the problem is that because it's 20 000 rows, it takes very long. Is there a way to insert more lines at once using Laravel migrations?
This is what I am doing at the moment:
foreach ($towns as $town) {
DB::table('town')->insert(
array(
// data goes here
)
);
}
I think maybe my question is a bit vague. I want to know what the format is to mass insert multiple items using one query, and if this will actually make a difference in speed?
You can mass insert by filling an array with your data:
foreach ($towns as $town) {
$array[] = array(... your data goes here...);
}
And then run it just once
DB::table('town')->insert($array);
But I really don't know how much faster it can be. You can also disable query log:
DB::disableQueryLog();
It uses less memory and is usually faster.

jqGrid - After re-ordering columns, sorting maps data to original columns

EDIT: Final solution is below.
Whether I try to implement column re-ordering via header dragging or via the column chooser plugin, after re-ordering the columns, clicking on any column header to sort results in the sorted columns being loaded into their original positions in the table. Using the sortable method:
sortable: {
update: function (perm) {
/*
* code to save the new colmodel goes here
*/
// the following line doesn't seem to do anything... just seems to return an array identical to 'perm'
$("#mainGrid").jqGrid("getGridParam", "remapColumns");
// if included, the next line causes the headers to not move
$("#mainGrid").jqGrid("remapColumns", perm, true);
// this alternate allows them to move, but the newly sorted columns still get remapped to their original position
$("#mainGrid").jqGrid("remapColumns", [0,1,2,3,4,5,6,7,8,9,10,11,12], true);
/* the following allows the headers to move, and allows the sort to occur ONLY
* if the order coming back from the database is unchanged. Note that in my real
* code I create an array of consecutive integers to pass as the first param to
* remapColumns()
*/
$("#mainGrid").jqGrid("remapColumns", [0,1,2,3,4,5,6,7,8,9,10,11,12], true, false);
}
}
When the page is reached for the first time, it creates a default column model from an xml file. When the user re-orders the headers, the new column model and column names are stored in the database as JSON strings. When the user makes another database call, the function reads the new column order from the database and creates the data array with the new ordering.
The problem seems to be that after jqGrid has remapped the columns, it still expects to see the data coming back from the server in the original order. So if the original data was
[ [A1, B1, C1], [A2, B2, C2], [A3, B3, C3] ]
after remapping the columns to the order C | A | B, jqGrid still wants the data to came back in the original order.
My final solution was to remove the code that saves the column model state from the sortable.update() function and to put it into window.onbeforeunload(). This way, the state is only saved when the user exits the page.
Hope this helps someone else.
See edited question. Without a method to update the colModel, the best solution seems to put the state save function into window.onbeforeunload().

LevelDB key,value from csv

I've huge csv file database of ~5M rows having below fields
start_ip,end_ip,country,city,lat,long
I am storing these in LevelDB using start_ip as the key and rest as the value.
How can I retrieve records for keys where
( ip_key > start_ip and ip_key < end_ip )
Any alternative solution.
I assume that your keys are the hash values of the IP and the hashes are 64-bit `unsigned' integers, but if that's not the case then just modify the code below to account for the proper keys.
void MyClass::ReadRecordRange(const uint64 startRange, const uint64 endRange)
{
// Get the start slice and the end slice
leveldb::Slice startSlice(static_cast<const char*>(static_cast<const void*>(&startRange)), sizeof(startRange));
leveldb::Slice endSlice(static_cast<const char*>(static_cast<const void*>(&endRange)), sizeof(endRange));
// Get a database iterator
shared_ptr<leveldb::Iterator> dbIter(_database->NewIterator(leveldb::ReadOptions()));
// Possible optimization suggested by Google engineers
// for critical loops. Reduces memory thrash.
for(dbIter->Seek(startSlice); dbIter->Valid() && _options.comparator->Compare(dbIter->key(), endSlice)<=0); dbIter->Next())
{
// get the key
dbIter->key().data();
// get the value
dbIter->value().data();
// TODO do whatever you need to do with the key/value you read
}
}
Note that _options are the same leveldb::Options with which you opened the database instance. You want to use the comparator specified in the options so that the order in which you read the records is the same as the order in the database.
If you're not using boost or tr1, then you can either use something else similar to the shared_ptr or just delete the leveldb::Iterator by yourself. If you don't delete the iterator, then you'll leak memory and get asserts in debug mode.

Resources