What is the ideal way to store data in an indexeddb database?

What is the ideal way to store data in an indexeddb database? - indexed

I know how a relational database stores data. I know the basics of indexeddb (up to and including writing a script which plasters over the differences between chrome and ff). I understand the principles behind using an index and that indexeddb stores js objects.
I am wondering if there is some design pattern that should be used when working with a flat database like indexeddb. Right now I have it saving every "row" to a objectStore which is then looked up by its key or an index. Would it be better to save one huge object instead of a bunch of rows?
Also, how should relationships be handled? That is, how should one bridge the gap between RDBSM's and flat databases like indexeddb?
I did a test yesterday and it took 11 seconds to write
params = {
"user_id":"4",
"first_name":"Bob Smith",
"phone": "1-800-555-1212"
};
to the database 100 times. I did open a new transaction each time but that still seems like a really long time. test was in ff.

results are in - it doesn't really matter.
if you store it relationally it has about the same read write times as a database that is naturally relational (websql) or even just key/value pairs (localStorage).

Related

Core Data or sqlite for fast search?

This is a description of the application I want to build and I'm not sure whether to use Core Data or Sqlite (or something else?):
Single user, desktop, not networked, only one frontend is accessing datastorage
User occasionally enters some data, no bulk data importing or large data inserts
Simple datamodel: entity with up to 20-30 attributes
User searches in data (about 50k datasets max.)
Search takes place mostly in attribute values, not looking for any keys here, but searching for text in values
Writing the data is nothing I see as critical, it happens not very often and with small amounts of data. The text search in the attributes has to be blazingly fast, a user would expect almost instant results. This is absolutely critical.
I would rather go with Core Data, but is this a scenario CD can handle?
Thanks
-Fish

Core Data can handle this scenario. But because you're looking for blazingly fast full text search, you'll have to do some extra work. Session 211 of WWDC 2013 goes into depth about how to do this (slides 117-131). You'll probably want to have a separate Entity with text search tokens: all of the findable words in your dataset.
Although one of the FTS extensions is available in Apple's deployment of SQLite, it's not exposed in Core Data.

IndexedDB Access Speed and Efficiency

I'm developing an RPG in Dart, and I'm going to use IndexedDB for data persistence.
I will have two databases: one for read-only access and one for read-write access where save games will be stored. I was just wondering if I should read required data directly from the database or cache it in Maps. I could potentially have several hundred records that need to be pulled from the read-only database (enemies, game maps etc.) and I though pulling everything from the database may be less efficient than using Dart's Maps.
Oh, also each database will be stored in a map. Object Stores will be nested maps inside that map.
Should I read directly from the database, or should I put everything into a Map and read from there?
EDIT: Forgot to mention, the read-only database will be initialised with data from a JSON file located on the user's machine, not through AJAX.

I am confident that hundreds of records will present you no issue in IndexedDB. IDB was designed with that kind of scale in mind, and its async APIs -- while vexing for novices -- make sure your app stays responsive by design.
I am working on a demo designed to push IDB further than it should go, and have some easy-to-reach statistics for you. These are gets on a single index in a single store on a database.
Gets are blazing fast in IndexedDB. The issue with IDB at scale is typically writes.
One thousand success callbacks, one complete callback, were sub-second:
Ten thousand success callbacks, one complete callback, was about 5 seconds:
More than fifty thousand success callbacks fired in less than a minute:
Writes are much slower - bursty at first, but then slow after minutes and dog slow after hours. That's with any schema, but you'd likely have multiple indexes on location (both latitude and longitude at least, I imagine) so your writes will be especially slow (more indexes, more work to do to main those in inserts and updates).
Layout for the stats above (just as important as the stats themselves, make sure to design your schema according to how you need to access it):

I would go with direct database access and then monitor the performance and then optimize where noteable gains are to be expected. Premature optimization is seldom a good idea.

How to sort/order data?

I've already experiences with MongoDB, CouchDB, Redis, Tokyo Cabinet, and other NoSQL Databases. Recently I stumbled upon Riak and it looks very interesting to me. To getting started with it, I decided to write a small Twitter clone, the "hello world" in the NoSQL World. To get a fully working clone, it's necessary to order the tweets chronologically. After reading the Riak docs I discovered that Map-Reduce is the right tool for this job. In my development-environment it works quite well, but how's the performance in production, with hundreds of parallel queries? Are there other, maybe faster, methods for sorting data, or is it possible to store data in an ordered form (like Cassandra)?
I think I've found another solution to this problem - a simple linked list. So one possible implementation could be, that every user gets his/her own "timeline bucket", where links to the tweets-data itself gets stored (tweets gets stored separately in the "tweets" bucket). As you would know, this timeline-bucket must contain a key named "first", which links to the latest timeline-object and is the starting point of the list. To insert a new tweet in the timeline, just insert a new item in the timeline bucket, set the "next"-link of this new item to the "first"-item, after that, make the new item to "first".
In short: Insert an item as you would do in a linked list...
As with Twitter, the personal timeline just holds 20 tweets shown to the user. To receive the last 20 tweets, there are only 2 queries necessary. To speed things up, the first query uses the link-walking ability of Riak to get the latest 20 objects, tagged by "next". Finally, the second, and last query uses the keys computed by the first query to receive the tweets itself (using map/reduce).
To remove the tweets of users you've just unfollowed, I would use the secondary index ability of Riak 1.0 to receive the related timeline-objects/tweets.

It is not possible to store data in an ordered form in Riak without resorting to re-writing portions of the Riak core. Data is stored, roughly, in bucket + key order. The actual order depends on the backend storage mechanism that you're using for Riak.
Riak 1.0 has some features that might help you, too. There's support for secondary indexes as well as improvements to Map Reduce operations - in particular, they perform much better in highly concurrent scenarios.
Alexander Siculars wrote an article about Pagination with Riak. It outlines the problem pretty well. Yammer also make extensive use of Riak and two of their engineers put together a presentation about Riak at Yammer. It doesn't go into a lot of implementation details, but you can learn a lot about how they have designed their solution.
Combining secondary index queries and Map Reduce makes it possible to solve your problem very easily.

As Jeremiah says it's not possible to store the data in sorted order, but you can still make it return sorted results by using secondary indexes and map/reduce. The problem, as described, is that you can't efficiently limit the query in a sorted way.
Here is an example using range query to list all keys and then sorting them using the built in functions in *riak_kv_mapreduce*::
{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087),
riakc_pb_socket:mapred(Pid
, {index, colonel_riak:bucket(context), <<"$key">>, <<0>>, <<255>>}
, [{reduce, {modfun, riak_kv_mapreduce, reduce_sort}, none, true}])
You can use functions in the lists module in erlang or use the native javascript sort function. Order by can be achieved by lists:reverse/1 in erlang.

Best strategy for retrieving large dynamically-specified tables on an ASP.NET page

Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?

Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu

As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.

I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.

Best-performing method for associating arbitrary key/value pairs with a table row in a Postgres DB?

I have an otherwise perfectly relational data schema in place for my Postgres 8.4 DB, but I need the ability to associate arbitrary key/value pairs with several of my tables, with the assigned keys varying by row. Key/value pairs are user-generated, so I have no way of predicting them ahead of time or wrangling orderly schema changes.
I have the following requirements:
Key/value pairs will be read often, written occasionally. Reads must be reasonably fast.
No (present) need to query off of the keys or values. (But it might come in handy some day.)
I see the following possible solutions:
The Entity-Attribute-Value pattern/antipattern. Annoying, but the annoyance would be generally offset by my ORM.
Storing key/value pairs as serialized JSON data on a text column. A simple solution, and again the ORM comes in handy, but I can kiss my future self's need for queries good-bye.
Storing key/value pairs in some other NoSQL db--probably a key/value or document store. ORM is no help here. I'll have to manage the separate queries (and looming data integrity issues?) myself.
I'm concerned about query performance, as I hope to have a lot of these some day. I'm also concerned about programmer performance, as I have to build, maintain, and use the darned thing. Is there an obvious best approach here? Or something I've missed?

That's precisely what the hstore datatype is for in PostgreSQL.
http://www.postgresql.org/docs/current/static/hstore.html
It's really fast (you can index it) and quite easy to handle. The only drawback is that you can only store character data, but you'd have that problem with the other solutions as well.
Indexes support "exists" operator, so you can query quite quickly for rows where a certain key is present, or for rows where a specific attribute has a specific value.
And with 9.0 it got even better because some size restrictions were lifted.

hstore is generally good solution for that, but personally I prefer to use plain key:value tables. One table with definitions, other table with values and relation to bind values to definition, and relation to bind values to particular record in other table.
Why I'm against hstore? Because it's like a registry pattern. Often mentioned as example of anti pattern. You can put anything there, it's hard to easy validate if it's still needed, when loading a whole row (in ORM especially), the whole hstore is loaded which can have much junk and very little sense. Not mentioning that there is need to convert hstore data type into your language type and convert back again when saved. So you get some overhead of type conversion.
So actually I'm trying to convert all hstores in company I'm working for into simple key:value tables. It's not that hard task though, because structures kept here in hstore are huge (or at least big), and reading/writing an object crates huge overhead of function calls. Thus making a simple task like that "select * from base_product where id = 1;" is making a server sweat and hits performance badly. Want to point that performance issue is not because db, but because python has to convert several times results received from postgres. While key:value is not requiring such conversion.

As you do not control data then do not try to overcomplicate this.
create table sometable_attributes (
sometable_id int not null references sometable(sometable_id),
attribute_key varchar(50) not null check (length(attribute_key>0)),
attribute_value varchar(5000) not null,
primary_key(sometable_id, attribute_key)
);
This is like EAV, but without attribute_keys table, which has no added value if you do not control what will be there.
For speed you should periodically do "cluster sometable_attributes using sometable_attributes_idx", so all attributes for one row will be physically close.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio