In my app i have to store some data. I'm thinking of XML instead of database. But little confused that which is faster.The data contains some URLs and some strings.
Please let me know xml or database is better?
It depends on what kind of app you are trying to develop.
Like a weather forecast app , you just need to save several provinces/cities info .
I think xml is better . Because it is more easy to implement and maintain.
And Like a diary app , the data increase very fast. So DB is more better , because the large xml file would affect the performance.
I thinks these kinds of questions are more discussive and most likely to be voted for closing.
Nevertheless, the performance depends on the size of the stored data.
While an XML file is small, it will generally perform better then the DB (considering an overhead you will need to go through while deploying it, etc.)
But when you need to store a lot of structured data - DB will after all will the race.
And since I think that the phone is not a place for an RDBMS engine, I go with XML storage on WP7 for now.
One of the things I've experience with WP7 and the built in database is that there's a bit more upfront performance cost to using the database engine than there is with straight Isolated Storage and XML. It was enough of a performance hit during application startup that it was apparent to the user that there was a delay in populating their data.
I would say that for small amounts of data where you just need to read and display, XML is probably your best bet, but for data where you might have to do a lot of aggregating and grouping, it will probably wind up being easier to do with SQL, so you'll need to measure the trade-offs between performance and ease-of-coding/maintenance before you make your decision.
Related
Our application (java,spring, hibernate) uses postgress to store data.
We are looking to add an analysis engine to the application. I want to explore using a nosql db to run the analysis on. This is an attempt at learning the nosql a bit also to free the main application activity from performance penalty (as much as possible).
So, I want the data changes to also synch to the nosql db (in addition to postgres). Any synch mechanism will affect the performance of the main data/transaction activity.
Is it a good idea to push the data changes to a message bus and free the main transaction as early as possible ? Can anyone point me to frameworks/technologies/ideas that address this issue of same data going to two different data stores.
The simplest solution would be sending data to a Postgres read replica and running your analytics queries on that. The performance impact is minimal and this would save a lot of time compared to alternative approaches.
Unless you really know what you are doing, I would avoid NoSQL for this kind of application. If your dataset is too big for a Postgres read replica, you might want to use Redshift, which is a columnar datastore that is optimized for types of analytics queries typically performed.
I was just thinking what is the best way to keep images in IPhone/iPad (XCODE) application if I'm getting them from internet dynamically. My main concern is if I'm storing it in my database as Binary data, will it decrease my efficiency when creating the queries to database?
In that case is it better to store them in Application's folder?
Thanks for responds.
Apple dev forums has some good discussion on this. A good post can be found here. General guideline from the post: less than 16kb data blob ok, 100k ok as well, approaching 1MB and it is better to store outside of Core Data or any database.
In terms of fetching performance, it will boil down to how you have normalized your data model.
Anyone an idea?
The issue is: I am writing a high performance application. It has a SQL database which I use for persistence. In memory objects get updated, then the changes queued for a disc write (which is pretty much always an insert in a versioned table). The small time risk is given as accepted - in case of a crash, program code will resynclocal state with external systems.
Now, quite often I need to run lookups on certain values, and it would be nice to have standard interface. Basically a bag of objects, but with the ability to run queries efficiently against an in memory index. For example I have a table of "instruments" which all have a unique code, and I need to look up this code.... about 30.000 times per second as I get updates for every instrument.
Anyone an idea for a decent high performance library for this?
You should be able to use an in-memory SQLite database (:memory) with System.Data.SQLite.
I've been searching for a document-oriented DB that for a Windows desktop program. MongoDB seems to be the best one so far, because it's smaller (11MB) and simpler when compared to CoachDB (which is another option but it seems to be more complex and the download size is almost 50MB), but unfortunately, on 32-bit Windows the database size limit in MongoDB is 2GB, and they don't intend to fix this limit anytime.
Do you have any recommendation? Requirements:
Open source;
schema-less, in BSON/JSON format;
Easy to deploy to a windows machine.
Many thanks!
I'm just curious.. Why would you need a non-relational database for a desktop application. I mean, these things are designed for high-availability clusters and a really large amount of data, both of which are irrelevant for desktop apps where you would usually have just one user at a time and not so large dataset.
What I would use if I were you is an embedded database like HSQLDB or SQLite.
Now, if you want make it schema-less for simplicity, well just create your tables only with columns id long and data varchar
And then serialize/deserialize your objects to and from JSON yourself when accessing the data.
You can see a really easy way to do the JSON stuff here:
JSON Serializer for arbitrary HashMaps in Voldemort
Note: The question on link above is Voldemort-specific, but the answer I received isn't and could be applied here as well (assuming you are using Java, if not there has to be an easy way to do so in your language, too).
I need to store large amount of small data objects (millions of rows per month). Once they're saved they wont change. I need to :
store them securely
use them to analysis (mostly time-oriented)
retrieve some raw data occasionally
It would be nice if it could be used with JasperReports or BIRT
My first shot was Infobright Community - just a column-oriented, read-only storing mechanism for MySQL
On the other hand, people says that NoSQL approach could be better. Hadoop+Hive looks promissing, but the documentation looks poor and the version number is less than 1.0 .
I heard about Hypertable, Pentaho, MongoDB ....
Do you have any recommendations ?
(Yes, I found some topics here, but it was year or two ago)
Edit:
Other solutions : MonetDB, InfiniDB, LucidDB - what do you think?
Am having the same problem here and made researches; two types of storages for BI :
column oriented. Free and known : monetDB, LucidDb, Infobright. InfiniDB
Distributed : hTable, Cassandra (also column oriented theoretically)
Document oriented / MongoDb, CouchDB
The answer depends on what you really need :
If your millions of row are loaded at once (nighly batch or so), InfiniDB or other column oriented DB are the best; They have great performance and are "BI oriented". http://www.d1solutions.ch/papers/d1_2010_hauenstein_real_life_performance_database.pdf
And they won't require a setup of "nodes", "sharding" and other stuff that comes with distributed/"NoSQL" DBs.
http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
If the rows are added in real time.. then column oriented DB are bad. You can either choose two have two separate DB (that's my choice : one noSQL for real feeding of the stats by the front, and real time stats. The other DB column-oriented for BI). Or turn towards something that mixes column oriented (for out requests) and distribution (for writes) / like Cassandra.
Document oriented DBs are not suited for BI, they are more useful for CRM/CMS issues where you need frequent access to a particular row
As for the exact choice inside a category, I'm still undecided. Cassandra in distributed, and Monet or InfiniDB for CODB, are leaders. Monet is reported to have problem loading very big tables because it runs indexes in memory.
You could also consider GridSQL. Even for a single server, you can create multiple logical "nodes" to utilize multiple cores when processing queries.
GridSQL uses PostgreSQL, so you can also take advantage of partitioning tables into subtables to evaluate queries faster. You mentioned the data is time-oriented, so that would be a good candidate for creating subtables.
If you're looking for compatibility with reporting tools, something based on MySQL may be your best choice. As for what will work for you, Infobright may work. There are several other solutions as well, however you may want also to look at plain-old MySQL and the Archive table. Each record is compressed and stored and, IIRC, it's designed for your type of workload, however I think Infobright is supposed to get better compression. I haven't really used either, so I'm not sure which will work best for you.
As for the key-value stores (E.g. NoSQL), yes, they can work as well and there are plenty of alternatives out there. I know CouchDB has "views", but I haven't had the opportunity to use any, so I don't know how well any of them work.
My only concern with your data set is that since you mentioned time, you may want to ensure that whatever solution you use will allow you to archive data past a certain time. It's a common data warehouse practice to only keep N months of data online and archive the rest. This is where partitioning, as implemented in an RDBMS, comes in very useful.