Data structure functioning like Database in C or C++ - data-structures

Is there a data structure which gives you functions of a database (like insert, update, delete etc)? For example:
create a struct like the database table
store data on it and query on it
selectively delete it
I know that with a hashtable you can do this (ex: uthash library). But as far as I know updating one column element only is not easy in a hash table.

Look at sqlite. Rather than a relational database system, it is essentially a connectionless, file-based database library that supports SQL. You link your program against it and it provides functions to perform SQL queries over data files.

Look At NoSQL itis The RMDBS used By FaceBook

Use C structs to represent rows of data and then trees (or maybe hashes) for indexes. There are a lot of little problems you will need to solve, specially in order to make all the operations efficient, but this forms the basis for an in-memory table.
For simple things, a tree structure may be enough.

Related

Is it OK to have multiple merge steps in an Excel Power query?

I have data from multiple sources - a combination of Excel (table and non table), csv and, sometimes, even a tsv.
I create queries for each data source and then I am bringing them together one step at a time or, actually, it's two steps: merge and then expand to bring in the fields I want for each data source.
This doesn't feel very efficient and I think that maybe I should be just joining everything together in the Data Model. The problem when I did that was that I couldn't then find a way to write a single query to access all the different fields spread across the different data sources.
If it were Access, I'd have no trouble creating a single query one I'd created all my relationships between my tables.
I feel as though I'm missing something: How can I build a single query out of the data model?
Hoping my question is clear. It feels like something that should be easy to do but I can't home in on it with a Google search.
It is never a good idea to push the heavy lifting downstream in Power Query. If you can, work with database views, not full tables, use a modular approach (several smaller queries that you then connect in the data model), filter early, remove unneeded columns etc.
The more work that has to be performed on data you don't really need, the slower the query will be. Please take a look at this article and this one, the latter one having a comprehensive list for Best Practices (you can also just do a search for that term, there are plenty).
In terms of creating a query from the data model, conceptually that makes little sense, as you could conceivably create circular references galore.

Database Schema vs. Data Structure

What's the conceptual difference between a database schema and any data structure, in general? Don't both convey the organisation of data for efficiency? Or am I mixing two completely different things?
Database schema is the logical view of the entire DB.
Data structures are the specific formats that are used to store data (File, array, trees, etc)
Another way to think of it is that a DB schema will contain various data structures.

Best-performing method for associating arbitrary key/value pairs with a table row in a Postgres DB?

I have an otherwise perfectly relational data schema in place for my Postgres 8.4 DB, but I need the ability to associate arbitrary key/value pairs with several of my tables, with the assigned keys varying by row. Key/value pairs are user-generated, so I have no way of predicting them ahead of time or wrangling orderly schema changes.
I have the following requirements:
Key/value pairs will be read often, written occasionally. Reads must be reasonably fast.
No (present) need to query off of the keys or values. (But it might come in handy some day.)
I see the following possible solutions:
The Entity-Attribute-Value pattern/antipattern. Annoying, but the annoyance would be generally offset by my ORM.
Storing key/value pairs as serialized JSON data on a text column. A simple solution, and again the ORM comes in handy, but I can kiss my future self's need for queries good-bye.
Storing key/value pairs in some other NoSQL db--probably a key/value or document store. ORM is no help here. I'll have to manage the separate queries (and looming data integrity issues?) myself.
I'm concerned about query performance, as I hope to have a lot of these some day. I'm also concerned about programmer performance, as I have to build, maintain, and use the darned thing. Is there an obvious best approach here? Or something I've missed?
That's precisely what the hstore datatype is for in PostgreSQL.
http://www.postgresql.org/docs/current/static/hstore.html
It's really fast (you can index it) and quite easy to handle. The only drawback is that you can only store character data, but you'd have that problem with the other solutions as well.
Indexes support "exists" operator, so you can query quite quickly for rows where a certain key is present, or for rows where a specific attribute has a specific value.
And with 9.0 it got even better because some size restrictions were lifted.
hstore is generally good solution for that, but personally I prefer to use plain key:value tables. One table with definitions, other table with values and relation to bind values to definition, and relation to bind values to particular record in other table.
Why I'm against hstore? Because it's like a registry pattern. Often mentioned as example of anti pattern. You can put anything there, it's hard to easy validate if it's still needed, when loading a whole row (in ORM especially), the whole hstore is loaded which can have much junk and very little sense. Not mentioning that there is need to convert hstore data type into your language type and convert back again when saved. So you get some overhead of type conversion.
So actually I'm trying to convert all hstores in company I'm working for into simple key:value tables. It's not that hard task though, because structures kept here in hstore are huge (or at least big), and reading/writing an object crates huge overhead of function calls. Thus making a simple task like that "select * from base_product where id = 1;" is making a server sweat and hits performance badly. Want to point that performance issue is not because db, but because python has to convert several times results received from postgres. While key:value is not requiring such conversion.
As you do not control data then do not try to overcomplicate this.
create table sometable_attributes (
sometable_id int not null references sometable(sometable_id),
attribute_key varchar(50) not null check (length(attribute_key>0)),
attribute_value varchar(5000) not null,
primary_key(sometable_id, attribute_key)
);
This is like EAV, but without attribute_keys table, which has no added value if you do not control what will be there.
For speed you should periodically do "cluster sometable_attributes using sometable_attributes_idx", so all attributes for one row will be physically close.

Recommended data structure for a Data Access layer

I am building a DataAccess layer to a DB, what data structure is recommended to use to pass and return a collection?
I use a list of data access objects mapped to the db tables.
I'm not sure what language you're using, but in general, there are tradeoffs of simplicity vs extensibility.
If you return the DataSet directly, you have now coupled yourself to database specific classes. This leaves little room for extension - what if you allow access to files or to other types of data sources? But, it is also very simple. This is the recordset pattern and C#/VB provide a lot of built-in support for this. The GUI layer can access the recordset and easily manipulate the data. This works well for simple applications.
On the other hand, you can wrap the datasets in a custom object, and provide gateway methods (see the Gateway pattern http://martinfowler.com/eaaCatalog/gateway.html). This method is more complex, but provides a lot more extensibility. In a larger application when you need to separate the the business logic, data logic, and GUI logic, this is a more robust way to go.
For larger enterprise applications, you can look into using Object Relational Mapping tools (ORM). They help to automatically map java objects to database tables. They hide a lot of the painful SQL details. Frameworks such as Spring provide excellent support for ORMs.
I tend to use arrays of objects, so that I can disconnect the DAO from the business logic.
You can store the data in the DAO as a dataset, for example, and give them an easy way to add to the database before doing an update, so they can pass in information to do modification operations, and then when they want to commit the changes they can do it in one shot.
I prefer that the user can't add/modify the structure themselves, as it makes it harder to determine what must be changed in the database.
By initially returning an array they can then display what is in the database.
Then, as the presentation layer makes changes, the DAO can be updated by the controller. By having a loose coupling the entire system becomes more flexible, as you can change the DAO from a dataset to something else, and the rest of the application doesn't care.
There are two choices that are the most generic.
The first way to look at a ResultSet is as a List of Maps, where each Map represents a row in the ResultSet. The keys are the columns listed in the FROM clause; the values are the database values.
The second way to look at a ResultSet is as a Map of Lists, where each List represents a column in the ResultSet. The Map keys are the columns listed in the FROM clause; the values are the List of database values.
If you don't want to do full-blown ORM, these can carry you a long way.

Serializing objects as BLOBs in Oracle

I have a HashMap that I am serializing and deserializing to an Oracle db, in a BLOB data type field.
I want to perform a query, using this field.
Example, the application will make a new HashMap, and have some key-value pairs.
I want to query the db to see if a HashMap with this data already exists in the db.
I do not know how to do this, it seems strange if i have to go to every record in the db, deserialize it, then compare, Does SQL handle comparing BLOBs, so i could have...select * from PROCESSES where foo = ?....and foo is a BLOB type, and the ? is an instance of the new HashMap?
Thanks
Here's an article for you to read: Pounding a Nail: Old Shoe or Glass Bottle
I haven't heard much about your application's underlying architecture, but I can tell you immediately that there is never a reason why you should need to use a HashMap in this way. Its a bad technique, plain and simple.
The answer to your question is not a clever Oracle query, its a redesign of your application's architecture.
For a start, you should not serialize a HashMap to a database (more generally, you shouldn't serialize anything that you need to query against). Its much easier to create a table to represent hashmaps in your application as follows:
HashMaps
--------
MapID (pk int)
Key (pk varchar)
Value
Once you have the content of your hashmaps in your database, its trivial to query the database to see if the data already exists or produce any other kind of aggregate data:
SELECT Count(*) FROM HashMaps where MapID = ? AND Key = ?
Storing serialized objects in a database is almost always a bad idea, unless you know ahead of time that you don't need to query against them.
How are you serializing the HashMap? There are lots of ways to serialize data and an object like a HashMap. Comparing two maps, especially in serialized form, is not trivial, unless your serialization technique guarantees that two equivalent maps always serialize the same way.
One way you can get around this mess is to use XML serialization for some objects that rarely need to be queried. For example, where I work we have a log table where a certain log message is stored as an XML file in a CLOB field. This xml data represents a serialized Java object. Normally we query against other columns in the record, and only read/write the blob in single atomic steps. However once or twice it was necessary to do some deep inspection of the blob, and using XML allowed this to happen (Oracle supports querying XML in varchar2 or CLOB fields as well as native XML objects). It's a useful technique if used sparingly.
Look into dbms_crypto.hash to make a hash of your blob. Store the hash alongside the blob and it will give you something to narrow down the search to something manageable. I'm not recommending storing the hash map, but this is a general technique for searching for an exact match between blobs.
See also SQL - How do you compare a CLOB
i cannot disagree, but i'm being told to do so.
i appreciate your solution, and that's sort of what i had previously.
thanks
I haven't had the need to compare BLOBs, but it appears that it's supported through the dbms_lob package.
See dbms_lob.compare() at http://www.psoug.org/reference/dbms_lob.html
Oracle can have new data types defined with Java (or .net on windows) you could define a data type for your serialized object and define how queries work on it.
Good lack if you try this...
If you serialize your data to xml, and store the data in an xml you can then use xpaths within your sql query. (Sorry as I am more of a SqlServer person, I don’t know the details of how to do this in Oracle.)
If you EVERY need to update only part of the serialized data don’t do this.
Likewise if any of the data is pointed to by other data or points to other data don’t do this.

Resources