Google Datastore bulk retrieve data using urlsafe - performance

Is there a way in Google DataStore to bulk fetch entities using their urlsafe key values?
I know about ndb.get_multi([list]) which takes a list of keys and retrieves the entities in bulk which is more efficient. But in our case we have a webpage with a few hundred entities, embedded with the entities urlsafe key values. At first we were only doing operations on single entities, so we were able to use the urlsafe value to retrieve the entity and do the operation without much trouble. Now, we need to change multiple entities at once, and looping on them one by one does not sound like an efficient approach. Any thoughts?
Is there any advantage of using the entities key ID directly (versus the key urlsafe value)? get_by_id() in the documentation does not imply being able to get entities in bulk (takes only one ID).
If the only way to retrieve entities in bulk is using the entities key, yet, exposing the key on the webpage is not a recommended approach, does that mean we're stuck when it comes to bulk operations on a page with a few hundred entities?

The keys and the urlsafe strings are exactly in a 1:1 relationship. When you have one you can obtain the other:
urlsafe_string = entity_key.urlsafe()
entity_key = ndb.Key(urlsafe=urlsafe_string)
So if you have a bunch of urlsafe strings you can obtain the corresponding keys and then use ndb.get_multi() with those keys to get all entities, modify them as needed then use ndb.put_multi() to save them back into the datastore.
As for using IDs - that only works (in a convenient manner) if you do not use entity ancestry. Otherwise to obtain a key you need both the ID and the entity's parent key (or its entire ancestry) - it's not convenient, better use urlsafe strings in this case.
But for entities with no parents (aka root entities in the respective entity groups) the entity keys and their IDs are always in a 1:1 relationship and again you can obtain one if you have the other:
entity_key_id = entity_key.id()
entity_key = ndb.Key(MyModel, entity_key_id)
So again from a bunch of IDs you can obtain keys to use with ndb.get_multi() and/or ndb.put_multi().
Using IDs can have a cosmetic advantage over the urlsafe strings - typically shorter and easier on the eyes when they apear in URLs or in the page HTML code :)
Another advantage of using IDs is the ability to split large entities or to deal in a simpler manner with entities in a 1:1 relationship. See re-using an entity's ID for other entities of different kinds - sane idea?
For more info on keys and IDs see Creating and Using Entity Keys.

Related

Supporting ABP's IEntityCache for entities with multi-column primary keys

The current implementation of IEntityCache supports a 1-column primary key type (default int, can be sent as a different type, e.g. long, string).
Currently there are 2 approaches (that I can see) to enable caching for tables/entities whose primary key consists of 2 (or more) columns, such as 2 string columns:
Modify the schema of the table and simply add an integer Id column (auto-increment) whose sole purpose is to enable the IEntityCache to do its magic. The application logic querying the entity would remain untouched as it would still use the 2-column unique index. Modifying the schema however may not be an option, and another problem is that the upper application layers (Domain, AppServices) are not aware of the Id they need to query, instead they are aware of the entity's 2 keys.
Rewrite a new implementation of IEntityCache to support multiple columns (e.g. IEntityCache<string, string>?). Not sure if this is even feasible given the cachemanager's limitations?
Here's the question: Would the inner workings of the ICacheManager - itself being an abstraction over different cache providers (e.g. IMemoryCache, Redis etc...) - prevent a multi-column implementation?

Elasticsearch the best way to design multiple one to many and many to many

I have two scenarios that I want to support but I don’t know the best way to design relations in the elasticsearch. I read the entire elasticsearch documentation but I couldn’t find the best way to design types for my scenarios.
Multiple one to many.
Let’s assume that I have the following tables in my relational database that I want to transfer to the elasticsearch:
Transaction table Id User1Id User2Id ….
User table Id Name
Transaction contains two references to User. As far as I know I cannot use the parent->child relation specifying two parents? I need to store transaction and user in separate types because they can be changed separately. I need to be able to search transaction through user details and return users connected with transactions. Any idea how to design such structure in the elastic search?
Many to many
Let’s assume that we have the following tables:
Order Id …
OrderLine OrderId UserId Amount …
User Id Name
Order line is always saved with the order so I thought that I can store order with order lines as a nested object relation but the user must be in the separate table. Is there any way how can I connected multiple users from order line with user type? I assume that I can use application side join but I need to retrieve order and order line always together and be able to search order by user data.
I can use grandparent and grandchildren relations but then I need to make joins in the application. Any idea how to design it in the best way?

Hbase Schema Nested Entity

Does anyone have an example on how to create an Hbase table with a nested entity?
Example
UserName (string)
SSN (string)
+ Books (collection)
The books collection would look like this for example
Books
isbn
title
etc...
I cannot find a single example are how to create a table like this. I see many people talk about it, and how it is a best practice in certain scenarios, but I cannot find an example on how to do it anywhere.
Thanks...
Nested entities isn't an official feature of HBase; it's just a way some people talk about one usage pattern. In this pattern, you use the fact that "columns" in HBase are really just a big map (a bunch of key/value pairs) to let you to model a dimension of cardinality inside the row by adding one column per "row" of the nested entity.
Schema-wise, you don't need to do much on the table itself; when you create a table in HBase, you just specify the name & column family (and associated properties), like so (in hbase shell):
hbase:001:0> create 'UserWithBooks', 'cf1'
Then, it's up to you what you put in it, column wise. You could insert values like:
hbase:002:0> put 'UsersWithBooks', 'userid1234', 'cf1:username', 'my username'
hbase:003:0> put 'UsersWithBooks', 'userid1234', 'cf1:ssn', 'my ssn'
hbase:004:0> put 'UsersWithBooks', 'userid1234', 'cf1:book_id_12345', '<isbn>12345</isbn><title>mary had a little lamb</title>'
hbase:005:0> put 'UsersWithBooks', 'userid1234', 'cf1:book_id_67890', '<isbn>67890</isbn><title>the importance of being earnest</title>'
The column names are totally up to you, and there's no limit to how many you can have (within reason: see the HBase Reference Guide for more on this). Of course, doing this, you have to do your own legwork re: putting in and getting out values (and you'd probably do it with the java client in a more sophisticated way than I'm doing with these shell commands, they're just for explanatory purposes). And while you can efficiently scan just a portion of the columns in a table by key (using a column pagination filter), you can't do much with the contents of the cells other than pull them and parse them elsewhere.
Why would you do this? Probably just if you wanted atomicity around all the nested rows for one parent row. It's not very common, your best bet is probably to start by modeling them as separate tables, and only move to this approach if you really understand the tradeoffs.
There are some limitations to this. First, this technique only works to
one level deep: your nested entities can’t themselves have nested entities. You can still
have multiple different nested child entities in a single parent, and the column qualifier is their identifying attributes.
Second, it’s not as efficient to access an individual value stored as a nested column
qualifier inside a row, as compared to accessing a row in another table, as you learned
earlier in the chapter.
Still, there are compelling cases where this kind of schema design is appropriate. If
the only way you get at the child entities is via the parent entity, and you’d like to have transactional protection around all children of a parent, this can be the right way to go.

Spring hibernate handling big html form

I am using Spring + Hibernate, and I will have a HTML from that has like 100+ fields and I must store all these values to database in a single table.
They are all used in one big massive calculation.
How should I handle this, I thought about creating an Entity with 100 fields and setters, getters, but is there a nicer solution for it?
EDIT:
Everytime when someone submits form, a new row will be added, so eventually there will be tens of thousands of rows.
I believe its not about an HTML but about the data modeling.
Think about your data, who are the consumers of it, how and in which business flows you're going to query the data.
In general an entity with 100 fields is not a good idea because it should be mapped to one single table with 100 columns. Its just not maintainable.
Maybe all the data should be normalized and you can store pieces of it in different tables in db with foreign keys?
Hope this helps or at least will give you some direction to think about
I think you could use a Map in this case, because:
You only want to store the fields as key-value elements.
It is more flexible to add/remove fields in the future.
So, instead of having a table with 100 fields you will end with a table with 2 fields (3 if you want to include the form identifier or something like that) and 100 rows.
If many of the form fields are empty (sparse data) you could also save some storage space (it depends on the database you are using).

How would you represent a relational entity as a single unit of retrievable data in BerkeleyDB?

BerkeleyDB is the database equivalent of a Ruby hashtable or a Python dictionary except that you can store multiple values for a single key.
My question is: If you wanted to store a complex datatype in a storage structure like this, how could you go about it?
In a normal relational table, if you want to represent a Person, you create a table with columns of particular data types:
Person
-id:integer
-name:string
-age:integer
-gender:string
When it's written out like this, you can see how a person might be understood as a set of key/value pairs:
id=1
name="john";
age=18;
gender="male";
Decomposing the person into individual key/value pairs (name="john") is easy.
But in order to use the BerkeleyDB format to represent a Person, you would need some way of recomposing the person from its constituent key/value pairs.
For that, you would need to impose some artificial encapsulating structure to hold a Person together as a unit.
Is there a way to do this?
EDIT: As Robert Harvey's answer indicates, there is an entity persistence feature in the Java edition of BerkeleyDB. Unfortunately because I will be connnecting to BerkeleyDB from a Ruby application using Moneta, I will be using the standard edition which I believe requires me to create a custom solution in the absence of this support.
You can always serialize (called marshalling in Ruby) the data as a string and store that instead. The serialization can be done in several ways.
With YAML (advantage: human readable, multiple implementation in different languages):
require 'yaml'; str = person.to_yaml
With Marshalling (Ruby-only, even Ruby version specific):
Marshal.dump(person)
This will only work if class of person is an entity which does not refer to other objects you want not included. For example, references to other persons would need to be taken care of differently.
If your datastore is able to do so (and BerkeleyDB does AFAICT) I'd just store a representation of the object attributes keyed with the object Id, without splitting the object attributes in different keys.
E.g. given:
Person
-id:1
-name:"john"
-age:18
-gender:"male"
I'd store the yaml representation in BerkleyDB with the key person_1:
--- !ruby/object:Person
attributes:
id: 1
name: john
age: 18
gender: male
Instead if you need to store each attribute as a key in the datastore (why?) you should make sure the key for the person record is somewhat linked to its identifying attribute, that's the id for an ActiveRecord.
In this case you'd store these keys in BerkleyDB:
person_1_name="john";
person_1_age=18;
person_1_gender="male";
Have a look at this documentation for an Annotation Type Entity:
http://www.oracle.com/technology/documentation/berkeley-db/je/java/com/sleepycat/persist/model/Entity.html

Resources