Sorting dynamic localized data

Sorting dynamic localized data - sorting

I've been working on a Java EE 6 project (EJB 3.1, JSF 2, JPA 2) for some time now and I cannot figure out a good way to sort dynamic localized data. With dynamic localized data, I mean a user created object with a name in one or several languages. To be more concrete, imagine a database table called X containing just an id (There's more columns ofc, this is just to simplify it), another table containing available languages, and one table binding the two together by putting a name on X for every language.
[x]
id integer PRIMARY KEY
[lang]
id integer PRIMARY KEY
[x_names]
x_id integer FOREIGN KEY REFERENCES x.id
lang_id integer FOREIGN KEY REFERENCES lang.id
name
PRIMARY KEY (x_id, lang_id)
[X.java]
#Id private Integer id;
#OneToMany(mappedBy="x", cascade=CascadeType.ALL, orphanRemoval=true)
private List<XName> names;
Now i'd like to present these objects sorted by name (while preserving the MVC model) for a user for a language chosen by this user. If the x-object lacks a name in this language, the name of a default language should be shown. X-objects will always have a name in at least one language (although there is no way to specify this database wise).
Java sorting seems kind of out of the question since the x-entity should not know what language is currently used and thus compareTo cannot do its job, so it feels like sorting in the database makes most sense. My first idea looked something like this:
SELECT DISTINCT x FROM X x LEFT JOIN x.names n ORDER BY n.name
This however makes no sense since no language is specified it will sometimes sort on one language, sometimes another.
This is the best that i've been able to come up with:
SELECT DISTINCT x FROM X x LEFT JOIN x.names n LEFT JOIN n.language l ORDER BY l.id, cn.name
Which could work in theory, always selecting the names from the lowest language id, enabling me to work in a priority order that way. The problem here is that DISTINCT does not work, I get one x-object for every name in the system. Could I perhaps send down the chosen language to the model layer and use it somehow while still getting ALL x objects, not just the ones with a language available? Any ideas would be appreciated.
The logic of all this is for a view where a user can organize these x-objects and easily fill in a name at any and all languages, while still seeing x-objects even if they dont have a name in the users language so that they can add one.
JPA provider is EclipseLink.

I realize now that sorting in the controller layer makes more sense, and skip database sorting completely. I just realized that you can implement your own comparator with Collections.sort which can have access to what language is currently selected and sort after that.

Related

Supporting ABP's IEntityCache for entities with multi-column primary keys

The current implementation of IEntityCache supports a 1-column primary key type (default int, can be sent as a different type, e.g. long, string).
Currently there are 2 approaches (that I can see) to enable caching for tables/entities whose primary key consists of 2 (or more) columns, such as 2 string columns:
Modify the schema of the table and simply add an integer Id column (auto-increment) whose sole purpose is to enable the IEntityCache to do its magic. The application logic querying the entity would remain untouched as it would still use the 2-column unique index. Modifying the schema however may not be an option, and another problem is that the upper application layers (Domain, AppServices) are not aware of the Id they need to query, instead they are aware of the entity's 2 keys.
Rewrite a new implementation of IEntityCache to support multiple columns (e.g. IEntityCache<string, string>?). Not sure if this is even feasible given the cachemanager's limitations?
Here's the question: Would the inner workings of the ICacheManager - itself being an abstraction over different cache providers (e.g. IMemoryCache, Redis etc...) - prevent a multi-column implementation?

What would be the most appropriate data structure given these requirements?

We are building Search API in our company for some of our entities - events, leagues and sports each of which has name property and we have difficulties implementing business requirements.
TL;DR; What will be the data structure addressing these business requirements better than basic Red-Black tree does?
What we are the business requirements?
Data structure needs to be sorted so following requirements are easier for implementation therefore insertion should not break this property.
Data structure needs to hold information about it's entities, so node key(entity's name property) will be used for searching, but the node needs to hold all the entities with name property starting with node key value.
Data structure needs to support deletion by id. Id is also a property of all entities.
It needs to support index search (up to 3 characters) so if someone searches for "aaa" every node with key between "aaaa.." and "aaaz" should appear. (ex. query = "aaa", index = "aaa", "aaab", "aaaab", "aaaz", result should be "aaa", "aaab", "aaaab").
We need to search by localized node key.
What we have done so far?
We started our first iteration using built-in red-black tree (SortedSet in C#) and for nodes we had structure that holds the name property of the entity and all related events to that name property. And with one helper method we satisfied business requirements (1), (2) and (4).
As our second iteration we had to support deletion so we created a map(Dictionary) of entity id's to references to entity objects put into the SortedSet. We do that because our request for deletion is only by id and we cannot recreate entity from id, so at addition we need to create such map. (maybe augumentation can help?) With this we secured requirement (3).
Now we need to support (5) however, with every iteration (business requirement we receive) it is getting harder and harder to implement and I almost feel like we need to change our data structure in order to address business criteria better.
Whats the problem with the localization?
We can create new SortedSet and re-use the implementation, but this comes with huge trade off. Let me elaborate.
We have 100 of clients, each of which has like 7-8 supported languages, languages in our system are unique per client so translations for one customer does not interfere with another (if someone wants to call it Soccer rather than Football, fine let it be.), besides that we have base languages (global for every client) which are basically default settings for newly create languages, so we can safely say that very large portion of client specific language (lets say english) is the same as the base one. Having said all of that, if we want to have accurate search for each client and locale individually we need to have index for each client and locale individually which on the other hand introduces massive amounts of duplication.
What I have thought so far?
I am not an expert in data structures myself, but I really want to make this right. Of course everything is possible with enough coding and hardware, but thats not the point.
I thought about implementing some binary tree (could be AVL, Red-Black, 2-3-4 etc.) and augment it to meet the requirements better than built in SortedSet does. This will hopefully solve a lot of the issue and workarounds we had to make so far and as I said address future requirements better so implementation is faster and more accurate, however like I said I am not an expert in data structures myself and sadly I am unable to map these business requirements to some data structure for the time frame I have, so without further a due, do you guys have any suggestions?

My suggestion here would be for your primary data structure to be a dictionary, keyed by product id, and the value is the product data. That gives you very quick insertion, and removal by product id.
For searching, provide a separate data structure that contains the product names and associated product ids.
class IndexEntry
{
string ProductName;
string ProductId; // or int, if ProductId is an integer
}
Since you allow customer-specific names, you'll have to add all those customer names to this index. Not a problem, but when you remove something by ID, you'll also have to remove the associated items from the other data structure. This will require a sequential search of the name index data structure to ensure that you get all the names associated with a particular product. That could be expensive, even if you use a tree structure.
To speed things up, you could have a "deleted" flag for those index entries, and then rebuild the structure periodically to remove the deleted items. That way, a deletion just requires a sequential scan. That's less than ideal, but if insertions and deletions are infrequent, quite acceptable.
The key, though, is to make your primary data structure that holds the product information indexed by product id. You can then build secondary indexes any way you want.

Hibernate shows different ID than DB

I have three database views that are mapped in Hibernate as entities.
The entities are in a parent-child relationship (1 parent (A), 2 children(B & C)).
One of the children views (B) uses Oracle's dbms_utility.get_hash_value() to calculate its ID.
This is because it does a UNION over several tables that use different ID sequences and thus the IDs from there may not be unique.
I now have the very puzzling effect that a simple entityManager.find(B.class, id) cannot find the appropriate row.
When I look at the children through a loaded parent (A) entity, I can see that the ID shown in B is completely different from the one in the database. If I use this ID with entityManager.find(B.class, hibernateId), Hibernate finds the appropriate entity.
The database, on the other hand, only returns a value when using the ID shown in the ID column there (and not with the ID Hibernate shows).
Child entity C does not use the hash function and does not show this peculiar behaviour - which means the hash must be responsible.
Does anyone have an idea why?

We found the reason:
Child view B used all of its (content containing) columns as a string concatenation for the hash function.
This included date fields, which were not explicitly formatted when creating the string.
So, when Hibernate selected from the view, it obviously used another format than SQL Developer and thus produced completely different (but consistent) IDs.
Explicitly formatting the used date fields removed the problem.

How to Program a Spring with Hibernate web app?

I am Working on web application where i have 90 fields for a Person class which are divided in to family details,education details, personal details etc....
I want separate form for each, like for family details has-father name, mother name siblings etc... fields and so on for other
I want separate table for each detail with common reference id for all tables
My question is how many bean classes should i write? Is it with one bean class can i map from multiple forms to multiple tables?
class PersonRegister{
private Long iD;
private String emailID;
private String password;
.
.
}//for register.......
once logged in i need to maintain his/her details
Either
class person{
}
or
class PersonFamilyDetails{}
class PersonEducationDetails{}
etc
which way software developing standards specify to create?

Don't go overboard, I believe in your case single but very wide (i.e. with a lot of columns) table would be most efficient and simplest from maintenance perspective. Only thing to keep in mind is too query only for a necessary subset of columns/fields when loading lots of rows. Otherwise you'll be fetching kilobytes of unnecessary data, not needed for particular use case.
Unfortunately Hibernate doesn't have direct support for that, when designing a mapping for Person, you'll end up with huge class and even worse - Hibernate will always fetch all simple columns (and many-to-one relationships). You can however overcome this problem either by creating several views in the database containing only subset of columns or by having several Java classes mapping to the same table but only to subset of columns.
Splitting your database model into several tables is beneficial only if your schema is not normalized. E.g. when storing siblings first name and last name you may wish to have a separate Sibling table and next time some other family member is entered, you can reuse the same row. This makes database smaller and might be faster when searching by sibling.

Your question comes down to database normalization, as described in-depth by Boyce and Codd, see
http://en.wikipedia.org/wiki/Database_normalization.
The main advantage of database normalization is avoiding modification anomalies. In your case, if you got one table with for each person e.g. father-firstname and father-lastname, and you have multiple people with the same father, this data will be duplicated, and when you discover a typo in the father-lastname, you could modify it for one sibling, and not for the next.
In this simplified case, database design best practices would call for a first normalization into a separate table with father-id, father-firstname and father-lastname, and your person table having a one-to-many relation to it.
For one-to-one relations, e.g. person->personeducationdetails, there's some debate. In the original definition of 1st Normal Form, every optional field would be normalized by putting it's own table. This was later weakened by introducing 'null' in relational databases, see http://en.wikipedia.org/wiki/First_normal_form#cite_note-CoddRule-12. But still, if a whole set of columns could be null at the same time, you put them in a separate table with a one-to-one relation.
E.g. if you don't know a person's educationdetails, all of its related fields are null, so you better split them off in a separate table, and simply not have a personeducationdetails record for that person.

Implementing User Defined Fields

I am creating a laboratory database which analyzes a variety of samples from a variety of locations. Some locations want their own reference number (or other attributes) kept with the sample.
How should I represent the columns which only apply to a subset of my samples?
Option 1:
Create a separate table for each unique set of attributes?
SAMPLE_BOILER: sample_id (FK), tank_number, boiler_temp, lot_number
SAMPLE_ACID: sample_id (FK), vial_number
This option seems too tedious, especially as the system grows.
Option 1a: Class table inheritance (link): Tree with common fields in internal node/table
Option 1b: Concrete table inheritance (link): Tree with common fields in leaf node/table
Option 2: Put every attribute which applies to any sample into the SAMPLE table.
Most columns of each entry would most likely be NULL, however all of the fields are stored together.
Option 3: Create _VALUE_ tables for each Oracle data type used.
This option is far more complex. Getting all of the attributes for a sample requires accessing all of the tables below. However, the system can expand dynamically without separate tables for each new sample type.
SAMPLE:
sample_id*
sample_template_id (FK)
SAMPLE_TEMPLATE:
sample_template_id*
version *
status
date_created
name
SAMPLE_ATTR_OF
sample_template_id* (FK)
sample_attribute_id* (FK)
SAMPLE_ATTRIBUTE:
sample_attribute_id*
name
description
SAMPLE_NUMBER:
sample_id* (FK)
sample_attribute_id (FK)
value
SAMPLE_DATE:
sample_id* (FK)
sample_attribute_id (FK)
value
Option 4: (Add your own option)

To help with Googling, your third option looks a little like the Entity-Attribute-Value pattern, which has been discussed on StackOverflow before although often critically.
As others have suggested, if at all possible (eg: once the system is up and running, few new attributes will appear), you should use your relational database in a conventional manner with tables as types and columns as attributes - your option 1. The initial setup pain will be worth it later as your database gets to work the way it was designed to.
Another thing to consider: are you tied to Oracle? If not, there are non-relational databases out there like CouchDB that aren't constrained by up-front schemas in the same way as relational databases are.
Edit: you've asked about handling new attributes under option 1 (now 1a and 1b in the question)...
If option 1 is a suitable solution, there are sufficiently few new attributes that the overhead of altering the database schema to accommodate them is acceptable, so...
you'll be writing database scripts to alter tables and add columns, so the provision of a default value can be handled easily in these scripts.
Of the two 1 options (1a, 1b), my personal preference would be concrete table inheritance (1b):
It's the simplest thing that works;
It requires fewer joins for any given query;
Updates are simpler as you only write to one table (no FK relationship to maintain).
Although either of these first options is a better solution than the others, and there's nothing wrong with the class table inheritance method if that's what you'd prefer.
It all comes down to how often genuinely new attributes will appear.
If the answer is "rarely" then the occasional schema update can cope.
If the answer is "a lot" then the relational DB model (which has fixed schemas baked-in) isn't the best tool for the job, so solutions that incorporate it (entity-attribute-value, XML columns and so on) will always seem a little laboured.
Good luck, and let us know how you solve this problem - it's a common issue that people run into.

Option 1, except that it's not a separate table for each set of attributes: create a separate table for each sample source.
i.e. from your examples: samples from a boiler will have tank number, boiler temp, lot number; acid samples have vial number.
You say this is tedious; but I suggest that the more work you put into gathering and encoding the meaning of the data now will pay off huge dividends later - you'll save in the long term because your reports will be easier to write, understand and maintain. Those guys from the boiler room will ask "we need to know the total of X for tank grouped by this set of boiler temperature ranges" and you'll say "no prob, give me half an hour" because you've done the hard yards already.
Option 2 would be my fall-back option if Option 1 turns out to be overkill. You'll still want to analyse what fields are needed, what their datatypes and constraints are.
Option 4 is to use a combination of options 1 and 2. You may find some attributes are shared among a lot of sample types, and it might make sense for these attributes to live in the main sample table; whereas other attributes will be very specific to certain sample types.

You should really go with Option 1. Although it is more tedious to create, Option 2 and 3 will bite you back when trying to query you data. The queries will become more complex.
In fact, the most important part of storing the data, is querying it. You haven't mentioned how you are planning to use the data, and this is a big factor in the database design.
As far as I can see, the first option will be most easy to query. If you plan on using reporting tools or an ORM, they will prefer it as well, so you are keeping your options open.
In fact, if you find building the tables tedious, try using an ORM from the start. Good ORMs will help you with creating the tables from the get-go.

I would base your decision on the how you usually see the data. For instance, if you get 5-6 new attributes per day, you're never going to be able to keep up adding new columns. In this case you should create columns for 'standard' attributes and add a key/value layout similar to your 'Option 3'.
If you don't expect to see this, I'd go with Option 1 for now, and modify your design to 'Option 3' only if you get to the point that it is turning into too much work. It could end up that you have 25 attributes added in the first few weeks and then nothing for several months. In which case you'll be glad you didn't do the extra work.
As for Option 2, I generally advise against this as Null in a relational database means the value is 'Unknown', not that it 'doesn't apply' to a specific record. Though I have disagreed on this in the past with people I generally respect, so I wouldn't start any wars over it.

Whatever you do option 3 is horrible, every query will have join the data to create a SAMPLE.
It sounds like you have some generic SAMPLE fields which need to be join with more specific data for the type of sample. Have you considered some user_defined fields.
Example:
SAMPLE_BASE: sample_id(PK), version, status, date_create, name, userdata1, userdata2, userdata3
SAMPLE_BOILER: sample_id (FK), tank_number, boiler_temp, lot_number

This might be a dumb question but what do you need to do with the attribute values? If you only need to display the data then just store them in one field, perhaps in XML or some serialised format.
You could always use a template table to define a sample 'type' and the available fields you display for the purposes of a data entry form.
If you need to filter on them, the only efficient model is option 2. As everyone else is saying the entity-attribute-value style of option 3 is somewhat mental and no real fun to work with. I've tried it myself in the past and once implemented I wished I hadn't bothered.
Try to design your database around how your users need to interact with it (and thus how you need to query it), rather than just modelling the data.

If the set of sample attributes was relatively static then the pragmatic solution that would make your life easier in the long run would be option #2 - these are all attributes of a SAMPLE so they should all be in the same table.
Ok - you could put together a nice object hierarchy of base attributes with various extensions but it would be more trouble than it's worth. Keep it simple. You could always put together a few views of subsets of sample attributes.
I would only go for a variant of your option #3 if the list of sample attributes was very dynamic and you needed your users to be able to create their own fields.
In terms of implementing dynamic user-defined fields then you might first like to read through Tom Kyte's comments to this question. Now, Tom can be pretty insistent in his views but I take from his comments that you have to be very sure that you really need the flexibility for your users to add fields on the fly before you go about doing it. If you really need to do it, then don't create a table for each data type - that's going too far - just store everything in a varchar2 in a standard way and flag each attribute with an appropriate data type.
create table sample (
sample_id integer,
name varchar2(120 char),
constraint pk_sample primary key (sample_id)
);
create table attribute (
attribute_id integer,
name varchar2(120 char) not null,
data_type varchar2(30 char) not null,
constraint pk_attribute primary key (attribute_id)
);
create table sample_attribute (
sample_id integer,
attribute_id integer,
value varchar2(4000 char),
constraint pk_sample_attribute primary key (sample_id, attribute_id)
);
Now... that just looks evil doesn't it? Do you really want to go there?

I work on both a commercial and a home-made system where users have the ability to create their own fields/controls dynamically. This is a simplified version of how it works.
Tables:
Pages
Controls
Values
A page is just a container for one or more controls. It can be given a name.
Controls are linked to pages and represents user input controls.
A control contains what datatype it is (int, string etc) and how it should be represented to the user (textbox, dropdown, checkboxes etc).
Values are the actual data that the users have typed into the controls, a value contains one column for every datatype that it can represent (int, string, etc) and depending on the control type, the relevant column is set with the user input.
There is an additional column in Values which specifies which group the value belong to.
Each time a user fills in a form of controls and clicks save, the values typed into the controls are saved into the same group so that we know that they belong together (incremental counter).

CodeSpeaker,
I like you answer, it's pointing me in the right direction for a similar problem.
But how would you handle drop-downlist values?
I am thinking of a Lookup table of values so that many lookups link to one UserDefinedField.
But I also have another problem to add to the mix. Each field must have multiple linked languages so each value must link to the equivilant value for multiple languages.
Maybe I'm thinking too hard about this as I've got about 6 tables so far.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio