Spring hibernate handling big html form - spring

I am using Spring + Hibernate, and I will have a HTML from that has like 100+ fields and I must store all these values to database in a single table.
They are all used in one big massive calculation.
How should I handle this, I thought about creating an Entity with 100 fields and setters, getters, but is there a nicer solution for it?
EDIT:
Everytime when someone submits form, a new row will be added, so eventually there will be tens of thousands of rows.

I believe its not about an HTML but about the data modeling.
Think about your data, who are the consumers of it, how and in which business flows you're going to query the data.
In general an entity with 100 fields is not a good idea because it should be mapped to one single table with 100 columns. Its just not maintainable.
Maybe all the data should be normalized and you can store pieces of it in different tables in db with foreign keys?
Hope this helps or at least will give you some direction to think about

I think you could use a Map in this case, because:
You only want to store the fields as key-value elements.
It is more flexible to add/remove fields in the future.
So, instead of having a table with 100 fields you will end with a table with 2 fields (3 if you want to include the form identifier or something like that) and 100 rows.
If many of the form fields are empty (sparse data) you could also save some storage space (it depends on the database you are using).

Related

How to avoid circular dependency error in multiple calculated columns when deleting all data in data model?

Context:
I have a data model in Power pivot with three tables, tTasks, tCaseworks and tCaseworkStatus. I am attempting to create two calculated columns in tCaseworks which from the two data tables. All three tables are linked through the common field casework_id (see illustration below).
The data model is regularly updated with new data. The way I am doing this is as follows:
All three tables are sourced from three corresponding tables in my Excel workbook.
A VBA script deletes all records in the three Excel tables and then refreshes the data model (sidenote: because the data model demands lookup tables to not be empty the VBA code adds one row per table before refreshing).
New data is then added to the excel tables and the data model is refreshed.
This process works perfectly.
Problem:
The problem arises when I am adding calculated columns to tCaseworks and then attempting to update the data as described above. I have added two calculated columns; has_task and status_now. I am using the following DAX code:
has_task:
has_task =
IF (
CONTAINS (
RELATEDTABLE ( tTasks );
tTasks[casework_id]; tCaseworks[casework_id]
);
"Yes";
"No"
)
status_now:
status_now =
VAR TableX = RELATEDTABLE(tCaseworkStatus)
VAR ResultX = IF(
CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Completed");"Completed";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Dismissed");"Dismissed";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Begun");"Begun";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Created");"Created";
"Find no status"))))
RETURN
ResultX
Both of these calculated columns work as expected as long as I do not delete the data in the model (I do have one hickup with both columns as described in this separated problem, but I think that is unrelated).
When the data has been deleted and I refresh the model I get the following error message:
"We cannot get the data from the data model. This is the error message we got: A circular dependency was discovered: 'tCaseworks'[status_now],'tCaseworks'[status_now],'tCaseworks'[has_task],'tCaseworks'[has_task],'tCaseworks'[status_now]."
Question:
What is creating this dependency and how can I avoid it?
My attempted solutions:
The problem only arise when there are two of these calculated columns. Any one of these two works perfectly without the other upon refreshing. I know that calculated columns are prone to circular problems, but unfortunately I need to use columns and not measures. I suspect that perhaps my choice in formula is creating the problem, most likely the contains-function. However, I don't know about any alternative ways of building the formulas I need. Any suggestions?
Edit:
I originally only posted a portion of my data model as I wanted the question to be as concise as possible but I guess it might have been confusing. The whole model concerns five objects from a case handling system: Claims, Cases, Caseworks, Tasks and Action Points. These objects are hierarchical, one claim can have one or more cases, but one case can only have one claim. Similarly, a case can have several caseworks, a casework can have several tasks, a task can have several action points. Additionally, the latter four can have a status attribute which is changed regularly.
I attempted to organize my data model in such a way that I had a lookup table for each object with unique values. I have many attributes for each object in my data that I did not include in the example above, and my goal was to add useful attributes through calculated columns in these tables. The data tables with the changes were intented to provide insight to the lookup tables.
I think your relationship model is a bit unusual. DAX works best when using something like dimensional fact model
I would consider the tCaseworkStatus a fact table since its like a log of the changes to your data. tTasks is a dimension, since it just add an extra dimension to your data.
The tCaseworks is not necessary since it doesn't hold any actual data (only calculated data).
if you want your current model to work, it might fix your problem if you just delete the relationship between tTasks and tCaseworks, and add a new between tTasks and tCaseworksStatus
edit.
it just occurred to me that the reason you have it like this, is that you may have a many-to-many relationship between tTasks and tCaseworksStatus. if that is the case you might have to create a proper many-to-many table. which is kind of what your tCaseworks is, but you cant have a relationship to the same key like you currently have.
edit2.
the solution seemed to be that somehow the Relatedtable function in conjunction with the relationship model was causing the error. using Lookupvalue instead seems to to have fixed the issue.

Advantage of splitting a table

My question may seems more general. But only answer I got so far is from the SO itself. My question is, I have a table customer information. I have 47 fields in it. Some of the fields are optional. I would like to split that table into two customer_info and customer_additional_info. One of its column is storing a file in byte format. Is there any advantage by splitting the table. I saw that the JOIN will slow down the query execution. Can I have more PROs and CONs of splitting a table into two?
I don't see much advantage in splitting the table unless some of the columns are very infrequently accessed and fairly large. There's a theoretical advantage to keeping rows small as you're going to get more of them in a cached block, and you improve the efficiency of a full table scan and of the buffer cache. Based on that I'd be wary of storing this file column in the customer table if it was more than a very small size.
Other than that, I'd keep it in a single table.
I can think of only 2 arguments in favor of splitting the table:
If all the columns in Customer_Addition_info are related, you could potentially get the benefit of additional declarative data integrity that you couldn't get with a single table. For instance, lets say your addition table was CustomerAddress. Your business logic may dictate that a customer address is optional, but once you have a customer Zip code, the addressL1, City and State become required fields. You could set these columns to non null if they exist in a customerAddress table. You couldn't do that if they existed directly in the customer table.
If you were doing some Object-relational mapping and your had a customer class with many subclasses and you didn't want to use Single Table Inheritance. Sometimes STI creates problems when you have similar properties of various subclasses that require different storage layout. Being that all subclasses have to use the same table, you might have name clashes. The alternative is Class Table inheritance where you have a table for the superclass, and an addition table for each subclass. This is a similar scenario to the one you described in your question.
As for CONS, The join makes things harder and slower. You also run the risk of accidentally creating a 1 to many relationship. I.E. You create 2 addresses in the CustomerAddress table and now you don't know which one is valid.
EDIT:
Let me explain the declarative ref integrity point further.
If your business rules are such that a customer address is optional, and you embed addressL1, addressL2, City, State, and Zip in your customer table, you would need to make each of these fields Nullable. That would allow someone to insert a customer with a City but no state. You could write a table level check constraint to cover this situation. But that isn't as easy as simply setting the AddressL1, City, State and Zip columns in the CustomerAddress table not nullable. To be clear, I am NOT advocating using the multi-table approach. However you asked for Pros and Cons, and I'm just pointing out this aspect falls on the pro side of the ledger.
I second what David Aldridge said, I'd just like to add a point about the file column (presumably BLOB)...
BLOBs are stored up to approx. 4000 bytes in-line1. If a BLOB is used rarely, you can specify DISABLE STORAGE IN ROW to store it out-of-line, removing the "cache pollution" without the need to split the table.
But whatever you do, measure the effects on realistic amounts of data before you make the final decision.
1 That is, in the row itself.

Hbase Schema Nested Entity

Does anyone have an example on how to create an Hbase table with a nested entity?
Example
UserName (string)
SSN (string)
+ Books (collection)
The books collection would look like this for example
Books
isbn
title
etc...
I cannot find a single example are how to create a table like this. I see many people talk about it, and how it is a best practice in certain scenarios, but I cannot find an example on how to do it anywhere.
Thanks...
Nested entities isn't an official feature of HBase; it's just a way some people talk about one usage pattern. In this pattern, you use the fact that "columns" in HBase are really just a big map (a bunch of key/value pairs) to let you to model a dimension of cardinality inside the row by adding one column per "row" of the nested entity.
Schema-wise, you don't need to do much on the table itself; when you create a table in HBase, you just specify the name & column family (and associated properties), like so (in hbase shell):
hbase:001:0> create 'UserWithBooks', 'cf1'
Then, it's up to you what you put in it, column wise. You could insert values like:
hbase:002:0> put 'UsersWithBooks', 'userid1234', 'cf1:username', 'my username'
hbase:003:0> put 'UsersWithBooks', 'userid1234', 'cf1:ssn', 'my ssn'
hbase:004:0> put 'UsersWithBooks', 'userid1234', 'cf1:book_id_12345', '<isbn>12345</isbn><title>mary had a little lamb</title>'
hbase:005:0> put 'UsersWithBooks', 'userid1234', 'cf1:book_id_67890', '<isbn>67890</isbn><title>the importance of being earnest</title>'
The column names are totally up to you, and there's no limit to how many you can have (within reason: see the HBase Reference Guide for more on this). Of course, doing this, you have to do your own legwork re: putting in and getting out values (and you'd probably do it with the java client in a more sophisticated way than I'm doing with these shell commands, they're just for explanatory purposes). And while you can efficiently scan just a portion of the columns in a table by key (using a column pagination filter), you can't do much with the contents of the cells other than pull them and parse them elsewhere.
Why would you do this? Probably just if you wanted atomicity around all the nested rows for one parent row. It's not very common, your best bet is probably to start by modeling them as separate tables, and only move to this approach if you really understand the tradeoffs.
There are some limitations to this. First, this technique only works to
one level deep: your nested entities can’t themselves have nested entities. You can still
have multiple different nested child entities in a single parent, and the column qualifier is their identifying attributes.
Second, it’s not as efficient to access an individual value stored as a nested column
qualifier inside a row, as compared to accessing a row in another table, as you learned
earlier in the chapter.
Still, there are compelling cases where this kind of schema design is appropriate. If
the only way you get at the child entities is via the parent entity, and you’d like to have transactional protection around all children of a parent, this can be the right way to go.

How to Program a Spring with Hibernate web app?

I am Working on web application where i have 90 fields for a Person class which are divided in to family details,education details, personal details etc....
I want separate form for each, like for family details has-father name, mother name siblings etc... fields and so on for other
I want separate table for each detail with common reference id for all tables
My question is how many bean classes should i write? Is it with one bean class can i map from multiple forms to multiple tables?
class PersonRegister{
private Long iD;
private String emailID;
private String password;
.
.
}//for register.......
once logged in i need to maintain his/her details
Either
class person{
}
or
class PersonFamilyDetails{}
class PersonEducationDetails{}
etc
which way software developing standards specify to create?
Don't go overboard, I believe in your case single but very wide (i.e. with a lot of columns) table would be most efficient and simplest from maintenance perspective. Only thing to keep in mind is too query only for a necessary subset of columns/fields when loading lots of rows. Otherwise you'll be fetching kilobytes of unnecessary data, not needed for particular use case.
Unfortunately Hibernate doesn't have direct support for that, when designing a mapping for Person, you'll end up with huge class and even worse - Hibernate will always fetch all simple columns (and many-to-one relationships). You can however overcome this problem either by creating several views in the database containing only subset of columns or by having several Java classes mapping to the same table but only to subset of columns.
Splitting your database model into several tables is beneficial only if your schema is not normalized. E.g. when storing siblings first name and last name you may wish to have a separate Sibling table and next time some other family member is entered, you can reuse the same row. This makes database smaller and might be faster when searching by sibling.
Your question comes down to database normalization, as described in-depth by Boyce and Codd, see
http://en.wikipedia.org/wiki/Database_normalization.
The main advantage of database normalization is avoiding modification anomalies. In your case, if you got one table with for each person e.g. father-firstname and father-lastname, and you have multiple people with the same father, this data will be duplicated, and when you discover a typo in the father-lastname, you could modify it for one sibling, and not for the next.
In this simplified case, database design best practices would call for a first normalization into a separate table with father-id, father-firstname and father-lastname, and your person table having a one-to-many relation to it.
For one-to-one relations, e.g. person->personeducationdetails, there's some debate. In the original definition of 1st Normal Form, every optional field would be normalized by putting it's own table. This was later weakened by introducing 'null' in relational databases, see http://en.wikipedia.org/wiki/First_normal_form#cite_note-CoddRule-12. But still, if a whole set of columns could be null at the same time, you put them in a separate table with a one-to-one relation.
E.g. if you don't know a person's educationdetails, all of its related fields are null, so you better split them off in a separate table, and simply not have a personeducationdetails record for that person.

Best approach for populating model object(s) from a joined query?

I'm building a small financial system. Because of double-entry accounting, transactions always come in batches of two or more, so I've got a batch table and a transaction table. (The transaction table has batch_id, account_id, and amount fields, and shared data like date and description are relegated to the batch table).
I've been using basic vo-type models for each table so far. Because of this table structure, though, transactions will almost always be selected with a join on the batch table.
So should I take the selected records and splice them into two separate vo objects, or should I create a "shared" vo that contains both batch and transaction data?
There are a few cases in which batch records and/or transaction records are loaded individually, so they will each also have their associated vo class. Are there possible pitfalls down the road if I have "overlapping" vo classes like this?
The best approach is to tie models not to database tables, but to your views. E.g. if view has date field, then use "shared " view object (ideally even specific-to-the-view object), if view has only transaction info, use another object etc. It can be tedious, but separation of concerns will be worthy. Too much duplication can be remedied with reusing/inheriting when appropriate.

Resources