In our application, we need to load large CSV files and fetch some data out of it. For example, getting the distinct values from the CSV file. For this, we decided to go with in-memory DB's like H2, as there is no need to store the data in persistent storage.
However, the file is so dynamic that the columns may not be the same. I need to load the file to the H2 database to a table that is temporary for that session.
Tech Stack is Spring boot and H2.
The examples I see on forums is using a standard entity that knows what fields the table has. However my case the table columns will be dynamic
I tried the below in spring boot
public interface ImportCSVRepository extends JpaRepository<Object, String>
with
#Query(value = "CREATE TABLE TEST AS SELECT * FROM CSVREAD('test.csv');", nativeQuery = true)
But this gives unmanaged entity error. I understand why the error is thrown. However I am not sure how to achieve this. Also please clarify if I should use Spring-batch ?
You can use JdbcTemplate to manually create tables and query/update the data in them.
An example of how to create a table with JdbcTemplate
Dynamically creating tables and defining new entities (or modifying existing ones) is hardly possible with spring-data repositories and #Entity-ies. You probably should also check some NoSQL dbs like MongoDb - it's easier to define documents (or key-value objects - Redis) with dynamic structures in them.
My application will be querying a database using Entity Framework. The problem is that the database table structure changes fairly often (a few times a year).
Back in the SQL days, we would store SQL queries in Resource files (.resx) and when any database changes occurred, we could just edit the one resource file and not have to edit any code in the app, recompile, etc.
Are there any good ways to do this with Linq-to-SQL?
Linq2SQL is innately code-based. If your schema is going to change, then the code will need to change.
The only way I can see around this, and still get some of the benefits of linq, is to write everything as Stored Procedures, which you can than add as method to the linq DataContext.
Then, as long as the Name, input parameters and output columns remain the same, you can change what the SP is doing on the database and the code can stay the same.
I am working on an asp.net MVC 3 web application and I am using database first, but after I have mapped the DB tables into entity classes using entity framework, I am interacting with these tables as I will be interacting on the code first approach by dealing with Database tables as classes an objects.
So after mapping the tables into entity classes I find that the code first approach and DB first are very similar but except of start writing the entities classes from scratch (as in code first) I have created the entity classes from existing database tables - which is easier and more convenient in my case.
So are there specific cases on which i will not be able to do some functionalities unless i am using one approach over the other which till now i cannot find any?
Having dealt with many many headaches using db-1st EDMX pre EF 4.1, I am partial to code-first. But I'm not going to evangelize it.
In addition to the direct sproc mapping & function import features mentioned in Pawel's answer & comment, you won't be able to change the namespaces or any other code in the generated files when you use db-first. Afaik all of the files are nested under the .tt file. If there is a way to move them into logical folders & namespaces in your project, then I'm not aware of it.
Also if you ever want to separate your DbContext into a separate project from your entities, I recall this was possible pre-EF 4.1. But it was more cumbersome, because you had to run custom tool on both .tt files after each db change. With code-first this is pretty straightforward because you're dealing with pure OOP.
I think that the biggest limitation of CodeFirst (as compared to ModelFirst/DatabaseFirst approaches) is that you cannot map your CUD operations to stored procedures. If you are not planning to do that then you should be good to go.
To be more specific - You can invoke stored procedures using SqlQuery method on DbSet which will cause the returned entities to be tracked or more general SqlQuery and ExecuteSqlCommand on Database class (for Database.SqlQuery the returned objects do not have to be entities and there is no tracking for these objects). That's about it. You cannot map Create/Update/Delete operations to stored procedures. FunctionImports are not supported as well
EDIT
It's possible to map CUD operations to stored procedures in EF6 now
If I map my Domain objects to linq Entities will I now not be able to track changes when saving my domain objects? So for any change in my model that i wish to make, once I map the object to linq entities for submission to db, all object values will be submitted to the db by linq since it it goes through a mapping first? Or would the object tracking here still be utilized?
Depends on the O/R mapper you're using. You're referring to entity framework which doesn't do any change tracking inside the entity and therefore it needs help from you when you re-attach an entity which previously was fetched from the db (so it knows it's not new).
Here's an article from microsoft about CRUD operations in multi-tiered environments (similiar issues to your Domain mapping scenario).
Check out the Update - With Complete Entities for the way to do change tracking yourself.
There's another technique, where you attach the entity as unmodified, and then .Refresh() with Keep Current Values - replacing the original. This would allow you to Insert/Update/Do Nothing as appropriate at the cost of a database roundtrip.
I've read Rick Strahl's article on Linq to SQL DataContext Lifetime Management hoping to find some answers on how I would manage my .dbml files since they are so closely related to DataContext. Unfortunately, Rick's article seems to be focused on DataContext lifetime at runtime though, and my question is concerned with how the .dbml's should be organized at design time.
The general question of 'Best practices with .dbml's' has been asked and answered here, and the answers have focused on external tools to manage the .dbml.
I'm asking a more focused question of when and why should you not have a single .dbml file in your LINQ to SQL based project?
Please note that LINQ2SQL is intended for simple and easy way to handle database relationship with objects.
Do not break table relationship and units of work concepts by creating multiple .dbml files.
If you ever need to create multiple .dbml files (which i don't recommend), then try to satisfy the following:-
If you create multiple databases with no relationship between those database tables.
If you want to use one of these .dbml just to handle stored procedures
If you do not care about unit of work concept.
If your database is too complex, then I would consider ORM such as NHibernate, EF 4
In my opinion, you can split the .dmbl files so that each hold a subset of tables/procs from a DB according to function and relationship. I have not done this yet so this is just opinion.
I have however created multiple .dbml files to assist with unit testing. If you work in an environment which restricts you to using stored procs in your production environment then you cannot use the table part of the .dbml (you can use the proc part though). So if you "unit test" (this is really integration testing) the DB layer of your code you can call the proc wrapper and then check the results by querying the tables through the .dbml interface. In cases like this I'll split the .dmbl file into just the tables that I want to query in my "unit test."
Further info: I have 2 solutions that I build. One has unit tests and is never built on the build server. The other is built on the build server and deployed to test/production.
I'd say, you always just need 1 dbml-file PER database. If you have multiple connections to other databases, consider design or use seperate dbml-files. Either way, one is enough per database.
This because the dbml mapps to your tables and why not just use one "data connector" / "data layer" for that, seems odd / weird design to use more than one.
It's probably more controllable using only 1 aswell.
This issue has been thoroughly analyzed here: http://craftycode.wordpress.com/2010/07/19/linq-to-sql-single-data-context-or-multiple/
In summary, you should create at most one data context per strongly connected group of tables, or one data context per database.
Say you have a database:
Database D contains tables A, B, C, X, Y, Z where
Table A has a foreign key
relationship with tables B and C
Table X has a foreign key
relationship with tables Y and Z
Table X also has a foreign key relationship with table A
Say you have 2 DBML files P and Q based on database D
DBML File P contains entities A', B'
and C' where A' is connected to B'
and C' via associations.
DBML File Q
contains entities X', Y' and Z' where
X' is connected to Y' and Z' via
associations.
AFAIK, there is no way for DBML files P and Q to contain an association between entities A' and X'. This is the single biggest problem with having multiple DBML files.
To my mind, a DBML file reflects the data-model represented by the tables and constraints on those tables in a database. If some tables or constraints are missing from a set of DBML files, then the set of DBML files do not accurately reflect the underlying database.
Going back to our example, if there was no relationship between tables A and X in database D, then one would be able to create 2 DBML files.
Generically speaking, one can have multiple DBML files if each DBML file contains all entities and relationships that are connected. Note that the converse is not a problem, i.e., one can have a single DBML file containing multiple groups of entities that are not related to each other by any associations.
The answer is tricky because it's what the situation requires. I try to logically separate each DBML into contexts (after all, the DBML provides the DataContext functionality). So if my app has a single context, then it doesn't make sense for me to have a separate DBML for each table. Context is king when creating your DBML files is what I say.
Another thing to bear in mind is that LINQ uses the DataContext to track the identities of the instances of entities it creates. Therefore, an entity representing a row in a table created by one instance of the DataContext class is not the same as one created by another, even if all the properties are the same.
When one has multiple DBML files, then by necessity, there will be multiple instances of DataContexts, one for each DBML file. Therefore, entities can't be joined or shared from one DataContext to another.
This is applicable when an entity exists in both (or all) DBML files.