In trying to solve:
Linq .Contains with large set causes TDS error
I think I've stumbled across a solution, and I'd like to see if it's a kosher way of approaching the problem.
(short summary) I'd like to linq-join against a list of record id's that aren't (wholly or at least easily) generated in SQL. It's a big list and frequently blows past the 2100 item limit for the TDS RPC call. So what I'd have done in SQL is thrown them in a temp table, and then joined against that when I needed them.
So I did the same in Linq.
In my MyDB.dbml file I added:
<Table Name="#temptab" Member="TempTabs">
<Type Name="TempTab">
<Column Name="recno" Type="System.Int32" DbType="Int NOT NULL"
IsPrimaryKey="true" CanBeNull="false" />
</Type>
</Table>
Opening the designer and closing it added the necessary entries there, although for completeness, I will quote from the MyDB.desginer.cs file:
[Table(Name="#temptab")]
public partial class TempTab : INotifyPropertyChanging, INotifyPropertyChanged
{
private static PropertyChangingEventArgs emptyChangingEventArgs = new PropertyChangingEventArgs(String.Empty);
private int _recno;
#region Extensibility Method Definitions
partial void OnLoaded();
partial void OnValidate(System.Data.Linq.ChangeAction action);
partial void OnCreated();
partial void OnrecnoChanging(int value);
partial void OnrecnoChanged();
#endregion
public TempTab()
{
OnCreated();
}
[Column(Storage="_recno", DbType="Int NOT NULL", IsPrimaryKey=true)]
public int recno
{
get
{
return this._recno;
}
set
{
if ((this._recno != value))
{
this.OnrecnoChanging(value);
this.SendPropertyChanging();
this._recno = value;
this.SendPropertyChanged("recno");
this.OnrecnoChanged();
}
}
}
public event PropertyChangingEventHandler PropertyChanging;
public event PropertyChangedEventHandler PropertyChanged;
protected virtual void SendPropertyChanging()
{
if ((this.PropertyChanging != null))
{
this.PropertyChanging(this, emptyChangingEventArgs);
}
}
protected virtual void SendPropertyChanged(String propertyName)
{
if ((this.PropertyChanged != null))
{
this.PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
}
}
}
Then it simply became a matter of juggling around some things in the code. Where I'd normally have had:
MyDBDataContext mydb = new MyDBDataContext();
I had to get it to share its connection with a normal SqlConnection so that I could use the connection to create the temporary table. After that it seems quite usable.
string connstring = "Data Source.... etc..";
SqlConnection conn = new SqlConnection(connstring);
conn.Open();
SqlCommand cmd = new SqlCommand("create table #temptab " +
"(recno int primary key not null)", conn);
cmd.ExecuteNonQuery();
MyDBDataContext mydb = new MyDBDataContext(conn);
// Now insert some records (1 shown for example)
TempTab tt = new TempTab();
tt.recno = 1;
mydb.TempTabs.InsertOnSubmit(tt);
mydb.SubmitChanges();
And using it:
// Through normal SqlCommands, etc...
cmd = new SqlCommand("select top 1 * from #temptab", conn);
Object o = cmd.ExecuteScalar();
// Or through Linq
var t = from tx in mydb.TempTabs
from v in mydb.v_BigTables
where tx.recno == v.recno
select tx;
Does anyone see a problem with this approach as a general-purpose solution for using temporary tables in joins in Linq?
It solved my problem wonderfully, as now I can do a straightforward join in Linq instead of having to use .Contains().
Postscript:
The one problem I do have is that mixing Linq and regular SqlCommands on the table (where one is reading/writing and so is the other) can be hazardous. Always using SqlCommands to insert on the table, and then Linq commands to read it works out fine. Apparently, Linq caches results -- there's probably a way around it, but it wasn't obviousl.
I don't see a problem with using temporary tables to solve your problem. As far as mixing SqlCommands and LINQ, you are absolutely correct about the hazard factor. It's so easy to execute your SQL statements using a DataContext, I wouldn't even worry about the SqlCommand:
private string _ConnectionString = "<your connection string>";
public void CreateTempTable()
{
using (MyDBDataContext dc = new MyDBDataContext(_ConnectionString))
{
dc.ExecuteCommand("create table #temptab (recno int primary key not null)");
}
}
public void DropTempTable()
{
using (MyDBDataContext dc = new MyDBDataContext(_ConnectionString))
{
dc.ExecuteCommand("DROP TABLE #TEMPTAB");
}
}
public void YourMethod()
{
CreateTempTable();
using (MyDBDataContext dc = new MyDBDataContext(_ConnectionString))
{
...
... do whatever you want (within reason)
...
}
DropTempTable();
}
We have a similar situation, and while this works, the issue becomes that you aren't really dealing with Queryables, so you cannot easily use this "with" LINQ. This isn't a solution that works with method chains.
Our final solution was just to throw what we want in a stored procedure, and write selects in that procedure against the temp tables when we want those values. It is a compromise, but both are workarounds. At least with the stored proc the designer will generate the calling code for you, and you have a black boxed implementation so if you need to do further tuning you can do so strictly within the procedure, without a recompile.
In a perfect world, there will be some future support for writing Linq2Sql statements that allow you to dicate the use of temp tables within your queries, avoid the nasty sql IN statement for complex scenarios like this one.
As a "general-purpose solution", what if you code is run in more than one threads/apps? I think big-list solution is always related to the problem domain. It's better to use a regular table for the problem you are working on.
I once created a "generic" list table in database. The table was created with three columns: int, uniqueidentifier and varchar, along with other columns to manage each list. I was thinking: "it ought to be enough to handle many cases". But soon I received a task that requires a join be performed with a list on three integers. After that, I never tried to create "generic" list table again.
Also, it's better to create a SP to insert multiple items into the list table in each database call. You can easily insert ~2000 items in less than 2 db round trips. Of cause, depending on what you are doing, performance may do not matter.
EDIT: forgot it is a temporary table and temporary table is per connection, so my previous argument on multi-threads was not proper. But still, it is not a general solution, for enforcing the fixed schema.
Would the solution offered by Neil actually work? If its a temporary table, and each of the methods is creating and disposing its own data context, I dont think the temporary table would still be there after the connection was dropped.
Even if it was there, I think this would be an area where you are assuming some functionality of how queries and connections end up being rendered, and thats ome of the big issues with linq to sql - you just dont know what might happen downt he track as the engineers come up with better ways of doing things.
I'd do it in a stored proc. You can always return the result set into a pre-defined table if you wish.
Related
I am trying to see if there is a way to improve the way data is inserted and updated.
I am using ORACLE DB with JDBC.
The current way i'm doing is to update (e.g.)customer record by using a FOR loop after checking if toUpdate is true . An Example such as the sample code below, followed by calling an existing DAO update() to do so. But this would not allow for the UPSERT of multiple data together.
However, is there a better way to UPSERT multiple data together?
if (toUpdate) {
for (Customer customerRec : customerRecList)
customerRecDAO.update(customerRec);
}
Yes you can use batching:
public <T> int saveInBatch(List<T> records, String sql, Function<T, MapSqlParameterSource> paramFn){
try{
MapSqlParameterSource[] params = records.stream().map(paramFn).toArray(MapSqlParameterSource[]::new);
int rowCount = jdbcTemplate.batchUpdate(sql, params);
return Arrays.stream(rowCount).sum();
}
catch(Exception e){
//exception handling
}
}
paramFn is a lambda of function such that you can map records to their values. example could be
(record)->{
return new MapSqlParameterSource("username" ,username),Integer.class);//just example
}
why we use MapSqlParameterSource
You can call saveInBatch in such a way that you pass smaller batches or customized batches of records. Suppose you have a million records then you may want to update only 200-400 records at a time so you can do something like below:
private <T> int saveRecords(List<T> records, String sql, Function<T, MapSqlParameterSource> paramFn) throws Exception{
return Lists.partition(records, 300).stream().map(batch-> saveInBatch(batch, sql, paramFn)).mapToInt(Integer::intValue).sum();
}
Note: above is not well optimized or streams are not used to their best but this is a working code I tried ages back :).
I'm a bit puzzled with the NHibernate's IsDirty() method.
Directly after getting a (very large) complex object from my database, NHibernate's ISession.IsDirty() gives 'true'.
IFacadeDAL fd = new FacadeDAL();
// Session's not dirty
IProject proj = fd.GetByID<IProject, string>("123611-3640");
// Session is dirty
However, if i call Commit() like so:
using (ITransaction trans = Facade.Session.Transaction)
{
trans.Begin();
Facade.Session.Save(entity);
trans.Commit();
return true;
}
this results in no sql (exept for "exec sp_reset_connection").
I have read that due to 'mapping-choices' you can get "ghosts" in your session (causing the session to say it's dirty), but wouldn't it then also try to update something? Also, if this is caused e.g. by "converting" an sql bit to a c# bool i don't think i can change it... (no clue if that could be a cause for ghosts, though).
Update 2:
There are several (sql server) views and tables involved here. This is the (very) simplified class:
public class Project : IProject
{
private string id;
private List<IPlantItem> plantItems;
public Project() { }
public virtual string ID
{
get { return id; }
}
public virtual IEnumerable<IPlantItem> PlantItems
{
get { return plantItems; }
}
}
'PlantItem is being stored in a table. So i expect when i change anything in a PlantItem, IsDirty should change to 'true'.
My question is: is there a way to check if the session at that point, on flush() (or in my case on commit() for that matter) would generated actual sql statements? And if not: is there another way of (manually) storing some sort of a snapshot of the session to compare the current session to?
Update 1: I should really also mention these aspects:
that my FlushMode is set to 'None'.
that the underlying data of 'IProject'-object itself is based on a sql-view and therefore has most properties in the mapping set to update="false"
that when i actually change something in an object and use the same method for saving, sql update statements are being sent (and thus all is committed just fine)
In my experience Ghosts can be caused by the database being a nullable int and the mapping an ordinary int.
When the entity gets hydrated the nullable db int is converted to zero and hence it is now dirty.
Another way to get dirty records is by specifying a wrong type in the XML mapping, e.g.
public enum Sex
{
Unspecified,
Male,
Female
}
...
public virtual Sex Sex { get; set; }
and specify an int in the mapping.
<property name="Sex" type="int"/>
See this link to test your mappings which explains in more details.
If some of your entities is dirty - and therefore the ISession is dirty - they you have a mismatch between the properties and the database. For example, imagine you have a column in a table that is nullable, but in your code it is set as not null (an int, for example). NHibernate will consider it dirty, because its current value (0 in case of an integer) is different from the value that came from the db (null). Look for "Ghost properties NHibernate" in Google.
I have code that generates records based on my DataGridView. These records are temporary because some of them already exist in the database.
Crop_Variety v = new Crop_Variety();
v.Type_ID = currentCropType.Type_ID;
v.Variety_ID = r.Cells[0].Value.ToString();
v.Description = r.Cells[1].Value.ToString();
v.Crop = currentCrop;
v.Crop_ID = currentCrop.Crop_ID;
Unfortunately in this little bit of code, because I say that v.Crop = currentCrop,
now currentCrop.Crop_Varieties includes this temporary record. And when I go to insert the records of this grid that are new, they have a reference to the same Crop record, and therefore these temporary records that do already exist in the database show up twice causing duplicate key errors when I submit.
I have a whole system for detecting what records need to be added and what need to be deleted based on what the user has done, but its getting gummed up by this relentless tracking of references.
Is there a way I can stop Linq-To-Sql from automatically adding these temporary records to its table collections?
I would suggest revisiting the code that populates DataGridView (grid) with records.
And then revisit the code that operates on items from a GridView, keeping in mind that you can grab bound item from a grid row using the following code:
public object GridSelectedItem
{
get
{
try
{
if (_grid == null || _grid.SelectedCells.Count < 1) return null;
DataGridViewCell cell = _grid.SelectedCells[0];
DataGridViewRow row = _grid.Rows[cell.RowIndex];
if (row.DataBoundItem == null) return null;
return row.DataBoundItem;
}
catch { }
return null;
}
}
It is also hard to understand the nature of Crop_Variety code that you have posted. As the Crop_Variety seems to be a subclass of Crop. This leads to problems when the Crop is not yet bound to database and potentially lead to problems when you're adding Crop_Variety to the context.
For this type of Form application I normally have List _dataList inside form class, then the main grid is bound to that list, through ObjectBindingList or another way. That way _dataList holds all data that needs to be persisted when needed (user clicked save).
When you assign an entity object reference you are creating a link between the two objects. Here you are doing that:
v.Crop = currentCrop;
There is only one way to avoid this: Modify the generated code or generate/write your own. I would never do this.
I think you will be better off by writing a custom DTO class instead of reusing the generated entities. I have done both approaches and I like the latter one far better.
Edit: Here is some sample generated code:
[global::System.Data.Linq.Mapping.AssociationAttribute(Name="RssFeed_RssFeedItem", Storage="_RssFeed", ThisKey="RssFeedID", OtherKey="ID", IsForeignKey=true, DeleteOnNull=true, DeleteRule="CASCADE")]
public RssFeed RssFeed
{
get
{
return this._RssFeed.Entity;
}
set
{
RssFeed previousValue = this._RssFeed.Entity;
if (((previousValue != value)
|| (this._RssFeed.HasLoadedOrAssignedValue == false)))
{
this.SendPropertyChanging();
if ((previousValue != null))
{
this._RssFeed.Entity = null;
previousValue.RssFeedItems.Remove(this);
}
this._RssFeed.Entity = value;
if ((value != null))
{
value.RssFeedItems.Add(this);
this._RssFeedID = value.ID;
}
else
{
this._RssFeedID = default(int);
}
this.SendPropertyChanged("RssFeed");
}
}
}
As you can see the generated code is establishing the link by saying "value.RssFeedItems.Add(this);".
In case you have many entities for wich you would need many DTOs you could code-generate the DTO classes by using reflection.
I've only been using fluent nhibernate a few days and its been going fine until trying to deal with guid values and Oracle. I have read a good few posts on the subject but none that help me solve the problem I am seeing.
I am using Oracle 10g express edition.
I have a simple test table in oracle
CREATE TABLE test (Field RAW(16));
I have a simple class and interface for mapping to the table
public class Test : ITest
{
public virtual Guid Field { get; set; }
}
public interface ITest
{
Guid Field { get; set; }
}
Class map is simple
public class TestMap : ClassMap<Test>
{
public TestMap()
{
Id(x => x.Field);
}
}
I start trying to insert a simple easily recognised guid value
00112233445566778899AABBCCDDEEFF
Heres the code
var test = new Test {Field = new Guid("00112233445566778899AABBCCDDEEFF")};
// test.Field == 00112233445566778899AABBCCDDEEFF here.
session.Save(test);
// after save guid is changed, test.Field == 09a3f4eefebc4cdb8c239f5300edfd82
// this value is different for each run so I pressume nhibernate is assigning
// a value internally.
transaction.Commit();
IQuery query = session.CreateQuery("from Test");
// or
// IQuery query = session.CreateSQLQuery("select * from Test").AddEntity(typeof(Test));
var t in query.List<Test>().Single();
// t.Field == 8ef8a3b10e704e4dae5d9f5300e77098
// this value never changes between runs.
The value actually stored in the database differs each time also, for the run above it was
EEF4A309BCFEDB4C8C239F5300EDFD82
Truly confused....
Any help much appreciated.
EDIT: I always delete data from the table before each test run. Also using ADO directly works no problem.
EDIT: OK, my first problem was that even though I thought I was dropping the data from the table via SQL command line for oracle when I viewed the table via oracle UI it still had data and the first guid was as I should have expected 8ef8a3b10e704e4dae5d9f5300e77098.
Fnhibernate still appears to be altering the guid value on save. it alters it to the value it stores in the database but I'm still not sure why it is doing this or how\if I can control it.
If you intend on assigning the id yourself you will need to use a different id generator than the default which is Guid.comb. You should be using assigned instead. So your mapping would look something like this:
Id(x => x.Field).GeneratedBy.Assigned();
You can read more about id generators in the nhibernate documentation here:
http://www.nhforge.org/doc/nh/en/index.html#mapping-declaration-id-generator
I have the following code -
public void LoadAllContacts()
{
var db = new ContextDB();
var contacts = db.LocalContacts.ToList();
grdItems.DataSource = contacts.OrderBy(x => x.Areas.OrderBy(y => y.Name));
grdItems.DataBind();
}
I'm trying to sort the list of the contacts according to the area name that is contained within each contact. When I tried the above, I get "At least one object must implement IComparable.". Is there an easy way instead of writing a custom IComparer?
Thanks!
try this:
public void LoadAllContacts()
{
var db = new ContextDB();
var contacts = db.LocalContacts.ToList();
grdItems.DataSource = contacts.OrderBy(x => x.Areas.OrderBy(y => y.Name).First().Name);
grdItems.DataBind();
}
this will order the contacts by the first area name, after ordering the areas by name.
Hope this helps :)
Edit: fixed error in code. (.First().Name)
I was in a discussion with #AbdouMoumen but in the end I thought I'd provide my own answer :-)
His answer works, but there two performance issues in this code (both in the answer as in the original question).
First, the code loads ALL contacts in the db. This may or may not be a problem, but in general I would recommend NOT to do this. Many modern controls support paging/filtering out of the box, so you'd be better off supplying an not-yet-evaluated IQueryable<T> instead of List<T>. If however you need everything in memory, you should delay the ToList to the last possible moment.
Second, in AbdouMoumen's answer, there is a so-called 'SELECT N+1' problem. Entity Framework will by default use lazy loading to fetch additional properties. I.e. the Areas property will not be fetched from the database until it's accessed. In this case this will happen in the controls 'for loop', while it's ordering the result set by name.
Open up SQL Server Profiler to see what I mean: you will see a SELECT statement for all the contacts, and an additional SELECT statement for each contact that fetches the Areas for that contact.
A much better solution would be the following:
public void LoadAllContacts()
{
using (var db = new ContextDB())
{
// note: no ToList() yet, just defining the query
var contactsQuery = db.LocalContacts
.OrderBy(x => x.Areas
.OrderBy(y => y.Name)
.First().Name);
// fetch all the contacts, correctly ordered in the DB
grdItems.DataSource = contactsQuery.ToList();
grdItems.DataBind();
}
}
Is it one to one relation (Contact->Area)?
if yeah then try the following :
public partial class Contact
{
public string AreaName
{
get
{
if (this.Area != null)
return this.Area.Name;
return string.Empty;
}
}
}
then
grdItems.DataSource = contacts.OrderBy(x => x.AreaName);