I'm using EF 5 with Oracle database.
I'm doing a select count in a table with a specific parameter. When I'm using EF, the query returns the value 31, as expected, But the result takes about 10 seconds to be returned.
using (var serv = new Aperam.SIP.PXP.Negocio.Modelos.SIP_PA())
{
var teste = (from ens in serv.PA_ENSAIOS_UM
where ens.COD_IDENT_UNMET == "FBLDY3840"
select ens).Count();
}
If I execute the simple query bellow the result is the same (31), but the result is showed in 500 milisecond.
SELECT
count(*)
FROM
PA_ENSAIOS_UM
WHERE
COD_IDENT_UNMET 'FBLDY3840'
There are a way to improve the performance when I'm using EF?
Note: There are 13.000.000 lines in this table.
Here are some things you can try:
Capture the query that is being generated and see if it is the same as the one you are using. Details can be found here, but essentially, you will instantiate your DbContext (let's call it "_context") and then set the Database.Log property to be the logging method. It's fine if this method doesn't actually do anything--you can just set a breakpoint in there and see what's going on.
So, as an example: define a logging function (I have a static class called "Logging" which uses nLog to write to files)
public static void LogQuery(string queryData)
{
if (string.IsNullOrWhiteSpace(queryData))
return;
var message = string.Format("{0}{1}",
queryData.Trim().Contains(Environment.NewLine) ?
Environment.NewLine : "", queryData);
_sqlLogger.Info(message);
_genLogger.Trace($"EntityFW query (len {message.Length} chars)");
}
Then when you create your context point to LogQuery:
_context.Database.Log = Logging.LogQuery;
When you do your tests, remember that often the first run is the slowest because the server has to actually do the work, but on the subsequent runs, it often uses cached data. Try running your tests 2-3 times back to back and see if they don't start to run in the same time.
I don't know if it generates the same query or not, but try this other form (which should be functionally equivalent, but may provide better time)
var teste = serv.PA_ENSAIOS_UM.Count(ens=>ens.COD_IDENT_UNMET == "FBLDY3840");
I'm wondering if the version you have pulls data from the DB and THEN counts it. If so, this other syntax may leave all the work to be done at the server, where it belongs. Not sure, though, esp. since I haven't ever used EF with Oracle and I don't know if it behaves the same as SQL or not.
Related
Considering a Spring Boot, neo4j environment with Spring-Data-neo4j-4 I want to make a delete and get an error message when it fails to delete.
My problem is since the Repository.delete() returns void I have no ideia if the delete modified anything or not.
First question: is there any way to get the last query affected lines? for example in plsql I could do SQL%ROWCOUNT
So anyway, I tried the following code:
public void deletesomething(Long somethingId) {
somethingRepository.delete(getExistingsomething(somethingId).getId());
}
private something getExistingsomething(Long somethingId, int depth) {
return Optional.ofNullable(somethingRepository.findOne(somethingId, depth))
.orElseThrow(() -> new somethingNotFoundException(somethingId));
}
In the code above I query the database to check if the value exist before I delete it.
Second question: do you recommend any different approach?
So now, just to add some complexity, I have a cluster database and db1 can only Create, Update and Delete, and db2 and db3 can only Read (this is ensured by the cluster sockets). db2 and db3 will receive the data from db1 from the replication process.
For what I seen so far replication can take up to 90s and that means that up to 90s the database will have a different state.
Looking again to the code above:
public void deletesomething(Long somethingId) {
somethingRepository.delete(getExistingsomething(somethingId).getId());
}
in debug that means:
getExistingsomething(somethingId).getId() // will hit db2
somethingRepository.delete(...) // will hit db1
and so if replication has not inserted the value in db2 this code wil throw the exception.
the second question is: without changing those sockets is there any way for me to delete and give the correct response?
This is not currently supported in Spring Data Neo4j, if you wish please open a feature request.
In the meantime, perhaps the easiest work around is to fall down to the OGM level of abstraction.
Create a class that is injected with org.neo4j.ogm.session.Session
Use the following method on Session
Example: (example is in Kotlin, which was on hand)
fun deleteProfilesByColor(color : String)
{
var query = """
MATCH (n:Profile {color: {color}})
DETACH DELETE n;
"""
val params = mutableMapOf(
"color" to color
)
val result = session.query(query, params)
val statistics = result.queryStatistics() //Use these!
}
Why isn't the exception triggered? Linq's "Any()" is not considering the new entries?
MyContext db = new MyContext();
foreach (string email in {"asdf#gmail.com", "asdf#gmail.com"})
{
Person person = new Person();
person.Email = email;
if (db.Persons.Any(p => p.Email.Equals(email))
{
throw new Exception("Email already used!");
}
db.Persons.Add(person);
}
db.SaveChanges()
Shouldn't the exception be triggered on the second iteration?
The previous code is adapted for the question, but the real scenario is the following:
I receive an excel of persons and I iterate over it adding every row as a person to db.Persons, checking their emails aren't already used in the db. The problem is when there are repeated emails in the worksheet itself (two rows with the same email)
Yes - queries (by design) are only computed against the data source. If you want to query in-memory items you can also query the Local store:
if (db.Persons.Any(p => p.Email.Equals(email) ||
db.Persons.Local.Any(p => p.Email.Equals(email) )
However - since YOU are in control of what's added to the store wouldn't it make sense to check for duplicates in your code instead of in EF? Or is this just a contrived example?
Also, throwing an exception for an already existing item seems like a poor design as well - exceptions can be expensive, and if the client does not know to catch them (and in this case compare the message of the exception) they can cause the entire program to terminate unexpectedly.
A call to db.Persons will always trigger a database query, but those new Persons are not yet persisted to the database.
I imagine if you look at the data in debug, you'll see that the new person isn't there on the second iteration. If you were to set MyContext db = new MyContext() again, it would be, but you wouldn't do that in a real situation.
What is the actual use case you need to solve? This example doesn't seem like it would happen in a real situation.
If you're comparing against the db, your code should work. If you need to prevent dups being entered, it should happen elsewhere - on the client or checking the C# collection before you start writing it to the db.
I'm using NHibernate 3.2 and I have a repository method that looks like:
public IEnumerable<MyModel> GetActiveMyModel()
{
return from m in Session.Query<MyModel>()
where m.Active == true
select m;
}
Which works as expected. However, sometimes when I use this method I want to filter it further:
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.ID < 100
select new { m.Name };
Which produces the same SQL as the first one and the second filter and select must be done after the fact. I thought the whole point in LINQ is that it formed an expression tree that was unravelled when it's needed and therefore the correct SQL for the job could be created, saving my database requests.
If not, it means all of my repository methods have to return exactly what is needed and I can't make use of LINQ further down the chain without taking a penalty.
Have I got this wrong?
Updated
In response to the comment below: I omitted the line where I iterate over the results, which causes the initial SQL to be run (WHERE Active = 1) and the second filter (ID < 100) is obviously done in .NET.
Also, If I replace the second chunk of code with
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.Items.Count > 0
select new { m.Name };
It generates the initial SQL to retrieve the active records and then runs a separate SQL statement for each record to find out how many Items it has, rather than writing something like I'd expect:
SELECT Name
FROM MyModel m
WHERE Active = 1
AND (SELECT COUNT(*) FROM Items WHERE MyModelID = m.ID) > 0
You are returning IEnumerable<MyModel> from the method, which will cause in-memory evaluation from that point on, even if the underlying sequence is IQueryable<MyModel>.
If you want to allow code after GetActiveMyModel to add to the SQL query, return IQueryable<MyModel> instead.
You're running IEnumerable's extension method "Where" instead of IQueryable's. It will still evaluate lazily and give the same output, however it evaluates the IQueryable on entry and you're filtering the collection in memory instead of against the database.
When you later add an extra condition on another table (the count), it has to lazily fetch each and every one of the Items collections from the database since it has already evaluated the IQueryable before it knew about the condition.
(Yes, I would also like to be the extensive extension methods on IEnumerable to instead be virtual members, but, alas, they're not)
I started using compiled queries to increase the performance of some commonly executed linq to entities queries. In one scenario I only boiled the query down to it's most basic form and pre-compiled that, then I tack on additional where clauses based on user input.
I seem to be losing the performance benefit of compiled queries in this particular case. Can someone explain why?
Here's an example of what I'm doing...
IEnumerable<Task> tasks = compiledQuery.Invoke(context, userId);
if(status != null)
{
tasks = tasks.Where(x=x.Status == status);
}
if(category != null)
{
tasks = tasks.Where(x=x.Category == category);
}
return tasks;
I think it's important to understand how Compiled Queries in EF work.
When you execute a query Entity Framework will map your expression tree with the help of your mapping file (EDMX or with code first your model definitions) to a SQL query. This can be a complex and performance intensive task.
Precompiling stores the results of these mapping phase so the next time you hit the query it has the SQL already available and it only has to set the current parameters.
The problem is that a precompiled query will lose it's performance benefit as soon as you modifie the query. Let's say you have the following:
IQueryable query = GetCompiledQuery(); // => db.Tasks.Where(t => t.Id == myId);
var notModifiedResult = query.ToList(); // Fast
int ModifiedResult = query.Count(); // Slow
With the first query you will have all the benefits of precompiling because EF has the SQL already generated for you and can execute this immediatly.
The second query will lose the precompiling because it has to regenerate it's SQL.
If you would now execute a query on notModifiedResult this will be a Linq To Objects one because you have already executed your SQL to the database and fetched all the elements in memory.
You can however chain Compiled Queries (that is, use a compiled query in another compiled query).
But your code would require a series of compiled queries:
- The default
- One where status != null
- One where category != null
- One where both status and category != null
(Note: I haven't done any EF work for ages, and then it was just pottering. This is just an informed guess, really.)
This could be the culprit:
IEnumerable<Task> tasks = compiledQuery.Invoke(context, userId);
Any further querying will have to be done within the .NET process, not in SQL. All the possible results will have to be fetched from the database and filtered locally. Try this instead:
IQueryable<Task> tasks = compiledQuery.Invoke(context, userId);
(Assuming that's valid, of course.)
The compiled query can't be changed, only the parameters can be changed. What you are doing here is actually running the query, and THEN filtering the results.
.Invoke(context, userId); // returns all the results
.Where(....) // filters on that entire collection
You can see if there is a clever way to restate your query, so that the parameters can be included in all cases, but not have any effect. I haven't worked with compiled queries, sorry about that, but does this work (using -1 as the "ignore" value)?
// bunch of code to define the compiled query part, copied from [msdn][1]
(ctx, total) => from order in ctx.SalesOrderHeaders
where (total == -1 || order.TotalDue >= total)
select order);
In SQL, you do this by either using dynamic sql, or having a default value (or null) that you pass in which indicates that parameter should be ignored
select * from table t
where
(#age = 0 or t.age = #age) and
(#weight is null or t.weight = #weight)
I've been developing a webapp using Linq to NHibernate for the past few months, but haven't profiled the SQL it generates until now. Using NH Profiler, it now seems that the following chunk of code hits the DB more than 3,000 times when the Linq expression is executed.
var activeCaseList = from c in UserRepository.GetCasesByProjectManagerID(consultantId)
where c.CompletionDate == null
select new { c.PropertyID, c.Reference, c.Property.Address, DaysOld = DateTime.Now.Subtract(c.CreationDate).Days, JobValue = String.Format("£{0:0,0}", c.JobValue), c.CurrentStatus };
Where the Repository method looks like:
public IEnumerable<Case> GetCasesByProjectManagerID(int projectManagerId)
{
return from c in Session.Linq<Case>()
where c.ProjectManagerID == projectManagerId
select c;
}
It appears to run the initial Repository query first, then iterates through all of the results checking to see if the CompletionDate is null, but issuing a query to get c.Property.Address first.
So if the initial query returns 2,000 records, even if only five of them have no CompletionDate, it still fires off an SQL query to bring back the address details for the 2,000 records.
The way I had imagined this would work, is that it would evaluate all of the WHERE and SELECT clauses and simply amalgamate them, so the inital query would be like:
SELECT ... WHERE ProjectManager = #p1 AND CompleteDate IS NOT NULL
Which would yield 5 records, and then it could fire the further 5 queries to obtain the addresses. Am I expecting too much here, or am I simply doing something wrong?
Anthony
Change the declaration of GetCasesByProjectManagerID:
public IQueryable<Case> GetCasesByProjectManagerID(int projectManagerId)
You can't compose queries with IEnumerable<T> - they're just sequences. IQueryable<T> is specifically designed for composition like this.
Since I can't add a comment yet. Jon Skeet is right you'll want to use IQueryable, this is allows the Linq provider to Lazily construct the SQL. IEnumerable is the eager version.