Convert this SQL query to Linq (Not Exists + sub query) - linq

I would like this SQL to be converted to LINQ. (it shouldl select rows from input which do not exist in table production based on 3 columns. If a column in both tables contains NULL, it should be considered as having the same value)
SELECT i.* FROM INPUT AS i
WHERE NOT EXISTS
(SELECT p.Agent FROM Production AS p
WHERE ISNULL(i.CustID,'') <> ISNULL(p.CustID,'')
AND ISNULL(i.CustName,'') <> ISNULL(p.CustName,'')
AND ISNULL(i.household,'') <> ISNULL(p.Household,''))

First of all - this is not a good SQL query. Every column is wrapped in a non-sargable function which means that the engine won't be able to take advantage of any indexes on any of those columns (assuming you have any).
Let's start by rewriting this as a semi-decent SQL query:
SELECT i.*
FROM Input i
LEFT JOIN Production p
ON (p.CustID = i.CustID OR (p.CustID IS NULL AND i.CustID IS NULL))
AND (p.CustName = i.CustName OR (p.CustName IS NULL AND i.CustName IS NULL))
AND (p.Household = i.Household OR
(p.Household IS NULL AND i.Household IS NULL))
WHERE p.CustID IS NULL
Now having said this, LEFT JOIN / IS NULL is not great for efficiency either, but we don't have much choice here because we're comparing on multiple columns. Based on your column names, I'm starting to wonder if the schema is properly normalized. A CustID should most likely be associated with one and only one CustName - the fact that you have to compare both of these seems a bit odd. And Household - I'm not sure what that is, but if it's a varchar(x)/nvarchar(x) column then I wonder if it might also have a 1:1 relationship with the customer.
If I'm speculating too much here then feel free to dismiss this paragraph; but just in case, I want to say that if this data isn't properly normalized, normalizing it would make it much easier and faster to query on:
SELECT *
FROM Input
WHERE CustID NOT IN (SELECT CustID FROM Production)
Anyway, going back to the first query, since that's what we have to work with for now. Unfortunately it's impossible to create a join on those specific conditions in Linq, so we need to rewrite the SQL query as something slightly worse (because we now have to read from Input twice):
SELECT *
FROM Input
WHERE <Primary Key> NOT IN
(
SELECT i.<Primary Key>
FROM Input i
INNER JOIN Production p
ON (p.CustID = i.CustID OR (p.CustID IS NULL AND i.CustID IS NULL))
AND (p.CustName = i.CustName OR (p.CustName IS NULL AND i.CustName IS NULL))
AND (p.Household = i.Household OR
(p.Household IS NULL AND i.Household IS NULL))
)
Now we have something we can finally translate to Linq syntax. We still can't do the join explicitly, which would be best, but we go old-school, start from the cartesian join and toss the join conditions into the WHERE segment, and the server will still be able to sort it out:
var excluded =
from i in input
from p in production
where
((p.CustID == i.CustID) || ((p.CustID == null) && (i.CustID == null))) &&
((p.CustName == i.CustName) ||
((p.CustName == null) && (i.CustName == null))) &&
((p.Household == i.Household) ||
((p.Household == null) && (i.Household == null)));
select i.PrimaryKey;
var results =
from i in input
where !excluded.Contains(i.PrimaryKey)
select i;
I'm assuming here that you have some sort of primary key on the table. If you don't, you've got other problems, but you can get around this particular problem using EXCEPT:
var excluded =
from i in input
from p in production
where
((p.CustID == i.CustID) || ((p.CustID == null) && (i.CustID == null))) &&
((p.CustName == i.CustName) ||
((p.CustName == null) && (i.CustName == null))) &&
((p.Household == i.Household) ||
((p.Household == null) && (i.Household == null)));
select i;
var results = input.Except(excluded);

Related

Transform Sql to EF Core Linq query

I am trying to translate the following query from SQL to EF Core. I can easily just use a stored procedure (I already have the SQL), but am trying to learn how some of the linq queries work. Unfortunately this is not by any means an ideal database schema that I inherited and I don't have the time to convert it to something better.
DECLARE #userId INT = 3
SELECT *
FROM dbo.CardGamePairs
WHERE EXISTS (SELECT 1
FROM dbo.Users
WHERE Users.Id = CardGamePairs.player1Id
AND Users.userId = #userId)
UNION
SELECT *
FROM dbo.CardGamePairs
WHERE EXISTS (SELECT 1
FROM dbo.Users
WHERE Users.Id = TableB.player2Id
AND Users.userId = #userId)
So basically I have an id that can exist in one of two separate columns in table b and I don't know in which column it may be in, but I need all rows that have that ID in either column. The following is what I tried to make this work:
//Find data from table A where id matches (part of the subquery from above)
var userResults = _userRepository.GetAllAsQueryable(x => x.userId == userId).ToList();
//Get data from table b
var cardGamePairsResults = _cardGamePairsRepository.GetAllAsQueryable(x => userResults .Any(y => y.userId == x.player1Id || y.userId == x.player2Id));
When I run the code above I get this error message:
predicate: (y) => y.userId == x.player1Id || y.userId == x.player2Id))' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync().
Any ideas on how I can make this work? (I tried changing the column and table names to something that would actually make sense, hopefully I didn't miss any spots and make it more confusing.)
Because you are already have user id use it to query both columns.
var userResults = _userRepository
.GetAllAsQueryable(x => x.userId == userId)
.ToList();
var cardGamePairsResults = _cardGamePairsRepository
.GetAllAsQueryable(x => x.player1Id == userId || x.player2Id == userId));

why count(*) is slow even with index?

This is my query:
select count(*)
FROM TB_E2V_DOCUMENTOS_CICLO D
WHERE (D.TIPOCLIENTE = null or null is null)
AND (D.TIPODOCUMENTOCLIENTE = null or null is null)
AND (D.NUMDOCUMENTOCLIENTE = null or null is null)
AND (D.BA = null or null is null)
AND (D.FA = null or null is null)
AND (D.NOMBRECLIENTE = null or null is null)
AND (D.NUMTELEFONO = null or null is null)
AND (D.NUMSUSCRIPCION = null or null is null)
AND (D.TIPORECIBO in ('Recibo'))
AND (D.NUMRECIBO = null or null is null)
AND (TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd') BETWEEN TO_DATE('2019-5-1', 'yyyy-MM-dd') AND TO_DATE('2020-2-18', 'yyyy-MM-dd'))
AND (D.MONTORECIBO = null or null is null)
AND (D.NUMPAGINAS = 0 or 0 = 0)
AND (D.NOMBREARCHIVO = null or null is null)
AND (D.NEGOCIO = null or null is null)
AND (D.NOMBREMETADATACARGA = null or null is null)
AND (D.FECHACARGA = TO_DATE(null) or TO_DATE(null) is null);
This query returns
And when I do a Xplain For:
The cost is very high, but this query uses the index. The query lasts 10 seconds approximately.
How can I improve the performance of the query?
I'm using Oracle 12c
Notes: All of the " and ( = null or null is null)" predicates will always evaluate to true; Oracle does not define null so null does not equal null, so instead if you want to check for null then use "is null"
select * from dual where null = null; -- returns no rows
select * from dual where not (null <> null); -- returns no rows
select * from dual where null is null; -- returns 1 row
select * from dual where not(null is not null); -- returns 1 row
As far as indexing goes, you need an index that is selective (i.e. return much fewer rows) and is present in the where clause predicate. In this case it looks like a function-based index on TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd')
along with D.TIPORECIBO is in order. The INDEX SKIP SCAN is used in this case probably because D.TIPORECIBO is not the leading column; INDEX SKIP SCANs are slower then INDEX RANGE SCANs because it needs to read more index blocks.
There are a few factors involved here:
First, this query is using the second (or third) part of a composite index, resulting in the SKIP SCAN.
Take a look at all indexes on the table and see what kind of index is on TIPORECIBO.
It is likely that this isn't the leading column. You might improve the performance by creating an index with TIPORECIBO as leading column, but it is unlikely--this appears to be a "type" column that might have only a few values, and not a good candidate for an index.
The second issue is that Oracle uses the index to get a set of candidate rows, then goes to the data blocks themselves to get the rows for further filtering.
A select count(*) will perform much better if Oracle doesn't need to fetch the data blocks. This can be achieved by creating an index that contains all of the data needed for the filter.
In your case, an index on TIPORECIBO and FECHAEMISION would mean that Oracle could go to the index alone without needing to access the data blocks.
The third issue is that you are applying TO_DATE to the FECHAEMISION column. If this is a DATE datatype, then you don't need the conversion and it is causing you trouble. If you do need the conversion, an option would be a function-based index on TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd').
To tune this particular query, you can try a function-based composite index:
CREATE INDEX TB_E2V_DOCUMENTOS_CICLO_FX1 ON TB_E2V_DOCUMENTOS_CICLO(FECHAEMISION, TO_DATE(D.FECHAEMISION, 'yyyy/MM/dd'))
Finally, this query is clearly being generated from code:
lines like AND (D.BA = null or null is null) seem to be a way of excluding portions of the WHERE clause when the front-end passes a NULL. This would possibly be AND (D.BA = 'X' or 'X' is null) if a value were provided for that parameter.
As such, be careful when tuning for the current set of parameters, as any change in what generated this query will impact the effectiveness of your tuning.
If you have a way to influence how this query is generated, it would be nice to simply exclude those non-event filters when the values are not provided, though Oracle ought to be able to handle them as-is.

SQL to LINQ with JOIN and SubQuery

I have a query that I' struggling to convert to LINQ. I just can't get my head around the required nesting. Here's the query in SQL (just freehand typed):
SELECT V.* FROM V
INNER JOIN VE ON V.ID = VE.V_ID
WHERE VE.USER_ID != #USER_ID
AND V.MAX > (SELECT COUNT(ID) FROM VE
WHERE VE.V_ID = V.ID AND VE.STATUS = 'SELECTED')
The Closest I've come to is this:
var query = from vac in _database.Vacancies
join e in _database.VacancyEngagements
on vac.Id equals e.VacancyId into va
from v in va.DefaultIfEmpty()
where vac.MaxRecruiters > (from ve in _database.VacancyEngagements
where ve.VacancyId == v.Id && ve.Status == Enums.VacanyEngagementStatus.ENGAGED
select ve).Count()
...which correctly resolves the subquery from my SQL statement. But I want to further restrict the returned V rows to only those where the current user does not have a related VE row.
I've realised that the SQL in the question was misleading and whilst it led to technically correct answers, they weren't what I was after. That's my fault for not reviewing the SQL properly so I apologise to #Andy B and #Ivan Stoev for the misleading post. Here's the LINQ that solved the problem for me. As stated in the post I needed to show vacancy rows where no linked vacancyEngagement rows existed. The ! operator provides ability to specify this with a subquery.
var query = from vac in _database.Vacancies
where !_database.VacancyEngagements.Any(ve => (ve.VacancyId == vac.Id && ve.UserId == user.Id))
&& vac.MaxRecruiters > (from ve in _database.VacancyEngagements
where ve.VacancyId == vac.Id && ve.Status == Enums.VacanyEngagementStatus.ENGAGED
select ve).Count()
This should work:
var filterOutUser = <userId you want to filter out>;
var query = from vac in _database.Vacancies
join e in _database.VacancyEngagements
on vac.Id equals e.VacancyId
where (e.UserId != filterOutUser) && vac.MaxRecruiters > (from ve in _database.VacancyEngagements
where ve.VacancyId == vac.Id && ve.Status == Enums.VacanyEngagementStatus.ENGAGED
select ve).Count()
select vac;
I removed the join to VacancyEngagements but if you need columns from that table you can add it back in.

How to compare multiple columns under same row in a table?

Following columns of a table should not be equal in my where clause.
cd_delivery_address
cd_mail_delivery_address
cd_st_code
cd_mail_st_code
cd_zip
cd_mail_zip
Please find my code snippet to achieve this:
select * from table cd
where
(
(cd_mail_delivery_address <> cd_delivery_address or
(cd_mail_delivery_address is null and cd_delivery_address is not null) or
(cd_mail_delivery_address is not null and cd_delivery_address is null)
)
and (
cd.cd_city <> cd.cd_mail_city or
(cd.cd_city is null and cd_mail_city is not null) or
(cd_city is not null and cd_mail_city is null))
and (
cd.st_code <> cd.cd_mail_st_code or
(cd.st_code is null and cd_mail_st_code is not null) or
(st_code is not null and cd_mail_st_code is null)
)
and (
cd.cd_zip <> cd.cd_mail_zip or
(cd.cd_zip is null and cd_mail_zip is not null) or
(cd_zip is not null and cd_mail_zip is null)
)
)
All columns are varchar2 and i get correct output for this code. But is it a better way to compare multiple columns in pl sql? can i improve this code? Any suggestion would be helpful.
You could replace your null checks with NVL function something like this:
...
NVL(cd_mail_delivery_address,'_') <> NVL(cd_delivery_address,'_')
...
it's definitively more readable but I'm not sure about query efficency
I have done it for two columns using a join:
select a.cd_delivery_address,b.cd_mail_delivery_address
from cd a inner join cd b
where a.cd_delivery_address <> b.cd_mail_delivery_address and
a.cd_delivery_address = b.cd_delivery_address
Here null checking condition will be omitted and will reduce the number of conditions, but there is a performance impact since join is involved.

Simple linq select query not returning correct results

When I run the following query in SQL Server Management Studio I get 132 rows of data. This is correct.
select *
from cyc.contac con
where (con.note_turn is null) and
(con.cont_type <> '_') and
(con.cont_code <> 'D') and
(con.cont_type <> 'Q')
When I run the following Linq query from my code I get over 11000 records. This is not correct. I am wondering why my linq query is not constraining by the constraints in my where clause.
var query = from contac in context.CONTACs
orderby contac.ID
where (contac.note_turn == null) &&
(contac.cont_type != "_") &&
(contac.cont_code != "D") &&
(contac.cont_type != "Q")
select contac;

Resources