linq (right) join does not return correct number of objects - linq

I have a question about joins in LINQ. I am currently converting an access application to .NET where data is retrieved from two different databases on two different servers. In the old application the data could be retrieved by one query:
SELECT *, tbl2.Descr, tbl2.Form FROM tbl2 RIGHT JOIN tbl1 ON tbl2.proId2 = tbl1.proId
I found that one way to do this in .NET is retrieving the two tables seperatly and then joining them with LINQ. I have no experience in LINQ so I may be completely wrong with my logic or code because I can't seem to get it working.
First I tried to do a normal join (no right) but then only 18 rows are returned when the two tables contain almost 2000 rows. I also checked the data and it should definitly result in more rows, there are not that many empty cells. So then I put together a right/left join but this actually results in an error. When I debug it, everything's fine when the LINQ statement is executed but when I go into the foreach, an error is shown and the error is actually indicated in the linq statement, saying table2 is empty. When I check table1 I also see only 22 datarows.
What am I doing wrong?
DataTable dtTarget= data1.Clone();
var dt2Columns = data2.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));
var dt2FinalColumns = from dc in dt2Columns.AsEnumerable()
where dtTarget.Columns.Contains(dc.ColumnName) == false
select dc;
dtTarget.Columns.AddRange(dt2FinalColumns.ToArray());
var results = from table1 in data1.AsEnumerable()
join table2 in data2.AsEnumerable()
on table1.Field<String>("proId") equals table2.Field<String>("proId2")
select table1.ItemArray.Concat(table2.ItemArray).ToArray();
foreach (object[] values in results)
dtTarget.Rows.Add(values);
Outer Join:
var results = from table1 in data1.AsEnumerable()
join table2 in data2.AsEnumerable() on table1.Field<String>("proId") equals table2.Field<String>("proId2") into t_join
from table2 in t_join.DefaultIfEmpty(null) select table1.ItemArray.Concat(table2.ItemArray).ToArray();

I notice you're using strings as the join keys. Perhaps the string comparison is different between the environments (access vs .net). Access may use a case-insensitive compare, while .net's default is case-sensitive.
To make .net use a case-insensitive compare, here's the first query:
var results = data1.AsEnumerable()
.Join(
data2.AsEnumerable(),
row1 => row1.Field<String>("proId"),
row2 => row2.Field<String>("proId2"),
(row1, row2) => row1.ItemArray.Concat(row2.ItemArray).ToArray(),
StringComparer.InvariantCultureIgnoreCase); //and now caps are ignored.
and second query:
var results = data1.AsEnumerable()
.GroupJoin(
data2.AsEnumerable(),
row1 => row1.Field<String>("proId"),
row2 => row2.Field<String>("proId2"),
(row1, row2s) => new {Row1 = row1, Row2s = row2s},
StringComparer.InvariantCultureIgnoreCase)
.SelectMany(
x => x.row2s.DefaultIfEmpty(null)),
(x, row2) => row2 == null ? x.Row1.ItemArray : x.Row1.ItemArray.Concat(row2.ItemArray).ToArray()
);

Related

How to join two table from two different edmx using linq query

How to join two table from two different edmx using linq query..
Is there a way to query from 2 different edmx at a time.
Thanks.
Update
As per your comment, EF wasn't able to parse a combined Expression tree across 2 different contexts.
If the total number of records in the tables is relatively small, or if you can reduce the number of records in the join to a small number of rows (say < 100 each), then you can materialize the data (e.g. .ToList() / .ToArray() / .AsEnumerable()) from both tables and use the Linq join as per below.
e.g. where yesterday is a DateTime selecting just a small set of data from both databases required for the join:
var reducedDataFromTable1 = context1.Table1
.Where(data => data.DateChanged > yesterday)
.ToList();
var reducedDataFromTable2 = context2.Table2
.Where(data => data.DateChanged > yesterday)
.ToList();
var joinedData = reducedDataFromTable1
.Join(reducedDataFromTable2,
t1 => t1.Id, // Join Key on table 1
t2 => t2.T1Id, // Join Key on table 2
(table1, table2) => ... // Projection
);
However, if the data required from both databases for the join is larger than could reasonably expected to be done in memory, then you'll need to investigate alternatives, such as:
Can you do the cross database join in the database? If so, look at using a Sql projection such as a view to do the join, which you can then use in your edmx.
Otherwise, you are going to need to do the join by manually iterating the 2 enumerables, something like chunking - this isn't exactly trivial. Sorting the data in both tables by the same order will help.
Original Answer
I believe you are looking for the Linq JOIN extension method
You can join any 2 IEnumerables as follows:
var joinedData = context1.Table1
.Join(context2.Table2,
t1 => t1.Id, // Join Key on table 1
t2 => t2.T1Id, // Join Key on table 2
(table1, table2) => ... // Projection
);
Where:
Join Key on table 1 e.g. the Primary Key of Table 1 or common natural
key
Join Key on table 2, e.g. a Foreign Key or common natural key
Projection : You can whatever you want from table1 and table2, e.g.
into a new anonymous class, such as new {Name = table1.Name, Data = table2.SalesQuantity}

How this linq execute?

Data = _db.ALLOCATION_D.OrderBy(a => a.ALLO_ID)
.Skip(10)
.Take(10)
.ToList();
Let say I have 100000 rows in ALLOCATION_D table. I want to select first 10 row. Now I want to know how the above statement executes. I don't know but I think it executes in the following way...
first it select the 100000 rows
then ordered by ALLO_ID
then Skip 10
finally select the 10 rows.
Is it right? I want to know more details.
This Linq produce a SQL query via Entity Framework. Then it depends on your DBMS, but for SQL Server 2008, here is the query produces:
SELECT TOP (10) [Extent1].[ALLO_ID] AS [ALLO_ID],
FROM (
SELECT [Extent1].[ALLO_ID] AS [ALLO_ID]
, row_number() OVER (ORDER BY [Extent1].[ALLO_ID] ASC) AS [row_number]
FROM [dbo].[ALLOCATION_D] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[ALLO_ID] ASC
You can run this in your C# for retrieve the query:
var linqQuery = _db.ALLOCATION_D
.OrderBy(a => a.ALLO_ID)
.Skip(10)
.Take(10);
var sqlQuery = ((System.Data.Objects.ObjectQuery)linqQuery).ToTraceString();
Data = linqQuery.ToList();
Second option with Linq To SQL
var linqQuery = _db.ALLOCATION_D
.OrderBy(a => a.ALLO_ID)
.Skip(10)
.Take(10);
var sqlQuery = _db.GetCommand(linqQuery).CommandText;
Data = linqQuery.ToList();
References:
How do I view the SQL generated by the entity framework?
How to: Display Generated SQL
How to view LINQ Generated SQL statements?
Your statement reads as follows:
Select all rows (overwritten by skip/take)
Order by Allo_ID
Order by Allo_ID again
Skip first 10 rows
Take next 10 rows
If you want it to select the first ten rows, you simply do this:
Data = _db.ALLOCATION_D // You don't need to order twice
.OrderBy(a => a.ALLO_ID)
.Take(10)
.ToList()
Up to the ToList call, the calls only generates expressions. That means that the OrderBy, Skip and Take calls are bundled up as an expression that is then sent to the entity framework to be executed in the database.
Entity framework will make an SQL query from that expression, which returns the ten rows from the table, which the ToList methods reads and places in a List<T> where T is the type of the items in the ALLOCATION_D collection.

PL SQL - Join 2 tables and return max from right table

Trying to retrive the MAX doc in the right table.
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
max(F431.PRDOC)
FROM PRODDTA.F43121 F431
LEFT OUTER JOIN PRODDTA.F4311 F43
ON
F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
and F43.PDLNTY = 'DC'
Group by
F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC,
F431.PRAREC/100
This query is still returning the two rows in the right table. Fairly new to SQL and struggling with the statement. Any help would be appreciated.
Without seeing your data it is difficult to tell where the problem might so I will offer a few suggestions that could help.
First, you are joining with a LEFT JOIN on the PRODDTA.F4311 but you have in the WHERE clause a filter for that table. You should move the F43.PDLNTY = 'DC' to the JOIN condition. This is causing the query to act like an INNER JOIN.
Second, you can try using a subquery to get the MAX(PRDOC) value. Then you can limit the columns that you are grouping on which could eliminate the duplicates. The query would them be similar to the following:
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
F431.PRDOC
FROM PRODDTA.F43121 F431
INNER JOIN
(
-- subquery to get the max
-- then group by the distinct columns
SELECT PDKCOO, max(PRDOC) MaxPRDOC
FROM PRODDTA.F43121
WHERE PRDOCO = 401531
and PRMATC = 2
GROUP BY PDKCOO
) f2
-- join the subquery result back to the PRODDTA.F43121 table
on F431.PRDOC = f2.MaxPRDOC
AND F431.PDKCOO = f2.PDKCOO
LEFT OUTER JOIN PRODDTA.F4311 F43
ON F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
AND F43.PDLNTY = 'DC' -- move this filter to the join instead of the WHERE
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
If you provide your table structures and some sample data, it will be easier to determine the issue.

How do I write a LINQ query to combine multiple rows into one row?

I have one table, 'a', with id and timestamp. Another table, 'b', has N multiple rows referring to id, and each row has 'type', and "some other data".
I want a LINQ query to produce a single row with id, timestamp, and "some other data" x N. Like this:
1 | 4671 | 46.5 | 56.5
where 46.5 is from one row of 'b', and 56.5 is from another row; both with the same id.
I have a working query in SQLite, but I am new to LINQ. I dont know where to start - I don't think this is a JOIN at all.
SELECT
a.id as id,
a.seconds,
COALESCE(
(SELECT b.some_data FROM
b WHERE
b.id=a.id AND b.type=1), '') AS 'data_one',
COALESCE(
(SELECT b.some_data FROM
b WHERE
b.id=a.id AND b.type=2), '') AS 'data_two'
FROM a first
WHERE first.id=1
GROUP BY first.ID
you didn't mention if you are using Linq to sql or linq to entities. However following query should get u there
(from x in a
join y in b on x.id equals y.id
select new{x.id, x.seconds, y.some_data, y.type}).GroupBy(x=>new{x.id,x.seconds}).
Select(x=>new{
id = x.key.id,
seconds = x.Key.seconds,
data_one = x.Where(z=>z.type == 1).Select(g=>g.some_data).FirstOrDefault(),
data_two = x.Where(z=>z.type == 2).Select(g=>g.some_data).FirstOrDefault()
});
Obviously, you have to prefix your table names with datacontext or Objectcontext depending upon the underlying provider.
What you want to do is similar to pivoting, see Is it possible to Pivot data using LINQ?. The difference here is that you don't really need to aggregate (like a standard pivot), so you'll need to use Max or some similar method that can simulate selecting a single varchar field.

Linq To Entity Framework selecting whole tables

I have the following Linq statement:
(from order in Orders.AsEnumerable()
join component in Components.AsEnumerable()
on order.ORDER_ID equals component.ORDER_ID
join detail in Detailss.AsEnumerable()
on component.RESULT_ID equals detail.RESULT_ID
where orderRestrict.ORDER_MNEMONIC == "MyOrderText"
select new
{
Mnemonic = detail.TEST_MNEMONIC,
OrderID = component.ORDER_ID,
SeqNumber = component.SEQ_NUM
}).ToList()
I expect this to put out the following query:
select *
from Orders ord (NoLock)
join Component comp (NoLock)
on ord .ORDER_ID = comp.ORDER_ID
join Details detail (NoLock)
on comp.RESULT_TEST_NUM = detail .RESULT_TEST_NUM
where res.ORDER_MNEMONIC = 'MyOrderText'
but instead I get 3 seperate queries that select all rows from the tables. I am guessing that Linq is then filtering the values because I do get the correct values in the end.
The problem is that it takes WAY WAY too long because it is pulling down all the rows from all three tables.
Any ideas how I can fix that?
Remove the .AsEnumerable()s from the query as these are preventing the entire query being evaluated on the server.

Resources