OpenXML linq query - linq

I'm using OpenXML to open a spreadsheet and loop through the rows of a spreadsheet. I have a linq query that returns all cells within a row. The linq query was ripped straight from a demo on the MSDN.
IEnumerable<String> textValues =
from cell in row.Descendants<Cell>()
where cell.CellValue != null
select (cell.DataType != null
&& cell.DataType.HasValue
&& cell.DataType == CellValues.SharedString
? sharedString.ChildElements[int.Parse(cell.CellValue.InnerText)].InnerText
: cell.CellValue.InnerText);
The linq query is great at returning all cells that have a value, but it doesn't return cells that don't have a value. This in turn makes it impossible to tell which cell is which. Let me explain a little more. Say for instance we have three columns in our spreadsheet: Name, SSN, and Address. The way this linq query works is it only returns those cells that have a value for a given row. So if there is a row of data that has "John", "", "173 Sycamore" then the linq query only returns "John" and "173 Sycamore" in the enumeration, which in turn makes it impossible for me to know if "173 Sycamore" is the SSN or the Address field.
Let me reiterate here: what I need is for all cells to be returned, and not just cells that contain a value.
I've tried to monkey the linq query in every way that I could think of, but I had no luck whatsoever (ie - removing the where clause isn't the trick). Any help would be appreciated. Thanks!

The OpenXML standard does not define placeholders for cells that don't have data. In other words, it's underlying storage in XML is sparse. You could work round this on one of two ways:
Create a list of all "available" or "possible" cells (probably by using a CROSS JOIN type of operation) then "left" joining to the row.Descendants<Cell>() collection to see if the cell reference has a value
Utilize a 3rd party tool such as ClosedXML or EPPlus as a wrapper around the Excel data and query their interfaces, which are much more developer-friendly.

With ClosedXML:
var wb = new XLWorkbook("YourWorkbook.xlsx");
var ws = wb.Worksheet("YourWorksheetName");
var range = ws.RangeUsed();
foreach(var row in range.Rows())
{
// Do something with the row...
// ...
foreach(var cell in row.Cells())
{
// Now do something with every cell in the row
// ...
}
}

The one way I recommend is to fill in all the null cells with blank data so they will be returned by your linq statement. See this answer for how to do that.

Related

Can I use linq to join two result sets on an ordinal/ index #?

I'm trying to use linq to objects with html agility pack to join two result sets on their relative ordinal position. One set is a list of headers, the other is a set of tables, with each table corresponding to one of the header values. Each set has a count of five. I've read the post here which looks very similar, but can't get it to translate to my purposes.
Here is what I'm using to get the two html node collections:
HtmlNodeCollection ratingsChgsHdrs = htmlDoc.DocumentNode.SelectNodes("//div[#id='calendar-header']");
HtmlNodeCollection ratingsChgsTbls = htmlDoc.DocumentNode.SelectNodes("//table[#class='calendar-table']");
The collection ratingsChgsHdrs contains the headers for each of the tables in ratingsChgsTbls, within the InnerText property. The end result I'm looking for is one result set consisting of all of the rows from all five tables, with the header value added as a property to each row. I hope that is clear.. any help would be great.
This might work:
ratingsChgsHdrs.Select((x, i) => new { x, ratingsChgsTbls.ElementAt(i) });

Is it possible to detect if the selected item is the first in LINQ-to-SQL?

I wonder how I can build a query expression which understands the given item being selected is the first or not. Say I'm selecting 10 items from DB:
var query = db.Table.Take(10).Select(t => IsFirst ? t.Value1 : t.Value2);
There is an indexed variant of Select but that is not supported in LINQ-to-SQL. If it was supported my problems would be solved. Is there any other trick?
I could have used ROW_NUMBER() on T-SQL for instance, which LINQ-to-SQL uses but does not give access to.
I know I can Concat two queries and use the first expression in the first and so forth but I don't want to manipulate the rest of the query, just the select statement itself because the query is built at multiple places and this is where I want to behave differently on first row. I'll consider other options if that is not possible.
You can use the indexed overload, but you need to use the LINQ to Objects version:
var query =
db.Table.Take(10).AsEnumreable()
.Select((t, index) => index == 0 ? t.Value1 : t.Value2);
If Table have a primary key. You could do this:
var result= (
from t in db.Table.Take(10)
let first=db.Table.Take(10).Select (ta =>ta.PrimayKey).First()
select new
{
Value=(t.PrimaryKey=first?t.Value1 : t.Value2)
}
);

How do I improve the performance of this simple LINQ?

I have two tables, one parent "Point" and one child "PointValue", connected by a single foreign key "PointID", making a one-to-many relation in SQL Server 2005.
I have a LINQ query:
var points = from p in ContextDB.Points
//join v in ContextDB.PointValues on p.PointID equals v.PointID
where p.InstanceID == instanceId
orderby p.PointInTime descending
select new
{
Point = p,
Values = p.PointValues.Take(16).ToList()
};
As you can see from the commented out join and the "Values" assignment, the "Point" table has a relation to "PointValue" (called "Points" and "PointValues" by LINQ).
When iterating through the "var points" IQueryable (say, when binding it to a GridView, etc.) the initial query is very fast, however iterating through the "Values" property is very slow. SQL Profiler shows me that for each value in the "points" IQueryable another query is executed.
How do I get this to be one query?
Interestingly, the initial query becomes very slow when the join is uncommented.
I think you want to use the DataLoadOptions.LoadWith method, described here:
http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions.loadwith.aspx
In your case you would do something like the following, when creating your DataContext:
DataLoadOptions options = new DataLoadOptions();
ContextDB.LoadOptions = options;
options.LoadWith((Point p) => p.PointValues);
You should make sure that the PointValues table has an index on the PointID column.
See also this SO question: Does Foreign Key improve query performance?

LINQ to SQL many to many int ID array criteria query

Ok this should be really simple, but I am doing my head in here and have read all the articles on this and tried a variety of things, but no luck.
I have 3 tables in a classic many-to-many setup.
ITEMS
ItemID
Description
ITEMFEATURES
ItemID
FeatureID
FEATURES
FeatureID
Description
Now I have a search interface where you can select any number of Features (checkboxes).
I get them all nicely as an int[] called SearchFeatures.
I simply want to find the Items which have the Features that are contained in the SearchFeatures.
E.g. something like:
return db.Items.Where(x => SearchFeatures.Contains(x.ItemFeatures.AllFeatures().FeatureID))
Inside my Items partial class I have added a custom method Features() which simply returns all Features for that Item, but I still can't seem to integrate that in any usable way into the main LINQ query.
Grr, it's gotta be simple, such a 1 second task in SQL. Many thanks.
The following query will return the list of items based on the list of searchFeatures:
from itemFeature in db.ItemFeatures
where searchFeatures.Contains(itemFeature.FeatureID)
select itemFeature.Item;
The trick here is to start with the ItemFeatures table.
It is possible to search items that have ALL features, as you asked in the comments. The trick here is to dynamically build up the query. See here:
var itemFeatures = db.ItemFeatures;
foreach (var temp in searchFeatures)
{
// You will need this extra variable. This is C# magic ;-).
var searchFeature = temp;
// Wrap the collection with a filter
itemFeatures =
from itemFeature in itemFeatures
where itemFeature.FeatureID == searchFeature
select itemFeature;
}
var items =
from itemFeature in itemFeatures
select itemFeature.Item;

LINQ to IEnumerable<MyObj>

I have a class MyObj and a collection IEnumerable.
Some of the columns are wholly empty (i.e. == NULL) across all rows and therefore I want to create an IEnumerable<> of the members of MyObj which hold a non-null value.
If I could predict the members of MyObj which would be of interest I'd do something like:
var part =
from entry in iList
select new {entry.a, entry.c, entry.s};
...but I don't know which members of MyObj I'm interested in at design time - I only know that at runtime.
How can I construct my list??
Thanks,
Tamim Sadikali.
Your question does not make sense.
You're trying to create a type whose members are only known at runtime.
What would you do with the results?
You would not be able to access any properties of the result objects because they might not exist.
If you want to display the data in a grid, and you don't want to display columns which are entirely null, then you should bind the original collection to the grid, then hide some of the columns in the grid.
Wait for release of VS2010, C# 4.0 with it's 'dynamic' type should solve your problem. (Or maybe help you shoot yourself in the foot).
If you are doing this for UI, better hide columns that contain all nulls. For DataGridView in WinForms it may look like this:
foreach (DataGridViewColumn column in dataGridView.Columns)
if (dataGridView1.Rows.Cast<DataGridViewRow>().All(r => r.Cells[column.Name].Value == null))
column.Visible = false;

Resources