Group lines with Numeric and AlphaNumeric (LINQ) - linq

I have a dataset with a column having Numeric and AlphaNumeric Order Nos. Data is being grouped up with different columns which result in merging rows together. Now I want to group lines based on Numeric and AlphaNumeric Order Nos as well, for that purpose I have to evaluate string (data) whether it is Numeric or AlphaNumeric. Following is the statement which groups up data together.
var groupLinesETAFile = (from g in strLinesETAFile
group g by new
{
g.PONo,
g.POLineNo,
g.POSubLineNo,
g.AllocatedPartNo
} into grouped
select new
{
PONo = grouped.Key.PONo,
POLineNo = grouped.Key.POLineNo,
POSubLineNo = grouped.Key.POSubLineNo,
AllocatedPartNo = grouped.Key.AllocatedPartNo,
ETAQty = grouped.Sum(x => decimal.Parse(x.ETAQty))
}).ToList();
Now I want to add a new column in statement "SupplierOrderNo" which has Numeric and AlphaNumeric Values and want to group on Numeric and AlphaNumeric values as well. How can I achieve this with LINQ?

Related

Determine column on which invoke custom function

I am trying to invoke a function on an added column that will concatenate two columns. The catch is that I can't use the column name shorthand as I use dynamic parameters using strings to determine the column name.
Therefore the result is that I get a column as a List multiplied per row rather than the concatenated value for the specific row as intended
(func as text) =>
let
Source = Excel.CurrentWorkbook(){[Name="DataTBL"]}[Content],
\\This is the string extraction process for the parameter
funcTrig = Text.Start(func, 1),
columnA = "" & Text.BetweenDelimiters(func,"_","_") & "",
columnB = "" & Text.AfterDelimiter(func,"_",1) & "",
\\converting the string to column data
Convert2ColA = Table.Column(Source,columnA),
Convert2ColB = Table.Column(Source,columnB),
\\function to concatanate column A value at a specific row with column B value at the same row.
concat= StraightForward(Convert2ColA ,Convert2ColB)
in
concat
I have outlined with remarks the process and desired results, In the added picture I have pulled out the result of "Convert2ColA" what is the desired result will be 1999 in row one and so on.

Filter inner bag in Pig

The data looks like this:
22678, {(112),(110),(2)}
656565, {(110), (109)}
6676, {(2),(112)}
This is the data structure:
(id:chararray, event_list:{innertuple:(innerfield:chararray)})
I want to filter those rows where event_list contains 2. I thought initially to flatten the data and then filter those rows that have 2. Somehow flatten doesn't work on this dataset.
Can anyone please help?
There might be a simpler way of doing this, like a bag lookup etc. Otherwise with basic pig one way of achieving this is:
data = load 'data.txt' AS (id:chararray, event_list:bag{});
-- flatten bag, in order to transpose each element to a separate row.
flattened = foreach data generate id, flatten(event_list);
-- keep only those rows where the value is 2.
filtered = filter flattened by (int) $1 == 2;
-- keep only distinct ids.
dist = distinct (foreach filtered generate $0 as (id:chararray));
-- join distinct ids to origitnal relation
jnd = join a by id, dist by id;
-- remove extra fields, keep original fields.
result = foreach jnd generate a::id, a::event_list;
dump result;
(22678,{(112),(110),(2)})
(6676,{(2),(112)})
You can filter the Bag and project a boolean which says if 2 is present in the bag or not. Then, filter the rows which says that projection is true or not
So..
input = LOAD 'data.txt' AS (id:chararray, event_list:bag{});
input_filt = FOREACH input {
bag_filter = FILTER event_list BY (val_0 matches '2');
GENERATE
id,
event_list,
isEmpty(bag_filter.$0) ? false : true AS is_2_present:boolean;
;
};
output = FILTER input_filt BY is_2_present;

Linq to dataset on dynamic columns and dynamic group by fields

I have four parameters to my function a dataset, array consisting of expressions (aggregate functions), array consisting of column names on which to apply expressions and an array consisting of columns on which I have to group by.
My problem is how can I handle dynamic columns or fields for expression and group by as it can vary in numbers (depends on array values).
I have written code for static query, but need a generic way...
This is my code:
public void ExpressionManipulation(DataSet dsExprEvaluate, string[] strExpressions, string[] colName, string[] groupbyFields)
{
int groupByLength = groupbyFields.Length;
var groupByQueryEvaluate = from table in dsExprEvaluate.AsEnumerable()
group table by new { column1 = table["DataSourceType"], column2 = table["Polarity"] }
into groupedTable
select new
{
x = groupedTable.Key, // Each Key contains column1 and column2
y = groupedTable.Count(),
//z = groupedTable.Max(column1),
z = groupedTable.Sum(table => Convert.ToInt32(table["Polarity"]))
};
}
Like in above I can have n number of fields in group by like for now I have taken only two (DataSourceType and Polarity) and similar I can have n number of fields for expressions, for sum, count etc which will be as an array as parameter.
Please help me through this, it is driving me crazy.
Thanks in advance.
I figured it out myself and the solution i ended up is with:
var objGroupSumCountkey = dt.AsEnumerable()
.AsQueryable()
.GroupBy("new ( it[\"DataSourceType\"] as GroupByColumnName1,it[\"Polarity\"] as GroupByColumnName2)", "it")
.Select("new ( Sum(Convert.ToDouble(it[\"Polarity\"].ToString())) as SumValue,Count() as TotalCount,it.key)");
in the above query all the parameters will be supplied as string, in Group By and select

LINQ return records where string[] values match Comma Delimited String Field

I am trying to select some records using LINQ for Entities (EF4 Code First).
I have a table called Monitoring with a field called AnimalType which has values such as
"Lion,Tiger,Goat"
"Snake,Lion,Horse"
"Rattlesnake"
"Mountain Lion"
I want to pass in some values in a string array (animalValues) and have the rows returned from the Monitorings table where one or more values in the field AnimalType match the one or more values from the animalValues. The following code ALMOST works as I wanted but I've discovered a major flaw with the approach I've taken.
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var result = from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m;
return result;
}
To explain the problem, if I pass in animalValues = { "Lion", "Tiger" } I find that three rows are selected due to the fact that the 4th record "Mountain Lion" contains the word "Lion" which it regards as a match.
This isn't what I wanted to happen. I need "Lion" to only match "Lion" and not "Mountain Lion".
Another example is if I pass in "Snake" I get rows which include "Rattlesnake". I'm hoping somebody has a better bit of LINQ code that will allow for matches that match the exact comma delimited value and not just a part of it as in "Snake" matching "Rattlesnake".
This is a kind of hack that will do the work:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var values = animalValues.Select(x => "," + x + ",");
var result = from m in db.Monitorings
where values.Any(c => ("," + m.AnimalType + ",").Contains(c))
select m;
return result;
}
This way, you will have
",Lion,Tiger,Goat,"
",Snake,Lion,Horse,"
",Rattlesnake,"
",Mountain Lion,"
And check for ",Lion," and "Mountain Lion" won't match.
It's dirty, I know.
Because the data in your field is comma delimited you really need to break those entries up individually. Since SQL doesn't really support a way to split strings, the option that I've come up with is to execute two queries.
The first query uses the code you started with to at least get you in the ballpark and minimize the amount of data you're retrieving. It converts it to a List<> to actually execute the query and bring the results into memory which will allow access to more extension methods like Split().
The second query uses the subset of data in memory and joins it with your database table to then pull out the exact matches:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
// execute a query that is greedy in its matches, but at least
// it's still only a subset of data. The ToList()
// brings the data into memory, so to speak
var subsetData = (from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m).ToList();
// given that subset of data in the List<>, join it against the DB again
// and get the exact matches this time
var result = from data in subsetData
join m in db.Monitorings on data.ID equals m.ID
where data.AnimalType.Split(',').Intersect(animalValues).Any ()
select m;
return result;
}

Data comparing in dataset

I had to write a method that does the following:
There is a DataSet let's say CarDataSet with one table Car and contains Primary key Id and one more column ColorId. And there is a string with Ids seperated with commas for example "5,6,7,8" (random length). The task is to check if all appropriate ColorIds are identical for given Car Ids.
For example:
String ids = "5,6,7,8"
If all the Cars ColorIds are for example 3,3,3,3 where the Car Ids are 5,6,7,8 then return true;
In other words - check if all cars with given Ids are in one color. Now I don't have my code anymore but I made this using 3 foreach loops and 3 linq expressions. Is there any simplier way to do this?
If you want all cars have same color means all of them should have same color as first one:
// first find the cars with given ids
var selectedCars = Cars.Where(x=>ids.Contains(x.ID.ToString());
// select one of them as comparer:
var firstCar = selectedCars.FirstOrDefault();
if (firstCar == null)
return true;
// check all of them has same color as first one:
return selectedCars.All(x=>x.ColorID == firstCar.ColorID);
Edit: Or if you have no problem with throwing exception when there is no car with given ids you can use two query in lambda syntax:
var selectedCars = Cars.Where(x=>ids.Contains(x.ID.ToString()));
return selectedCars.All(x=>x.ColorID == selectedCars.First().ColorID);
You could do this by performing a distinct, and asserting the count is 1.
var colors = Cars.Where(x=>ids.Contains(x.ID.ToString())
.Select(x=>x.ColorID)
.Distinct().Count();
return count == 1;

Resources