I try to create a linq query but unfortunately I have no ideas to resolve my problem.
I would like get the highest entry of all customers and form this result only 5 entries sort by date.
ID Date ID_Costumer
1 - 01.01.2014 - 1
2 - 02.01.2014 - 2
3 - 02.01.2014 - 1
4 - 03.01.2014 - 1 --> this value
5 - 04.01.2014 - 3
6 - 05.01.2014 - 3 --> this value
7 - 05.01.2014 - 4
8 - 06.01.2014 - 4 --> this value
9 - 08.01.2014 - 5 --> this value
10 - 09.01.2014 - 6 --> this value
I try it with this query
var query = from g in context.Geraete
where g.Online && g.AltGeraet == false
select g;
query.GroupBy(g => g.ID_Anbieter).Select(g => g.Last());
but it doesn't work.

You should assign results of selecting last item from group back to query variable:
query = query.GroupBy(g => g.ID_Anbieter).Select(g => g.Last());
var result = query.Take(5);
Keep in mind - operator Last() is not supported by Linq to Entities. Also I think you should add ordering when selecting latest item from each group, and selecting top 5 latest items:
var query = from g in context.Geraete
where g.Online && !g.AltGeraet
group g by g.ID_Anbieter into grp
select grp.OrderByDescending(g => g.Date).First();
var result = query.OrderByDescending(x => x.Date).Take(5);


Multi-level counter iteration

I'm stuck on creating an algorithm as follows. I know this shouldn't be too difficult, but I simply can't get my head around it, and can't find the right description of this kind of pattern.
Basically I need a multi-level counter, where when a combination exist in the database, the next value is tried by incrementing from the right.
1 1 1 - Start position. Does this exist in database? YES -> Extract this and go to next
1 1 2 - Does this exist in database? YES -> Extract this and go to next
1 1 3 - Does this exist in database? YES -> Extract this and go to next
1 1 4 - Does this exist in database? NO -> Reset level 1, move to level 2
1 2 1 - Does this exist in database? YES -> Extract this and go to next
1 2 2 - Does this exist in database? NO -> Reset level 2 and 1, move to level 3
2 1 1 - Does this exist in database? YES -> Extract this and go to next
2 1 2 - Does this exist in database? YES -> Extract this and go to next
2 1 3 - Does this exist in database? NO -> Reset level 1 and increment level 2
2 2 1 - Does this exist in database? YES -> Extract this and go to next
2 2 2 - Does this exist in database? YES -> Extract this and go to next
2 2 3 - Does this exist in database? YES -> Extract this and go to next
2 3 1 - Does this exist in database? NO -> Extract this and go to next
3 1 1 - Does this exist in database? NO -> Extract this and go to next
3 2 1 - Does this exist in database? NO -> End, as all increments tried
There could be more than three levels, though.
In practice, each value like 1, 2, etc is actually a $value1, $value2, etc. containing a runtime string being matched against an XML document. So it's not just a case of pulling out every combination already existing in the database.
Assuming, the length of the DB key is known upfront, here's one way how it can be implemented. I'm using TypeScript but similar code can be written in your favorite language.
First, I declare some type definitions for convenience.
export type Digits = number[];
export type DbRecord = number;
Then I initialize fakeDb object which works as a mock data source. The function I wrote will work against this object. This object's keys are representing the the database records' keys (of type string). The values are simple numbers (intentionally sequential); they represent the database records themselves.
export const fakeDb: { [ dbRecordKey: string ]: DbRecord } = {
'111': 1,
'112': 2,
'113': 3,
'211': 4,
'212': 5,
'221': 6,
'311': 7,
Next, you can see the fun part, which is the function that uses counterDigits array of "digits" to increment depending on whether the record presence or absence.
Please, do NOT think this is the production-ready code! A) there are unnecessary console.log() invocations which only exist for demo purposes. B) it's a good idea to not read a whole lot of DbRecords from the database into memory, but rather use yield/return or some kind of buffer or stream.
export function readDbRecordsViaCounter(): DbRecord[] {
const foundDbRecords: DbRecord[] = [];
const counterDigits: Digits = [1, 1, 1];
let currentDigitIndex = counterDigits.length - 1;
do {
if (recordExistsFor(counterDigits)) {
currentDigitIndex = counterDigits.length - 1;
counterDigits[currentDigitIndex] += 1;
} else {
for (let priorDigitIndex = currentDigitIndex + 1; priorDigitIndex < counterDigits.length; priorDigitIndex++) {
counterDigits[priorDigitIndex] = 1;
if (currentDigitIndex < 0) {
console.log(`------- (no more records expected -- ran out of counter's range)`);
return foundDbRecords;
counterDigits[currentDigitIndex] += 1;
console.log(`next key to try: ${ getKey(counterDigits) }`);
} while (true);
The remainings are some "helper" functions for constructing a string key from a digits array, and accessing the fake database.
export function recordExistsFor(digits: Digits): boolean {
const keyToSearch = getKey(digits);
const result = Object.getOwnPropertyNames(fakeDb).some(key => key === keyToSearch);
console.log(`key=${ keyToSearch } => recordExists=${ result }`);
return result;
export function extract(digits: Digits): DbRecord {
const keyToSearch = getKey(digits);
const result = fakeDb[keyToSearch];
console.log(`key=${ keyToSearch } => extractedValue=${ result }`);
return result;
export function getKey(digits: Digits): string {
return digits.join('');
Now, if you run the function like this:
const dbRecords = readDbRecordsViaCounter();
console.log(`\n\nDb Record List: ${ dbRecords }`);
you should see the following output that tells you about the iteration steps; as well as reports the final result in the very end.
key=111 => recordExists=true
key=111 => extractedValue=1
next key to try: 112
key=112 => recordExists=true
key=112 => extractedValue=2
next key to try: 113
key=113 => recordExists=true
key=113 => extractedValue=3
next key to try: 114
key=114 => recordExists=false
next key to try: 121
key=121 => recordExists=false
next key to try: 211
key=211 => recordExists=true
key=211 => extractedValue=4
next key to try: 212
key=212 => recordExists=true
key=212 => extractedValue=5
next key to try: 213
key=213 => recordExists=false
next key to try: 221
key=221 => recordExists=true
key=221 => extractedValue=6
next key to try: 222
key=222 => recordExists=false
next key to try: 231
key=231 => recordExists=false
next key to try: 311
key=311 => recordExists=true
key=311 => extractedValue=7
next key to try: 312
key=312 => recordExists=false
next key to try: 321
key=321 => recordExists=false
next key to try: 411
key=411 => recordExists=false
------- (no more records expected -- ran out of counter's range)
Db Record List: 1,2,3,4,5,6,7
It is strongly recommended to read the code. If you want me to describe the approach or any specific detail(s) -- let me know. Hope, it helps.

Validating against a variable number of columns in Spark

​I have a bunch of codes indicating the stages a person has been in my data displayed horizontally as shown below.
Name code1 code2 code3 code4
A 2 3. 4 Null
B 2 5 4 7
C 1 3 4 5
D 0 9 Null Null
I have another file which has all the valid codes.
ID Value
1 3
2 4
3 5
4 6
5 7
What I would like to do is validate all the columns cell by cell against this lookup and indicate 0 if they are valid and null if they are not valid.
I'm using Apache Spark 1.5.2 and I would like to do this the efficient way. I've tried bunch of combinations and only thing close to what I want I've come is using concat on the cells and then explode it as normalized table and then perform lookups.
You can do this very simply with a single pass through the data, without any joins or explode by code-generating a validation expression:
// Simulate the data
case class Record(Name: String, code1: Option[Int], code2: Option[Int])
val dfData = sc.parallelize(Seq(
Record("A", Some(3), Some(4)),
Record("B", Some(3), None)
// Simulate the lookup table
val dfLookup = sc.parallelize(Seq((1,3), (2,4))).toDF("ID", "Value")
// Build a validation expression
val validationExpression ={ row =>
s"code${row.getInt(0)} = ${row.getInt(1)}"
}.mkString(" and ")
// Add an is_valid column to the data
sql(s"select *, nvl($validationExpression, false) as is_valid from my_data").show
This produces:
defined class Record
dfData: Unit = ()
dfLookup: org.apache.spark.sql.DataFrame = [ID: int, Value: int]
validationExpression: String = code1 = 3 and code2 = 4
| A| 3| 4| true|
| B| 3| null| false|

LINQ - Previous Record

Lets say I have the following table:
RevisionID, Project_ID, Count, Changed_Date
1 2 4 01/01/2016: 01:02:01
2 2 7 01/01/2016: 01:03:01
3 2 8 01/01/2016: 01:04:01
4 2 3 01/01/2016: 01:05:01
5 2 15 01/01/2016: 01:06:01
I am ordering the records based on Updated_Date. A user comes into my site and edits record (RevisionID = 3). For various reasons, using LINQ (with entity framework), I need to get the previous record in the table, which would be RevisionID = 2 so I can perform calculations on "Count". If user went to edit record (RevisionID = 4), I would need to select RevisionID = 3.
I currently have the following:
var x = _db.RevisionHistory
.Where(t => t.Project_ID == input.Project_ID)
.OrderBy(t => t.Changed_Date);
This works in finding the records based on the Project_ID, but how then do I select the record before?
I am trying to do the following, but in one LINQ statement, if possible.
var itemList = from t in _db.RevisionHistory
where t.Project_ID == input.Project_ID
orderby t.Changed_Date
select t;
int h = 0;
foreach (var entry in itemList)
if (entry.Revision_ID == input.Revision_ID)
h = entry.Revision_ID;
var previousEntry = _db.RevisionHistory.Find(h);
Here is the correct single query equivalent of your code:
var previousEntry = (
from r1 in db.RevisionHistory
where r1.Project_ID == input.Project_ID && r1.Revision_ID == input.Revision_ID
from r2 in db.RevisionHistory
where r2.Project_ID == r1.Project_ID && r2.Changed_Date < r1.Changed_Date
orderby r2.Changed_Date descending
select r2
which generates the following SQL query:
[Project1].[Revision_ID] AS [Revision_ID],
[Project1].[Project_ID] AS [Project_ID],
[Project1].[Count] AS [Count],
[Project1].[Changed_Date] AS [Changed_Date]
[Extent2].[Revision_ID] AS [Revision_ID],
[Extent2].[Project_ID] AS [Project_ID],
[Extent2].[Count] AS [Count],
[Extent2].[Changed_Date] AS [Changed_Date]
FROM [dbo].[RevisionHistories] AS [Extent1]
INNER JOIN [dbo].[RevisionHistories] AS [Extent2] ON [Extent2].[Project_ID] = [Extent1].[Project_ID]
WHERE ([Extent1].[Project_ID] = #p__linq__0) AND ([Extent1].[Revision_ID] = #p__linq__1) AND ([Extent2].[Changed_Date] < [Extent1].[Changed_Date])
) AS [Project1]
ORDER BY [Project1].[Changed_Date] DESC
hope I understood what you want.
var x = _db.RevisionHistory
.FirstOrDefault(t => t.Project_ID == input.Project_ID && t.Revision_ID == input.Revision_ID -1)
Or, based on what you wrote, but edited:
.Where(t => t.Project_ID == input.Project_ID)
.OrderBy(t => t.Changed_Date)
.TakeWhile(t => t.Revision_ID != input.Revision_ID)

Pig: Group By, Average, and Order By

I am new to pig and I have a text file where each line contains a different record of information in the following format:
name, year, count, uniquecount
For example:
Zverkov winced_VERB 2004 8 8
Zverkov winced_VERB 2008 4 4
Zverkov winced_VERB 2009 1 1
zvlastni _ADV_ 1913 1 1
zvlastni _ADV_ 1928 2 2
zvlastni _ADV_ 1929 3 2
I want to group all the records by their unique names, then for each unique name calculate count/uniquecount, and finally sort the output by this calculated value.
Here is what I have been trying:
bigrams = LOAD 'input/bigram/zv.gz' AS (bigram:chararray, year:int, count:float, books:float);
group_bigrams = GROUP bigrams BY bigram;
average_bigrams = FOREACH group_bigrams GENERATE group, SUM(bigrams.count) / SUM(bigrams.books) AS average;
sorted_bigrams = ORDER average_bigrams BY average;
It seems my original code does produce the desired output with one minor change:
bigrams = LOAD 'input/bigram/zv.gz' AS (bigram:chararray, year:int, count:float, books:float);
group_bigrams = GROUP bigrams BY bigram;
average_bigrams = FOREACH group_bigrams GENERATE group, SUM(bigrams.count)/SUM(bigrams.books) AS average;
sorted_bigrams = ORDER average_bigrams BY average DESC, group ASC;

rowcount in Forloop does not update when row is deleted

I am trying to run this code in QTP which is for deleting records from a grid. This code
dgRows = SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").RowCount
For i = 1 To dgRows
SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").SelectCell i-1,0
'row of data grid begins with 0, hence i -1
SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfButton("DELETE").Click
Swfwindow("PWC - [PWC]").SwfWindow("Region Master").SwfWindow("RegionMaster").SwfButton("Insert").Click
deleteCode = closePrompt()
If deleteCode = 15 Then 'closePrompt returns 15 when record is successfully deleted
i = i - 1 'As record is deleted, grid has lost one record and for loop will exit early by one record or throw error.
dgoRows = SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").RowCount
End If
This piece of code runs from 1 to number of rows in a grid (dgRows).
If there are 3 rows, it will run thrice and delete records if possible. If 1 record is deleted,
the grid loses a record. Hence I am trying to adjust the value of i and dgRows by the code
i = i - 1 'As record is deleted, grid has lost one record and
for loop will exit early by one record or throw error.
dgoRows = SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").RowCount
'Updating new value of dgRows so that QTP does not click on a row for value of i that does not
I try to illustrate the issues I am facing with this piece of code
dgRows = SwfWindow("PWC - [PWC]").SwfWindow("Region
After row is deleted, it does not dynamically get the no of rows in the grid when the for loop iterates. Hence i value becomes equal to 3 rows but actually in the grid there is one row as 2 records have been deleted, thus QTP tries to click on a cell with i value 3 but doesn't find it and throws an error.
Can anyone tell me why doesn't ("dgMaster").RowCount update itself or how to update it when the for loop runs next?
In VBScript a for loop only evaluates the limit once.
limit = 4
For i = 1 to limit
limit = limit - 1
document.write(i & " - " & limit & "<br>")
1 - 3
2 - 2
3 - 1
4 - 0
You should use a While loop instead
dgRows = SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").RowCount
i = 1 ' I would use 0 but I'm keeping the code as close to original as possible
While i <= dgRows
i = i + 1
SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").SelectCell i-1,0
'row of data grid begins with 0, hence i -1
SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfButton("DELETE").Click
Swfwindow("PWC - [PWC]").SwfWindow("Region Master").SwfWindow("RegionMaster").SwfButton("Insert").Click
deleteCode = closePrompt()
If deleteCode = 15 Then 'closePrompt returns 15 when record is successfully deleted
i = i - 1 'As record is deleted, grid has lost one record and for loop will exit early by one record or throw error.
dgoRows = SwfWindow("PWC - [PWC]").SwfWindow("Region Master").SwfTable("dgMaster").RowCount
End If
