Beeing pretty new to Power Query, I find myself faced with this problem I wish to solve.
I have a TableA with these columns. Example:
Key | Sprint | Index
-------------------------
A | PI1-I1 | 1
A | PI1-I2 | 2
B | PI1-I3 | 1
C | PI1-I1 | 1
I want to end up with a set looking like this:
Key | Sprint | Index | HasSpillOver
-------------------------
A | PI1-I1 | 1 | Yes
A | PI2-I2 | 2 | No
B | PI1-I3 | 1 | No
C | PI1-I1 | 1 | No
I thought I could maybe nestedjoin TableA on itself and then compare indicies and strip them away and then count rows in the table, like outlined below.
TableA=Key, Sprint, Index
// TableA Nested joined on itself (Key, Sprint, Index, Nested)
TableB=NestedJoin(#"TableA", "Key", #"TableA", "Key", "Nested", JoinKind.Inner)
TableC= Table.TransformColumns(#"TableB", {"Nested", (x)=>Table.SelectRows(x, each [Index] <x[Index])} )
.. and then do the count, however this throws an error:
Can not apply operator < on types List and Number.
Any suggestions how to approach this problem? Possibly (probably) in a different way.
You did not define very well what "spillover" means but this should get you most of the way
Mine assumes adding another index. You could use what you have if it is relevant
Then the code counts the number of rows where the (2nd) index is higher, and the [Key] field matches. You could add code so that the Sprint field matches as well if relevant
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index.1", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index" ,"Count",(i)=>Table.RowCount(Table.SelectRows(#"Added Index" , each [Key]=i[Key] and [Index.1]>i[Index.1])))
in #"Added Custom"
I have following data
start stop status
+-----------+-----------+-----------+
| 09:01:10 | 09:01:40 | active |
| 09:02:30 | 09:04:50 | active |
| 09:10:01 | 09:11:50 | active |
+-----------+-----------+-----------+
I want to fill in the gaps with "passive"
start stop status
+-----------+-----------+-----------+
| 09:01:10 | 09:01:40 | active |
| 09:01:40 | 09:02:30 | passive |
| 09:02:30 | 09:04:50 | active |
| 09:04:50 | 09:10:01 | passive |
| 09:10:01 | 09:11:50 | active |
+-----------+-----------+-----------+
How can I do this in M Query language?
You could try something like the below (my first two steps someTable and changedTypes are just to re-create your sample data on my end):
let
someTable = Table.FromColumns({{"09:01:10", "09:02:30", "09:10:01"}, {"09:01:40", "09:04:50", "09:11:50"}, {"active", "active", "active"}}, {"start","stop","status"}),
changedTypes = Table.TransformColumnTypes(someTable, {{"start", type duration}, {"stop", type duration}, {"status", type text}}),
listOfRecords = Table.ToRecords(changedTypes),
transformList = List.Accumulate(List.Skip(List.Positions(listOfRecords)), {listOfRecords{0}}, (listState, currentIndex) =>
let
previousRecord = listOfRecords{currentIndex-1},
currentRecord = listOfRecords{currentIndex},
thereIsAGap = currentRecord[start] <> previousRecord[stop],
recordsToAdd = if thereIsAGap then {[start=previousRecord[stop], stop=currentRecord[start], status="passive"], currentRecord} else {currentRecord},
append = listState & recordsToAdd
in
append
),
backToTable = Table.FromRecords(transformList, type table [start=duration, stop=duration, status=text])
in
backToTable
This is what I start off with (at the changedTypes step):
This is what I end up with:
To integrate with your existing M code, you'll probably need to:
remove someTable and changedTypes from my code (and replace with your existing query)
change changedTypes in the listOfRecords step to whatever your last step is called (otherwise you'll get an error if you don't have a changedTypes expression in your code).
Edit:
Further to my answer, what I would suggest is:
Try changing this line in the code above:
listOfRecords = Table.ToRecords(changedTypes),
to
listOfRecords = List.Buffer(Table.ToRecords(changedTypes)),
I found that storing the list in memory reduced my refresh time significantly (maybe ~90% if quantified). I imagine there are limits and drawbacks (e.g. if the list can't fit), but might be okay for your use case.
Do you experience similar behaviour? Also, my basic graph indicates non-linear complexity of the code overall unfortunately.
Final note: I found that generating and processing 100k rows resulted in a stack overflow whilst refreshing the query (this might have been due to the generation of input rows and may not the insertion of new rows, don't know). So clearly, this approach has limits.
I think I may have a better performing solution.
From your source table (assuming it's sorted), add an index column starting from 0 and an index column starting from 1 and then merge the table with itself doing a left outer join on the index columns and expand the start column.
Remove columns except for stop, status, and start.1 and filter out nulls.
Rename columns to start, status, and stop and replace "active" with "passive".
Finally, append this table to your original table.
let
Source = Table.RenameColumns(#"Removed Columns",{{"Column1.2", "start"}, {"Column1.3", "stop"}, {"Column1.4", "status"}}),
Add1Index = Table.AddIndexColumn(Source, "Index", 1, 1),
Add0Index = Table.AddIndexColumn(Add1Index, "Index.1", 0, 1),
SelfMerge = Table.NestedJoin(Add0Index,{"Index"},Add0Index,{"Index.1"},"Added Index1",JoinKind.LeftOuter),
ExpandStart1 = Table.ExpandTableColumn(SelfMerge, "Added Index1", {"start"}, {"start.1"}),
RemoveCols = Table.RemoveColumns(ExpandStart1,{"start", "Index", "Index.1"}),
FilterNulls = Table.SelectRows(RemoveCols, each ([start.1] <> null)),
RenameCols = Table.RenameColumns(FilterNulls,{{"stop", "start"}, {"start.1", "stop"}}),
ActiveToPassive = Table.ReplaceValue(RenameCols,"active","passive",Replacer.ReplaceText,{"status"}),
AppendQuery = Table.Combine({Source, ActiveToPassive}),
#"Sorted Rows" = Table.Sort(AppendQuery,{{"start", Order.Ascending}})
in
#"Sorted Rows"
This should be O(n) complexity with similar logic to #chillin, but I think should be faster than using a custom function since it will be using a built-in merge which is likely to be highly optimized.
I would approach this as follows:
Duplicate the first table.
Replace "active" with "passive".
Remove the start column.
Rename stop to start.
Create a new stop column by looking up the earliest start time from your original table that occurs after the current stop time.
Filter out nulls in this new column.
Append this table to the original table.
The M code will look something like this:
let
Source = <...your starting table...>
PassiveStatus = Table.ReplaceValue(Source,"active","passive",Replacer.ReplaceText,{"status"}),
RemoveStart = Table.RemoveColumns(PassiveStatus,{"start"}),
RenameStart = Table.RenameColumns(RemoveStart,{{"stop", "start"}}),
AddStop = Table.AddColumn(RenameStart, "stop", (C) => List.Min(List.Select(Source[start], each _ > C[start])), type time),
RemoveNulls = Table.SelectRows(AddStop, each ([stop] <> null)),
CombineTables = Table.Combine({Source, RemoveNulls}),
#"Sorted Rows" = Table.Sort(CombineTables,{{"start", Order.Ascending}})
in
#"Sorted Rows"
The only tricky bit above is the custom column part where I define the new column like this:
(C) => List.Min(List.Select(Source[start], each _ > C[start]))
This takes each item in the column/list Source[start] and compares it to the time in the current row. It selects only the ones that occur after the time in the current row and then take the min over that list to find the earliest one.
I want to know, is there set of entities by following rule:
I have a table with two primary keys:
| id | key |
| 1 | a |
| 2 | b |
| 1 | c |
So, I want to do something like that:
boolean existsByIdAndAllOfKey(
long id,
Set<Key> keys
)
This query should return true if in the database there are entities with all keys presented in input Set.
I wondering is there any keyword from spring data? Or what is the best way to do that?
found following solution:
int countByIdAndKeyIn(
long id,
Set<Key> keys
)
boolean isThereEntityWithAllKeys(long id, Set<Key> keys) {
return countByIdAndKeyIn(id, keys) == keys.size;
}
I have two related one to many entities
Race and Cars (on race contains a lot of cars)
I need to generate an json result to pass it to jQGrid, i thought may be it is possible to do that without creating new class witch would contain properties. I thought I can go like that:
var jsonData = new
{
total = totalPages,
page = page,
records = totalRecords,
rows = (from c in Races
select new
{
//c.Cars.Id.ToString(), - need iteration
cell = new string[] {
//c.Cars.Id.ToString(), - need iteration
c.Date.ToString(),
c.Type.ToString(),
c.Cars //But how i may loop all Cars colection here?
//c.Cars.Name - need iteration
//c.Cars.Speed - need iteration
}
}).ToArray()
};
But the Cars property represent collection. How may i iterate that inside collection initializer? Or should i better create class witch would contain all the properties i need?
Any ideas?
Lets say Car has properties Name Speed Id and Race has properties Date, Type
The data will be displayed like that:
Date | Type | Id | Name | Speed
02/03/2011 | A | 1 | MegaName1 | 130
02/03/2011 | A | 2 | MegaName2 | 112
02/03/2011 | A | 3 | MegaName3 | 132
03/05/2011 | B | 4 | MegaName2 | 112
03/05/2011 | B | 5 | MegaName4 | 33
Try the following:
var jsonData = new
{
total = totalPages,
page = page,
records = totalRecords,
rows =
(from race in races
from car in race.Cars
select new
{
cell = new string[]
{
race.Date.ToString(),
race.Type,
car.Id,
car.Name,
car.Speed.ToString()
}
}).ToArray()
};
I want to read excel 2003( cannot change as its coming from third party) and group data in List or Dictionary (I don't which one is good)
for example below (Excel formatting )
Books Data [first row and first column in excel]
second row( no records)
Code,Name,IBN [third row (second column, third column]
Aust [fourth row, first column]
UX test1 34 [ fifth row (second column, third column]
......
....
Books Data
Code Name IBN
Aust
UX test1 34
UZ test2 345
UN test3 5654
US
UX name1 567
TG nam2 123
UM name3 234
I am reading excel data using following code( some help from Google)
string filename = #"C:\\" + "Book1.xls";
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + filename + ";" +
"Extended Properties=Excel 8.0;";
OleDbDataAdapter dataAdapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
DataSet myDataSet = new DataSet();
dataAdapter.Fill(myDataSet, "BookInfo");
DataTable dataTable = myDataSet.Tables["BookInfo"];
var rows = from p in dataTable.AsEnumerable()
where p[0].ToString() != null || p[0].ToString() != "" && p.Field<string>("F2") != null
select new
{ countryName= p[0],
bookCode= p.Field<string>("F2"),
bookName= p.Field<string>("F3")
};
The code above is not good as to get the “Code” I am using “ F2” and for country I am using p[0].What should I use to get the code and name for each country.
Also it’s give the information I want but I don't how to put in list or dictionary or in class so I can get data by passing parameter as a country name.
In short first it must put all data in list or dictionary and then you can call list or dictionary get data filter by country.
Thanks
There's two things you need to do:
First, you need to reformat the spreadsheet to have the column headers on the first row like the table below shows
| Country | Code | Name | IBN |
|---------|------|---------|------|
| Aust | UX | test1 | 34 |
| Aust | UZ | test2 | 345 |
| Aust | UN | test3 | 5654 |
| US | UX | name1 | 567 |
| US | TG | name2 | 123 |
| US | UM | name3 | 234 |
Second, use the Linq to Excel library to retrieve the data. It takes care of making the oledb connection and creating the sql for you. Below is an example of how easy it is to use the library
var book = new ExcelQueryFactory("pathToExcelFile");
var australia = from x in book.Worksheet()
where x["Country"] == "Aust"
select new
{
Country = x["Country"],
BookCode = x["Code"],
BookName = x["Name"]
};
Checkout the Linq to Excel intro video for more information about the open source project.
Suggestion 1
Checkout THIS link......as AKofC suggests, creating a class to hold your data would be your first port of call. The link I have posted has a small example of the sort of idea we are proposing.
Suggestion 2 with example...
The obvious thing to do from the code you have posted would be to create a new class to store your book information in.
Then you simply define which fields from your excel document it is that you want to pass into the new instance of your bookinformation class.
New Book Information Class:
class MyBookInfo
{
public string CountryName { get; set; }
public string BookCode { get; set; }
public string BookName { get; set; }
}
Method To Retrieve Info:
public void GetMyBookInfoFromExcelDocument()
{
string filename = #"C:\\" + "Book1.xls";
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + filename + ";" +
"Extended Properties=Excel 8.0;";
OleDbDataAdapter dataAdapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
DataSet myDataSet = new DataSet();
dataAdapter.Fill(myDataSet, "BookInfo");
DataTable dataTable = myDataSet.Tables["BookInfo"];
var rows = from p in dataTable.AsEnumerable()
where p[0].ToString() != null || p[0].ToString() != "" && p.Field<string>("F2") != null
select new MyBookInfo
{
CountryName = p.Field<string>("InsertFieldNameHere"),
BookCode = p.Field<string>("InsertFieldNameHere"),
BookName = p.Field<string>("InsertFieldNameHere")
};
}
From what I understand, I suggest creating a BookData class containing the properties you need, in this case Country, Code, Name, and IBN.
Then once you've filled your DataSet with the Excel stuff, create a new List, and loop through the DataRows in the DataSet adding the Excel values to the List.
Then you can use Linq on the List like so:
List<BookData> results = from books in bookList
where books.country == 'US'
select books;
Or something like that. I don't have Visual Studio on me, and Intellisense has spoiled me, so yeah. >__>