Add column in a data set in R - logic

I have an existing data frame with these columns. The home teams played against the Away Team in the sequence mentioned. In Results column H denotes that Home Team won, P denotes AwayTeam won and D denotes that it was a draw.
HomeTeam = Liverpool, Brighton, Birmingham, Manchester, Portsmouth
Away Team = Netherland, Austria, Cambodia, Netherlands, Austria
Results = H,H,P,D,H
My new data frame consists of column 'TeamName' where it shows the total number of teams playing the series.
TeamName = Liverpool, Brighton, Birmingham, Manchester, Netherland, Austria, Cambodia, Portsmouth
I need two columns in the new data frame 'HomeRec' and 'AwayRec' that records the matches won by these teams at home and away.

Related

Can I add a new column with Linear Interpolation in Power Query M?

I am working on extracting an Interest Rate curve from futures market prices and create a table (Table 1) inside power query with the following columns:
- BusinessDays: Represents the nr o business days from today to the expiry of each future contract
- InterestRate: Represents the rate from today until the expiry of the futures contract
The second table (table 2) refers to the ID of internal financial products that expire in different business days.
- InstrumentID: Unique internal ID a financial product selled by a financial institution
- BusinessDays: Represents the nr o business days from today to the expiry of each financial product
I am having some trouble with M language, and unfortunately this specific calculation must be executed in Excel, so i am restricted to Power Query M.
The specific step i am not able to do is:
Creating a function in power query that adds a new column do table 2 containing the interpolated interest rate os each financial product.
The end result i am looking for would look like this
There are several ways to approach this but one way or another, you'll need to do some kind of lookup to determine which bracket to match your BusinessDays value with, so you can calculate the interpolated value.
I think it's simpler to just generate an all inclusive list of days vs interest rates, and then do a Join to pull out the matches.
I Name'd this first query intRates and expanded the Interest Rate table:
let
//Get the interest rate/business day table
Source = Excel.CurrentWorkbook(){[Name="intRates"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"BusinessDays", Int64.Type}, {"InterestRate", Percentage.Type}}),
//Add two columns which are the interest rate and business day columns offset by one
//It is faster to subtract this way than by adding an Index column
offset=
Table.FromColumns(
Table.ToColumns(#"Changed Type")
& {List.RemoveFirstN(#"Changed Type"[BusinessDays]) & {null}}
& {(List.RemoveFirstN(#"Changed Type"[InterestRate])) & {null}},
type table[BusinessDays=Int64.Type, InterestRate=Percentage.Type, shifted BusDays=Int64.Type, shifted IntRate=Percentage.Type]),
//Add a column with a list of the interest rates for each data interpolated between the segments
#"Added Custom" = Table.AddColumn(offset, "IntList", each let
sbd=[shifted BusDays],
intRateIncrement = ([shifted IntRate]-[InterestRate])/([shifted BusDays]-[BusinessDays]),
Lists= List.Generate(
()=>[d=[BusinessDays],i=[InterestRate]],
each [d]< sbd,
each [d=[d]+1, i = [i]+intRateIncrement],
each [i])
in Lists),
//add another column with a list of days corresponding to the interest rates
#"Added Custom1" = Table.AddColumn(#"Added Custom", "dayList", each {[BusinessDays]..[shifted BusDays]-1}),
//remove the last row as it will have an error
remErrRow = Table.RemoveLastN(#"Added Custom1",1),
//create the new table which has the rates for every duration
intRateTable = Table.FromColumns(
{List.Combine(remErrRow[dayList]),List.Combine(remErrRow[IntList])},
type table[Days=Int64.Type, Interest=Percentage.Type])
in
intRateTable
This results in a table that has every day (from 39 to , with its corresponding interest rate.
Then read in the "Instruments" table and Join it with the intRates, using a JoinKind.LeftOuter
let
Source = Excel.CurrentWorkbook(){[Name="Instruments"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"InstrumentID", type text}, {"BusinessDays", Int64.Type}}),
//add the rate column
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"BusinessDays"}, intRates, {"Days"}, "intRates", JoinKind.LeftOuter),
#"Expanded intRates" = Table.ExpandTableColumn(#"Merged Queries", "intRates", {"Interest"}, {"Interest"})
in
#"Expanded intRates"
Some of the results in the middle part of the table differ from what you've posted, but seem to be consistent with the linear interpolation formula for between two values, so I'm not sure how the discrepancy arises

Power Query to Convert List of Links to Grid of Crossings

In Excel I have a data table of Paired Items that are tagged with an identifier. Essentially, named linkages.
Worksheet: Links
Tag
Point-A
Point-B
Route 1
Home
Office
Route 2
Home
Grocery 1
Happy Hour
Office
Bar
Sad Hour
Office
Dump
Headaches
Bar
Pharmacy
Sick
Bar
Dump
Route 3
Office
Moms
Route 4
Office
Park
Victory
Park
Bar
Discard
Park
Dump
I want to transform this data into a grid of all points in rows and columns with the tag placed at the intersection (Much like old paper road maps with grids for city distances)
Worksheet: Grid
A \ B
Bar
Dump
Grocery 1
Home
Home
Moms
Office
Office
Park
Pharmacy
Bar
Sick
Happy Hour
Victory
Headaches
Dump
Sick
Sad Hour
Discard
Grocery 1
Route 2
Home
Route 1
Home
Route 2
Moms
Route 3
Office
Happy Hour
Sad Hour
Route 1
Route 3
Office
Route 4
Park
Victory
Discard
Route 4
Pharmacy
Headaches
I have written the following M code for transforming, but it seems a bit wayward and overwrought. I am using bit coding of points to construct a join key, so the bitting process will probably break around 32 points.
Is there a shorter set of LETs that do the same transform to grid ?
Is there a way to create a key that is Min(Point-A,Point-B) delimited concatenation with Max(Point-A,Point-B), and thus not rely of bitting?
M code (copied from Advanced Editor)
let
LinksTable = Table.SelectRows(Excel.CurrentWorkbook(), each [Name] = "Links"),
Links = Table.RemoveColumns(Table.ExpandTableColumn(LinksTable, "Content", {"Tag", "Point-A", "Point-B"}), "Name"),
AllPoints = Table.Combine(
{ Table.SelectColumns(Table.RenameColumns(Links,{"Point-A", "Point"}), "Point"),
Table.SelectColumns(Table.RenameColumns(Links,{"Point-B", "Point"}), "Point")
}),
ThePoints = Table.Sort(Table.Distinct(AllPoints),{"Point"}),
PointsIndexed = Table.AddIndexColumn(ThePoints, "Index", 0, 1, Int64.Type),
PointsBitted = Table.RemoveColumns(Table.AddColumn(PointsIndexed, "Bit", each Number.Power(2, [Index]), Int64.Type),"Index"),
AllPairsBitted = Table.Join(
Table.RenameColumns(PointsBitted, {{"Point", "Point-A"}, {"Bit", "Bit-A"}}), {},
Table.RenameColumns(PointsBitted, {{"Point", "Point-B"}, {"Bit", "Bit-B"}}), {},
JoinKind.FullOuter
),
AllPairsKeyed = Table.RemoveColumns(
Table.AddColumn(AllPairsBitted, "BitKeyPair", each Number.BitwiseOr([#"Bit-A"],[#"Bit-B"])),
{ "Bit-A", "Bit-B"}
),
#"Links-A-Bitted" = Table.Join(
Links, "Point-A",
Table.RenameColumns(PointsBitted,{{"Point", "Point-A"}, {"Bit", "Bit-A"}}), "Point-A"
),
#"Links-AB-Bitted" = Table.Join(
#"Links-A-Bitted", "Point-B",
Table.RenameColumns(PointsBitted,{{"Point", "Point-B"}, {"Bit", "Bit-B"}}), "Point-B"
),
LinksKeyed = Table.RemoveColumns(
Table.AddColumn(#"Links-AB-Bitted", "BitKeyLink", each Number.BitwiseOr([#"Bit-A"],[#"Bit-B"])),
{ "Bit-A", "Bit-B"}
),
AllPairsTagged = Table.Sort( Table.RemoveColumns(
Table.Join(
AllPairsKeyed, "BitKeyPair",
Table.SelectColumns(LinksKeyed, {"BitKeyLink", "Tag"}), "BitKeyLink",
JoinKind.LeftOuter
),
{"BitKeyPair", "BitKeyLink"}
),
{"Point-A", "Point-B"}
),
Grid = Table.Pivot(AllPairsTagged, List.Distinct(AllPairsTagged[#"Point-B"]), "Point-B", "Tag", List.First)
in
Grid
I think you can use PIVOT to achieve this. Using directly this functionality would not work because you are looking for symmetry of columns and rows.
The trick is to force that symmetry, appending values from Point-B into values of Point-A.
Steps
Create a secondary table and reorder the columns in the opposite way that the original table, so Tag, Point-B and Point-A.
On the secondary table, rename the columns to Tag, Point-A and Point-B in that order. Append usually take column names literally, so without renaming it would append the names of the same columns.
Pivot on column Point-B without aggregating data.
Reorder the columns using Point-A as a reference, so you have symmetry of columns and rows.
It's worth mentioning that's good practice to Buffer the source table because is used multiple times across the calculations.
Calculation
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Zc69CsMgEMDxVwnOLv14ghKoS2looEvIcJwXIkEMZxx8+6axiUInwd/99bpOvFxYqDoJKZSztB7PYTBIope7nbPd2SFxXMe/rGCeY6Vc4JxJcQPetAX9Z3Wwc0oJNOBI/hdI0YzAFjCm1uB0yBGldS7lgw9nfWHX0hrgabO3wcVx3K/yirXxCKwzpK/6Dw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Tag = _t, #"Point-A" = _t, #"Point-B" = _t]),
BufferedSource = Table.Buffer(Source),
SecondTable = Table.ReorderColumns(BufferedSource,{"Tag","Point-B","Point-A"}),
SecondTableRenameCols = Table.RenameColumns(SecondTable,{{"Point-A","Point-B"},{"Point-B","Point-A"}}),
AppendTables = Table.Combine({BufferedSource,SecondTableRenameCols}),
PivotTables = Table.Pivot(AppendTables, List.Distinct(AppendTables[#"Point-B"]), "Point-B", "Tag"),
ReorderCols = Table.ReorderColumns( PivotTables, PivotTables[#"Point-A"])
in ReorderCols
Output
Point-A
Bar
Dump
Grocery 1
Home
Moms
Office
Park
Pharmacy
Bar
Sick
Happy Hour
Victory
Headaches
Dump
Sick
Sad Hour
Discard
Grocery 1
Route 2
Home
Route 2
Route 1
Moms
Route 3
Office
Happy Hour
Sad Hour
Route 1
Route 3
Route 4
Park
Victory
Discard
Route 4
Pharmacy
Headaches

Referencing from table with mixed cells of different categories

I'm trying to program a Google Sheets for comparing and analyzing logistic costs.
I have the following:
A sheet with a database of numbers, organized like this:
A second sheet with a table in which, using the MIN function, I get the price of the cheapest provider for each model, depending on quantity and destination.
And last, into another sheet, I have what I call "The interface". Using an INDEX MATCH MATCH formula, I let the user choose destination and quantity for each one of the models avalable, and it returns the cheapest price. (I can't post more images, so basically it has this structure):
MODEL A
DESTINATION: DESTINATION 2
NUM. OBJ: 2
PRICE: 59
PROVIDER:
My problem is that I can't figure how to make it return the name of the provider with the cheapest price, as I'm referencing from the second table, in which in a same row or column there are cells with prices that belong to different providers.
Using min is undesirable in this context, because it doesn't tell you where the minimal value was found, and you need this information.
Here is a formula that returns the minimal cost together with the provider. In my example, the data is in the range A1:E7, as below; destination is in G1 and model is in G2.
=iferror(array_constrain(sort({filter(A1:A7, B1:B7=G2), filter(filter(C1:E7, B1:B7=G2), C1:E1=G1)}, 2, True), 1, 2), "Not found")
The same with linebreaks for readability:
=iferror(
array_constrain(
sort(
{
filter(A1:A7, B1:B7 = G2),
filter(filter(C1:E7, B1:B7 = G2), C1:E1 = G1)
},
2, True),
1, 2),
"Not found")
Explanation:
filtering by B1:B7 = G2 means keeping only the rows with the desired model
filtering by C1:E1 = G1 means keeping only the column with desired destination
{ , } means putting two parts of a filtered table together: column A, and column with destination
sort by 2nd column (price), in ascending order (true)
array_constrain keeps only the first row in this sort; that is, one with lowest price.
iferror is in case there is no such destination or model in the table. Then the function returns "not found".
Example: with G1 = Destination 1 and G2 = A, the formula returns
Provider 2 2

Complicated Cube Query

I'm working on a fairly complicated view, which calculates the total cost of a guest's stayed based on data pulled from four different tables. The output however is not exactly what I want. My code is
CREATE OR REPLACE VIEW Price AS
SELECT UNIQUE
Booking.Booking_ID AS "Booking",
Booking.GuestID AS "Guest ID",
Room.Room_Price*(Booking.CheckOutDate-Booking.CheckInDate) AS "Room Price",
Add_Ons.Price AS "Add ons Price",
Room.Room_Price*(Booking.CheckOutDate-Booking.CheckInDate) + (Add_Ons.Price) AS "Total Price"
FROM Booking JOIN Room ON Room.Room_Num = Booking.Room_Num
JOIN Booking_Add_Ons ON Booking.Booking_ID = Booking_Add_Ons.Booking_ID
JOIN Add_ons ON Booking_Add_Ons.Add_On_ID = Add_Ons.Add_On_ID
ORDER BY Booking.Booking_ID;
Now, I'm trying to get this to return the total cost of all Addons, plus the cost of the hotel rooms as the total price, however it is returning the cost of the rooms + each of the addons on separate lines. As follows:
My question is, is it possible to use something like CUBE, or SUM to add up the rows, so that there is only one entry for each of the Bookings with the total price of all add-ons accounted for?

How do I break up high-cpu requests on Google App Engine?

To give an example of the kind of request that I can't figure out what else to do for:
The application is a bowling score/stat tracker. When someone enters their scores in advanced mode, a number of stats are calculated, as well as their score. The data is modeled as:
Game - members like name, user, reference to the bowling alley, score
Frame - pinfalls for each ball, boolean lists for which pins were knocked down on each ball, information about the path of the ball (stance, target, where it actually went), the score as of that frame, etc
GameStats - stores calculated statistics for the entire game, to be merged with other game stats as needed for statistics display across groups of games.
An example of this information in practice can be found here.
When a game is complete, and a frame is updated, I have to update the game, the frame, every frame after it and possibly some before it (to make sure their scores are correct), and the stats. This operation always flags the CPU monitor. Even if the game isn't complete, and statistics don't need to be calculated, the scores and such need to be updated to show the real-time progress to the user, and so these also get flagged. The average CPU time for this handler is over 7000 mcycles, and it doesn't even display a view. Most people bowl 3 to 4 games per series - if they are entering their scores realtime, at the lanes, that's about 1 request every 2 to 4 minutes, but if they write it all down and enter it later, there are 30-40 of these requests being made in a row.
As requested, the data model for the important classes:
class Stats(db.Model):
version = db.IntegerProperty(default=1)
first_balls=db.IntegerProperty(default=0)
pocket_tracked=db.IntegerProperty(default=0)
pocket=db.IntegerProperty(default=0)
strike=db.IntegerProperty(default=0)
carry=db.IntegerProperty(default=0)
double=db.IntegerProperty(default=0)
double_tries=db.IntegerProperty(default=0)
target_hit=db.IntegerProperty(default=0)
target_missed_left=db.IntegerProperty(default=0)
target_missed_right=db.IntegerProperty(default=0)
target_missed=db.FloatProperty(default=0.0)
first_count=db.IntegerProperty(default=0)
first_count_miss=db.IntegerProperty(default=0)
second_balls=db.IntegerProperty(default=0)
spare=db.IntegerProperty(default=0)
single=db.IntegerProperty(default=0)
single_made=db.IntegerProperty(default=0)
multi=db.IntegerProperty(default=0)
multi_made=db.IntegerProperty(default=0)
split=db.IntegerProperty(default=0)
split_made=db.IntegerProperty(default=0)
class Game(db.Model):
version = db.IntegerProperty(default=3)
user = db.UserProperty(required=True)
series = db.ReferenceProperty(Series)
score = db.IntegerProperty()
game_number = db.IntegerProperty()
pair = db.StringProperty()
notes = db.TextProperty()
simple_entry_mode = db.BooleanProperty(default=False)
stats = db.ReferenceProperty(Stats)
complete = db.BooleanProperty(default=False)
class Frame(db.Model):
version = db.IntegerProperty(default=1)
user = db.UserProperty()
game = db.ReferenceProperty(Game, required=True)
frame_number = db.IntegerProperty(required=True)
first_count = db.IntegerProperty(required=True)
second_count = db.IntegerProperty()
total_count = db.IntegerProperty()
score = db.IntegerProperty()
ball = db.ReferenceProperty(Ball)
stance = db.FloatProperty()
target = db.FloatProperty()
actual = db.FloatProperty()
slide = db.FloatProperty()
breakpoint = db.FloatProperty()
pocket = db.BooleanProperty()
pocket_type = db.StringProperty()
notes = db.TextProperty()
first_pinfall = db.ListProperty(bool)
second_pinfall = db.ListProperty(bool)
split = db.BooleanProperty(default=False)
A few suggestions:
You could store the stats for frames as part of the same entity as the game, rather than having a separate entity for each, by storing it as a list of bitfields (stored in integers) for the pins standing at the end of each half-frame, for example. Let me know if you want more details on how this would be implemented.
Failing that, you can calculate some of the more interrelated stats on fetch. For example, calculating the score-so-far ought to be simple if you have the whole game loaded at once, which means you can avoid having to update multiple frames on every request.
We can be of more help if you show us your data model. :)

Resources