Including for loop in subsets - algorithm

I want to calculate the weekly averages of the the load for two ships with containers. One ship sails at Sunday and the other on Wednesday. I have a big excel file with bookings.I will load up a small part of this file in the following link: https://docs.google.com/spreadsheets/d/1BxHTClTkrQzIzZzG5vXXnvKtV0_az83PGJ2ghBaAQr0/edit?usp=sharing
The first ship gets the containers that should be delivered on Monday (Mo), Tuesday(Di) and Wednesday (Mi). The second ship should deliver the containers demanded in the other port for Thursday (Do), Friday (Fr), Saturday(Sa) and Sunday(So). The data contains information about the containers from 2017-01-01 till 2018-07-31. These are 82 full weeks. I would like to make a vector with length 82, with each number the amount of containers of the days combined for that week. For example, the first number of the vector should be the demand of containers for Monday, Tuesday and Wednesday in the first week. So, I want to create a vector, one per ship, that contains information about the amount of containers that should be loaded on this ship. A vector of 82 weeks, to see which weeks we had low demand and the mean etc.
Can anyone please help me?
Here is the beginning of my code:
containers <- "https://docs.google.com/spreadsheets/d/1BxHTClTkrQzIzZzG5vXXnvKtV0_az83PGJ2ghBaAQr0/edit?usp=sharing"
#Containers between Rotterdam and Duisburg
containersRTMDUI <- subset(containers, containers$Laadhaven == "Rotterdam" & containers$Loshaven == "Duisburg")
#I used to do this in subsets, because I could not make a loop
Week1 <- subset(containersRTMDUI, containersRTMDUI$Datum1 >= "2017-01-02" &
containersRTMDUI$Datum1 < "2017-01-09" & containersRTMDUI$Dag1 = "Mo" &
containersRTMDUI$Dag1 = "Di" &containersRTMDUI$Dag1 = "Mi")
Week2 <- subset(etc..)
Of course, the hard point comes by the fact that for some days there is no demand.

I think I got it. One approach with data.table:
# read in data as a data.table
library(data.table)
dt <- data.table(read.csv("path/to/file", stringsAsFactors = F))
# rename variables to english (
# there are shorter ways to do this, but I like to keep track)
setnames(dt, old = "ISO", new = "containter_type")
setnames(dt, old = "F.E", new = "full_empty")
setnames(dt, old = "Gewicht", new = "weight")
setnames(dt, old = "Laadhaven", new = "pickup_port")
setnames(dt, old = "Laadterminal", new = "pickup_terminal")
setnames(dt, old = "Loshaven", new = "dropoff_port")
setnames(dt, old = "Losterminal", new = "dropoff_terminal")
setnames(dt, old = "Datum1", new = "pickup_date")
setnames(dt, old = "Dag1", new = "pickup_dow")
setnames(dt, old = "Datum2", new = "dropoff_date")
setnames(dt, old = "Dag2", new = "dropoff_dow")
# convert date variable to date-type (instead of factor/string)
dt[ , pickup_date := as.Date(pickup_date, "%d.%m.%Y")]
dt[ , dropoff_date := as.Date(dropoff_date, "%d.%m.%Y")]
# create a week variable
dt[ , week := lubridate::week(pickup_date)]
# create a variable (MTW) by day-of-week
# MTW=1 for mon, tues, wed; MTW=0 for thurs, fri, sat, sun
dt[ , MTW := pickup_dow %in% c("Mo", "Di", "Mi")]
# count the number of rows by week and MTW
result <- dt[ , .(nrows = .N), by=.(week, MTW)]
# print result
result
# fill in 0 weeks
dt2 <- data.table(week = rep(1:7, each=2), MTW = rep(c(T,F), each=7))
result <- merge(result, dt2, by=c("week", "MTW"), all=T)
result[is.na(nrows), nrows := 0]
# print updated result
result

Related

Quantlib Zero Coupon Inflation Swap Ignores CPI Values

I am trying to price a zero-coupon USD CPI inflation swap in Quantlib and Python. My discount curve and NPV of the fixed leg looks good, but I'm a few percentage points out compared to BBG SWPM on the NPV of the inflation leg.
One thing I've noticed is that changing the values of the CPI has no effect on the price of the swap. So I think I'm setting the CPI wrong and as such the base index of the swap is wrong. Can anyone see what I'm doing wrong here?
For reference, I've backed this out from looking at the C++ unit tests. If there's a complete Python example I would be interested to see it, but couldn't find one on my own.
import QuantLib as quantlib
import pandas as pd
start_date = quantlib.Date.from_date(pd.Timestamp(2022, 9, 6))
calc_date = quantlib.Date.from_date(pd.Timestamp(2022, 9, 6))
end_date = quantlib.Date.from_date(pd.Timestamp(2024, 9, 6))
swap_type = quantlib.ZeroCouponInflationSwap.Receiver
calendar = quantlib.TARGET()
day_count_convention = quantlib.ActualActual()
contract_observation_lag = quantlib.Period(3, quantlib.Months)
business_day_convention = quantlib.ModifiedFollowing
nominal = 10e6
fixed_rate = 0.05
cpi_json = '{"columns":[1],"index":[1640908800000,1643587200000,1646006400000,1648684800000,1651276800000,1653955200000,1656547200000,1659225600000],"data":[[277.948],[278.802],[283.716],[287.504],[289.109],[292.296],[296.311],[296.276]]}'
cpi_prints = pd.read_json(cpi_json, orient='split')
# Pretty-printed CPI:
# 1
# 2021-12-31 277.948
# 2022-01-31 278.802
# 2022-02-28 283.716
# 2022-03-31 287.504
# 2022-04-30 289.109
# 2022-05-31 292.296
# 2022-06-30 296.311
# 2022-07-31 296.276
zero_coupon_observations = pd.DataFrame(index=[0],
data={'1Y': 2.73620,
'2Y': 2.975,
'3Y': 2.967,
'4Y': 2.917,
'5Y': 2.8484})
inflation_yield_term_structure = quantlib.RelinkableZeroInflationTermStructureHandle()
inflation_index = quantlib.USCPI(True, inflation_yield_term_structure)
for date, value in cpi_prints.itertuples():
# Setting the CPI as fixings, but no matter what I put here the NPV comes out the same
# Looks like the base index for the swap is not being set by me/set through the CPI prints
# I put here.
inflation_index.addFixing(quantlib.Date.from_date(date), value)
inflation_rate_helpers = []
nominal_term_structure = quantlib.YieldTermStructureHandle(quantlib.FlatForward(calc_date,
0.00, # Changing this seems to have no effect
quantlib.ActualActual()))
for tenor in zero_coupon_observations.columns:
maturity = calendar.advance(calc_date, quantlib.Period(tenor))
quote = quantlib.QuoteHandle(quantlib.SimpleQuote(zero_coupon_observations.at[0, tenor] / 100.0))
helper = quantlib.ZeroCouponInflationSwapHelper(quote,
contract_observation_lag,
maturity,
calendar,
business_day_convention,
day_count_convention,
inflation_index,
nominal_term_structure)
inflation_rate_helpers.append(helper)
# Not sure how to choose this number, just taking the 1Y tenor on the calc date?
# I'm pricing a 2Y swap, and will want to price it off it's start date as well
base_zero_rate = zero_coupon_observations.at[0, '1Y']/100
inflation_curve = quantlib.PiecewiseZeroInflation(calc_date,
calendar,
day_count_convention,
contract_observation_lag,
quantlib.Monthly,
inflation_index.interpolated(),
base_zero_rate,
inflation_rate_helpers,
1.0e-12,
quantlib.Linear())
inflation_yield_term_structure.linkTo(inflation_curve)
swap = quantlib.ZeroCouponInflationSwap(swap_type,
nominal,
start_date,
end_date,
calendar,
business_day_convention,
day_count_convention,
fixed_rate,
inflation_index,
contract_observation_lag)
# Leaving off the construction of the discount curve for brevity.
# NPV of the fixed legs checks out
discount_curve = ...
swap_engine = quantlib.DiscountingSwapEngine(discount_curve)
swap.setPricingEngine(swap_engine)
print(swap.NPV())

Technical Analyis (MACD) for crpto trading

Background:
I have writing a crypto trading bot for fun and profit.
So far, it connects to an exchange and gets streaming price data.
I am using this price to create a technical indicator (MACD).
Generally for MACD, it is recommended to use closing prices for 26, 12 and 9 days.
However, for my trading strategy, I plan to use data for 26, 12 and 9 minutes.
Question:
I am getting multiple (say 10) price ticks in a minute.
Do I simply average them and round the time to the next minute (so they all fall in the same minute bucket)? Or is there is better way to handle this.
Many Thanks!
This is how I handled it. Streaming data comes in < 1s period. Code checks for new low and high during streaming period and builds the candle. Probably ugly since I'm not a trained developer, but it works.
Adjust "...round('20s')" and "if dur > 15:" for whatever candle period you want.
def on_message(self, msg):
df = pd.json_normalize(msg, record_prefix=msg['type'])
df['date'] = df['time']
df['price'] = df['price'].astype(float)
df['low'] = df['low'].astype(float)
for i in range(0, len(self.df)):
if i == (len(self.df) - 1):
self.rounded_time = self.df['date'][i]
self.rounded_time = pd.to_datetime(self.rounded_time).round('20s')
self.lhigh = self.df['price'][i]
self.lhighcandle = self.candle['high'][i]
self.llow = self.df['price'][i]
self.lowcandle = self.candle['low'][i]
self.close = self.df['price'][i]
if self.lhigh > self.lhighcandle:
nhigh = self.lhigh
else:
nhigh = self.lhighcandle
if self.llow < self.lowcandle:
nlow = self.llow
else:
nlow = self.lowcandle
newdata = pd.DataFrame.from_dict({
'date': self.df['date'],
'tkr': tkr,
'open': self.df.price.iloc[0],
'high': nhigh,
'low': nlow,
'close': self.close,
'vol': self.df['last_size']})
self.candle = self.candle.append(newdata, ignore_index=True).fillna(0)
if ctime > self.rounded_time:
closeit = True
self.en = time.time()
if closeit:
dur = (self.en - self.st)
if dur > 15:
self.st = time.time()
out = self.candle[-1:]
out.to_sql(tkr, cnx, if_exists='append')
dat = ['tkr', 0, 0, 100000, 0, 0]
self.candle = pd.DataFrame([dat], columns=['tkr', 'open', 'high', 'low', 'close', 'vol'])
As far as I know, most or all technical indicator formulas rely on same-sized bars to produce accurate and meaningful results. You'll have to do some data transformation. Here's an example of an aggregation technique that uses quantization to get all your bars into uniform sizes. It will convert small bar sizes to larger bar sizes; e.g. second to minute bars.
// C#, see link above for more info
quoteHistory
.OrderBy(x => x.Date)
.GroupBy(x => x.Date.RoundDown(newPeriod))
.Select(x => new Quote
{
Date = x.Key,
Open = x.First().Open,
High = x.Max(t => t.High),
Low = x.Min(t => t.Low),
Close = x.Last().Close,
Volume = x.Sum(t => t.Volume)
});
See Stock.Indicators for .NET for indicators and related tools.

Time Delta problem in Hackerrank not taking good answer / Python 3

The hackerrank challenge is in the following url: https://www.hackerrank.com/challenges/python-time-delta/problem
I got testcase 0 correct, but the website is saying that I have wrong answers for testcase 1 and 2, but in my pycharm, I copied the website expected output and compared with my output and they were exactly the same.
Please have a look at my code.
#!/bin/pyth
# Complete the time_delta function below.
from datetime import datetime
def time_delta(tmp1, tmp2):
dicto = {'Jan':1, 'Feb':2, 'Mar':3,
'Apr':4, 'May':5, 'Jun':6,
'Jul':7, 'Aug':8, 'Sep':9,
'Oct':10, 'Nov':11, 'Dec':12}
# extracting t1 from first timestamp without -xxxx
t1 = datetime(int(tmp1[2]), dicto[tmp1[1]], int(tmp1[0]), int(tmp1[3][:2]),int(tmp1[3][3:5]), int(tmp1[3][6:]))
# extracting t1 from second timestamp without -xxxx
t2 = datetime(int(tmp2[2]), dicto[tmp2[1]], int(tmp2[0]), int(tmp2[3][:2]), int(tmp2[3][3:5]), int(tmp2[3][6:]))
# converting -xxxx of timestamp 1
t1_utc = int(tmp1[4][:3])*3600 + int(tmp1[4][3:])*60
# converting -xxxx of timestamp 2
t2_utc = int(tmp2[4][:3])*3600 + int(tmp2[4][3:])*60
# absolute difference
return abs(int((t1-t2).total_seconds()-(t1_utc-t2_utc)))
if __name__ == '__main__':
# fptr = open(os.environ['OUTPUT_PATH'], 'w')
t = int(input())
for t_itr in range(t):
tmp1 = list(input().split(' '))[1:]
tmp2 = list(input().split(' '))[1:]
delta = time_delta(tmp1, tmp2)
print(delta)
t1_utc = int(tmp1[4][:3])*3600 + int(tmp1[4][3:])*60
For a time zone like +0715, you correctly add “7 hours of seconds” and “15 minutes of seconds”
For a timezone like -0715, you are adding “-7 hours of seconds” and “+15 minutes of seconds”, resulting in -6h45m, instead of -7h15m.
You need to either use the same “sign” for both parts, or apply the sign afterwards.

How to obtain first date of each month based on given year range in Nifi flow

I would like to know what could be the best way to obtain the starting date values for each month based on the date range.
For example: If I am given a year range of 2015-11-10 and 2018-01-15(format YYYY-mm-dd). Then I would like to extract following dates:
2015-12-01
2016-01-01
.
.
2018-01-01
You can try to use this flow for generating the first day of each month in the provided date range.
Overall flow
Step 1 Configuration: Start
Step 2 Configuration: Configure Date Range
Provide the start and end dates as configuration parameters via this step.
Step 3 Configuration: Generate First Dates For Months
This uses a Groovy script, which is provided below
Groovy script
flowFile = session.get();
if(!flowFile)
return;
DATE_FORMAT = 'yyyy-MM-dd';
startDate = Date.parse(DATE_FORMAT, flowFile.getAttribute("start_date"));
endDate = Date.parse(DATE_FORMAT, flowFile.getAttribute("end_date"));
allFirstDates = "";
Calendar calendar = Calendar.getInstance();
Set firstDaysOfMonths = new LinkedHashSet();
for (int i = 0; i <= endDate-startDate; i++) {
calendar.setTime(startDate.plus(i));
calendar.set(Calendar.DAY_OF_MONTH, 1);
firstDayOfMonth = calendar.getTime();
if (firstDayOfMonth.compareTo(startDate) >= 0) {
firstDaysOfMonths.add(calendar.getTime().format(DATE_FORMAT));
}
}
firstDaysOfMonths.each {
firstDayOfMonth -> allFirstDates = allFirstDates + firstDayOfMonth + "\n";
}
flowFile = session.putAttribute(flowFile,"all_first_dates", allFirstDates );
session.transfer(flowFile,REL_SUCCESS)
Step 4 Configuration: View Result
Output of run:
When the flow is run, the attribute all_first_dates will be populated with the first dates of each month in the date range.

VBA written in Excel for Windows not working on Mac

I have a set of macros to hide and unhide columns based on the contents of a specific row. They were all written in Excel 2013 for Windows (running in parallels on my MBA, if that's relevant) and work fine there. But when I open the worksheet in Excel 2011 for Mac, the macros give odd results. The "unhide all columns" macro works fine; the other functions get as far as hiding all columns but not as far as unhiding the ones I want to see.
I can only assume Excel for Mac is having a problem with what's in the FOR EACH loop, but I can't figure out what! I'd appreciate any guidance: I need to get this system working on both Windows and Mac.
Code below.
This function works:
Sub GANTT_Filter_Show_All()
Dim rngDates As Range
Set rngDates = Range("GANTT_Dates")
rngDates.EntireColumn.Hidden = False
End Sub
But this one only hides all the columns:
Sub GANTT_Filter_This_Quarter()
Dim intCurrentMonth As Integer, intCurrentYear As Integer, rngDates As Range, cell As Range
Dim intCurrentQuarterMonths(3) As Integer
Set rngDates = Range("GANTT_Dates")
intCurrentMonth = DatePart("m", Date)
intCurrentYear = DatePart("yyyy", Date)
'loading months of current quarter into an array intCurrentMonth
Select Case intCurrentMonth
Case 1 To 3
intCurrentQuarterMonths(0) = 1
intCurrentQuarterMonths(1) = 2
intCurrentQuarterMonths(2) = 3
Case 4 To 6
intCurrentQuarterMonths(0) = 4
intCurrentQuarterMonths(1) = 5
intCurrentQuarterMonths(2) = 6
Case 7 To 9
intCurrentQuarterMonths(0) = 7
intCurrentQuarterMonths(1) = 8
intCurrentQuarterMonths(2) = 9
Case 10 To 12
intCurrentQuarterMonths(0) = 10
intCurrentQuarterMonths(1) = 11
intCurrentQuarterMonths(2) = 12
End Select
'hiding all columns
rngDates.EntireColumn.Hidden = True
'comparing each column to array of months in current quarter and hiding if false
For Each cell In rngDates
For Each v In intCurrentQuarterMonths
If v = DatePart("m", cell.Value) And DatePart("yyyy", cell.Value) = intCurrentYear Then cell.EntireColumn.Hidden = False
Next v
Next cell
Application.Goto Reference:=Range("a1"), Scroll:=True
End Sub
I'm with #Steven on this one, nothing obviously wrong with the code. I'm not a Mac user, but it's entirely possible that there's some weirdness around the date functions, particularly those that require formatting to resolve.
I would try replacing the calls to DatePart() with calls to Month() and Year() in situations like this - even for non-Mac users. It doesn't rely on parsing the strings for formatting, so it's much more efficient (and easy to read):
Sub Benchmarks()
Dim starting As Double, test As Date, i As Long
test = Now
starting = Timer
For i = 1 To 1000000
Year test
Next i
Debug.Print "Elapsed: " & (Timer - starting)
starting = Timer
For i = 1 To 1000000
DatePart "yyyy", test
Next i
Debug.Print "Elapsed: " & (Timer - starting)
End Sub
Since you likely can't run the benchmark...
Elapsed for Year(): 0.109375
Elapsed for DatePart(): 0.515625
Also note that in addition to this, the dates in the column you're searching are coming through as Variants, it may help to explicitly cast them to dates:
If v = Month(CDate(cell.Value)) And intCurrentYear = Year(CDate(cell.Value)) Then
cell.EntireColumn.Hidden = False
End If

Resources