Quantlib Zero Coupon Inflation Swap Ignores CPI Values - quantitative-finance

I am trying to price a zero-coupon USD CPI inflation swap in Quantlib and Python. My discount curve and NPV of the fixed leg looks good, but I'm a few percentage points out compared to BBG SWPM on the NPV of the inflation leg.
One thing I've noticed is that changing the values of the CPI has no effect on the price of the swap. So I think I'm setting the CPI wrong and as such the base index of the swap is wrong. Can anyone see what I'm doing wrong here?
For reference, I've backed this out from looking at the C++ unit tests. If there's a complete Python example I would be interested to see it, but couldn't find one on my own.
import QuantLib as quantlib
import pandas as pd
start_date = quantlib.Date.from_date(pd.Timestamp(2022, 9, 6))
calc_date = quantlib.Date.from_date(pd.Timestamp(2022, 9, 6))
end_date = quantlib.Date.from_date(pd.Timestamp(2024, 9, 6))
swap_type = quantlib.ZeroCouponInflationSwap.Receiver
calendar = quantlib.TARGET()
day_count_convention = quantlib.ActualActual()
contract_observation_lag = quantlib.Period(3, quantlib.Months)
business_day_convention = quantlib.ModifiedFollowing
nominal = 10e6
fixed_rate = 0.05
cpi_json = '{"columns":[1],"index":[1640908800000,1643587200000,1646006400000,1648684800000,1651276800000,1653955200000,1656547200000,1659225600000],"data":[[277.948],[278.802],[283.716],[287.504],[289.109],[292.296],[296.311],[296.276]]}'
cpi_prints = pd.read_json(cpi_json, orient='split')
# Pretty-printed CPI:
# 1
# 2021-12-31 277.948
# 2022-01-31 278.802
# 2022-02-28 283.716
# 2022-03-31 287.504
# 2022-04-30 289.109
# 2022-05-31 292.296
# 2022-06-30 296.311
# 2022-07-31 296.276
zero_coupon_observations = pd.DataFrame(index=[0],
data={'1Y': 2.73620,
'2Y': 2.975,
'3Y': 2.967,
'4Y': 2.917,
'5Y': 2.8484})
inflation_yield_term_structure = quantlib.RelinkableZeroInflationTermStructureHandle()
inflation_index = quantlib.USCPI(True, inflation_yield_term_structure)
for date, value in cpi_prints.itertuples():
# Setting the CPI as fixings, but no matter what I put here the NPV comes out the same
# Looks like the base index for the swap is not being set by me/set through the CPI prints
# I put here.
inflation_index.addFixing(quantlib.Date.from_date(date), value)
inflation_rate_helpers = []
nominal_term_structure = quantlib.YieldTermStructureHandle(quantlib.FlatForward(calc_date,
0.00, # Changing this seems to have no effect
quantlib.ActualActual()))
for tenor in zero_coupon_observations.columns:
maturity = calendar.advance(calc_date, quantlib.Period(tenor))
quote = quantlib.QuoteHandle(quantlib.SimpleQuote(zero_coupon_observations.at[0, tenor] / 100.0))
helper = quantlib.ZeroCouponInflationSwapHelper(quote,
contract_observation_lag,
maturity,
calendar,
business_day_convention,
day_count_convention,
inflation_index,
nominal_term_structure)
inflation_rate_helpers.append(helper)
# Not sure how to choose this number, just taking the 1Y tenor on the calc date?
# I'm pricing a 2Y swap, and will want to price it off it's start date as well
base_zero_rate = zero_coupon_observations.at[0, '1Y']/100
inflation_curve = quantlib.PiecewiseZeroInflation(calc_date,
calendar,
day_count_convention,
contract_observation_lag,
quantlib.Monthly,
inflation_index.interpolated(),
base_zero_rate,
inflation_rate_helpers,
1.0e-12,
quantlib.Linear())
inflation_yield_term_structure.linkTo(inflation_curve)
swap = quantlib.ZeroCouponInflationSwap(swap_type,
nominal,
start_date,
end_date,
calendar,
business_day_convention,
day_count_convention,
fixed_rate,
inflation_index,
contract_observation_lag)
# Leaving off the construction of the discount curve for brevity.
# NPV of the fixed legs checks out
discount_curve = ...
swap_engine = quantlib.DiscountingSwapEngine(discount_curve)
swap.setPricingEngine(swap_engine)
print(swap.NPV())

Related

auto_arima() m value, and seasonal decomposition period parameter

I am working on arima modeling. The data has hourly granularity - taken from 1st May 2022 till 8th June 2022. I am trying to do forecasting for next 30 days i.e 720 hours. I am facing trouble & getting confused with the below doubts. If anybody could provide pointers then it will be great.
Tried plotting the raw data & found no trend, and seasonality
a) Checked with seasonal_decomposition() with a few period values with period=1 (correct with my understanding that season should be 0)
b) period = 12 (just random - but why it is showing some seasons?. Even if I pot without period for which default value is 7, it still shows season - why?)
Plotted this graph with seasonality value False as in the raw plot I do not see any seasons/trend & getting the below plot. How & what should be concluded???
Then I thought of capturing this season thing through resampling by plotting daily graph and getting further confused.
a) period - 7 (default for seasonal_decomposition), again I can see seasonality of 4 days when the raw plot do not show seasons.
The forecasting for this resampled (daily) data is below
I am extremely clueless now as to what to see. The more I am reading the more I am getting confused.
Below is the code that I am using.
df=pd.read_csv('~/Desktop/gru-scl/gru-scl-filtered.csv', index_col="time")
del df["Index"]
df.index=pd.to_datetime(df.index)
model = pm.auto_arima(df.bps, start_p=0, start_q=0,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=24, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
f_steps=720
fc, confint = model.predict(n_periods=f_steps, return_conf_int=True)
fc_index = np.arange(len(df.bps), len(df.bps)+f_steps)
val=0
for f in fc:
val = val+f
mean = val/f_steps
print(mean)
# make series for plotting purpose
fc_series = pd.Series(fc, index=fc_index)
lower_series = pd.Series(confint[:, 0], index=fc_index)
upper_series = pd.Series(confint[:, 1], index=fc_index)
# Plot
plt.plot(df.bps, label="Actual values")
plt.plot(fc, color='darkgreen', label="Predicted values")
plt.fill_between(fc_index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.legend(loc='upper left', fontsize=8)
plt.title('Forecast vs Actuals')
plt.xlabel("Hours since 1st May 2022")
plt.ylabel("Bps")
plt.show()

How to add a maximum travel time duration for the sum of all routes in VRP Google OR-TOOLS

I am new to programming and used Google OR-tools to create my VRP model. In my current model, I have included a general time window and capacity constraint per vehicle, creating a capacitated vehicle routing problem with time windows. I followed the OR-tools guides which contains a maximum travel duration for each vehicle.
However, I want to include a maximum travel duration for the sum of all routes, whereas the maximum travel duration for each vehicle does not matter (so I set it to 100.000). Accorddingly, I want to create something in the model/solution printer that tells me which amount of addresses could not be visited due to the constraint on the maximum travel duration for the sum of all routes. From the examples I have seen I think it would be kind of easy, but my knowledge on programming is fairly limited, so my attempts had no succes. Can anyone help me?
import pandas as pd
import openpyxl
import numpy as np
import math
from random import sample
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
from scipy.spatial.distance import squareform, pdist
from haversine import haversine
#STEP - create data
# import/read excel file
data = pd.read_excel(r'C:\Users\Jean-Paul\Documents\Thesis\OR TOOLS\Data.xlsx', engine = 'openpyxl')
df = pd.DataFrame(data, columns= ['number','lat','lng']) # create dataframe with 10805 addresses + address of the depot
#print (df)
# randomly sample X addresses from the dataframe and their corresponding number/latitude/longtitude
df_sample = df.sample(n=100)
#print (df_data)
# read first row of the excel file (= coordinates of the depot)
df_depot = pd.DataFrame(data, columns= ['number','lat','lng']).iloc[0:1]
#print (df_depot)
# combine dataframe of depot and sample into one dataframe
df_data = pd.concat([df_depot, df_sample], ignore_index=True, sort=False)
#print (df_data)
#STEP - create distance matrix data
# determine distance between latitude and longtitude
df_data.set_index('number', inplace=True)
matrix_distance = pd.DataFrame(squareform(pdist(df_data, metric=haversine)), index=df_data.index, columns=df_data.index)
matrix_list = np.array(matrix_distance)
#print (matrix_distance) # create table of distances between addresses including headers
#print (matrix_list) # converting table to list of lists and exclude headers
#STEP - create time matrix data
travel_time = matrix_list / 15 * 60 # divide distance by travel speed 20 km/h and multiply by 60 minutes
#print (travel_time) # converting distance matrix to travel time matrix
#STEP - create time window data
# create list for each sample - couriers have to visit this address within 0-X minutes of time using a list of lists
window_range = []
for i in range(len(df_data)):
list = [0, 240]
window_range.append(list) # create list of list with a time window range for each address
#print (window_range)
#STEP - create demand data
# create list for each sample - all addresses demand 1 parcel except the depot
demand_range = []
for i in range(len(df_data.iloc[0:1])):
list = 0
demand_range.append(list)
for j in range(len(df_data.iloc[1:])):
list2 = 1
demand_range.append(list2)
#print (demand_range)
#STEP - create fleet size data # amount of vehicles in the fleet
fleet_size = 6
#print (fleet_size)
#STEP - create capacity data for each vehicle
fleet_capacity = []
for i in range(fleet_size): # capacity per vehicle
list = 20
fleet_capacity.append(list)
#print (fleet_capacity)
#STEP - create data model that stores all data for the problem
def create_data_model():
data = {}
data['time_matrix'] = travel_time
data['time_windows'] = window_range
data['num_vehicles'] = fleet_size
data['depot'] = 0 # index of the depot
data['demands'] = demand_range
data['vehicle_capacities'] = fleet_capacity
return data
#STEP - creating the solution printer
def print_solution(data, manager, routing, solution):
"""Prints solution on console."""
print(f'Objective: {solution.ObjectiveValue()}')
time_dimension = routing.GetDimensionOrDie('Time')
total_time = 0
for vehicle_id in range(data['num_vehicles']):
index = routing.Start(vehicle_id)
plan_output = 'Route for vehicle {}:\n'.format(vehicle_id)
while not routing.IsEnd(index):
time_var = time_dimension.CumulVar(index)
plan_output += '{0} Time({1},{2}) -> '.format(
manager.IndexToNode(index), solution.Min(time_var),
solution.Max(time_var))
index = solution.Value(routing.NextVar(index))
time_var = time_dimension.CumulVar(index)
plan_output += '{0} Time({1},{2})\n'.format(manager.IndexToNode(index),
solution.Min(time_var),
solution.Max(time_var))
plan_output += 'Time of the route: {}min\n'.format(
solution.Min(time_var))
print(plan_output)
total_time += solution.Min(time_var)
print('Total time of all routes: {}min'.format(total_time))
#STEP - create the VRP solver
def main():
# instantiate the data problem
data = create_data_model()
# create the routing index manager
manager = pywrapcp.RoutingIndexManager(len(data['time_matrix']),
data['num_vehicles'], data['depot'])
# create routing model
routing = pywrapcp.RoutingModel(manager)
#STEP - create demand callback and dimension for capacity
# create and register a transit callback
def demand_callback(from_index):
"""Returns the demand of the node."""
# convert from routing variable Index to demands NodeIndex
from_node = manager.IndexToNode(from_index)
return data['demands'][from_node]
demand_callback_index = routing.RegisterUnaryTransitCallback(
demand_callback)
routing.AddDimensionWithVehicleCapacity(
demand_callback_index,
0, # null capacity slack
data['vehicle_capacities'], # vehicle maximum capacities
True, # start cumul to zero
'Capacity')
#STEP - create time callback
# create and register a transit callback
def time_callback(from_index, to_index):
"""Returns the travel time between the two nodes."""
# convert from routing variable Index to time matrix NodeIndex
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return data['time_matrix'][from_node][to_node]
transit_callback_index = routing.RegisterTransitCallback(time_callback)
# define cost of each Arc (costs in terms of travel time)
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
# STEP - create a dimension for the travel time (TIMEWINDOW) - dimension keeps track of quantities that accumulate over a vehicles route
# add time windows constraint
time = 'Time'
routing.AddDimension(
transit_callback_index,
2, # allow waiting time (does not have an influence in this model)
100000, # maximum total route lenght in minutes per vehicle (does not have an influence because of capacity constraint)
False, # do not force start cumul to zero
time)
time_dimension = routing.GetDimensionOrDie(time)
# add time window constraints for each location except depot
for location_idx, time_window in enumerate(data['time_windows']):
if location_idx == data['depot']:
continue
index = manager.NodeToIndex(location_idx)
time_dimension.CumulVar(index).SetRange(time_window[0], time_window[1])
# add time window constraint for each vehicle start node
depot_idx = data['depot']
for vehicle_id in range(data['num_vehicles']):
index = routing.Start(vehicle_id)
time_dimension.CumulVar(index).SetRange(
data['time_windows'][depot_idx][0],
data['time_windows'][depot_idx][1])
#STEP - instantiate route start and end times to produce feasible times
for i in range(data['num_vehicles']):
routing.AddVariableMinimizedByFinalizer(
time_dimension.CumulVar(routing.Start(i)))
routing.AddVariableMinimizedByFinalizer(
time_dimension.CumulVar(routing.End(i)))
#STEP - setting default search parameters and a heuristic method for finding the first solution
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (
routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
#STEP - solve the problem with the serach parameters and print solution
solution = routing.SolveWithParameters(search_parameters)
if solution:
print_solution(data, manager, routing, solution)
if __name__ == '__main__':
main()
See #Mizux's answer, going under-the-hood in the solver to make a summation cost over all vehicle route lengths:
https://stackoverflow.com/a/68756570/13773745

problem, backtesting with backtrader and btalib, It's not action as i tried, want to action as order buy if sma5>sma10 and sell when sma5<sma14

from btalib.indicators import sma
import pandas as pd
import backtrader as bt
import os.path #To manage paths
import sys # to find out the script name
import datetime
import matplotlib as plt
from backtrader import cerebro
from numpy import mod #for datetime object
df = pd.read_csv('C:/Users/User/Desktop/programming/dataset/coin_Bitcoin.csv',parse_dates=True, index_col='Date')
sma14 = btalib.sma(df, period = 14)
sma5 = btalib.sma(df, period=5)
class TestStrategy(bt.Strategy):
params = (
('exitbars', 5),
)
def log(self, txt, dt=None):
#Logging function fot this strategy
dt = dt or self.datas[0].datetime.date(0)
print('%s, %s' % (dt.isoformat(), txt))
def __init__(self):
# Keep a reference to the "close" line in the data[0] dataseries
self.dataclose = self.datas[0].close
# To keep track of pending orders
self.order = None
self.buyprice = None
self.buycomm = None
def notify_order(self, order):
if order.status in [order.Submitted, order.Accepted]:
# Buy/Sell order submitted/accepted to/by broker - Nothing to do
return
if order.status in [order.Completed]:
if order.isbuy():
self.log(
'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm: %.2f' %
(order.executed.price,
order.executed.value,
order.executed.comm))
self.buyprice = order.executed.price
self.buycomm = order.executed.comm
else: #sell
self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f'%
(order.executed.price,
order.executed.value,
order.executed.comm))
self.bar_executed = len(self)
elif order.status in [order.Canceled, order.Margin, order.Rejected]:
self.log('Order Canceled/Margin/Reject')
# Write down: no pending order
self.order = None
# Check if an order has been completed
# Attention: broker could reject order if not enough cash
def notify_trade(self, trade):
if not trade.isclosed:
return
self.log('OPERATION PROFIT, GROSS %.2f, NET %.2f' %
(trade.pnl, trade.pnlcomm))
def next(self):
#sma = btalib.sma(df, period=30)
# Simply log the closing price of the series from the reference
self.log('Close, %.2f' % self.dataclose[0])
# Check if an order is pending ... if yes, we cannot send a 2nd one
if self.order:
return
# Check if we are in the market
#if not self.position:
# Not yet ... we MIGHT BUY if ...
if sma5[0] > sma14[0]:
# BUY, BUY, BUY!!! (with all possible default parameters)
self.log('BUY CREATE, %.2f' % self.dataclose[0])
# Keep track of the created order to avoid a 2nd order
self.order = self.buy()
else:
# Already in the market ... we might sell
if sma5[0] < sma14[0]:
# SELL, SELL, SELL!!! (with all possible default parameters)
self.log('S[enter image description here][1]ELL CREATE, %.2f' % self.dataclose[0])
self.order = self.sell()
if __name__ == '__main__':
# Create a cerebro entity
cerebro = bt.Cerebro()
# Add a strategy
cerebro.addstrategy(TestStrategy)
modpath = os.path.dirname(os.path.abspath(sys.argv[0]))
datapath = os.path.join(modpath, 'C:/programming/AlgoTrading/backtest/BTC-USD-YF.csv')
data = bt.feeds.YahooFinanceCSVData(
dataname = datapath,
fromdate = datetime.datetime(2020,5,1),
todate = datetime.datetime(2021,6,1),
reverse = False)
#Add the Data Feed to Cerebro
cerebro.adddata(data)
cerebro.broker.setcash(100000.0)
# Add a FixedSize sizer according to the stake
#cerebro.addsizer(bt.sizers.FixedSize, stake=10)
cerebro.addsizer(bt.sizers.FixedSize)
# Set the commission
cerebro.broker.setcommission(commission=0.0)
# Print out the starting conditions
print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue())
# Run over everything
cerebro.run()
#print(df(data))
# Print out the final result
print('Final Portfolio Value: %.2f' % cerebro.broker.getvalue())
cerebro.plot()
I tried so hard to order buy when sma5>sma14 and sell at sma5<sma14 but it doesn't work
I use backtrader as backtesting library and use btalib for indicator o generate signal where "btalib.sma(df, period)"
cerebro function is what backtesting module
sometimes it's buy and sell everyday, buy today sell tomorrow
Probably you have to invert the order of your df, this was my problem when computing the RSI with btalib.
Example: df = df.iloc[::-1]

Converge on Best Combination of Elements

You have $10,000 to invest in stocks. You are given a list of 200 stocks, and are told to select 8 of those stocks to buy, and also indicate how many of those stocks you want to buy. You cannot spend more than $2,500 on a single stock alone, and each stock has its own price ranging from $100 to $1000. You cannot buy a fraction of a stock, only whole numbers. Each stock also has a value attached to it indicating how profitable it is. This is an arbitrary number from 0-100 that serves as a simple rating system.
The end goal is to list the optimal selection of 8 stocks, and indicate the best quantity of each of those stocks to buy without going over the $2,500 limit for each stock.
• I'm not asking for investment advice, I chose stocks because it acts as a good metaphor for the actual problem I'm trying to solve.
• Seems like what I'm looking at is a more complex version of the 0/1 Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem.
• No, this isn't homework.
Here is lightly tested code for solving your problem exactly in time that is polynomial in the amount of money available, the number of stocks that you have, and the maximum amount of stock that you can buy.
#! /usr/bin/env python
from collections import namedtuple
Stock = namedtuple('Stock', ['id', 'price', 'profit'])
def optimize (stocks, money=10000, max_stocks=8, max_per_stock=2500):
Investment = namedtuple('investment', ['profit', 'stock', 'quantity', 'previous_investment'])
investment_transitions = []
last_investments = {money: Investment(0, None, None, None)}
for _ in range(max_stocks):
next_investments = {}
investment_transitions.append([last_investments, next_investments])
last_investments = next_investments
def prioritize(stock):
# This puts the best profit/price, as a ratio, first.
val = [-(stock.profit + 0.0)/stock.price, stock.price, stock.id]
return val
for stock in sorted(stocks, key=prioritize):
# We reverse transitions so we have not yet added the stock to the
# old investments when we add it to the new investments.
for transition in reversed(investment_transitions):
old_t = transition[0]
new_t = transition[1]
for avail, invest in old_t.iteritems():
for i in range(int(min(avail, max_per_stock)/stock.price)):
quantity = i+1
new_avail = avail - quantity*stock.price
new_profit = invest.profit + quantity*stock.profit
if new_avail not in new_t or new_t[new_avail].profit < new_profit:
new_t[new_avail] = Investment(new_profit, stock, quantity, invest)
best_investment = investment_transitions[0][0][money]
for transition in investment_transitions:
for invest in transition[1].values():
if best_investment.profit < invest.profit:
best_investment = invest
purchase = {}
while best_investment.stock is not None:
purchase[best_investment.stock] = best_investment.quantity
best_investment = best_investment.previous_investment
return purchase
optimize([Stock('A', 100, 10), Stock('B', 1040, 160)])
And here it is with the tiny optimization of deleting investments once we see that continuing to add stocks to it cannot improve. This will probably run orders of magnitude faster than the old code with your data.
#! /usr/bin/env python
from collections import namedtuple
Stock = namedtuple('Stock', ['id', 'price', 'profit'])
def optimize (stocks, money=10000, max_stocks=8, max_per_stock=2500):
Investment = namedtuple('investment', ['profit', 'stock', 'quantity', 'previous_investment'])
investment_transitions = []
last_investments = {money: Investment(0, None, None, None)}
for _ in range(max_stocks):
next_investments = {}
investment_transitions.append([last_investments, next_investments])
last_investments = next_investments
def prioritize(stock):
# This puts the best profit/price, as a ratio, first.
val = [-(stock.profit + 0.0)/stock.price, stock.price, stock.id]
return val
best_investment = investment_transitions[0][0][money]
for stock in sorted(stocks, key=prioritize):
profit_ratio = (stock.profit + 0.0) / stock.price
# We reverse transitions so we have not yet added the stock to the
# old investments when we add it to the new investments.
for transition in reversed(investment_transitions):
old_t = transition[0]
new_t = transition[1]
for avail, invest in old_t.items():
if avail * profit_ratio + invest.profit <= best_investment.profit:
# We cannot possibly improve with this or any other stock.
del old_t[avail]
continue
for i in range(int(min(avail, max_per_stock)/stock.price)):
quantity = i+1
new_avail = avail - quantity*stock.price
new_profit = invest.profit + quantity*stock.profit
if new_avail not in new_t or new_t[new_avail].profit < new_profit:
new_invest = Investment(new_profit, stock, quantity, invest)
new_t[new_avail] = new_invest
if best_investment.profit < new_invest.profit:
best_investment = new_invest
purchase = {}
while best_investment.stock is not None:
purchase[best_investment.stock] = best_investment.quantity
best_investment = best_investment.previous_investment
return purchase

Calculate running/cumulative cost of EC2 spot instance

I often run spot instances on EC2 (for Hadoop task jobs, temporary nodes, etc.) Some of these are long-running spot instances.
Its fairly easy to calculate the cost for on-demand or reserved EC2 instances - but how do I calculate the cost incurred for a specific node (or nodes) that are running as spot instances?
I am aware that the cost for a spot instance changes every hour depending on market rate - so is there any way to calculate the cumulative total cost for a running spot instance? Through an API or otherwise?
OK I found a way to do this in the Boto library. This code is not perfect - Boto doesn't seem to return the exact time range, but it does get the historic spot prices more or less within a range. The following code seems to work quite well. If anyone can improve on it, that would be great.
import boto, datetime, time
# Enter your AWS credentials
aws_key = "YOUR_AWS_KEY"
aws_secret = "YOUR_AWS_SECRET"
# Details of instance & time range you want to find spot prices for
instanceType = 'm1.xlarge'
startTime = '2012-07-01T21:14:45.000Z'
endTime = '2012-07-30T23:14:45.000Z'
aZ = 'us-east-1c'
# Some other variables
maxCost = 0.0
minTime = float("inf")
maxTime = 0.0
totalPrice = 0.0
oldTimee = 0.0
# Connect to EC2
conn = boto.connect_ec2(aws_key, aws_secret)
# Get prices for instance, AZ and time range
prices = conn.get_spot_price_history(instance_type=instanceType,
start_time=startTime, end_time=endTime, availability_zone=aZ)
# Output the prices
print "Historic prices"
for price in prices:
timee = time.mktime(datetime.datetime.strptime(price.timestamp,
"%Y-%m-%dT%H:%M:%S.000Z" ).timetuple())
print "\t" + price.timestamp + " => " + str(price.price)
# Get max and min time from results
if timee < minTime:
minTime = timee
if timee > maxTime:
maxTime = timee
# Get the max cost
if price.price > maxCost:
maxCost = price.price
# Calculate total price
if not (oldTimee == 0):
totalPrice += (price.price * abs(timee - oldTimee)) / 3600
oldTimee = timee
# Difference b/w first and last returned times
timeDiff = maxTime - minTime
# Output aggregate, average and max results
print "For: one %s in %s" % (instanceType, aZ)
print "From: %s to %s" % (startTime, endTime)
print "\tTotal cost = $" + str(totalPrice)
print "\tMax hourly cost = $" + str(maxCost)
print "\tAvg hourly cost = $" + str(totalPrice * 3600/ timeDiff)
I've re-written Suman's solution to work with boto3. Make sure to use utctime with the tz set!:
def get_spot_instance_pricing(ec2, instance_type, start_time, end_time, zone):
result = ec2.describe_spot_price_history(InstanceTypes=[instance_type], StartTime=start_time, EndTime=end_time, AvailabilityZone=zone)
assert 'NextToken' not in result or result['NextToken'] == ''
total_cost = 0.0
total_seconds = (end_time - start_time).total_seconds()
total_hours = total_seconds / (60*60)
computed_seconds = 0
last_time = end_time
for price in result["SpotPriceHistory"]:
price["SpotPrice"] = float(price["SpotPrice"])
available_seconds = (last_time - price["Timestamp"]).total_seconds()
remaining_seconds = total_seconds - computed_seconds
used_seconds = min(available_seconds, remaining_seconds)
total_cost += (price["SpotPrice"] / (60 * 60)) * used_seconds
computed_seconds += used_seconds
last_time = price["Timestamp"]
# Difference b/w first and last returned times
avg_hourly_cost = total_cost / total_hours
return avg_hourly_cost, total_cost, total_hours
You can subscribe to the spot instance data feed to get charges for your running instances dumped to an S3 bucket. Install the ec2 toolset and then run:
ec2-create-spot-datafeed-subscription -b bucket-to-dump-in
Note: you can have only one data feed subscription for your entire account.
In about an hour you should start seeing gzipped tabbed delimited files show up in the bucket that look something like this:
#Version: 1.0
#Fields: Timestamp UsageType Operation InstanceID MyBidID MyMaxPrice MarketPrice Charge Version
2013-05-20 14:21:07 UTC SpotUsage:m1.xlarge RunInstances:S0012 i-1870f27d sir-b398b235 0.219 USD 0.052 USD 0.052 USD 1
I have recently developed a small python library that calculates the cost of a single EMR cluster, or for a list of clusters (given a period of days).
It takes into account Spot instances and Task nodes as well (that may go up and down while the cluster is still running).
In order to calculate the cost I use the bid price, which (in many cases) might not be the exact price that you end up paying for the instance.
Depending on your bidding policy however, this price can be accurate enough.
You can find the code here: https://github.com/memosstilvi/emr-cost-calculator

Resources