Tried resampling monthly data with month end dates to quarterly by mean but getting Nan - resampling

I want to resample the monthly data with month end date to Quarterly with mean.
# interpolation for US_RGDPI_R
df1 = df_new[['Date','US_RGDPI_R']]
df1.head()
df1.set_index(['Date'], inplace = True)
df1.head()
#output
US_RGDPI_R
Date
2001-01-31 13218.75936
2001-04-30 13301.23871
2001-07-31 13247.97901
2001-10-31 13284.85795
2002-01-31 13394.97196
upsampled_1 = df1.resample('M')
# Creating interpolated df and Interplotating using cubic method
interpolated_1 = upsampled_1.interpolate(method='cubic')
interpolated_1.head()
#output
US_RGDPI_R
Date
2001-01-31 13218.759360
2001-02-28 13275.335186
2001-03-31 13301.728161
2001-04-30 13301.238710
2001-05-31 13284.765079
# Creating new column in df1 interpolated by resampling interpolated_1 columndf US_RGDPI_R to
mean to quarterly
df1['interpolated'] = interpolated_1['US_RGDPI_R'].resample('Q').mean()
df1.head()
# output
US_RGDPI_R interpolated
Date
2001-01-31 13218.759360 NaN
2001-02-28 13275.335186 NaN
2001-03-31 13301.728161 NaN
2001-04-30 13301.238710 NaN
2001-05-31 13284.765079 NaN
Tried resampling with resample('Q').mean() but getting Nan
How to resolve this?
This works for Month start date with resample('QS').mean()
Thanks

Related

auto_arima() m value, and seasonal decomposition period parameter

I am working on arima modeling. The data has hourly granularity - taken from 1st May 2022 till 8th June 2022. I am trying to do forecasting for next 30 days i.e 720 hours. I am facing trouble & getting confused with the below doubts. If anybody could provide pointers then it will be great.
Tried plotting the raw data & found no trend, and seasonality
a) Checked with seasonal_decomposition() with a few period values with period=1 (correct with my understanding that season should be 0)
b) period = 12 (just random - but why it is showing some seasons?. Even if I pot without period for which default value is 7, it still shows season - why?)
Plotted this graph with seasonality value False as in the raw plot I do not see any seasons/trend & getting the below plot. How & what should be concluded???
Then I thought of capturing this season thing through resampling by plotting daily graph and getting further confused.
a) period - 7 (default for seasonal_decomposition), again I can see seasonality of 4 days when the raw plot do not show seasons.
The forecasting for this resampled (daily) data is below
I am extremely clueless now as to what to see. The more I am reading the more I am getting confused.
Below is the code that I am using.
df=pd.read_csv('~/Desktop/gru-scl/gru-scl-filtered.csv', index_col="time")
del df["Index"]
df.index=pd.to_datetime(df.index)
model = pm.auto_arima(df.bps, start_p=0, start_q=0,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=24, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
f_steps=720
fc, confint = model.predict(n_periods=f_steps, return_conf_int=True)
fc_index = np.arange(len(df.bps), len(df.bps)+f_steps)
val=0
for f in fc:
val = val+f
mean = val/f_steps
print(mean)
# make series for plotting purpose
fc_series = pd.Series(fc, index=fc_index)
lower_series = pd.Series(confint[:, 0], index=fc_index)
upper_series = pd.Series(confint[:, 1], index=fc_index)
# Plot
plt.plot(df.bps, label="Actual values")
plt.plot(fc, color='darkgreen', label="Predicted values")
plt.fill_between(fc_index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.legend(loc='upper left', fontsize=8)
plt.title('Forecast vs Actuals')
plt.xlabel("Hours since 1st May 2022")
plt.ylabel("Bps")
plt.show()

Time Conversion from str to float (00:54:50) -> (54.8) for example

I am trying to convert my time watched in a Netflix show to a float so I can total it up. I cannot figure out how to convert it. I have tried many ways, including:
temp['Minutes'] = temp['Duration'].apply(lambda x: float(x))
Error: ValueError: could not convert string to float: '00:54:45'
''' 2022-05-18 05:21:42 00:54:45 NaN Ozark: Season 4: Mud (Episode 13)
NaN Amazon FTVET31DOVI2020 Smart TV 00:54:50 00:54:50 US (United
States) Wednesday 2022-05-18
'''
I have pulled the day of week and Day out but I would like to plot it just for fun and think the minutes would be the best to add up over time.
Do it like this:
var = '00:54:45'
var_array = var.split(':')
float = float(var_array[1]) + (float(var_array[2])/60)
print(float)
Output: 54.75 (from here u can round the second part, since it's a plus it wouldn't affect the first term)

Change all images in training set

I have a convolutional neural network. And I wanted to train it on images from the training set but first they should be wrapped with my function change(tensor, float) that takes in a tensor/image of the form [hight,width,3] and a float.
Batch size =4
loading data
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)
Cnn architecture
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
#size of inputs [4,3,32,32]
#size of labels [4]
inputs = change(inputs,0.1) <----------------------------
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs) #[4, 10]
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
I am trying to apply the image function change but it gives an object error.
it there a quick way to fix it?
I am using a Julia function but it works completely fine with other objects. Error message:
JULIA: MethodError: no method matching copy(::PyObject)
Closest candidates are:
copy(!Matched::T) where T<:SHA.SHA3_CTX at /opt/julia-1.7.2/share/julia/stdlib/v1.7/SHA/src/types.jl:213
copy(!Matched::T) where T<:SHA.SHA2_CTX at /opt/julia-1.7.2/share/julia/stdlib/v1.7/SHA/src/types.jl:212
copy(!Matched::Number) at /opt/julia-1.7.2/share/julia/base/number.jl:113
I would recommend to put change function to transforms list, so you do data changes on transformation stage.
partial from functools will help you to fix number of arguments, like this:
from functools import partial
def change(input, float):
pass
# Use partial to fix number of params, such that change accepts only input
change_partial = partial(change, float=pass_float_value_here)
# Add change_partial to a list of transforms before or after converting to tensors
transforms = Compose([
RandomResizedCrop(img_size), # example
# Add change_partial here if it operates on PIL Image
change_partial,
ToTensor(), # convert to tensor
# Add change_partial here if it operates on torch tensors
change_partial,
])

Time Delta problem in Hackerrank not taking good answer / Python 3

The hackerrank challenge is in the following url: https://www.hackerrank.com/challenges/python-time-delta/problem
I got testcase 0 correct, but the website is saying that I have wrong answers for testcase 1 and 2, but in my pycharm, I copied the website expected output and compared with my output and they were exactly the same.
Please have a look at my code.
#!/bin/pyth
# Complete the time_delta function below.
from datetime import datetime
def time_delta(tmp1, tmp2):
dicto = {'Jan':1, 'Feb':2, 'Mar':3,
'Apr':4, 'May':5, 'Jun':6,
'Jul':7, 'Aug':8, 'Sep':9,
'Oct':10, 'Nov':11, 'Dec':12}
# extracting t1 from first timestamp without -xxxx
t1 = datetime(int(tmp1[2]), dicto[tmp1[1]], int(tmp1[0]), int(tmp1[3][:2]),int(tmp1[3][3:5]), int(tmp1[3][6:]))
# extracting t1 from second timestamp without -xxxx
t2 = datetime(int(tmp2[2]), dicto[tmp2[1]], int(tmp2[0]), int(tmp2[3][:2]), int(tmp2[3][3:5]), int(tmp2[3][6:]))
# converting -xxxx of timestamp 1
t1_utc = int(tmp1[4][:3])*3600 + int(tmp1[4][3:])*60
# converting -xxxx of timestamp 2
t2_utc = int(tmp2[4][:3])*3600 + int(tmp2[4][3:])*60
# absolute difference
return abs(int((t1-t2).total_seconds()-(t1_utc-t2_utc)))
if __name__ == '__main__':
# fptr = open(os.environ['OUTPUT_PATH'], 'w')
t = int(input())
for t_itr in range(t):
tmp1 = list(input().split(' '))[1:]
tmp2 = list(input().split(' '))[1:]
delta = time_delta(tmp1, tmp2)
print(delta)
t1_utc = int(tmp1[4][:3])*3600 + int(tmp1[4][3:])*60
For a time zone like +0715, you correctly add “7 hours of seconds” and “15 minutes of seconds”
For a timezone like -0715, you are adding “-7 hours of seconds” and “+15 minutes of seconds”, resulting in -6h45m, instead of -7h15m.
You need to either use the same “sign” for both parts, or apply the sign afterwards.

Subtract Time from CSV using Ruby

Hi I would like to subtract time from a CSV array using Ruby
time[0] is 12:12:00AM
time[1] is 12:12:01AM
Here is my code
time_converted = DateTime.parse(time)
difference = time_converted[1].to_i - time_converted[0].to_i
p difference
However, I got 0
p time[0].to_i gives me 12
is there a way to fix this?
You can use Time#strptime to define the format of the parsed string.
In your case the string is %I:%M:%S%p.
%I = 12 hour time
%M = minutes
%S = seconds
%p = AM/PM indicator
So to parse your example:
require 'time'
time = %w(12:12:00AM 12:12:01AM)
parsed_time = time.map { |t| Time.strptime(t, '%I:%M:%S%p').to_i }
parsed_time.last - parsed_time.first
=> 1
Use the Ruby DateTime class and parse your dates into objects of that class.

Resources