python pandas index by time

python pandas index by time - time

I have a csv file that looks like this:
"06/09/2013 14:08:34.930","7.2680542849633447","1.6151231744362988","0","0","21","1546964992","15.772567829158248","1577332736","8360","21.400382061280961","0","15","0","685","0","0","0","0","0","0","0","4637","0"
the csv includes 1 month daily values (24 hrs)
I have a need to load it to pandas and then get some stats on data (min, max) but I need the data to include data records for all days only working hours (between 8:00 to 18:00)
I am very new to pandas library

Load your data:
import pandas as pd
from datetime import datetime
df = pd.read_csv('data.csv', header=None, index_col=0)
Filter your data for working hours from 8:00 to 18:00:
work_hours = lambda d: datetime.strptime(d, '%d/%m/%Y %H:%M:%S.%f').hour in range(8, 18)
df = df[map(work_hours, df.index)]
Get the min and max of the first data column:
min, max = df[1].min(), df[1].max()

Related

from the yfinance library how can I read the ex-dividend date?

This code should return the ex-dividend date:
import yfinance as yf
yf.Ticker('AGNC').info['exDividendDate']
but I get this as an output:
1661817600
I am wondering if there is a way to get the date from that number ?

It looks like this number is obtained based on seconds. In order to get the real date, you can use pd.to_datetime to convert the seconds to calendar date.
import pandas as pd
pd.to_datetime(1661817600, unit='s')
Out[6]: Timestamp('2022-08-30 00:00:00')
or you can use the built-in datetime package in Python.
from datetime import datetime
print(datetime.fromtimestamp(1661817600))
2022-08-30 08:00:00

Days from dates in qlikview expression

I am trying to get days from given data like this:
In this data suppose ID B start date is 4/10/2019 and end date is 10/25/2019. Here there is 7 months: April to October, so for the first month start date is 4/10/2019 and end date is 4/30/2019 so this means he only avail 10 days from this month and remaining days is 21.. same for here end date is 10/25/2019 so if we look calendar end date 10/31/2019 we only avail 6 days so in data I want to get above data which is mentioned in image .. where as I am trying this formula in qlikview:
=sum(
If(
MonthName(CalendarMonthEnd) = MonthName([End Date]),
([End Date]-CalendarMonthStart+1),
(RangeMin([End Date],CalendarMonthEnd)-RangeMax([Start Date],CalendarMonthStart))
)
)
and through this formula I get this data which is remaining days where i want to get days which is availed...
this is link of folder please download and check ..
https://www.dropbox.com/s/v48373io1bv9qqj/file_qlik.rar?dl=0
in this folder the excel file "output.. " in this excel file the first table output which i need

Just add another if
=Sum(
If(CalendarMonthStart >= [Start Date] and CalendarMonthEnd <= [End Date],
CalendarMonthEnd-CalendarMonthStart,
If([Start Date]>CalendarMonthStart,
[Start Date]-CalendarMonthStart+1,
CalendarMonthEnd-[End Date])
)
)
)

PySpark round off timestamps to full hours?

I am interested in rounding off timestamps to full hours. What I got so far is to round to the nearest hour. For example with this:
df.withColumn("Full Hour", hour((round(unix_timestamp("Timestamp")/3600)*3600).cast("timestamp")))
But this "round" function uses HALF_UP rounding. This means: 23:56 results in 00:00 but I would instead prefer to have 23:00. Is this possible? I didn't find an option field how to set the rounding behaviour in the function.

I think you're overcomplicating things. Hour function returns by default an hour component of a timestamp.
from pyspark.sql.functions import to_timestamp
from pyspark.sql import Row
df = (sc
.parallelize([Row(Timestamp='2016_08_21 11_59_08')])
.toDF()
.withColumn("parsed", to_timestamp("Timestamp", "yyyy_MM_dd hh_mm_ss")))
df2 = df.withColumn("Full Hour", hour(unix_timestamp("parsed").cast("timestamp")))
df2.show()
Output:
+-------------------+-------------------+---------+
| Timestamp| parsed|Full Hour|
+-------------------+-------------------+---------+
|2016_08_21 11_59_08|2016-08-21 11:59:08| 11|
+-------------------+-------------------+---------+

Speed up Pandas DateTime variable

I have a number of quite large cvs files (1,000,000 rows each) which contain a DateTime column. I am using Pandas pivot tables to summarise them. Part of what this involves is splitting out this DateTime variable into hours and minutes. I am using the following code, which is working fine, but it is taking quite a lot of time (around 4-5 minutes).
My question is: Is this just because the files are so large/my laptop to slow, or is there a more efficient code that allows me to split out hours and minutes from a DateTime variable?
Thanks
df['hours'], df['minutes'] = pd.DatetimeIndex(df['DateTime']).hour, pd.DatetimeIndex(df['DateTime']).minute

If dtypes of column Datetime is not datetime, first convert it to_datetime. Then use dt.hour and dt.minute:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['hours'], df['minutes'] = df['DateTime'].dt.hour, df['DateTime'].dt.minute
Sample:
import pandas as pd
df = pd.DataFrame({'DateTime': ['2014-06-17 11:09:20', '2014-06-18 10:02:10']})
print (df)
DateTime
0 2014-06-17 11:09:20
1 2014-06-18 10:02:10
print (df.dtypes)
DateTime object
dtype: object
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['hours'], df['minutes'] = df['DateTime'].dt.hour, df['DateTime'].dt.minute
print (df)
DateTime hours minutes
0 2014-06-17 11:09:20 11 9
1 2014-06-18 10:02:10 10 2

Awk and calculating start time from end time and duration

I have a file with date, end time and duration in decimal format and I need to calculate the start time. The file looks like:
20140101;1212;1.5
20140102;1515;1.58
20140103;1759;.69
20140104;1100;12.5
...
The duration 1.5 for the time 12:12 means one and a half hours and the start time would be 12:12 - 1:30 = 10:42 AM or 11:00 - 12.5 = 11:00 - 12:30 = 22:30 PM. Is there an easy way for calculating such time differences in Awk or is it the good ol' split-multiply-subtract-and-handle-the-day-break-yourself all over again?
Since the values are in hours and minutes, only the minutes matter and the seconds can be discarded, for example duration 1.58 means 1:34 and the leftover 0.8 seconds can be discarded.
I'm on GNU Awk 4.1.3

As you are using gawk take adventage of its native time functions:
gawk -F\; '{tmst=sprintf("%s %s %s %s %s 00",\
substr($1,1,4),\
substr($1,5,2),\
substr($1,7,2),\
substr($2,1,2),\
substr($2,3,2))
t1=mktime(tmst)
seconds=sprintf("%f",$3)+0
seconds*=60*60
difference=strftime("%H%M",t1-seconds)
print $0""FS""difference}' file
Results:
20140101;1212;1.5;1042
20140102;1515;1.58;1340
20140103;1759;.69;1717
20140104;1100;12.5;2230
Check: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
Explanation:
tmst=sprintf(..) :used to create a date string from the file
that conforms with the datespec of mktime function YYYY MM
DD HH MM SS [DST].
t1=mktime(tmst) :turn datespec into a timestamp than can be
handle by gawk (as the number of seconds elapsed since 1
January 1970)
seconds=sprintf("%f",$3)+0 : convert third field to float.
seconds*=60*60 : convert hours (in float) to seconds.
difference=strftime("%H%M",t1-seconds) : get the difference in
human maner, hours an minutes.

I highly recommend to use a programming language which supports datetime calculations, because the calculation can be tricky in detail because daylight saving shifts. You can use Python for example:
start_times.py:
import csv
from datetime import datetime, timedelta
with open('input.txt', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=';', quotechar='|')
for row in reader:
end_day = row[0]
end_time = row[1]
# Create a datetime object
end = datetime.strptime(end_day + end_time, "%Y%m%d%H%M")
# Translate duration into minutes
duration=float(row[2])*60
# Calculate start time
start = end - timedelta(minutes=duration)
# Column 3 is the start day (can differ from end day!)
row.append(start.strftime("%Y%m%d"))
# Column 4 is the start time
row.append(start.strftime("%H%M"))
print ';'.join(row)
Run:
python start_times.py
Output:
20140101;1212;1.5;20140101;1042
20140102;1515;1.58;20140102;1340
20140103;1759;.69;20140103;1717
20140104;1100;12.5;20140103;2230 <-- you see, the day matters!
The above example is using the system's timezone. If the input data refers to a different timezone, Pyhon's datetime module allows to specify it.

I would do something like this:
awk 'BEGIN{FS=OFS=";"}
{ h=substr($2,0,2); m=substr($2,3,2); mins=h*60 + m; diff=mins - $3*60;
print $0, int(diff/60) ":" int(diff%60)
}' file
That is, convert everything to minutes and then back to hours/minutes.
Test
$ awk 'BEGIN{FS=OFS=";"}{h=substr($2,0,2); m=substr($2,3,2); mins=h*60 + m; diff=mins - $3*60; print $0, int(diff/60) ":" int(diff%60)}' a
20140101;1212;1.5;10:42
20140102;1515;1.58;13:40
20140103;1759;.69;17:17

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

python pandas index by time - time

Related

from the yfinance library how can I read the ex-dividend date?

Days from dates in qlikview expression

PySpark round off timestamps to full hours?

Speed up Pandas DateTime variable

Awk and calculating start time from end time and duration

Categories

Resources