Pandas timeseries specifying time unit

Pandas timeseries specifying time unit - time

I have an uneven timestamped data that I would like to process in Pandas. The timestamps are in milliseconds starting at 0, So the data looks like
Timestamp Property
0 1
1 2
2 3
4 4
10 4
19 7
I have a very basic question, I can create a pd.Series object with index as the Timestamp. But how does Pandas know that my timestamps are in millis or for that matter in secs or hours?

Pandas assumes timestamps are in nanos, if you want to convert a column in millis, use df['col'] = df['col'].astype('datetime64[ms]'). Note that the final column will be in nanos.

Related

How to find difference between two timestamps in Talend HI

I am new to Talend I want to find the difference between the two timestamps.
I am having two columns start_time and end_time.
I want to make a table in destination that will show the difference in both the timestamps, specifically I want to show hours mins and seconds.
Also I want time in timestamp not in ling format, how can I achieve this
start_time- 2021-06-18 08:27:52.000000
end_time- 2021-06-18 08:29:59.000000
I tried-
creating a variable 'ms' of long type in tmap = TalendDate.diffDate(row181.start_time,row181.end_time,"mm")
for converting into hh:mm:ss
String.format("%02d:%02d:%02d.%d", (Var.ms / (1000 * 60 * 60)) % 24, (Var.ms / (1000 * 60)) % 60, (Var.ms / 1000) % 60, Var.ms % 1000)
if I make table as string I am getting this err-
column "call_duration" is of type bigint but expression is of type character varying
Above T-map expression returning zero also I have to use long in the destination column type, but I want date type

Pattern "MM" refers to months, not minutes. Use "mm" instead.
How could you return a date type for a difference between two dates ? The result is necessarily a number (long/double...) .
If you want your output with hours/mins/seconds, you should use diffDate with "ss" pattern to get a long representing the duration in seconds. Then you'll have to transform this to get hours and minutes (e.g 3700 s would give you 1 hour, 1 minute, 40 seconds) . You also have to determine what kind of output you want (one column for each, a string with the concatenation of hours/minutes/seconds...)
Example : with row1.diffDate being your diffdate in seconds in input of a tMap, you could separate in three different columns. Then you'll only have to concatenate all values in a string. if you want a string output with ":" separator.

Delete rows from csv/txt file by filtering on the basis of time

How to delete rows from csv/txt file by filtering on the basis of time. Delete all rows which lie outside time period 09:01 to 16:00 (Column 3)?
Column 3 contains only time in hh:mm format.
Whereas Column 2 contains only date (dtype int64).
There are no Headers.
Time dtype is object.
I am able to filter based on other columns but not able to deal with time.
My data looks like this:
RTY,20200401,07:10,964.80,964.80,964.80,964.80,456,20
RTY,20200401,08:15,964.80,964.80,964.80,964.80,456,250
RTY,20200401,09:00,964.80,964.80,964.80,964.80,456,155
RTY,20200401,09:01,964.80,964.80,964.80,964.80,456,10
RTY,20200401,09:05,964.80,964.80,964.80,964.80,456,63
RTY,20200401,09:16,964.80,964.80,951.25,956.20,4587,159
RTY,20200401,09:17,956.20,957.25,953.10,955.15,4555,578
RTY,20200401,10:18,954.95,959.00,954.95,958.55,5121,951
RTY,20200401,12:19,958.50,960.00,956.50,959.20,3944,753
RTY,20200401,15:20,959.30,962.55,958.25,959.35,7071,258
RTY,20200401,15:30,960.00,960.00,956.15,956.15,2991,89
RTY,20200401,15:40,955.25,955.90,953.90,954.65,3812,574
RTY,20200401,16:00,955.25,955.90,953.90,954.65,3812,46
RTY,20200401,17:00,954.65,956.00,954.00,955.05,2775,654
RTY,20200401,18:00,954.65,956.00,954.00,955.05,2775,259
RTY,20200402,07:15,964.80,964.80,964.80,964.80,456,71
RTY,20200402,08:15,964.80,964.80,964.80,964.80,456,359
RTY,20200402,09:01,964.80,964.80,964.80,964.80,456,452
RTY,20200402,09:05,964.80,964.80,964.80,964.80,456,256
RTY,20200402,09:15,964.80,964.80,964.80,964.80,456,96
RTY,20200402,09:18,964.80,964.80,951.25,956.20,4587,754
RTY,20200402,09:55,956.20,957.25,953.10,955.15,4555,145
RTY,20200402,10:28,954.95,959.00,954.95,958.55,5121,252
RTY,20200402,12:49,958.50,960.00,956.50,959.20,3944,59
RTY,20200402,15:25,959.30,962.55,958.25,959.35,7071,745
RTY,20200402,15:30,960.00,960.00,956.15,956.15,2991,352
RTY,20200402,15:45,955.25,955.90,953.90,954.65,3812,621
RTY,20200401,16:00,950.25,959.90,950.90,951.65,3812,25
RTY,20200402,17:55,954.65,956.00,954.00,955.05,2775,48
RTY,20200402,18:00,954.65,956.00,954.00,955.05,2775,100

Here's a method that converts the strings to integers and then filters by value:
with open('example.txt','r') as file_handle:
example_file_content = file_handle.read().split("\n")
for line in example_file_content:
line_as_list = line.split(",")
# Delete all rows which lie outside time period 09:01 to 16:00
if not (int(line_as_list[2].split(":")[0])<9 or
int(line_as_list[2].split(":")[0])>16):
print(line)
A better approach is to convert the column to datetime
import datetime
with open('example.txt','r') as file_handle:
example_file_content = file_handle.read().split("\n")
for line in example_file_content:
line_as_list = line.split(",")
# Delete all rows which lie outside time period 09:01 to 16:00
if not ((datetime.datetime.strptime(line_as_list[2], '%H:%M')<
datetime.datetime.strptime("09:01", '%H:%M')) or
(datetime.datetime.strptime(line_as_list[2], '%H:%M')>
datetime.datetime.strptime("16:00", '%H:%M'))):
print(line)

elasticsearch store time series data when there are many properties fields (about 60000 fields)

store 60000 points value every 500 milliseconds with elasticsearch,there are two ways to design elasticsearch mapping:
1.one pointName/one timestamp in one doc,like this:
_id timestamp pointName value
uuid1 1582130490000 p1 1
uuid2 1582130490000 p2 2
....
uuid 1582130490500 p1 x
2.one timestamp with 60000 pointName in one doc,like this:
_id timestamp p1 p2 p3 p4 ... p60000
uuid1 1582130490000 1 2 3 4 ... x
uuid2 1582130490500 1 2 3 4 ... x
i have tested write/query performance in two uper ways,no problem with aggregate query performance.but batch write performance is too slow,it takes about 1500 milliseconds to write 10000 points with 3 elasticsearch clusters.
is there a bad write performance over 1000 fields in elasticsearch?
how should i to design elasticsearch mapping to store 60000 points every 500 milliseconds with good write performance?
thanks

SUMIF with date range for specific column

I've been trying to find an answer for this, but haven't succeeded - I need to sum a column for a specified date range, as long as my rowname matches the reference sheet's column name.
i.e
Reference_Sheet
Date John Matt
07/01/19 1 2
07/02/19 1 2
07/03/19 2 1
07/04/19 1 1
07/05/19 3 3
07/06/19 1 2
07/07/19 1 1
07/08/19 5 9
07/09/19 9 2
Sheet1
A B
1 07/01
2 07/07
3 Week1
4 John 10
5 Matt 12
Have to work in google sheets, and I tried using SUMPRODUCT which told me I can't multiply texts and I tried SUMIFS which let me know I can't have different array arguments - failed efforts were similar to below,
=SUMIFS('Reference_Sheet'!B2:AO1000,'Reference_Sheet'!A1:AO1,"=A4",'Reference_Sheet'!A2:A1000,">=B1",'Reference_Sheet'!A2:A1000,"<=B2")
=SUMPRODUCT(('Reference_Sheet'!$A$2:$AO$1000)*('Reference_Sheet'!$A$2:$A$1000>=B$1)*('Reference_Sheet'!$A$2:$A$1000<=B$2)*('Reference_Sheet'!$A$1:$AO$1=$A4))

This might work:
=sumifs(indirect("Reference_Sheet!"&address(2,match(A4,Reference_Sheet!A$1:AO$1,0))&":"&address(100,match(A4,Reference_Sheet!A$1:AO$1,0))),Reference_Sheet!A$2:A$100,">="&B$1,Reference_Sheet!A$2:A$100,"<="&B$2)
But you'll need to specify how many rows down you need it to go. In my formula, it looks down till 100 rows.
To change the number of rows, you need to change the number in three places:
&address(100
Reference_Sheet!A$2:A$100," ... in two places
To briefly explain what is going on:
look for the person's name in row 1 using match
Use address and indirect to build the address of cells to add
and then sumIfs() based on dates.

alternative:
=SUMPRODUCT(QUERY(TRANSPOSE(QUERY($A:$D,
"where A >= date '"&TEXT(F$1, "yyyy-mm-dd")&"'
and A <= date '"&TEXT(F$2, "yyyy-mm-dd")&"'", 1)),
"where Col1 = '"&$E4&"'", 0))

How to reference a specific Group Total of one dataset in the expression in different dataset

I have a report with multiple datasets. In one of them I need to reference the group total from another dataset. It looks like this:
Tablix1:
Region1 Total Age1 Age2
a 7 5 2
b 12 6 6
c 20 12 8
Total 39 23 16
Tablix2:
Region2 Value %
a 4 57.14%
b 6 50.00%
c 5 25.00%
The values in the "%" column of Tablix2 come from formula: %a = Tablix2 Value a / Tablix1 Total a.
My current expression in % column of Tablix2 looks like:
=CountDistinct(Fields!ID.Value, "Region2")/CountDistinct(Fields!CONSTITUENT_ID.Value, "Tablix1")
but what I get is the percentage calculated of the Total row of Tablix1 and not each Region of Tablix1.

The Lookup function would work for this. It's similar to a vlookup in Excel. It would look something like this:
=Lookup(Fields!Region1.Value, Fields!Region2.Value, Fields!ID.Value, "Region2")
This would pull the corresponding value from Region 2 into Tablix 1. You can just switch it around if you want it in the other table.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Pandas timeseries specifying time unit - time

Pandas assumes timestamps are in nanos, if you want to convert a column in millis, use df['col'] = df['col'].astype('datetime64[ms]'). Note that the final column will be in nanos.

Related

How to find difference between two timestamps in Talend HI

Delete rows from csv/txt file by filtering on the basis of time

elasticsearch store time series data when there are many properties fields (about 60000 fields)

SUMIF with date range for specific column

How to reference a specific Group Total of one dataset in the expression in different dataset

Categories

Resources