Pandas timeseries specifying time unit - time
I have an uneven timestamped data that I would like to process in Pandas. The timestamps are in milliseconds starting at 0, So the data looks like
Timestamp Property
0 1
1 2
2 3
4 4
10 4
19 7
I have a very basic question, I can create a pd.Series object with index as the Timestamp. But how does Pandas know that my timestamps are in millis or for that matter in secs or hours?
Pandas assumes timestamps are in nanos, if you want to convert a column in millis, use df['col'] = df['col'].astype('datetime64[ms]'). Note that the final column will be in nanos.
Related
How to find difference between two timestamps in Talend HI
I am new to Talend I want to find the difference between the two timestamps. I am having two columns start_time and end_time. I want to make a table in destination that will show the difference in both the timestamps, specifically I want to show hours mins and seconds. Also I want time in timestamp not in ling format, how can I achieve this start_time- 2021-06-18 08:27:52.000000 end_time- 2021-06-18 08:29:59.000000 I tried- creating a variable 'ms' of long type in tmap = TalendDate.diffDate(row181.start_time,row181.end_time,"mm") for converting into hh:mm:ss String.format("%02d:%02d:%02d.%d", (Var.ms / (1000 * 60 * 60)) % 24, (Var.ms / (1000 * 60)) % 60, (Var.ms / 1000) % 60, Var.ms % 1000) if I make table as string I am getting this err- column "call_duration" is of type bigint but expression is of type character varying Above T-map expression returning zero also I have to use long in the destination column type, but I want date type
Pattern "MM" refers to months, not minutes. Use "mm" instead. How could you return a date type for a difference between two dates ? The result is necessarily a number (long/double...) . If you want your output with hours/mins/seconds, you should use diffDate with "ss" pattern to get a long representing the duration in seconds. Then you'll have to transform this to get hours and minutes (e.g 3700 s would give you 1 hour, 1 minute, 40 seconds) . You also have to determine what kind of output you want (one column for each, a string with the concatenation of hours/minutes/seconds...) Example : with row1.diffDate being your diffdate in seconds in input of a tMap, you could separate in three different columns. Then you'll only have to concatenate all values in a string. if you want a string output with ":" separator.
Delete rows from csv/txt file by filtering on the basis of time
How to delete rows from csv/txt file by filtering on the basis of time. Delete all rows which lie outside time period 09:01 to 16:00 (Column 3)? Column 3 contains only time in hh:mm format. Whereas Column 2 contains only date (dtype int64). There are no Headers. Time dtype is object. I am able to filter based on other columns but not able to deal with time. My data looks like this: RTY,20200401,07:10,964.80,964.80,964.80,964.80,456,20 RTY,20200401,08:15,964.80,964.80,964.80,964.80,456,250 RTY,20200401,09:00,964.80,964.80,964.80,964.80,456,155 RTY,20200401,09:01,964.80,964.80,964.80,964.80,456,10 RTY,20200401,09:05,964.80,964.80,964.80,964.80,456,63 RTY,20200401,09:16,964.80,964.80,951.25,956.20,4587,159 RTY,20200401,09:17,956.20,957.25,953.10,955.15,4555,578 RTY,20200401,10:18,954.95,959.00,954.95,958.55,5121,951 RTY,20200401,12:19,958.50,960.00,956.50,959.20,3944,753 RTY,20200401,15:20,959.30,962.55,958.25,959.35,7071,258 RTY,20200401,15:30,960.00,960.00,956.15,956.15,2991,89 RTY,20200401,15:40,955.25,955.90,953.90,954.65,3812,574 RTY,20200401,16:00,955.25,955.90,953.90,954.65,3812,46 RTY,20200401,17:00,954.65,956.00,954.00,955.05,2775,654 RTY,20200401,18:00,954.65,956.00,954.00,955.05,2775,259 RTY,20200402,07:15,964.80,964.80,964.80,964.80,456,71 RTY,20200402,08:15,964.80,964.80,964.80,964.80,456,359 RTY,20200402,09:01,964.80,964.80,964.80,964.80,456,452 RTY,20200402,09:05,964.80,964.80,964.80,964.80,456,256 RTY,20200402,09:15,964.80,964.80,964.80,964.80,456,96 RTY,20200402,09:18,964.80,964.80,951.25,956.20,4587,754 RTY,20200402,09:55,956.20,957.25,953.10,955.15,4555,145 RTY,20200402,10:28,954.95,959.00,954.95,958.55,5121,252 RTY,20200402,12:49,958.50,960.00,956.50,959.20,3944,59 RTY,20200402,15:25,959.30,962.55,958.25,959.35,7071,745 RTY,20200402,15:30,960.00,960.00,956.15,956.15,2991,352 RTY,20200402,15:45,955.25,955.90,953.90,954.65,3812,621 RTY,20200401,16:00,950.25,959.90,950.90,951.65,3812,25 RTY,20200402,17:55,954.65,956.00,954.00,955.05,2775,48 RTY,20200402,18:00,954.65,956.00,954.00,955.05,2775,100
Here's a method that converts the strings to integers and then filters by value: with open('example.txt','r') as file_handle: example_file_content = file_handle.read().split("\n") for line in example_file_content: line_as_list = line.split(",") # Delete all rows which lie outside time period 09:01 to 16:00 if not (int(line_as_list[2].split(":")[0])<9 or int(line_as_list[2].split(":")[0])>16): print(line) A better approach is to convert the column to datetime import datetime with open('example.txt','r') as file_handle: example_file_content = file_handle.read().split("\n") for line in example_file_content: line_as_list = line.split(",") # Delete all rows which lie outside time period 09:01 to 16:00 if not ((datetime.datetime.strptime(line_as_list[2], '%H:%M')< datetime.datetime.strptime("09:01", '%H:%M')) or (datetime.datetime.strptime(line_as_list[2], '%H:%M')> datetime.datetime.strptime("16:00", '%H:%M'))): print(line)
elasticsearch store time series data when there are many properties fields (about 60000 fields)
store 60000 points value every 500 milliseconds with elasticsearch,there are two ways to design elasticsearch mapping: 1.one pointName/one timestamp in one doc,like this: _id timestamp pointName value uuid1 1582130490000 p1 1 uuid2 1582130490000 p2 2 .... uuid 1582130490500 p1 x 2.one timestamp with 60000 pointName in one doc,like this: _id timestamp p1 p2 p3 p4 ... p60000 uuid1 1582130490000 1 2 3 4 ... x uuid2 1582130490500 1 2 3 4 ... x i have tested write/query performance in two uper ways,no problem with aggregate query performance.but batch write performance is too slow,it takes about 1500 milliseconds to write 10000 points with 3 elasticsearch clusters. is there a bad write performance over 1000 fields in elasticsearch? how should i to design elasticsearch mapping to store 60000 points every 500 milliseconds with good write performance? thanks
SUMIF with date range for specific column
I've been trying to find an answer for this, but haven't succeeded - I need to sum a column for a specified date range, as long as my rowname matches the reference sheet's column name. i.e Reference_Sheet Date John Matt 07/01/19 1 2 07/02/19 1 2 07/03/19 2 1 07/04/19 1 1 07/05/19 3 3 07/06/19 1 2 07/07/19 1 1 07/08/19 5 9 07/09/19 9 2 Sheet1 A B 1 07/01 2 07/07 3 Week1 4 John 10 5 Matt 12 Have to work in google sheets, and I tried using SUMPRODUCT which told me I can't multiply texts and I tried SUMIFS which let me know I can't have different array arguments - failed efforts were similar to below, =SUMIFS('Reference_Sheet'!B2:AO1000,'Reference_Sheet'!A1:AO1,"=A4",'Reference_Sheet'!A2:A1000,">=B1",'Reference_Sheet'!A2:A1000,"<=B2") =SUMPRODUCT(('Reference_Sheet'!$A$2:$AO$1000)*('Reference_Sheet'!$A$2:$A$1000>=B$1)*('Reference_Sheet'!$A$2:$A$1000<=B$2)*('Reference_Sheet'!$A$1:$AO$1=$A4))
This might work: =sumifs(indirect("Reference_Sheet!"&address(2,match(A4,Reference_Sheet!A$1:AO$1,0))&":"&address(100,match(A4,Reference_Sheet!A$1:AO$1,0))),Reference_Sheet!A$2:A$100,">="&B$1,Reference_Sheet!A$2:A$100,"<="&B$2) But you'll need to specify how many rows down you need it to go. In my formula, it looks down till 100 rows. To change the number of rows, you need to change the number in three places: &address(100 Reference_Sheet!A$2:A$100," ... in two places To briefly explain what is going on: look for the person's name in row 1 using match Use address and indirect to build the address of cells to add and then sumIfs() based on dates.
alternative: =SUMPRODUCT(QUERY(TRANSPOSE(QUERY($A:$D, "where A >= date '"&TEXT(F$1, "yyyy-mm-dd")&"' and A <= date '"&TEXT(F$2, "yyyy-mm-dd")&"'", 1)), "where Col1 = '"&$E4&"'", 0))
How to reference a specific Group Total of one dataset in the expression in different dataset
I have a report with multiple datasets. In one of them I need to reference the group total from another dataset. It looks like this: Tablix1: Region1 Total Age1 Age2 a 7 5 2 b 12 6 6 c 20 12 8 Total 39 23 16 Tablix2: Region2 Value % a 4 57.14% b 6 50.00% c 5 25.00% The values in the "%" column of Tablix2 come from formula: %a = Tablix2 Value a / Tablix1 Total a. My current expression in % column of Tablix2 looks like: =CountDistinct(Fields!ID.Value, "Region2")/CountDistinct(Fields!CONSTITUENT_ID.Value, "Tablix1") but what I get is the percentage calculated of the Total row of Tablix1 and not each Region of Tablix1.
The Lookup function would work for this. It's similar to a vlookup in Excel. It would look something like this: =Lookup(Fields!Region1.Value, Fields!Region2.Value, Fields!ID.Value, "Region2") This would pull the corresponding value from Region 2 into Tablix 1. You can just switch it around if you want it in the other table.