How does one convert a string/varchar to a timestamp in MonetDB ?
Like this, but with millisecond precision (to six decimal places, ideally):
sql>select str_to_date('2008-09-19-18.40.09.812000', '%Y-%m-%d-%H.%M.%6S');
+--------------------------+
| str_to_date_single_value |
+==========================+
| 2008-09-19 |
+--------------------------+
1 tuple (0.312ms)
I'm not sure whether str_to_date is built in or whether I created it ages ago and forgot.
create function str_to_date(s string, format string) returns date
external name mtime."str_to_date";
Edit: expected output something like
+---------------------------------+
| str_to_timestamp_single_value |
+=================================+
| 2008-09-19 18:40:09.812000 |
+---------------------------------+
Monetdb time conversion functions are listed in :
[Monetdb installation folder]\MonetDB5\lib\monetdb5\createdb\13_date.sql.
Besides the str_to_date function, there is a str_to_timestamp function.
The syntax of the format string follows the MySQL one.
Example :
select sys.str_to_timestamp('2016-02-04 15:30:29', '%Y-%m-%d %H:%M:%S');
The date/time specifiers might need to be changed:
select str_to_date('2008-09-19-18.40.09.812000','%Y-%m-%d-%H.%i.%s.%f')
output:
2008-09-19 18:40:09.812000
*monetdb could be different, although in standard SQL these are the standard date specifiers.
You could also use date_format in addition to str_to_date:
select date_format(str_to_date('SEP 19 2008 06:40:09:812PM','%M %D %Y %h:%i:%s:%f%p'),'%Y-%m-%d-%H.%i.%f');
output:
2008-09-19-18.40.812000
Related
Background Story
I have an excel table of values with thousands seperator . and floating point seperator ,.
If the number is lower than 1000, therefore only the , exists. In UiPath, I'm using a read range and store the data in a data table. Somehow, Uipath manages to replace the , by a . because it interprets the value as float. But this only happens to values lower than 1000. Larger numbers are interpreted as string and all the seperators stay the same.
Example:
+───────────+───────────────+─────────+
| Input | UiPath Value | Type |
+───────────+───────────────+─────────+
| 4.381,14 | 4.381,14 | String |
| 5.677,50 | 5.677,50 | String |
| 605,27 | 605.27 | Double |
+───────────+───────────────+─────────+
Problem
I want to loop through the data table and apply some logic to each value. Because of the different data types, I assign the value to a generic value variable. It is a huge problem that the , is automatically replaced by a ., because in my context, this is a completely different value. Therefore I somehow need to check the data type, so i can replace the seperator again.
Attempt
I'm trying to get the type by GetType().ToString(), but it only delivers me: UiPath.Core.GenericValue
I tried to replicate it. And I have successfully converted to double using the following steps. I have taken one value and followed the below steps.
strValue = dt(0)(0).ToString.Replace(".","$")
strValue = strValue.Replace(",",".")
strValue = strValue.Replace("$",",")
dblValue = CDbl(strValue)
In UiPath, when we read data from Excel, it will be treating the cell values as generic objects. So, we explicitly convert it to String.
I did a simple proc freq in SAS:
PROC FREQ DATA=test;
a * b;
RUN;
This raised the error: insufficient page size to print frequency table
From ERROR: Insufficient page size to print frequency table in SAS PROC FREQ I learned that the error is fixed by enlarging the page size:
option pagesize=max;
But then my table still looked strange with super high white spaces in column b:
Frequency |
Percent |
Row Pct | value 1 | value 2 |
Col Pct | | |
| | |
...etc... ...etc...
| | |
----------+----------+----------+
a | 12 | 3 |
What solved my problem was adding a format to the proc freq that truncated variable b.
PROC FREQ DATA=test;
FORMAT B $7.;
a * b;
RUN;
now my result looks like this and I'm happy enough:
Frequency |
Percent |
Row Pct |
Col Pct | value 1 | value 2 |
----------+----------+----------+
a | 12 | 3 |
I'm left a bit bewilderd, because nowhere in the code did I apply a format to b before, just a lenght statement. Other variables that had their lengths fixed did not have this problem. I did switch from an excel sourcefile to oracle-exadata as source. Is it possible that Oracle pushes variable formats to SAS?
SAS has a nasty habit of attaching formats to character variables pulled from external databases, including PROC IMPORT from an EXCEL file. So if a character variable has a storage length of 200 then SAS will also attach the $200. format to the variable.
When you combine two dataets that both contain the same variable the length will be set by the first version of the variable seen. But the format attached will be set by the first non-empty format seen. So you could combine a dataset where A has length $10 and no format attached with another dataset where A has the format $200. attached and the result will a variable with an actual length of 10 but the $200. format attached.
You can use the format statement where you list variable names but no format specification to remove them. You could do it in the PROC step.
PROC FREQ DATA=test;
tables a * b;
format _character_ ;
RUN;
Or do it in a data step or use PROC DATASETS to modify the formats attached to the variable in an existing dataset.
I'm using clickhouse for the first time, and when I'm doing import like this:
cat /home/data/_XDR_IMPORT_1001_20001010_000001_.tsv | clickhouse-client --password=123 --query="INSERT INTO ts FORMAT TSV";
It gives me an error:
Column 13, name: dpc, type: Nullable(Int32), parsed text: "0"
ERROR: garbage after Nullable(Int32): "3242"
And this is because I have a column (dpc) in type Int32 and the value of this column is 03242, so it seems the import process takes only 0 and trying to find the tap after it.
Please help anyone?
ok, you can use following command:
sed -E "s/(\t+)0([0-9]+)/\1\2/g" 1.tsv /home/data/_XDR_IMPORT_1001_20001010_000001_.tsv | clickhouse-client --password=123 --query="INSERT INTO ts FORMAT TSV";
and hope first column doesn't contains leading zero ;)
change dpc field to string
and add new column
ALTER TABLE ts
ADD COLUMN dpc_int UInt64 MATERIALIZED toUInt64(dpc);
For timestamps I am using the ISO format with zone ID included, defined as:
yyyy-MM-dd'T'HH:mm:ss.SSS'Z['z']'
This format for example matches these two timestamps:
2015-02-20T09:46:56.336Z[UTC]
2015-02-20T10:46:55.221+01:00[Europe/Berlin]
For indexing data into elasticsearch I also defined this date format in a mapping like this (using elastic4s DSL):
create index indexName mappings {
"/exampleType" as (
"exampleField" typed DateType
) dynamicDateFormats "yyyy-MM-dd'T'HH:mm:ss.SSS'Z['z']'"
}
Basically this mapping works as expected but I experience problems when the formatted date string gets too long due to the zone ID. E. g. the above example
2015-02-20T09:46:56.336Z[UTC]
having 30 chars works fine, whereas
2015-02-20T10:46:55.221+01:00[Europe/Berlin]
having 44 chars fails to index with following error:
...
Caused by: java.io.IOException: Cannot read numeric data larger than 32 chars
at org.elasticsearch.index.analysis.NumericTokenizer.incrementToken(NumericTokenizer.java:78) ~[elasticsearch-1.4.2.jar:na]
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:618) ~[lucene-core-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:51:56]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359) ~[lucene-core-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:51:56]
...
My question is if there's a way to get around this problem e. g. by means of configuration or if I am forced to change my date format to ensure that formatted dates do not exceed 32 chars.
I have a string '20141014123456789' which represents a timestamp with milliseconds that I need to convert to a timestamp in Hive (0.13.0) without losing the milliseconds.
I tried this but unix_timestamp returns an integer, so I lose the milliseconds:
from_unixtime(unix_timestamp('20141014123456789', 'yyyyMMddHHmmssSSS')) >> 2014-10-14 12:34:56
Casting a string works:
cast('2014-10-14 12:34:56.789' as timestamp) >> 2014-10-14 12:34:56.789
but my string isn't in that form.
I think I need to reformat my string from '20141014123456789' to '2014-10-14 12:34:56.789'. My challenge is how to do that without a messy concatenation of substrings.
I found a way to avoid the messy concatenation of substrings using the following code:
select cast(regexp_replace('20141014123456789',
'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})',
'$1-$2-$3 $4:$5:$6.$7') as timestamp)
A simple strategy would be to use date_format(arg1, arg2), where arg1 is the timestamp either as formatted string, date, or timestamp and the arg2 is the format of the string (in arg1). Refer to the SimpleDateFormat java documentation for what is acceptable in the format argument.
So, in this case:
date_format('20141014123456789', 'yyyyMMddHHmmssSSS')
would yield the following string: '2014-10-14 12:34:56.789' which can then be cast as timestamp:
cast(date_format('20141014123456789', 'yyyyMMddHHmmssSSS') as timestamp)
The above statement would return timestamp (as desired).
i had the date field in this form 2015-07-22T09:00:32.956443Z(stored as string). i needed to do some date manipulations.
the following command even though little messy worked fine for me:)
select cast(concat(concat(substr(date_created,1,10),' '),substr(date_created,12,15)) as timestamp) from tablename;
this looks confusing but it is quite easy if you break it down.
extracting the date and time with milliseconds and concat a space in between and then concat the whole thing and casting it into timestamp. now this can be used for date or timestamp manipulations.
Let say you have a column 'birth_date' in your table which is in string format,
you should use the following query to filter using birth_date
date_Format(birth_date, 'yyyy-MM-dd HH:mm:ssSSS')
You can use it in a query in the following way
select * from yourtable
where
date_Format(birth_date, 'yyyy-MM-dd HH:mm:ssSSS') = '2019-04-16 07:12:59999';
I don't think this can be done without being messy. Because according to the unix_timestamp() function documentation it returns the time is seconds and hence will omit the milliseconds part.
"Convert time string with given pattern to Unix time stamp (in seconds), return 0 if fail: unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400."
Best option here would be to write a UDF to handle this is you want to avoid messy concatenations. However the concatenation (though messy) would be better to the job.