Data import integer start with zero - clickhouse

I'm using clickhouse for the first time, and when I'm doing import like this:
cat /home/data/_XDR_IMPORT_1001_20001010_000001_.tsv | clickhouse-client --password=123 --query="INSERT INTO ts FORMAT TSV";
It gives me an error:
Column 13, name: dpc, type: Nullable(Int32), parsed text: "0"
ERROR: garbage after Nullable(Int32): "3242"
And this is because I have a column (dpc) in type Int32 and the value of this column is 03242, so it seems the import process takes only 0 and trying to find the tap after it.
Please help anyone?

ok, you can use following command:
sed -E "s/(\t+)0([0-9]+)/\1\2/g" 1.tsv /home/data/_XDR_IMPORT_1001_20001010_000001_.tsv | clickhouse-client --password=123 --query="INSERT INTO ts FORMAT TSV";
and hope first column doesn't contains leading zero ;)

change dpc field to string
and add new column
ALTER TABLE ts
ADD COLUMN dpc_int UInt64 MATERIALIZED toUInt64(dpc);

Related

how to write for loop and if statement to fill specific value

I'm trying to fill a new column in my data frame based on specific condition.
From column is already timedelta dtype.
for index in df:
if From >= "07:00:00" & From <"15:00:00"
df.shift="A"
I get a syntax error.
My dataframe looks like this (inferred from comment):
From,shift
00:00:00,None
00:30:00,None
01:00:00,None
01:30:00,None
02:00:00,None
[...]
Try adding ":" to the end of the "if" statement. It is missing in your example.

Handle date column with spaces in nifi

I am trying to format the date field which contains spaces in replace text processor from a csv file.
Got the error as it is unable to parse the date column which is spaces for first record. Please let me know how to handle this
Error message: Replace text failed to process session due to Cannot parse attribute value as a date; date format ddMMyyyy; attribute value:
Input csv:
1, , 123
2,02091997,234
Search value : (.{1}),(.{8}), (.{3})
Replacement value : $1, ${'$2':toDate("ddMMyyyy") :format("yyyy-MM-dd HH:mm:ss.SSS") }, $3
Replacement strategy : Regex Replace
Evaluation mode : Entire Text
You can use the isEmpty and ifElse function from the language expression.
ex: ${'$2':isEmpty():ifElse('null', '$2':toDate("ddMMyyyy"):format("yyyy-MM-dd HH:mm:ss.SSS")) }
Here I put 'null', when there is no date, but you can choose the value you want.
However, if you can, parse your CSV files with a CSVReacordReader which handle that out of the box.

Extract 2 fields from string with search

I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.
Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777
Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".

Convert string to timestamp in MonetDB

How does one convert a string/varchar to a timestamp in MonetDB ?
Like this, but with millisecond precision (to six decimal places, ideally):
sql>select str_to_date('2008-09-19-18.40.09.812000', '%Y-%m-%d-%H.%M.%6S');
+--------------------------+
| str_to_date_single_value |
+==========================+
| 2008-09-19 |
+--------------------------+
1 tuple (0.312ms)
I'm not sure whether str_to_date is built in or whether I created it ages ago and forgot.
create function str_to_date(s string, format string) returns date
external name mtime."str_to_date";
Edit: expected output something like
+---------------------------------+
| str_to_timestamp_single_value |
+=================================+
| 2008-09-19 18:40:09.812000 |
+---------------------------------+
Monetdb time conversion functions are listed in :
[Monetdb installation folder]\MonetDB5\lib\monetdb5\createdb\13_date.sql.
Besides the str_to_date function, there is a str_to_timestamp function.
The syntax of the format string follows the MySQL one.
Example :
select sys.str_to_timestamp('2016-02-04 15:30:29', '%Y-%m-%d %H:%M:%S');
The date/time specifiers might need to be changed:
select str_to_date('2008-09-19-18.40.09.812000','%Y-%m-%d-%H.%i.%s.%f')
output:
2008-09-19 18:40:09.812000
*monetdb could be different, although in standard SQL these are the standard date specifiers.
You could also use date_format in addition to str_to_date:
select date_format(str_to_date('SEP 19 2008 06:40:09:812PM','%M %D %Y %h:%i:%s:%f%p'),'%Y-%m-%d-%H.%i.%f');
output:
2008-09-19-18.40.812000

Reading format in Fortran 90

I have a huge file to read whose structure is:
[...]
(0,0,0,0,0): 5.00634e-33, 5.59393e-33, 6.24691e-33, 7.29338e-33,
(0,0,0,0,4): 7.77607e-33, 8.95879e-33, 9.65316e-33, 1.07434e-32,
(0,0,0,0,8): 1.20824e-32, 1.34983e-32, 1.49877e-32, 1.73061e-32,
(0,0,0,0,12): 1.919e-32, 2.15391e-32, 2.3996e-32, 2.67899e-32,
[...]
I'm interested in reading the value after ":", which format should I use in the read statement if I use Fortran90?
I've tried with
read(1,'("(",I6,",",I6,",",I6,",",I6,",",I6,"):",F10.4,F10.4,F10.4,F10.4)')idx1,idx2,idx3,idx4,idx5,dummy1,dummy2,dummy3,dummy4
But I got a forrtl: severe (64): input conversion error
Since it appears that the items don't line up in columns this is tricky to do with formats. I'd approach it this way:
read (55, '(A)') string
colon_pos = index (string, ":")
read (string (colon_pos+1:len_string), * ) real1, real2, real3, real4
read each line into a string, locate the colon, then use list-directed IO to process the numeric values in the string after the colon.

Resources