ClickHouse dataset import error: DB::ParsingException - clickhouse

I encountered an error while importing sample data from https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz
It seems to be a parse error, and I don't know why Row 4531: Column 7 's parse result is "0??:?[?<0x03>U" rather than "0", why the '\t' is parsed in to string?
First, Create Table:
CREATE TABLE hits_NoPrimaryKey
(
`UserID` UInt32,
`URL` String,
`EventTime` DateTime
)
ENGINE = MergeTree
PRIMARY KEY tuple();
Second, import data:
INSERT INTO hits_NoPrimaryKey SELECT
intHash32(c11::UInt64) AS UserID,
c15 AS URL,
c5 AS EventTime
FROM url('https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz')
WHERE URL != '';
↓ Progress: 4.37 million rows, 8.28 GB (18.25 thousand rows/s., 34.55 MB/s.) (0.0 CPU, 121.22 MB RAM)
0 rows in set. Elapsed: 239.650 sec. Processed 4.37 million rows, 8.28 GB (18.25 thousand rows/s., 34.55 MB/s.)
Received exception from server (version 22.10.1):
Code: 27. DB::Exception: Received from localhost:9000. DB::ParsingException. DB::ParsingException: Cannot parse input: expected '\t' before: 'c??m???\t115\t2668037917139250981\t0\t227\t105\thttp://yandsearch[filter=user_page=http://book-nika/nyurttunian/haberlandsearch&text=all&user_page-148564b4080_1280x1':
Row 4530:
Column 0, name: c1, type: Nullable(DateTime64(9)), parsed text: "7111657564365305139"
Column 1, name: c2, type: Nullable(Int64), parsed text: "1"
Column 2, name: c3, type: Nullable(String), parsed text: <EMPTY>
Column 3, name: c4, type: Nullable(Int64), parsed text: "1"
Column 4, name: c5, type: Nullable(DateTime64(9)), parsed text: "2014-03-17 20:56:03"
Column 5, name: c6, type: Nullable(Date), parsed text: "2014-03-17"
Column 6, name: c7, type: Nullable(String), parsed text: "31440846"
Column 7, name: c8, type: Nullable(DateTime64(9)), parsed text: "3653375523"
Column 8, name: c9, type: Nullable(String), parsed text: "??:?[?<0x03>Uc??m???<0x1A>"
Column 9, name: c10, type: Nullable(String), parsed text: "42"
Column 10, name: c11, type: Nullable(String), parsed text: "2668037917139250981"
Column 11, name: c12, type: Nullable(Int64), parsed text: "0"
Column 12, name: c13, type: Nullable(Int64), parsed text: "227"
Column 13, name: c14, type: Nullable(Int64), parsed text: "105"
Column 14, name: c15, type: Nullable(String), parsed text: "http://yandsearch[filter=user_page=http://book-nika/nyurttunian/haberlandsearch&text=all&user_page-148564b4080_1280x120"
Column 15, name: c16, type: Nullable(String), parsed text: <EMPTY>
Column 16, name: c17, type: Nullable(String), parsed text: "yandex.ru.msn"
Column 17, name: c18, type: Nullable(String), parsed text: <EMPTY>
Column 18, name: c19, type: Nullable(Int64), parsed text: "0"
Column 19, name: c20, type: Nullable(Int64), parsed text: "0"
Column 20, name: c21, type: Array(Nullable(Int64)), parsed text: "[]"
Column 21, name: c22, type: Array(Nullable(Int64)), parsed text: "[]"
Column 22, name: c23, type: Array(Nullable(Int64)), parsed text: "[239]"
Column 23, name: c24, type: Array(Nullable(Int64)), parsed text: "[]"
Column 24, name: c25, type: Nullable(Int64), parsed text: "355"
Column 25, name: c26, type: Nullable(Int64), parsed text: "514"
Column 26, name: c27, type: Nullable(Int64), parsed text: "57"
Column 27, name: c28, type: Nullable(Int64), parsed text: "0"
Column 28, name: c29, type: Nullable(Int64), parsed text: "0"
Column 29, name: c30, type: Nullable(Float64), parsed text: <EMPTY>
Column 30, name: c31, type: Nullable(Int64), parsed text: "0"
Column 31, name: c32, type: Nullable(Int64), parsed text: "0"
Column 32, name: c33, type: Nullable(Int64), parsed text: "44"
Column 33, name: c34, type: Nullable(String), parsed text: "s?"
Column 34, name: c35, type: Nullable(Int64), parsed text: "1"
Column 35, name: c36, type: Nullable(Int64), parsed text: "1"
Column 36, name: c37, type: Nullable(Int64), parsed text: "1"
Column 37, name: c38, type: Nullable(Int64), parsed text: "0"
Column 38, name: c39, type: Nullable(String), parsed text: <EMPTY>
Column 39, name: c40, type: Nullable(String), parsed text: <EMPTY>
Column 40, name: c41, type: Nullable(String), parsed text: "2023156"
Column 41, name: c42, type: Nullable(Int64), parsed text: "0"
Column 42, name: c43, type: Nullable(Int64), parsed text: "0"
Column 43, name: c44, type: Nullable(String), parsed text: <EMPTY>
Column 44, name: c45, type: Nullable(Int64), parsed text: "0"
Column 45, name: c46, type: Nullable(Int64), parsed text: "1"
Column 46, name: c47, type: Nullable(Int64), parsed text: "436"
Column 47, name: c48, type: Nullable(Int64), parsed text: "1002"
Column 48, name: c49, type: Nullable(Int64), parsed text: "296"
Column 49, name: c50, type: Nullable(DateTime64(9)), parsed text: "2014-03-17 07:47:03"
Column 50, name: c51, type: Nullable(Int64), parsed text: "0"
Column 51, name: c52, type: Nullable(Int64), parsed text: "0"
Column 52, name: c53, type: Nullable(String), parsed text: "0"
Column 53, name: c54, type: Nullable(Int64), parsed text: "0"
Column 54, name: c55, type: Nullable(String), parsed text: "utf-8"
Column 55, name: c56, type: Nullable(Int64), parsed text: "315"
Column 56, name: c57, type: Nullable(Int64), parsed text: "0"
Column 57, name: c58, type: Nullable(Int64), parsed text: "0"
Column 58, name: c59, type: Nullable(Int64), parsed text: "1"
Column 59, name: c60, type: Nullable(String), parsed text: "0"
Column 60, name: c61, type: Nullable(String), parsed text: "559851309"
Column 61, name: c62, type: Nullable(Int64), parsed text: "0"
Column 62, name: c63, type: Nullable(Int64), parsed text: "0"
Column 63, name: c64, type: Nullable(Int64), parsed text: "0"
Column 64, name: c65, type: Nullable(Int64), parsed text: "1"
Column 65, name: c66, type: Nullable(Int64), parsed text: "0"
Column 66, name: c67, type: Nullable(String), parsed text: "E"
Column 67, name: c68, type: Nullable(DateTime64(9)), parsed text: "2014-03-17 07:06:29"
Column 68, name: c69, type: Nullable(Int64), parsed text: "55"
Column 69, name: c70, type: Nullable(Int64), parsed text: "1"
Column 70, name: c71, type: Nullable(Int64), parsed text: "3"
Column 71, name: c72, type: Nullable(String), parsed text: "0"
Column 72, name: c73, type: Nullable(Int64), parsed text: "0"
Column 73, name: c74, type: Array(Nullable(Int64)), parsed text: "[72,14]"
Column 74, name: c75, type: Nullable(DateTime64(9)), parsed text: "2044414662"
Column 75, name: c76, type: Nullable(String), parsed text: "??<0x17>?_Y???<0x06>?5Ӵ."
Column 76, name: c77, type: Nullable(String), parsed text: "54527"
Column 77, name: c78, type: Nullable(Int64), parsed text: "-1"
Column 78, name: c79, type: Nullable(Int64), parsed text: "1"
Column 79, name: c80, type: Nullable(String), parsed text: "nD"
Column 80, name: c81, type: Nullable(String), parsed text: "??"
Column 81, name: c82, type: Nullable(String), parsed text: <EMPTY>
Column 82, name: c83, type: Nullable(String), parsed text: <EMPTY>
Column 83, name: c84, type: Nullable(Int64), parsed text: "0"
Column 84, name: c85, type: Nullable(String), parsed text: "1978"
Column 85, name: c86, type: Nullable(Int64), parsed text: "-1"
Column 86, name: c87, type: Nullable(String), parsed text: "-1"
Column 87, name: c88, type: Nullable(String), parsed text: "-1"
Column 88, name: c89, type: Nullable(String), parsed text: "-1"
Column 89, name: c90, type: Nullable(String), parsed text: "-1"
Column 90, name: c91, type: Nullable(String), parsed text: "-1"
Column 91, name: c92, type: Nullable(String), parsed text: "-1"
Column 92, name: c93, type: Nullable(String), parsed text: "-1"
Column 93, name: c94, type: Nullable(String), parsed text: "2852"
Column 94, name: c95, type: Nullable(String), parsed text: "3597"
Column 95, name: c96, type: Nullable(Int64), parsed text: "15"
Column 96, name: c97, type: Nullable(String), parsed text: "-1"
Column 97, name: c98, type: Nullable(String), parsed text: "3888"
Column 98, name: c99, type: Nullable(Int64), parsed text: "-1"
Column 99, name: c100, type: Nullable(Int64), parsed text: "0"
Column 100, name: c101, type: Nullable(String), parsed text: <EMPTY>
Column 101, name: c102, type: Nullable(Int64), parsed text: "0"
Column 102, name: c103, type: Nullable(String), parsed text: <EMPTY>
Column 103, name: c104, type: Nullable(String), parsed text: "<0x07>?<0x1F>"
Column 104, name: c105, type: Nullable(Int64), parsed text: "0"
Column 105, name: c106, type: Array(Nullable(Int64)), parsed text: "[]"
Column 106, name: c107, type: Nullable(String), parsed text: <EMPTY>
Column 107, name: c108, type: Nullable(String), parsed text: <EMPTY>
Column 108, name: c109, type: Nullable(String), parsed text: <EMPTY>
Column 109, name: c110, type: Nullable(String), parsed text: <EMPTY>
Column 110, name: c111, type: Nullable(String), parsed text: <EMPTY>
Column 111, name: c112, type: Nullable(String), parsed text: <EMPTY>
Column 112, name: c113, type: Nullable(String), parsed text: <EMPTY>
Column 113, name: c114, type: Nullable(String), parsed text: <EMPTY>
Column 114, name: c115, type: Nullable(String), parsed text: <EMPTY>
Column 115, name: c116, type: Nullable(String), parsed text: <EMPTY>
Column 116, name: c117, type: Nullable(Int64), parsed text: "0"
Column 117, name: c118, type: Nullable(String), parsed text: "15284527577228392792"
Column 118, name: c119, type: Nullable(String), parsed text: "1303689622826169012"
Column 119, name: c120, type: Nullable(String), parsed text: "0"
Column 120, name: c121, type: Nullable(String), parsed text: "0"
Column 121, name: c122, type: Nullable(String), parsed text: <EMPTY>
Column 122, name: c123, type: Nullable(String), parsed text: <EMPTY>
Column 123, name: c124, type: Nullable(String), parsed text: <EMPTY>
Column 124, name: c125, type: Array(Nullable(String)), parsed text: "[]"
Column 125, name: c126, type: Array(Nullable(String)), parsed text: "[]"
Column 126, name: c127, type: Array(Nullable(String)), parsed text: "[]"
Column 127, name: c128, type: Array(Nullable(String)), parsed text: "[]"
Column 128, name: c129, type: Array(Nullable(String)), parsed text: "[]"
Column 129, name: c130, type: Array(Nullable(Float64)), parsed text: "[]"
Column 130, name: c131, type: Nullable(String), parsed text: "???+???<0x19>?<0x04>??bKQ9"
Column 131, name: c132, type: Nullable(String), parsed text: "6"
Column 132, name: c133, type: Nullable(Int64), parsed text: "1"
Row 4531:
Column 0, name: c1, type: Nullable(DateTime64(9)), parsed text: "8484166349348046735"
Column 1, name: c2, type: Nullable(Int64), parsed text: "1"
Column 2, name: c3, type: Nullable(String), parsed text: "Почта Mail.ru - Почта Mail.Ru | Spor,Magazin,Haberler, Oyun, Video moda.ru"
Column 3, name: c4, type: Nullable(Int64), parsed text: "1"
Column 4, name: c5, type: Nullable(DateTime64(9)), parsed text: "2014-03-17 21:33:13"
Column 5, name: c6, type: Nullable(Date), parsed text: "2014-03-17"
Column 6, name: c7, type: Nullable(String), parsed text: "31440846"
Column 7, name: c8, type: Nullable(DateTime64(9)), parsed text: "0<TAB>??:?[?<0x03>U"
ERROR: garbage after Nullable(DateTime64(9)): "c??m???<0x1A><TAB>1"
: While executing ParallelParsingBlockInputFormat: While executing URL: (in file/uri https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz): (at row 4377757)
. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)
Any one know why the exception occured?

Related

How to get sum of field1 for unique values of field2 in Kibana

I have index pattern in kibana. I want to get SUM of one field based on UNIQUE values of second field. How can I get this in kibana visualization.
As example :
{ server: 1, size: 813, #timestamp: "2018-12-28 09:00"}
{ server: 2, size: 654, #timestamp: "2018-12-28 09:00"}
{ server: 3, size: 752, #timestamp: "2018-12-28 09:00"}
{ server: 1, size: 915, #timestamp: "2018-12-28 10:00"}
{ server: 2, size: 823, #timestamp: "2018-12-28 10:00"}
{ server: 3, size: 783, #timestamp: "2018-12-28 10:00"}
I want solution like below :
(915 + 823 + 783 = 2521)
Any Solution ??

Converting Language Detection Score of CLD2 to CLD3 Accuracy

My cld2 language detection model (langID) returns for the input sentence to classify the following values
{ reliable: true,
textBytes: 181,
languages:
[ { name: 'ITALIAN', code: 'it', percent: 61, score: 774 },
{ name: 'ENGLISH', code: 'en', percent: 38, score: 1573 } ],
chunks:
[ { name: 'ITALIAN', code: 'it', offset: 0, bytes: 116 },
{ name: 'ENGLISH', code: 'en', offset: 116, bytes: 71 } ] }
where the textBytes represents the size of the input text, percent the distribution of the code in the sentence, while the score is an indicator of the quality of the detection (the smaller it is the best it is).
That said, in the brand new CLD3 neural network, the result of the classification is just the accuracy (so a probability value between 0 and 1) so like
println(ld.getCode(0))
println(ld.getScore(0))
en
0.99
I would like to figure out how to convert CLD2 score to probabilities values in order to compare the results to the new CLD3 model.

how to create c3js Bar chart from json array

if i have a json array like the below json
[
{id: 27, name: "ACT LIFE BU1", count: 360}
{id: 38, name: "ALUMATREND BU2", count: 802}
{id: 36, name: "ASPATREND BU2", count: 350}
{id: 28, name: "AVOSOYA BU1", count: 569}
]
how to make a chart like the below picture
many thanks for your help,

Elixir POST file to Heroku file Attachment Scanner add-on

I am trying to scan uploaded documents for viruses when a user uploads, using the Heroku Add-on Attachment Scanner.
I am attempting to encode the file directly with Poison.encode, but it is throwing an error so am not sure this is the correct method. Any help appreciated, below is my attempted HTTPoison post request, and the error from Poison.encode!.
def scan do
url = System.get_env("ATTACHMENT_SCANNER_URL") <> "/requests"
token = System.get_env("ATTACHMENT_SCANNER_API_TOKEN")
headers =
[
"Authorization": "bearer " <> token,
"Content-Type": "multipart/form-data",
]
file_path = local_path_to_pdf_file
file = file_path |> File.read!
body = Poison.encode!(%{file: file})
res = HTTPoison.post(url, body, headers, recv_timeout: 40_000)
end
Poison.encode(file) error:
iex(3)> Poison.encode(file)
** (FunctionClauseError) no function clause matching in Poison.Encoder.BitString.chunk_size/3
The following arguments were given to Poison.Encoder.BitString.chunk_size/3:
# 1
<<226, 227, 207, 211, 13, 10, 49, 48, 51, 32, 48, 32, 111, 98, 106, 13, 60, 60,
47, 76, 105, 110, 101, 97, 114, 105, 122, 101, 100, 32, 49, 47, 76, 32, 50,
53, 50, 53, 51, 52, 51, 47, 79, 32, 49, 48, 53, 47, 69, 32, ...>>
# 2
nil
# 3
1
ps. I need to send the file directly, and am unable to host the image publicly, so the node.js examples in the docs will not work.
file = "/some/path/video.mp4"
HTTPoison.post( "api.vid.me/video/upload";, {:multipart, [{:file, file, {"form-data", [name: "filedata", filename: Path.basename(file)]}, []}]}, ["AccessToken": "XXXXX"] )
will this help you?.. reference
Following on from Dinesh' answer, here is the code snippet which I went for:
headers =
[
"Authorization": "bearer " <> token,
"Content-Type": "multipart/form-data",
]
file_path = Ev2.Lib.MergerAPI.get_timecard_document_path
body = {:multipart, [{:file, file_path}]}
res = HTTPoison.post(url, body, headers)

How do I save the csv data into the mongodb?

I want to write a service method for saving bulk amount of csv data into the Mongo database keeping things in mind that it will not give some memory overflow error.
How can I do this if anyone already done or know please help.
You can use GridFS to save document larger than 16 MB, here is the link: https://docs.mongodb.com/v3.2/core/gridfs/, come back again when you have further questions.
Try to model a document based on your CSV that you're trying to store. It's hard to comment directly on your CSV format take the following for example:
Id,Name,Student 1 Id,Student 1 Name,Student 2 Id,Student 2 Name,Student 3 Id,Student 3 Name
48,Bill,1001,Kev,1002,Sakis,1005,Lee
78,Fred,1005,Lee,1073,Karen,1021,Jay
We could break this down in to a teacher collection:
{
_id: 48,
name: "Bill",
students: [
{_id: 1001, name: "Kev"},
{_id: 1002, name: "Sakis"},
{_id: 1005, name: "Lee"}
]
}
{
_id: 78,
name: "Fred",
students: [
{_id: 1005, name: "Lee"},
{_id: 1073, name: "Karen"},
{_id: 1021, name: "Jay"},
]
}

Resources