Insert data to clickhouse using a CSV file adding '&' to first record with CURL | Bash - bash

I'm having issues while trying to insert data using a csv file to clickhouse using CURL, the very first value is adding some characters and looks like follow:
┌─name─ ┬─lastname─┐
│&'Mark'│ Olson │
│ Joe │ Robins │
└────── ┴──────────┘
my CSV file is ok, it is like this:
'Mark','Olson'
'Joe','Robins'
as you can see the table is adding the first value in first record as &'Mark'
This is my code in bash
query="INSERT INTO Myschema.persons FORMAT CSV"
cat ${csv} | curl -X POST -d "$query" $user:$password#localhost:8123 --data-binary #-
Do you know what's the problem?
Thanks

I think you should use following format where query is part of url
cat *.csv |curl http://localhost:8123/?query=INSERT%20INTO%20Myschema.persons%20FORMAT%20CSV' --data-binary #-
I am not sure why your curl is not working but my best guess is that Clickhouse has parsing rules which are not able to consume your specified format, ${query} and ${csv} both being parameters to POST are getting appended by '&' in final http url, while parsing Clickhouse is unable to consider this case.
Quotes from clickhouse documentation -
You can send the query itself either in the POST body, or in the URL
parameter.
and
The POST method of transmitting data is necessary for INSERT queries.
In this case, you can write the beginning of the query in the URL
parameter, and use POST to pass the data to insert. The data to insert
could be, for example, a tab-separated dump from MySQL. In this way,
the INSERT query replaces LOAD DATA LOCAL INFILE from MySQL.
Check here for more details and examples - https://clickhouse.yandex/docs/en/interfaces/http_interface/

Related

How to add sysdate from bcp

I have a .csv file with the following sample data format:
REFID|PARENTID|QTY|DESCRIPTION|DATE
AA01|1234|1|1st item|null
AA02|12345|2|2nd item|null
AA03|12345|3|3rd item|null
AA04|12345|4|4th item|null
To load the above file into a table I am using below BCP command:
/bcp $TABLE_NAME in $FILE_NAME -S $DB_SERVER -t "|" -F 1 -U $DB_USERNAME -d $DB_NAME
What i am trying to look here is like below (adding sysdate instead of null from bcp)
AA01|1234|1|1st item|3/16/2020
AA02|12345|2|2nd item|3/16/2020
AA03|12345|3|3rd item|3/16/2020
AA04|12345|4|4th item|3/16/2020
Update : I was able to exclude header with #Jamie answer by -F 1 option, but looking for some help on inserting date with bcp. Tried looking some old Q&A, but no luck so far..
To exclude a single header record, you can use the -F option. This will tell BCP which line in the file is the first line to begin loading from. For your sample, -F2 should work fine. However, your command has other issues. See comments.
There is no way to introduce new data using the BCP command as you stated. BCP cannot introduce a date value while copying data into your table. To accomplish this I suggest a default for your date column or to first load the raw data into a table without the date column then you can introduce the date value as you see fit in late processing.

Getting 'Invalid Boolean' error while inserting a file to influx's measurement through shell

I'm trying to insert a file to influx through shell script using write API. Inconsistently data is not getting inserted. I get 'Invalid Boolean' response whenever its failing to insert to influx.
Kindly help me in identifying where i'm doing the mistake
Here is my code to write to influx
curl -s -X POST "http://influxdb:8186/write?db=mydb" --data-binary #data.txt
I get below json response with an error very inconsistently.
I'm generating the data.txt file with after some calculation. Below is the screenshot of the file.
{"error":"unable to parse 'databackupstatus1,env=engg-az-dev2 tenant=dataplatform snapshot_name=engg-az-dev2-dataplatform-2019-07-08_12-43-59 state=Started backup_status=Not-Applicable': invalid boolean\nunable to parse 'databackupstatus1,env=engg-az-dev2 tenant=dataplatform snapshot_name=engg-az-dev2-dataplatform-2019-07-08_12-43-59 state=Completed backup_status=\"SUCCESS\"': invalid boolean"}
Note : The same data above had got inserted multiple times

export github commits/names to CSV with bash & jq

For a project I need to extract data from a lot of different blockchain GitHub profiles to a csv.
After browsing through the GitHub API I was able to achieve some of the necessary data being shown as txt/csv files using bash commands and jq.
Now doing all of this manually would probably take 7 days. I have a list of profiles i need to loop through saved as CSV.
The list looks like this --> https://docs.google.com/spreadsheets/d/1lFsewAYI7F8zSw7WPhI9E9WwR8f4G1clw1yjxY3wz_4/edit#gid=0
My approach so far to get all the repo names looks like this:
sample='[{"name":"0chain"},{"name":"0stateapp"},{"name":"0xcert"}]'
the csv belongs in here, I didn't know how to redirect it to that variable yet, but for testing purposes this was enough. If somebody knows how to, feel free to give a hint.
for row in $(echo "${sample}" | jq -r '.[] | #base64'); do
_jq()
{
echo ${row} | base64 --decode | jq -r ${1}
}
for GHUSER in $( echo $(_jq '.name')); do
curl -s https://api.github.com/users/$GHUSER/repos?per_page=100 | jq -r '.[]|.full_name'
done
done
The output looks like this:
0chain/0chain-token
0chain/client-sdk
0chain/docs
0chain/gorocksdb
0chain/hostadmin
0chain/rocksdb
0stateapp/ZSCoin
0xcert/0xcert
0xcert/conventions
0xcert/docs
0xcert/erc721-validator
0xcert/erc721-validator-api
0xcert/erc721-validator-ui
0xcert/erc721-website
0xcert/ethereum
0xcert/ethereum-crowdsale
0xcert/ethereum-dex
0xcert/ethereum-erc20
0xcert/ethereum-erc721
0xcert/ethereum-minter
0xcert/ethereum-utils
0xcert/ethereum-xcert
0xcert/ethereum-xcert-builder
0xcert/ethereum-zxc
0xcert/framework
0xcert/framework-cert-test
0xcert/nonfungiblealliance-www
0xcert/solidity-style-guide
0xcert/techpaper
0xcert/truffle
0xcert/web3.js
What I need to do is use all of the above values and generate a file that contains:
Github Profile (already stored in the attached sheet)
The Date when accessing this information
All the repositories belonging to that profile (code above but
filtered)
Now the Interesting part:
The commit history
number of commit (ID)
number of commit (ID)
Date of commit
Description of commit
person who commited
checks passed
checks failed
Almost the same needs to be done for closed and open pull requests although I think when solving the "problem" above solving the pull requests is the same strategy.
For the commits I'd do something like this:
for commits in $( $repoarray) do curl -i https://api.github.com/repos/$commits/commits | jq -r '.[]|.author.lgoin (and whatever els is needed)' done
basically this chart here needs to be filled
https://docs.google.com/spreadsheets/d/1mFXiohiWNXNP8CVztFA1PFF41jn3J9sRUhYALZShsPY/edit?usp=sharing
what I need help with:
storing my output from the first loop in a an array
loop through that array to get the number of commits
loop through that array to get the data to closed pull requests
loop through that array to get the data to open pull requests
Excuse my "noobish" question.
I'm using bash/jq and the GitHub API for the time.
I'd appreciate any kind of help.

Prometheus Pushgetway get data, data coverage appears

cat <<EOF | curl --data-binary #- http://localhost:9091/metrics/job/pushgetway/instance/test_instance
http_s_attack_type{hostname="test1",scheme="http",src_ip="192.168.33.86",dst_ip="192.168.33.85",port="15555"} 44
http_s_attack_type{hostname="other",scheme="tcp",src_ip="1.2.3.4",dst_ip="192.168.33.85",port="15557"} 123
EOF
Change data and write again:
cat <<EOF | curl --data-binary #- http://localhost:9091/metrics/job/pushgetway/instance/test_instance
http_s_attack_type{hostname="test2",scheme="http",src_ip="192.168.33.86",dst_ip="192.168.33.85",port="15555"} 55
http_s_attack_type{hostname="other3",scheme="tcp",src_ip="1.2.3.4",dst_ip="192.168.33.85",port="15557"} 14
EOF
View the data on localhost:9091 becomes the last write data, the data written for the first time is overwritten。
Is there a problem with my operation? Please tell me how to continuously introduce new data without being overwritten or replaced
This is working exactly as designed. The pushgateway is meant to hold the results of batch jobs when they exit, so on the next run the results will replace the previous run.
It sounds like you're trying to do event logging. Prometheus is not a suitable tool for that use case, you might want to consider something like the ELK stack instead.

msearch not working with bool must

Elasticsearch version - .90.1
The following works perfectly.
cat names
{"index":"events","type":"news"}
{"query":{"term":{"Type":"MarketEvent"}}}
{"query":{"term":{"Type":"MarketEvent"}}}
curl -XGET 'http://localhost:9200/_msearch' --data-binary #names
The following also works
{"index":"events","type":"news"}
{"query":{"bool":{"must":[{"query_string":{"query":"*","fields":["Events.Event"],"default_operator":"AND"}},{"term":{"Type":"MarketEvent"}}]}}}
But queries with more than 1 bool doesn't work -
cat names
{"index":"events","type":"news"}
{"query":{"bool":{"must":[{"query_string":{"query":"*","fields":["Events.Event"],"default_operator":"AND"}},{"term":{"Type":"MarketEvent"}}]}}}
{"query":{"bool":{"must":[{"query_string":{"query":"*","fields":["Events.Event"],"default_operator":"AND"}},{"term":{"Type":"MarketEvent"}}]}}}
curl -XGET 'http://localhost:9200/_msearch' --data-binary #names
{"error":"must doesn't support arrays"}
Am not seeing any log for this in logs ( not in DEBUG mode too)
Is this a bug ?
The _msearch query should have the following format:
header\n
body\n
header\n
body\n
In the first and the last queries the second header is missing. The error is not generated for the first query only because of the way the header is parsed. For this query to work the names file should be changed into
{"index":"events","type":"news"}
{"query":{"bool":{"must":[{"query_string":{"query":"*","fields":["Events.Event"],"default_operator":"AND"}},{"term":{"Type":"MarketEvent"}}]}}}
{"index":"events","type":"news"}
{"query":{"bool":{"must":[{"query_string":{"query":"*","fields":["Events.Event"],"default_operator":"AND"}},{"term":{"Type":"MarketEvent"}}]}}}

Resources