H2O Import files via REST API from local server fails - h2o

I am trying to use h2o REST API to import CSV files that I have on my local server.
Command:
curl -v -X GET
'http://127.0.0.1:54321/3/ImportFiles?path=http://127.0.0.1:8083/datasets/tables/csv/RDsTWgcvAjHeWJFnbhCKTCE5rn6aLCjJ.csv'
Result in following log:
Trying 127.0.0.1... Connected to 127.0.0.1 (127.0.0.1) port 54321 (#0)
GET
/3/ImportFiles?path=http://127.0.0.1:8083/datasets/tables/csv/RDsTWgcvAjHeWJFnbhCKTCE5rn6aLCjJ.csv HTTP/1.1 Host: 127.0.0.1:54321 User-Agent: curl/7.47.0 Accept: /
HTTP/1.1 200 OK X-h2o-build-project-version: 3.16.0.2
X-h2o-rest-api-version-max: 3 X-h2o-cluster-id: 1512722051559
X-h2o-cluster-good: true X-h2o-context-path: / Content-Type:
application/json Content-Length: 349 Server: Jetty(8.y.z-SNAPSHOT)
Connection #0 to host 127.0.0.1 left intact
{"__meta":{"schema_version":3,"schema_name":"ImportFilesV3","schema_type":"ImportFiles"},"_exclude_fields":"","path":"http://127.0.0.1:8083/datasets/tables/csv/RDsTWgcvAjHeWJFnbhCKTCE5rn6aLCjJ.csv","pattern":null,"files":[],"destination_frames":[],"fails":["http://127.0.0.1:8083/datasets/tables/csv/RDsTWgcvAjHeWJFnbhCKTCE5rn6aLCjJ.csv"],"dels":[]}
H2O log on TRACE level shows only:
12-08 15:41:59.951 10.8.128.101:54321 36013 #4756-331 INFO: GET
/3/ImportFiles, parms:
{path=http://127.0.0.1:8083/datasets/tables/csv/RDsTWgcvAjHeWJFnbhCKTCE5rn6aLCjJ.csv}
Is there any way to debug while importing fails? h2o does not asking local server at all.
Commands from other servers work well:
curl -v -X GET "http://127.0.0.1:54321/3/ImportFiles?path=http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"
curl -v -X GET "https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv"

In general, trying to interact directly with the H2O REST API isn't easy. The vast majority of people use a pre-made API client like Python or R.
But if you really want to do this, I would debug it by comparing with something that's working. Like the R client for H2O.
Write an R program that does this:
h2o.init()
h2o.startLogging()
h2o.importFile("/path/to/data.csv")
The startLogging() call will produce a detailed log file with all the REST API requests and responses. Look at that and try to mimic it.
You can also refer to the autogenerated REST API documentation (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/rest-api-reference.html), but I would caution that if you tried to write a working client just based on the docs it would be hard.
Looking at a logged conversation from an already working client is by far your best bet.

library(h2o)
h2o.init()
h2o.startLogging()
h2o.importFile("http://localhost:8082/datasets/tables/csv/vPrzC5TOQr6JTvnAYrU5AKyz8SP4ao8p.csv")
Time: 2017-12-11 11:55:09.237
GET http://localhost:54321/3/Cloud?skip_ticks=true postBody:
curlError: FALSE curlErrorMessage: httpStatusCode: 200
httpStatusMessage: OK millis: 7
{"__meta":{"schema_version":3,"schema_name":"CloudV3","schema_type":"Iced"},"_exclude_fields":"","skip_ticks":true,"version":"3.16.0.2","branch_name":"rel-wheeler","build_number":"2","build_age":"10 days","build_too_old":false,"node_idx":0,"cloud_name":"H2O_started_from_R_vasiliy_gey658","cloud_size":1,"cloud_uptime_millis":306486,"cloud_healthy":true,"bad_nodes":0,"consensus":true,"locked":true,"is_client":false,"nodes":[{"__meta":{"schema_version":3,"schema_name":"NodeV3","schema_type":"Iced"},"h2o":"localhost/127.0.0.1:54321","ip_port":"127.0.0.1:54321","healthy":true,"last_ping":1512982506643,"pid":97891,"num_cpus":4,"cpus_allowed":4,"nthreads":4,"sys_load":2.0917969,"my_cpu_pct":-1,"sys_cpu_pct":-1,"mem_value_size":17408,"pojo_mem":12224512,"free_mem":1896688640,"max_mem":1908930560,"swap_mem":0,"num_keys":56,"free_disk":0,"max_disk":0,"rpcs_active":0,"fjthrds":[-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,-1,1,0,0,0,0,0,0,0],"fjqueue":[-1,0,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,0,0,0,0,0,0,0,0],"tcps_active":0,"open_fds":-1,"gflops":4.598999977111816,"mem_bw":6.423728128E9}],"internal_security_enabled":false}
Time: 2017-12-11 11:55:09.251
GET
http://localhost:54321/3/ImportFiles?path=http%3A%2F%2Flocalhost%3A8082%2Fdatasets%2Ftables%2Fcsv%2FvPrzC5TOQr6JTvnAYrU5AKyz8SP4ao8p.csv&pattern=
postBody:
curlError: FALSE curlErrorMessage: httpStatusCode: 200
httpStatusMessage: OK millis: 6
{"__meta":{"schema_version":3,"schema_name":"ImportFilesV3","schema_type":"ImportFiles"},"_exclude_fields":"","path":"http://localhost:8082/datasets/tables/csv/vPrzC5TOQr6JTvnAYrU5AKyz8SP4ao8p.csv","pattern":"","files":[],"destination_frames":[],"fails":["http://localhost:8082/datasets/tables/csv/vPrzC5TOQr6JTvnAYrU5AKyz8SP4ao8p.csv"],"dels":[]}

This is my import information.

Related

Server responded with 400 Bad Request during HTTP Post request

I am using SIM800 to send HTTP Post request to a remote InfluxDB database. I have successfully sent HTTP Post request to the database from my computer using Curl as shown below.
01:~$ curl -i -XPOST 'http://<ip address>:8086/write?db=mydb' --data-binary 'location,host=server01,region=us-west value=0.99'
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: 3c958273-edb2-11eb-88ca-000000000000
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.6.3
X-Request-Id: 3c958273-edb2-11eb-88ca-000000000000
Date: Mon, 26 Jul 2021 01:38:54 GMT
After that when I tried using SIM800, I received 400 Bad Request after executing AT+HTTPACTION=1 . The remote server responded with "+HTTPACTION: 1,400,0" . Below are the AT Commands.
AT+SAPBR=3,1,"CONTYPE","GPRS"
AT+SAPBR=3,1,"APN","myAPN"
AT+SAPBR=1,1
AT+HTTPINIT
AT+HTTPPARA="CID",1
AT+HTTPPARA="URL","http://<ip address>:8086/write?db=mydb"
AT+HTTPPARA="CONTENT","application/json"
AT+HTTPPARA="USERDATA","location,host=server01,region=us-west value=0.55"
AT+HTTPDATA=300,5000
AT+HTTPACTION=1
AT+HTTPREAD
AT+HTTPTERM
AT+SAPBR=0,1
Besides that, i captured the packets when SIM800 sent http post request but could not find the http post body.
The http post body can be seen in the captured packets when sending http post using Curl.
I appreciate if anyone can shed some light on this matter. Thank you in advance.
I managed to get it to work. Here are the AT commands:-
AT+SAPBR=3,1,"CONTYPE","GPRS"
AT+SAPBR=3,1,"APN","myAPN"
AT+SAPBR=1,1
AT+HTTPINIT
AT+HTTPPARA="CID",1
AT+HTTPPARA="URL","http://<ip address>:8086/write?db=mydb"
AT+HTTPPARA="CONTENT","application/json"
AT+HTTPDATA=48,5000
location,host=server01,region=us-west value=0.55
AT+HTTPACTION=1
AT+HTTPREAD
AT+HTTPTERM
AT+SAPBR=0,1
after executing the AT+HTTPDATA, quickly execute the HTTP data (location,host...).

wget on Debian Server gets 302 Found while wget on Manjaro gets 200 OK

I'm using wget to retrieve the Instagram JSON from the URL https://www.instagram.com/instagram/?__a=1.
Running wget from my local Manjaro setup returns a 200 OK and the proper JSON response, but running it from a Debian server retrieves a 302 found.
At first I thought it could be because of the wget version differences, but running curl locally also works while wget doesn't work properly.
Is there anything that I should be setting up on my server to get a proper response? My guess is that the HTTPS connection is refusing my server from connecting properly.
So, this is a weird quirk of the Instagram servers. Nothing you can do about it.
The problem is that Instagram responds differently depending on whether you connect to their server over IPv4 or IPv6. Why they would do that is beyond me, but I can reliably reproduce the result by controlling for only this variable.
IPv4:
$ wget -O/dev/null -4 "https://www.instagram.com/instagram/?__a=1"
--2020-09-03 14:22:15-- https://www.instagram.com/instagram/?__a=1
Resolving www.instagram.com (www.instagram.com)... 157.240.27.174
Connecting to www.instagram.com (www.instagram.com)|157.240.27.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 118552 (116K) [application/json]
Saving to: ‘/dev/null’
100%[================================================================================================================================>] 118,552 306KB/s in 0.4s
2020-09-03 14:22:17 (306 KB/s) - ‘/dev/null’ saved [118552/118552]
IPv6:
$ wget -O/dev/null -6 "https://www.instagram.com/instagram/?__a=1"
--2020-09-03 14:22:54-- https://www.instagram.com/instagram/?__a=1
Resolving www.instagram.com (www.instagram.com)... 2a03:2880:f23f:e5:face:b00c:0:4420
Connecting to www.instagram.com (www.instagram.com)|2a03:2880:f23f:e5:face:b00c:0:4420|:443... connected.
HTTP request sent, awaiting response... 302 Found
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Location: https://www.instagram.com/accounts/login/?next=/instagram/%3F__a%3D1 [following]
--2020-09-03 14:22:54-- https://www.instagram.com/accounts/login/?next=/instagram/%3F__a%3D1
Reusing existing connection to [www.instagram.com]:443.
HTTP request sent, awaiting response... 200 OK
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Cookie coming from www.instagram.com attempted to set domain to www.instagram.com
Length: 48094 (47K) [text/html]
Saving to: ‘/dev/null’
100%[================================================================================================================================>] 48,094 --.-K/s in 0.04s
2020-09-03 14:22:54 (1.28 MB/s) - ‘/dev/null’ saved [48094/48094]
This is the same thing you see in your debug logs. On Manjaro, it makes a IPv4 connection, while on Debian it makes a IPv6 connection leading to the differences.
Welcome to the world of crazy webservers :)
In any case, the answer to your question then is to use only a IPv4 connection

G-WAN v7.12.6 can't visit the static contents

G-WAN out-of-box is running no problem with the dynamic contents but can't visit the static contents.
I ran G-WAN with kk user(not root, no sudo). All files and directories are of kk user/group.
I installed it in localhost. When I typed 127.0.0.1:8080 in the browser, it returned:
Server not found
Firefox can't find the server at www.index.html.
It is strange that the returned server name was www.index.html
When I typed 127.0.0.1:8080/index.html, it returned 404 file not found.
How can I workaround these, until a new release is out?
This time, there is no error message in the log file.
(I installed G-WAN in Ubuntu 15.10)
UPDATE: ------------------------
There are two strange info from the served_from.c, as below:
This page was processed...
Using get_env():
by the Server: 0.0.0.0:8080:8080(hostname: 127.0.0.1)
^^^^^^^^^^^^^^^^^(Should it be only one '8080'?)
Virtual Host: /home/kk/dev/gwan_v7.12.6/0.0.0.0:8080/#0.0.0.0
HTTP method: GET
HTTP request: /?served_from
HTTP query: served_from
HTTP entity: (null)
Content-Encoding: 0
Content-Length: 0
Content-Type: 0
for the Client: 127.0.0.1:43199
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0
Using HTTP Headers to get the same information:
by the Server: 27.0.0.1:8080 (hostname: 27.0.0.1)
^^^^^^^^^^^^^(Should it be 127.0.0.1 ?
The leading '1' is missing.)
HTTP method: GET
HTTP entity: -
Content-Encoding: 0
Content-Length: 0
Content-Type: 0
for the Client: 127.0.0.1:43199
ozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0
Acpt-Language: n-US,en;q=0.5
Acpt-Encoding: 3: |GZIP|DEFLATE
Cookies: -
Is there anything wrong in v7.12.6?
UPDATE 2 ================================
with the following steps, G-WAN v7.12.6 can't show static contents:
1. all files and directories are of the same user/group (i.e. kk)
2. out-of-box, listen to 8080
3. run ./gwan
cannot visit static contents.
with following changes, G-WAN works normally:
1. change directory 0.0.0.0.8080 to 0.0.0.0:80
2. run sudo ./gwan -d:kk:kk
everything is normal.
sorry for my mis-typing in the UPDATE2. It should be 0.0.0.0:80 (emphasize the colon). Actually, I typed it right in the directory name. Repeat: ./gwan should work according to G-WAN's document, since the user launching the program is owning all the files and directories.
If you can query the server for scripts then G-WAN is running fine. The problem is obviously a file permission issue, hence the error 404 file not found you got for the query 127.0.0.1:8080/index.html.
This system configuration problem will affect any program, not specifically G-WAN. Consult our dedicated FAQ to find how to correct it.
For the served_from.c script, feel free to display the IP address in the "correct" order and to cut the string at the expected position to avoid the listener port repetition.
UPDATE:
Your latest update says "changing directory 0.0.0.0.8080 to 0.0.0.0:80 works". This was the problem: "0.0.0.0.8080" has never been a valid syntax for G-WAN.

New cluster creation using Cloudera Director

Getting the following error while trying to create a new cluster using Cloudera Director. Any advice?
[ec2-user#ip-10-0-2-227 cloudera-director-1.0.0]$ ./bin/cloudera-director bootstrap-remote aws.reference.conf --lp.remote.hostAndPort=127.0.0.1:7189
Process logs can be found at /home/ec2-user/cloudera/cloudera-director-1.0.0/logs/application.log
Cloudera Director 1.0.0 initializing ...
Configuration file passes all validation checks.
Creating a new environment ...
>> POST http://127.0.0.1:7189/api/v1/environments
<< 401 Unauthorized
Unexpected internal error (see logs): HTTP/1.1 401 Unauthorized [X-Content-Type-Options: nosniff, X-XSS-Protection: 1; mode=block, Pragma: no-cache, X-Frame-Options: DENY, Set-Cookie: JSESSIONID=j0ii441ungs61o1ivobib7zn2;Path=/, Content-Type: application/json;charset=UTF-8, Transfer-Encoding: chunked, Server: Jetty(8.1.15.v20140411)]
You are using the Cloudera Director Server (which currently has known issues). In the meantime, you can still get the cluster running with Cloudera Director without the server part.
The command is
./bin/cloudera-director bootstrap aws.simple.conf (simple config)
-OR-
./bin/cloudera-director bootstrap aws.reference.conf (advanced config)
You need to supply the username and password for the Director server when using the bootstrap-remote command, for example:
... --lp.remote.username=admin --lp.remote.password=admin ...
This should have been included in our docs; we're working on that. (I work for Cloudera.)
Feel free to also post questions to community.cloudera.com.

HTTP2_Plain in node-http2 module is not working?

I want to create a http2 server using node-http2 module without TLS. My code is as follows:
http2 = require('http2');
const bunyan = require('bunyan');
var log = bunyan.createLogger({name: "HTTP2 server without TLS!"});
var options = {
log: log
}
var server = http2.raw.createServer(options, function(request, response) {
console.log("Receiving HTTP2 request!");
// response.writeHead(200);
response.end('Hello world from HTTP2!');
});
server.listen(8000);
However, it does not work. When connecting to this server from chrome, it shows downloading something. When I closed the server, the downloading is finished with blank file (26 bytes).
Does anyone know what is wrong here? Do I need to configure the browser? Thanks in advance!
Chrome and all other browsers only support HTTP/2 over TLS (h2) and not plain HTTP/2 (h2c). So your browser does not understand what is returned from the server and apparently node-http2 does not send a proper error response when it receives a non-http2 request.
The problem seems not just from the browser. Using [curl] curllink that supports http2 over an http:// URL does not working either. Following is the output from the curl:
$ curl -I --http2 http://54.208.83.136:8000/ -v -k
* Trying 54.208.83.136...
* Connected to 54.208.83.136 (54.208.83.136) port 8000 (#0)
> HEAD / HTTP/1.1
> Host: 54.208.83.136:8000
> User-Agent: curl/7.47.1
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQAAP__
>
As we see from the curl output. It sends http/1.1 Upgrade request with proper headers set as it supposed to do according to the [http2 rfc] rfclink.
On the server side, the logs were very long, so I present here only the content of msg in the relevant three logs.
New incoming HTTP/2 connection
Client connection header prelude does not match
PROTOCOL ERROR, Fatal error, closing connection
So basically the server closed the connection because the client connection header prelude does not match. By checking the code, I figured out the error was originated from the readPrelude function of [endpoint.js] endpointlink. It is a function to read the client header, but I don't know what is wrong in the client header :(.
Thus maybe I can say the node-http2 module does not support http2 over plaintext.
Update: it turns out that I was wrong. The node-http2 module do support http2 over plaintext with direct connecting, it does not support HTTP/2 server with Upgrade from HTTP/1.1. The problem resulted from the client side using Upgrade mechanism to connect to the server not supporting Upgrade. Using nghttp client to connect sever with prior knowledge works as follows.
$ nghttp http://127.0.0.1:8000/
Hello world from HTTP2!
nghttpd server also supports HTTP2 without TLS, even though it does not support HTTP Upgrade.
$ nghttpd -d /Documents/Proxy 8080 --no-tls -v
So I highly suggest to use nghttp when you want to test HTTP2 without TLS.

Resources