elasticsearch httpClient3.x bulk api - elasticsearch

I use httpClient3.x to _bulk operator the elasticsearch 4.5.1 restful api:
postUrl = "http://127.0.0.1:9200/_bulk";
postMethod = new PostMethod(postUrl);
query = "{\"delete\":{\"_index\":\"equipment\", \"_type\":\"unit\", \"_id\":\"3\" } }" + "\n";
System.out.println("query ="+query);
requestEntity = new StringRequestEntity(query,
"application/x-ndjson", "UTF-8");
postMethod.setRequestEntity(requestEntity);
statusCode = httpClient.executeMethod(postMethod);
postMethod.getResponseBodyAsStream();
System.out.println("Bulk Response status code: " + statusCode);
System.out.println("Bulk Response body: ");
System.out.println(postMethod.getResponseBodyAsString());
the console return:
query ={"delete":{"_index":"equipment", "_type":"unit", "_id":"3" } }
Bulk Response status code: 400
Bulk Response body:
{"error":{"root_cause":[{"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [delete].","line":1,"col":11}],"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [delete].","line":1,"col":11},"status":400}
when i paste the blow code in Kibana,it return "status": 200
POST _bulk
{
"delete": {
"_index": "equipment",
"_type": "unit",
"_id": "1"
}
}
does it because the HttpClient3.x does't support x-ndjson or the query format is wrong or other resons? how can it run _bulk api successfully in HttpClient3.x
Result
2017/08/02 14:29:49:571 CST [DEBUG] HttpClient - Java version: 1.7.0_79
2017/08/02 14:29:49:575 CST [DEBUG] HttpClient - Java vendor: Oracle Corporation
2017/08/02 14:29:49:575 CST [DEBUG] HttpClient - Java class path: /Users/qk/Documents/workspace-luna/httpclient-test/bin:/Users/qk/Documents/workspace-luna/httpclient-test/lib/commons-beanutils.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/commons-codec-1.10.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/commons-collections4-4.1.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/commons-httpclient.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/commons-lang.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/commons-logging-1.1.3.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/ezmorph-1.0.5.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/json-lib-2.2-jdk15.jar:/Users/qk/Documents/workspace-luna/httpclient-test/lib/morph-1.1.1.jar
2017/08/02 14:29:49:575 CST [DEBUG] HttpClient - Operating system name: Mac OS X
2017/08/02 14:29:49:575 CST [DEBUG] HttpClient - Operating system architecture: x86_64
2017/08/02 14:29:49:575 CST [DEBUG] HttpClient - Operating system version: 10.11.6
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SUN 1.7: SUN (DSA key/parameter generation; DSA signing; SHA-1, MD5 digests; SecureRandom; X.509 certificates; JKS keystore; PKIX CertPathValidator; PKIX CertPathBuilder; LDAP, Collection CertStores, JavaPolicy Policy; JavaLoginConfig Configuration)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunRsaSign 1.7: Sun RSA signature provider
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunEC 1.7: Sun Elliptic Curve provider (EC, ECDSA, ECDH)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunJSSE 1.7: Sun JSSE provider(PKCS12, SunX509 key/trust factories, SSLv3, TLSv1)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunJCE 1.7: SunJCE Provider (implements RSA, DES, Triple DES, AES, Blowfish, ARCFOUR, RC2, PBE, Diffie-Hellman, HMAC)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunJGSS 1.7: Sun (Kerberos v5, SPNEGO)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunSASL 1.7: Sun SASL provider(implements client mechanisms for: DIGEST-MD5, GSSAPI, EXTERNAL, PLAIN, CRAM-MD5, NTLM; server mechanisms for: DIGEST-MD5, GSSAPI, CRAM-MD5, NTLM)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - XMLDSig 1.0: XMLDSig (DOM XMLSignatureFactory; DOM KeyInfoFactory)
2017/08/02 14:29:49:635 CST [DEBUG] HttpClient - SunPCSC 1.7: Sun PC/SC provider
2017/08/02 14:29:49:636 CST [DEBUG] HttpClient - Apple 1.1: Apple Provider
2017/08/02 14:29:49:639 CST [DEBUG] DefaultHttpParams - Set parameter http.useragent = Jakarta Commons-HttpClient/3.1
2017/08/02 14:29:49:641 CST [DEBUG] DefaultHttpParams - Set parameter http.protocol.version = HTTP/1.1
2017/08/02 14:29:49:642 CST [DEBUG] DefaultHttpParams - Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager
2017/08/02 14:29:49:642 CST [DEBUG] DefaultHttpParams - Set parameter http.protocol.cookie-policy = default
2017/08/02 14:29:49:642 CST [DEBUG] DefaultHttpParams - Set parameter http.protocol.element-charset = US-ASCII
2017/08/02 14:29:49:642 CST [DEBUG] DefaultHttpParams - Set parameter http.protocol.content-charset = ISO-8859-1
2017/08/02 14:29:49:643 CST [DEBUG] DefaultHttpParams - Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler#2972a4d0
2017/08/02 14:29:49:644 CST [DEBUG] DefaultHttpParams - Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z]
query ={"delete":{"_index":"equipment", "_type":"unit", "_id":"2" } }
2017/08/02 14:29:49:692 CST [DEBUG] HttpConnection - Open connection to 127.0.0.1:9200
2017/08/02 14:29:49:707 CST [DEBUG] header - >> "POST /_bulk HTTP/1.1[\r][\n]"
2017/08/02 14:29:49:707 CST [DEBUG] HttpMethodBase - Adding Host request header
2017/08/02 14:29:49:719 CST [DEBUG] header - >> "User-Agent: Jakarta Commons-HttpClient/3.1[\r][\n]"
2017/08/02 14:29:49:719 CST [DEBUG] header - >> "Host: 127.0.0.1:9200[\r][\n]"
2017/08/02 14:29:49:719 CST [DEBUG] header - >> "Content-Length: 63[\r][\n]"
2017/08/02 14:29:49:719 CST [DEBUG] header - >> "Content-Type: application/x-ndjson; charset=UTF-8[\r][\n]"
2017/08/02 14:29:49:720 CST [DEBUG] header - >> "[\r][\n]"
2017/08/02 14:29:49:720 CST [DEBUG] content - >> "{"delete":{"_index":"equipment", "_type":"unit", "_id":"2" } }[\n]"
2017/08/02 14:29:49:720 CST [DEBUG] EntityEnclosingMethod - Request body sent
2017/08/02 14:29:49:735 CST [DEBUG] header - << "HTTP/1.1 200 OK[\r][\n]"
2017/08/02 14:29:49:735 CST [DEBUG] header - << "HTTP/1.1 200 OK[\r][\n]"
2017/08/02 14:29:49:736 CST [DEBUG] header - << "Warning: 299 Elasticsearch-5.4.1-2cfe0df "Content type detection for rest requests is deprecated. Specify the content type using the [Content-Type] header." "Wed, 02 Aug 2017 06:29:49 GMT"[\r][\n]"
2017/08/02 14:29:49:736 CST [DEBUG] header - << "content-type: application/json; charset=UTF-8[\r][\n]"
2017/08/02 14:29:49:736 CST [DEBUG] header - << "content-length: 203[\r][\n]"
2017/08/02 14:29:49:737 CST [DEBUG] header - << "[\r][\n]"
Bulk Response status code: 200
Bulk Response body:
2017/08/02 14:29:49:737 CST [DEBUG] HttpMethodBase - Buffering response body
2017/08/02 14:29:49:738 CST [DEBUG] content - << "{"took":1,"errors":false,"items":[{"delete":{"found":false,"_index":"equipment","_type":"unit","_id":"2","_version":3,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"status":404}}]}"
2017/08/02 14:29:49:738 CST [DEBUG] HttpMethodBase - Resorting to protocol version default close connection policy
2017/08/02 14:29:49:738 CST [DEBUG] HttpMethodBase - Should NOT close connection, using HTTP/1.1
2017/08/02 14:29:49:738 CST [DEBUG] HttpConnection - Releasing connection back to connection manager.
{"took":1,"errors":false,"items":[{"delete":{"found":false,"_index":"equipment","_type":"unit","_id":"2","_version":3,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"status":404}}]}

Related

How do I connect to iccube using Snowflake?

after copying the latest version of the Snowflake driver in to the lib folder of iccube, starting the server and then performing the following:
Schema create - Wizard (Dimensions/Measures -> Table)
Relational Database
Connection details....
Driver Type: JDBC
Server Name:
net.snowflake.client.jdbc.SnowflakeDriver
DB Name:
jdbc:snowflake://xxx-eu-west-1.snowflakecomputing.com
User: dummy
Password: xxx
I get the following error.
[ qtp525575644-48] [DEBUG] (13:21:33.986 UTC) [R] GWT 20 servlet-started
[ qtp525575644-48] [DEBUG] (13:21:34.031 UTC) [R] GWT 20 request-process-started [session:node0s0rjncom0tmx12mojb0y00nl60] OTHER (schema:none) GwtDiscoverTableNamesQuery cl_GWT_GwtDiscoverTableNamesQuery_1546953693969_1151490167
[ qtp525575644-48] [DEBUG] (13:21:34.031 UTC) [R] GWT 20 submit-tasks-started 1 q:0 t:0/8
[ qtp525575644-48] [DEBUG] (13:21:34.031 UTC) [R] GWT 20 submit-task-started GWT
[ qtp525575644-48] [DEBUG] (13:21:34.032 UTC) [R] GWT 20 execute-task-started GWT [LOCK:none]
[ qtp525575644-48] [DEBUG] (13:21:34.034 UTC) [JDBC] creating a new OLAP connection [780055920]
[ qtp525575644-48] [DEBUG] (13:21:34.065 UTC) [JDBC] opening a new DB connection [780055920]
[ qtp525575644-48] [DEBUG] (13:21:34.065 UTC) [JDBC] Postgres URL [-] [net.snowflake.client.jdbc.SnowflakeDriver] [null] [jdbc:snowflake://xxx.eu-west-1.snowflakecomputing.com]
[ gc] [ WARN] (13:21:34.339 UTC) [GC] (PS Scavenge) : 14ms ( free:174MB / total:227MB / max:456MB )
[ qtp525575644-48] [DEBUG] (13:21:36.640 UTC) [JDBC] closing the DB connection [780055920]
[ qtp525575644-48] [ERROR] (13:21:37.119 UTC) [builder] validation error(s)
[BUILDER_JDBC_CONNECTION_CANNOT_BE_CREATED] JDBC connection for url 'jdbc:snowflake://xxx.eu-west-1.snowflakecomputing.com' and user 'pentaho_reporting' cannot be created due to error 'null'
at crazydev.iccube.builder.datasource.jdbc.OlapBuilderJdbcConnection.onOpen(SourceFile:110)
at crazydev.iccube.builder.datasource.OlapBuilderAbstractConnection.open(SourceFile:73)
at crazydev.iccube.gwt.server.requesthandler.builder.handlers.datatable.GwtDiscoverTableNamesQueryHandler.doHandleImpl(SourceFile:65)
at crazydev.iccube.gwt.server.requesthandler.builder.handlers.datatable.GwtDiscoverTableNamesQueryHandler.doHandleImpl(SourceFile:29)
at crazydev.iccube.gwt.server.requesthandler.builder.handlers.common.GwtAbstractBuilderQueryHandler.unsafeHandleImpl(SourceFile:239)
at crazydev.iccube.gwt.server.requesthandler.builder.handlers.common.GwtAbstractBuilderQueryHandler.safeHandleImpl(SourceFile:186)
at crazydev.iccube.gwt.server.requesthandler.builder.handlers.common.GwtAbstractBuilderQueryHandler.handleImpl(SourceFile:178)
at crazydev.iccube.gwt.server.requesthandler.builder.handlers.common.GwtAbstractBuilderQueryHandler.handleImpl(SourceFile:70)
at crazydev.iccube.gwt.server.requesthandler.common.GwtAbstractQueryHandler.handle(SourceFile:75)
at crazydev.iccube.gwt.server.requesthandler.common.GwtAbstractQueryHandler.handle(SourceFile:58)
at crazydev.iccube.gwt.server.requesthandler.common.GwtQueryHandlerDispatcher.dispatchQuery(SourceFile:528)
at crazydev.iccube.server.request.request.gwt.IcCubeGwtServerRequest$Task.unsafeExecute(SourceFile:629)
at crazydev.iccube.server.request.task.IcCubeServerTask.execute(SourceFile:247)
at crazydev.iccube.server.request.executor.IcCubeServerTaskRunnable.run(SourceFile:42)
Snowflake jdbc driver throws a 'SQLFeatureNotSupportedException' with an empty message when calling setReadOnly to the connection.
We fixed this in our dev branch and will be available in the next release or as a pre-release.
PS: Discovering tables doesn't work very well, as a workaround you might add SQL queries as tables.

Apache Camel https4 client not using the same tcp port for multiple requests with keepAlive=true

Hi I am using Apache camel http4 component to send a https request with keepAlive=true but when i see the netstat after sending multiple requests I see each request opens a new TCP port to the peer.
I feel this should not be the usual behavior of keepAlive transactions, why the same TCP port is not being reused for communicating with the server and how can that be achieved if at all it can be.
Turns out this is not a keep alive problem; connections are in fact kept alive properly. The problem is that the connections as part of the pool managed by default PoolingHttpClientConnectionManager were not being reused. Both things could've been easily seen by enabling the logging for Apache's HttpClient (that's used under the hood):
Keep-Alive is being used:
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "HTTP/1.1 200 OK[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Keep-Alive: timeout=5, max=300[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Server: Apache-Coyote/1.1[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Content-Encoding: gzip[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Vary: Accept-Encoding[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Cluster-Id: A[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Date: Wed, 19 Dec 2018 06:59:17 GMT[\r][\n]"
2018/12/19 07:59:17:470 CET [DEBUG] wire - http-outgoing-7 << "Content-Type: text/xml[\r][\n]"
2018/12/19 07:59:17:471 CET [DEBUG] wire - http-outgoing-7 << "Content-Length: 239[\r][\n]"
2018/12/19 07:59:17:471 CET [DEBUG] wire - http-outgoing-7 << "[\r][\n]"
Connections not being reused:
2018/12/19 08:00:08:240 CET [DEBUG] PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://someurl.com:443][total kept alive: 1; route allocated: 1 of 1; total allocated: 1 of 1]
2018/12/19 08:00:08:240 CET [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-7: Close connection
2018/12/19 08:00:08:242 CET [DEBUG] PoolingHttpClientConnectionManager - Connection leased: [id: 8][route: {s}->https://someurl.com:443][total kept alive: 0; route allocated: 1 of 1; total allocated: 1 of 1]
Note that one can easily enable logging for HttpClient by passing some arguments to JVM at startup.
So, why are connections not being reused then? This is due to the fact that SSL is being used and the PoolingHttpClientConnectionManager used by Apache's HttpClient not allowing reuse of a connection in case the user principal from the existing connection differs from the requested connection (facilitated by DefaultUserTokenHandler). See also e.g. this Stackoverflow post. Solution is to implement a custom UserTokenHandler (or use the NullTokenHandler if that's sufficient) and configure the HttpClientBuilder accordingly.
Did you already analyze the HTTP response headers in order to check whether KeepAlive is being taken into account and with which timeout value.
Example of expected response:
HTTP/1.1 200 OK
Connection: Keep-Alive
Keep-Alive: timeout=10, max=20
Content-Type: text/html; charset=UTF-8
Date: ...
Content-Length: ...

Spring returning 200 with no content and not hitting controller

I have a controller as
#Controller
#RequestMapping("/v2/**")
public class ReactController {
#RequestMapping(method = RequestMethod.GET)
public String reactEntry() {
return "react-entry";
}
}
When I log into the app and go through the login page, then navigate to the page that hits this URL, I get the content. However, if I go directly to the URL (it hits the login page but then forwards directly to this URL) spring returns a status code of 200 with a content length of 0 and my controller is never hit.
In debug logging, in the normal case I see:
[DEBUG] 17 Aug 2018 10:41:50,805 org.springframework.security.web.FilterChainProxy - /v2/ reached end of additional filter chain; proceeding with original chain
[TRACE] 17 Aug 2018 10:41:50,807 org.springframework.web.servlet.DispatcherServlet - Bound request context to thread: SecurityContextHolderAwareRequestWrapper[ org.springframework.security.web.context.HttpSessionSecurityContextRepository Servlet3SaveToSessionRequestWrapper#38f60506]
[DEBUG] 17 Aug 2018 10:41:50,807 org.springframework.web.servlet.DispatcherServlet - DispatcherServlet with name 'cem' processing GET request for [/cems/v2/]
[TRACE] 17 Aug 2018 10:41:50,807 org.springframework.web.servlet.DispatcherServlet - Testing handler map [org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerMapping#746fb8d3] in DispatcherServlet with name 'cem'
[DEBUG] 17 Aug 2018 10:41:50,807 org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerMapping - Looking up handler method for path /v2/
[TRACE] 17 Aug 2018 10:41:50,842 org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerMapping - Found 1 matching mapping(s) for [/v2/] : [{[/v2/**],methods=[GET],params=[],headers=[],consumes=[],produces=[],custom=[]}]
[DEBUG] 17 Aug 2018 10:41:50,842 org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerMapping - Returning handler method [public java.lang.String rest.controller.ReactController.reactEntry()]
However, when hit via the second mechanism, I get:
[DEBUG] 17 Aug 2018 10:45:32,702 org.springframework.security.web.FilterChainProxy - /v2/ reached end of additional filter chain; proceeding with original chain
[DEBUG] 17 Aug 2018 10:45:32,779 org.springframework.beans.factory.annotation.InjectionMetadata - Processing injected element of bean 'domain.User': PersistenceElement for transient javax.persistence.EntityManager rest.domain.Personnel.entityManager
Notice the time gap which suggests that second line is part of a different request.
It appears that in the second case, the request is not being mapping to the cems DispatcherServlet even though it is the same URL.

Make Cygnus use WebHDFS to write to local HDFS

I'm trying to make a local Orion+Cygnus persist Orion's data on a local HDFS through WebHDFS.
On Cygnus' instructions on gitub, very little is mentioned about WebHDFS, as the configuration is more about HttpFS.
On the .md OrionHDFSsink it's said that hdfs_port=50070 is for WebHDFS, as indeed my HDFS is. So I would expect by setting the port this way cygnus would automatically use WebHDFS, but on my case it doesn't seem to be working this way.
So, here's my agent_1.conf:
cygnusagent.sources = http-source
cygnusagent.sinks = hdfs-sink
cygnusagent.channels = hdfs-channel
# source configuration
cygnusagent.sources.http-source.channels = hdfs-channel
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnusagent.sources.http-source.port = 5050
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
cygnusagent.sources.http-source.handler.notification_target = /notify
cygnusagent.sources.http-source.handler.default_service = def_serv
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
cygnusagent.sources.http-source.handler.events_ttl = 4
cygnusagent.sources.http-source.interceptors = ts gi
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# OrionHDFSSink configuration
cygnusagent.sinks.hdfs-sink.channel = hdfs-channel
cygnusagent.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.OrionHDFSSink
cygnusagent.sinks.hdfs-sink.hdfs_host = localHDFS.ip
cygnusagent.sinks.hdfs-sink.hdfs_port = 50070
cygnusagent.sinks.hdfs-sink.hdfs_username = HDFSrootUser
cygnusagent.sinks.hdfs-sink.attr_persistence = column
# hdfs-channel configuration
cygnusagent.channels.hdfs-channel.type = memory
cygnusagent.channels.hdfs-channel.capacity = 1000
cygnusagent.channels.hdfs-channel.transactionCapacity = 100
When I update an Entity on Orion, to whom Cygnus is subbed, Cygnus logs the following:
02 Sep 2015 20:09:12,353 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:150) - Starting transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:12,362 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:236) - Received data ({ "subscriptionId" : "55e735c9b89e8535f8ca5ef2", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "Reading", "isPattern" : "false", "id" : "Reading1.1", "attributes" : [ { "name" : "Cost", "type" : "double", "value" : "32" }, { "name" : "Reading_ID", "type" : "integer", "value" : "14" }, { "name" : "Threshold", "type" : "double", "value" : "30" }, { "name" : "email", "type" : "string", "value" : "arthurmvieira#hotmail.com" } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
02 Sep 2015 20:09:12,366 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:258) - Event put in the channel (id=2020008711, ttl=4)
02 Sep 2015 20:09:12,432 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=4, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:12,549 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:12,557 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:12,558 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=3)
02 Sep 2015 20:09:12,558 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:13,560 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=3, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:13,574 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:13,574 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:13,575 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=2)
02 Sep 2015 20:09:13,575 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:15,576 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=2, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:15,590 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:15,599 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:15,600 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=1)
02 Sep 2015 20:09:15,600 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:18,601 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=1, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:18,615 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:18,618 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:18,621 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=0)
02 Sep 2015 20:09:18,621 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:22,622 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=0, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:22,635 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:22,635 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:22,635 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:163) - The event TTL has expired, it is no more re-injected in the channel (id=2020008711, ttl=0)
02 Sep 2015 20:09:22,635 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
So you can see it's trying to use HttpFS, as it logs the response:
HttpFS response: 503 Service unavailable
...on each writing try.
How should I configure the agent to use WebHDFS?
Thank you
I don't know what was happening, but the configuration mentioned is correct and is working now.
After several tries at rebooting the instance, rewriting the config files and other log errors than the one mentioned, it worked.
At some point Cygnus was trying to write to localhost:50075, instead of {localHDFS.ip}:50070, but that was gone after rebooting cygnus.
All instances are at their latest version (important).
Cygnus configuration for WebHDFS is just about setting the port to 50070, nothing else is required.
Regarding the connections you mention to 50075, they are correct as well, since that's the behaviour of WebHDFS: when you want to upload data to HDFS, first the client (in this case, Cygnus) accesses the Namenode through TCP/50070 port, then the namenode responds with a redirection location pointing to the datanode where the data will be effectively uploaded; such a redirection uses the TCP/50075 port, and thus that datanode:50075 must be accessible by the client (Cygnus). That's why we are using HttpFS in the global instance of Cosmos at FIWARE Lab: HttpFS works as a gateway hiding the details of the datanodes, and a single entry point and port (14000) is required.

hadoop too many logs on screen

I start learning hadoop using hive recently. As a beginner I am not so familiar with all the logs showing on the screen. So it's better to see a clean version of all important logs. I learn hive based on Rutberglen's "Programming Hive" book.
Just started, and I got numerous of logs after the first command. While on the book, it's just "OK, Time taken: 3.543 seconds".
Anyone has solution to reduce these logs?
PS:below are the logs I got from command "create table x (a int);"
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Sep 28, 2014 12:10:28 AM org.apache.hadoop.hive.conf.HiveConf <clinit>
WARNING: hive-site.xml not found on CLASSPATH
Logging initialized using configuration in jar:file:/Users/admin/Documents/Study/software /Programming/Hive/hive-0.9.0-bin/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Sep 28, 2014 12:10:28 AM SessionState printInfo
INFO: Logging initialized using configuration in jar:file:/Users/admin/Documents/Study/software/Programming/Hive/hive-0.9.0-bin/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/admin/hive_job_log_admin_201409280010_720612579.txt
Sep 28, 2014 12:10:28 AM hive.ql.exec.HiveHistory printInfo
INFO: Hive history file=/tmp/admin/hive_job_log_admin_201409280010_720612579.txt
hive> CREATE TABLE x (a INT);
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=Driver.run>
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=compile>
Sep 28, 2014 12:10:31 AM hive.ql.parse.ParseDriver parse
INFO: Parsing command: CREATE TABLE x (a INT)
Sep 28, 2014 12:10:31 AM hive.ql.parse.ParseDriver parse
INFO: Parse Completed
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.parse.SemanticAnalyzer analyzeInternal
INFO: Starting Semantic Analysis
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.parse.SemanticAnalyzer analyzeCreateTable
INFO: Creating table x position=13
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver compile
INFO: Semantic Analysis Completed
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver getSchema
INFO: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=compile start=1411877431127 end=1411877431388 duration=261>
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=Driver.execute>
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.ql.Driver execute
INFO: Starting command: CREATE TABLE x (a INT)
Sep 28, 2014 12:10:31 AM hive.ql.exec.DDLTask createTable
INFO: Default to LazySimpleSerDe for table x
Sep 28, 2014 12:10:31 AM hive.log getDDLFromFieldSchema
INFO: DDL: struct x { i32 a}
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.metastore.HiveMetaStore newRawStore
INFO: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Sep 28, 2014 12:10:31 AM org.apache.hadoop.hive.metastore.ObjectStore initialize
INFO: ObjectStore, initialize called
Sep 28, 2014 12:10:32 AM org.apache.hadoop.hive.metastore.ObjectStore getPMF
INFO: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Sep 28, 2014 12:10:32 AM org.apache.hadoop.hive.metastore.ObjectStore setConf
INFO: Initialized ObjectStore
Sep 28, 2014 12:10:33 AM org.apache.hadoop.hive.metastore.HiveMetaStore logInfo
INFO: 0: create_table: db=default tbl=x
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=Driver.execute start=1411877431389 end=1411877434527 duration=3138>
OK
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver printInfo
INFO: OK
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogBegin
INFO: <PERFLOG method=releaseLocks>
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=releaseLocks start=1411877434529 end=1411877434529 duration=0>
Sep 28, 2014 12:10:34 AM org.apache.hadoop.hive.ql.Driver PerfLogEnd
INFO: </PERFLOG method=Driver.run start=1411877431126 end=1411877434530 duration=3404>
Time taken: 3.407 seconds
Sep 28, 2014 12:10:34 AM CliDriver printInfo
INFO: Time taken: 3.407 seconds
Try starting hive shell as follows :
hive --hiveconf hive.root.logger=WARN,console
If you wanted to make this change persistent, modify the logger property file HIVE_CONF_DIR/hive-log4j.properties file. If you don't have this file in your HIVE_CONF_DIR, create this file by copying the contents of hive-log4j.default in the HIVE_CONF_DIR.

Resources