Apache Phoenix UPSERT timeouts on large data - hadoop

Exception in thread "main"
org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions:
Mon May 09 20:45:17 PDT 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=214280: row '�.�A`' on tabl..
Any suggestions?

Related

mvn gluon:runagent failure - Unknown Attribute 'queryAllDeclaredMethods' .. in definition of class ..HelloController

trying to get my first native build working.
(using Windows 10, jdk 17, javafx17, gluon 1.0.9, gluon graalvm (graalvm-svm-windows-gluon-21.2.0-dev.zip))
I am able to run mvn gluonfx:run
(and click on the 1 button I have in my test UI)
However when I run: mvn gluonfx:runagent, I get:
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Error: Error parsing reflection configuration in file:/C:/devel/repos/Gluon-SingleViewProject-jdk17/target/classes/META-INF%5cnative-image%5creflect-config.json:
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Unknown attribute 'queryAllDeclaredMethods' (supported attributes: allDeclaredConstructors, allPublicConstructors, allDeclaredMethods, allPublicMethods, allDeclaredFields, allPublicFields, methods, fields) in defintion of class com.gluonapplication.HelloController
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Verify that the configuration matches the schema described in the -H:PrintFlags=+ output for option ReflectionConfigurationResources.
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] com.oracle.svm.core.util.UserError$UserException: Error parsing reflection configuration in file:/C:/devel/repos/Gluon-SingleViewProject-jdk17/target/classes/META-INF%5cnative-image%5creflect-config.json:
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Unknown attribute 'queryAllDeclaredMethods' (supported attributes: allDeclaredConstructors, allPublicConstructors, allDeclaredMethods, allPublicMethods, allDeclaredFields, allPublicFields, methods, fields) in defintion of class com.gluonapplication.HelloController
[Wed Nov 17 08:10:41 PST 2021][INFO] [SUB] Verify that the configuration matches the schema described in the -H:PrintFlags=+ output for option ReflectionConfigurationResources.
The helloController, simply consists of 1 method atm:
public class HelloController {
public void pressButton(ActionEvent ae){
System.out.println("hello, source pressed: " + ae.getSource());
}
}
Any suggestions/tips greatly appreciated...(based on the error above..looks like build process is may be calling an unsupported method for jdk 17?)

Oracle Bi timeouts while connection to HIVE

I'm trying to create new Datasource in OracleBi Visual Analyzer? but dialog displays Timeout error message.
This is log message from log:
<Feb 18, 2017, 9:38:55,614 PM EET> <Error> <oracle.bi.datasource> <BEA-000000> <Failed to write to output stream
java.util.concurrent.TimeoutException null
oracle.bi.datasource.buffer.RingBuffer.dequeue(RingBuffer.java:602)
oracle.bi.datasource.service.adf.protocol.ADFProtocolRespBufferedOutputStream$StreamWriter.run(ADFProtocolRespBufferedOutputStream.java:104)
java.lang.Thread.run(Thread.java:745)
>
<Feb 18, 2017, 9:38:55,639 PM EET> <Error> <oracle.bi.datasource.trace> <BEA-000000> <TIMEOUT_ERROR Request timed out: Request Headers
X-BI-SERVICE-INSTANCE-KEY=ssi
User-Agent=DSS/v1
Accept-Language=en-US,en;q=0.5
OBIEE_SERVICE_NAME=DatasourceService
OBIEE_BYPASS_ADF_PROTOCOL=true
OBIEE_CONN_POOL_TIMEOUT=300000
OBIPARAM_ImpersonatedUser=weblogic
BISuperUserName=tzmv4us7do8ji2xcenylwhafg3qbkrp5
OBIARG_datasourceName=JDBC
OBIARG_targetConn=JDBC (BIJDBC)
OBIPARAM_DriverClass=oracle.bi.jdbc.AnaJdbcDriver
OBIPARAM_TargetType=Hadoop
OBIPARAM_ConnectionString=jdbc:oraclebi://oracle-VirtualBox:9508/PrimaryCCS=oracle-VirtualBox;PrimaryCCSPort=9508;SSL=false;NQ_SESSION.SERVICEINSTANCEKEY=ssi;
OBIPARAM_UserName=tzmv4us7do8ji2xcenylwhafg3qbkrp5
OBIPARAM_Password=XXXXX
OBIPARAM_ConnectionID='weblogic'.'HiveDS'
OBIEE_METHOD_NAME=getObjectsList
OBIPARAM_SchemaClearCache=true
Host=oracle-VirtualBox:9502
Accept=text/html, image/gif, image/jpeg, */*; q=.2
Connection=Keep-Alive
ECID-Context=1.e9d4bbd6-0808-481a-8484-f50ed2c2552c-0000031a;kYhnv0YDJPHBBLPUn3VSdGXIVVNLCKIHVKHM5VML5JGLFINN1ODDs9DEn4DDt2C
Cause - The request did not complete within the allotted time.
Action - Check the log for details and increase the connection pool timeout or fix the root cause.
oracle.bi.datasource.service.adf.server.DatasourceServlet.doGet(DatasourceServlet.java:454)
javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>
<Feb 18, 2017, 9:38:55,642 PM EET> <Error> <oracle.bi.web.datasetsvc> <BEA-000000> <getObjectList failed for 'weblogic'.'HiveDS'>
I did telnet to oracle-VirtualBox 9508 and service on that port is responding ( I't Cluster Controller there ) so i'm lost why it's :
1) connecting there anyways ( i suppose it's should try to connect to HIVE straight )
2) failing to do it's stuff
Does anyone has same experience ?

Error with reverse scan in HBase

I'm getting an exception while performing reverse scan on a HBase table. There is some problem with seeking to previous row. Any suggestions will be highly appreciated. Error log is following:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Mon May 02 10:59:29 CEST 2016, RpcRetryingCaller{globalStartTime=1462179569123, pause=100, retries=35}, java.io.IOException: java.io.IOException: Could not seekToPreviousRow StoreFileScanner[HFileScanner for reader reader=file:/data/hbase-1.1.2/data/hbase/data/default/table/c8cdadcd1247e04720972ab5a25597a7/outlinks/3eac358ffb9d43018221fbddf9274ffd, compression=none, cacheConf=blockCache=LruBlockCache{blockCount=149348, currentSize=9919772624, freeSize=2866589744, maxSize=12786362368, heapSize=9919772624, minSize=12147044352, minFactor=0.95, multiSize=6073522176, multiFactor=0.5, singleSize=3036761088, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=Danmark2010-01-26T21:02:50Z/outlinks:.dk/1459765153334/Put, lastKey=Motorveje i Danmark2010-08-24T14:03:07Z/outlinks:\xC3\x98ver\xC3\xB8d/1459766037971/Put, avgKeyLen=70, avgValueLen=20, entries=49195292, length=4896832843, cur=Hj\xC3\xA6lp:Sandkassen2010-11-02T21:40:44Z/outlinks:Adriaterhav/1459771842796/Put/vlen=20/seqid=0] to key Hj\xC3\xA6lp:Sandkassen2010-11-02T21:34:14Z/outlinks:\xC4\x8Crnomelj/1459771842779/Put/vlen=20/seqid=0
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:457)
at org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:596)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: On-disk size without header provided is 196736, but block header contains 65582. Block offset: -1, data starts with: DATABLK*\x00\x01\x00.\x00\x01\x00\x1A\x00\x00\x00\x00\x8D\xA08\xE2\x01\x00\x00#\x00\x00\x01\x00
at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:500)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:85)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1625)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:673)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:646)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:425)
... 13 more
and source code is:
Table WHtable=connection.getTable(TableName.valueOf("table"));
Scan lowerClosestRowScan=new Scan();
lowerClosestRowScan.addFamily(Bytes.toBytes("outlinks"));
lowerClosestRowScan.setStartRow(Bytes.toBytes("A row"));
lowerClosestRowScan.setReversed(true);//to fetch last
ResultScanner lowerClosestRowScanner=WHtable.getScanner(lowerClosestRowScan);

Make Cygnus use WebHDFS to write to local HDFS

I'm trying to make a local Orion+Cygnus persist Orion's data on a local HDFS through WebHDFS.
On Cygnus' instructions on gitub, very little is mentioned about WebHDFS, as the configuration is more about HttpFS.
On the .md OrionHDFSsink it's said that hdfs_port=50070 is for WebHDFS, as indeed my HDFS is. So I would expect by setting the port this way cygnus would automatically use WebHDFS, but on my case it doesn't seem to be working this way.
So, here's my agent_1.conf:
cygnusagent.sources = http-source
cygnusagent.sinks = hdfs-sink
cygnusagent.channels = hdfs-channel
# source configuration
cygnusagent.sources.http-source.channels = hdfs-channel
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnusagent.sources.http-source.port = 5050
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
cygnusagent.sources.http-source.handler.notification_target = /notify
cygnusagent.sources.http-source.handler.default_service = def_serv
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
cygnusagent.sources.http-source.handler.events_ttl = 4
cygnusagent.sources.http-source.interceptors = ts gi
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# OrionHDFSSink configuration
cygnusagent.sinks.hdfs-sink.channel = hdfs-channel
cygnusagent.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.OrionHDFSSink
cygnusagent.sinks.hdfs-sink.hdfs_host = localHDFS.ip
cygnusagent.sinks.hdfs-sink.hdfs_port = 50070
cygnusagent.sinks.hdfs-sink.hdfs_username = HDFSrootUser
cygnusagent.sinks.hdfs-sink.attr_persistence = column
# hdfs-channel configuration
cygnusagent.channels.hdfs-channel.type = memory
cygnusagent.channels.hdfs-channel.capacity = 1000
cygnusagent.channels.hdfs-channel.transactionCapacity = 100
When I update an Entity on Orion, to whom Cygnus is subbed, Cygnus logs the following:
02 Sep 2015 20:09:12,353 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:150) - Starting transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:12,362 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:236) - Received data ({ "subscriptionId" : "55e735c9b89e8535f8ca5ef2", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "Reading", "isPattern" : "false", "id" : "Reading1.1", "attributes" : [ { "name" : "Cost", "type" : "double", "value" : "32" }, { "name" : "Reading_ID", "type" : "integer", "value" : "14" }, { "name" : "Threshold", "type" : "double", "value" : "30" }, { "name" : "email", "type" : "string", "value" : "arthurmvieira#hotmail.com" } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
02 Sep 2015 20:09:12,366 INFO [2055470757#qtp-1523539038-0] (com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents:258) - Event put in the channel (id=2020008711, ttl=4)
02 Sep 2015 20:09:12,432 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=4, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:12,549 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:12,557 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:12,558 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=3)
02 Sep 2015 20:09:12,558 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:13,560 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=3, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:13,574 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:13,574 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:13,575 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=2)
02 Sep 2015 20:09:13,575 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:15,576 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=2, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:15,590 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:15,599 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:15,600 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=1)
02 Sep 2015 20:09:15,600 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:18,601 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=1, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:18,615 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:18,618 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:18,621 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:173) - An event was put again in the channel (id=2020008711, ttl=0)
02 Sep 2015 20:09:18,621 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
02 Sep 2015 20:09:22,622 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:128) - Event got from the channel (id=2020008711, headers={fiware-servicepath=def_servpath, destination=reading1.1_reading, content-type=application/json, fiware-service=def_serv, ttl=0, transactionId=1441217314-956-0000000000, timestamp=1441217352368}, bodyLength=812)
02 Sep 2015 20:09:22,635 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionHDFSSink.persist:356) - [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (def_serv/def_servpath/reading1.1_reading/reading1.1_reading.txt), Data ({"recvTime":"2015-09-02T18:09:12.368Z","Cost":"32", "Cost_md":[],"Reading_ID":"14", "Reading_ID_md":[],"Threshold":"30", "Threshold_md":[],"email":"arthurmvieira#hotmail.com", "email_md":[]})
02 Sep 2015 20:09:22,635 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:143) - Persistence error (The /user/root/def_serv/def_servpath/reading1.1_reading directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
02 Sep 2015 20:09:22,635 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:163) - The event TTL has expired, it is no more re-injected in the channel (id=2020008711, ttl=0)
02 Sep 2015 20:09:22,635 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (com.telefonica.iot.cygnus.sinks.OrionSink.process:193) - Finishing transaction (1441217314-956-0000000000)
So you can see it's trying to use HttpFS, as it logs the response:
HttpFS response: 503 Service unavailable
...on each writing try.
How should I configure the agent to use WebHDFS?
Thank you
I don't know what was happening, but the configuration mentioned is correct and is working now.
After several tries at rebooting the instance, rewriting the config files and other log errors than the one mentioned, it worked.
At some point Cygnus was trying to write to localhost:50075, instead of {localHDFS.ip}:50070, but that was gone after rebooting cygnus.
All instances are at their latest version (important).
Cygnus configuration for WebHDFS is just about setting the port to 50070, nothing else is required.
Regarding the connections you mention to 50075, they are correct as well, since that's the behaviour of WebHDFS: when you want to upload data to HDFS, first the client (in this case, Cygnus) accesses the Namenode through TCP/50070 port, then the namenode responds with a redirection location pointing to the datanode where the data will be effectively uploaded; such a redirection uses the TCP/50075 port, and thus that datanode:50075 must be accessible by the client (Cygnus). That's why we are using HttpFS in the global instance of Cosmos at FIWARE Lab: HttpFS works as a gateway hiding the details of the datanodes, and a single entry point and port (14000) is required.

Mapreduce job failing even the required class is present

Hi my job is failing due to runtime exception, saying, mapper class not found.
Below is the exception:
14/06/16 05:52:56 INFO mapred.JobClient: Task Id : attempt_201406071432_1142_m_000028_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.cloudera.sa.omniture.mr.OmnitureRawDataMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: Class com.cloudera.sa.omniture.mr.OmnitureRawDataMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
... 8 more
But in my jar i used to run the job had the Mapper class present, still i am facing the issue. pls guide.
Below info. on my jar file confirms the presence of Mapper class in jar file.
[svtdphpd#d-hcr75y1 testJARs]$ jar tvf MalformedAnalysis.jar | grep Omniture
1405 Mon Jun 16 15:59:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureRawDataMapper$HITDATA_PROBLEM.class
3728 Mon Jun 16 15:59:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureRawDataMapper.class
5280 Mon Jun 16 16:00:46 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureToRCFileJob.class
6436 Mon Jun 16 16:03:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureDataFileRecordReader.class
3642 Mon Jun 16 16:00:16 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureRawDataReducer.class
1792 Mon Jun 16 15:59:34 CDT 2014 com/cloudera/sa/omniture/mr/OmnitureDataFileInputFormat.class
Below is the Job configuration:
// Create job
Job job = Job.getInstance(config, "LoadOmnitureData");
job.setJarByClass(OmnitureToRCFileJob.class);
// Add named output for malformed records
MultipleOutputs.addNamedOutput(job, "Malformed",
TextOutputFormat.class, NullWritable.class, Text.class);
FileInputFormat.addInputPath(job, new Path(inputPath));
job.setInputFormatClass(OmnitureDataFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(OmnitureRawDataReducer.class);
job.setMapperClass(OmnitureRawDataMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
TextOutputFormat.setOutputPath(job, new Path(outputPath));
job.setNumReduceTasks(numReduceTasks);

Resources