I am trying to call the api, in batches. Ex :- First batch will call with offset 0 - limit -10,000
and second one with offset- 10,000, limit-10,000(bring 10,000 -20,000) and third one with offset-20,000 and limit -10,000(bring 20,000 - 30,000). It will break once it has fetched all records, but I see more number of calls than expected.
Sample code :
AtomicBoolean makeNextCall = new AtomicBoolean(true);
Flux.fromStream(Stream.iterate(0, i -> i + 1))
.takeWhile(integer -> {
LOGGER.withTask(GET_TRANSACTIONS)
.withMessage(String.format("Batch =[%s] and MaxResultsReturned = [%s]", integer, makeNextCall.get()))
.info();
return makeNextCall.get();
}).concatMap(counter -> {
int histOffset = counter * batchSize;
return bbTransactionRepository.accountTransactions(transactionContext, histOffset, batchSize)
.flatMap(tranList -> {
int size = ((List<BBTransaction>) tranList).size();
LOGGER.withTask(GET_TRANSACTIONS)
.withAttribute(RESULT, size)
.withAttribute(HIST_OFFSET, histOffset)
.withAttribute(HIST_LIMIT, batchSize)
.withAttribute(BATCH, counter)
.withMessage("fetching bb transactions in batches")
.info();
boolean shouldContinue = size >= batchSize;
makeNextCall.set(shouldContinue);
return Mono.just(tranList);
});
})
.flatMap(Flux::fromIterable)
.collectList()
So for 26,000 records, there should be 3 calls and then break since 3 call(6,000 < batch.size(10,000)
But I see around 33 calls in UAT env, it works correctly in local env though.
Not sure I fully understand the code but the best way to validate the flow is to create a test using StepVerifier.
As for batch processing I would suggest to simplify the code and use Flux.buffer to process data and Flux.takeUntil to cancel the publisher when condition matches.
private Flux<List<Integer>> processInBatch(int batchSize) {
AtomicInteger offset = new AtomicInteger();
return Flux.range(0, Integer.MAX_VALUE)
.buffer(batchSize)
.concatMap(batch -> {
var histOffset = offset.getAndAdd(batch.size());
log.info("offset: {}, batch: {}", histOffset, batch.size());
return accountTransactions(histOffset, batch.size());
})
.doOnNext(res -> log.info("res: {}", res.size()))
.takeUntil(res -> res.size() < batchSize);
}
and here is a test to verify the flow
#Test
void validateBuffer() {
StepVerifier.create(processInBatch(26000, 10000))
.expectNextCount(3)
.verifyComplete();
}
23:00:11.341 [Test worker] INFO - offset: 0, batch: 10000
23:00:11.369 [Test worker] INFO - res: 10000
23:00:11.370 [Test worker] INFO - offset: 10000, batch: 10000
23:00:11.371 [Test worker] INFO - res: 10000
23:00:11.372 [Test worker] INFO - offset: 20000, batch: 10000
23:00:11.372 [Test worker] INFO - res: 6000
How to set connection timeout to only particular to that rest spring mvc request.
I want to render huge data of clothes on browser but server send an connection timeout error.
#RequestMapping(value="/clothes/{clothType}/properties", method = RequestMethod.GET)
public #ResponseBody List<Clothes> getClothesList(#PathVariable String clothType, HttpServletResponse response) {
List<Clothes> listClothes;
try {
listClothes = ClothesApiService.getClothesProperties(clothType);
} catch (Exception e) {
log.info("Problem getting the properties for Clothes " + clothType + ". Error: " + e.getMessage());
}
return listClothes;
}
Below is stack trace on IBM server logs
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:1613) ~[jackson-all-1.9.11.jar:1.9.11]
128 at org.springframework.http.converter.json.MappingJacksonHttpMessageConverter.writeInternal(MappingJacksonHttpMessageConverter.java:140) ~[org.springframework.web-3.1.4.RELEASE.jar:3.1.4.RELEASE]
129 ... 49 common frames omitted
130 Caused by: java.io.IOException: Async IO operation failed (1), reason: RC: 32 Broken pipe
131 at com.ibm.io.async.AsyncLibrary$IOExceptionCache.(AsyncLibrary.java:891) ~[com.ibm.ws.runtime.jar:na]
132 at com.ibm.io.async.AsyncLibrary$IOExceptionCache.get(AsyncLibrary.java:904) ~[com.ibm.ws.runtime.jar:na]
133 at com.ibm.io.async.AsyncLibrary.getIOException(AsyncLibrary.java:918) ~[com.ibm.ws.runtime.jar:na]
134 at com.ibm.io.async.AbstractAsyncChannel.multiIO(AbstractAsyncChannel.java:473) ~[com.ibm.ws.runtime.jar:na]
135 at com.ibm.io.async.AsyncSocketChannelHelper.write(AsyncSocketChannelHelper.java:478) ~[com.ibm.ws.runtime.jar:na]
136 at com.ibm.io.async.AsyncSocketChannelHelper.write(AsyncSocketChannelHelper.java:396) ~[com.ibm.ws.runtime.jar:na]
137 at com.ibm.ws.tcp.channel.impl.AioSocketIOChannel.writeAIO(AioSocketIOChannel.java:282) ~[com.ibm.ws.runtime.jar:na]
138 at com.ibm.ws.tcp.channel.impl.AioTCPWriteRequestContextImpl.processAsyncWriteRequest(AioTCPWriteRequestContextImpl.java:53) ~[com.ibm.ws.runtime.jar:na]
139 at com.ibm.ws.tcp.channel.impl.TCPWriteRequestContextImpl.writeInternal(TCPWriteRequestContextImpl.java:382) ~[na:CC70.CF [a0849.02]]
140 at com.ibm.ws.tcp.channel.impl.TCPWriteRequestContextImpl.write(TCPWriteRequestContextImpl.java:353) ~[na:CC70.CF [a0849.02]]
141 at com.ibm.ws.http.channel.impl.HttpServiceContextImpl.asynchWrite(HttpServiceContextImpl.java:2442) ~[com.ibm.ws.runtime.jar:na]
142 at com.ibm.ws.http.channel.impl.HttpServiceContextImpl.sendOutgoing(HttpServiceContextImpl.java:2229) ~[com.ibm.ws.runtime.jar:na]
143 at com.ibm.ws.http.channel.inbound.impl.HttpInboundServiceContextImpl.sendResponseBody(HttpInboundServiceContextImpl.java:866) ~[com.ibm.ws.runtime.jar:na]
144 at com.ibm.ws.webcontainer.channel.WCChannelLink.writeBufferAsynch(WCChannelLink.java:551) [com.ibm.ws.webcontainer.jar:na]
145 at com.ibm.ws.webcontainer.channel.WCChannelLink.writeBufferResponse(WCChannelLink.java:528) [com.ibm.ws.webcontainer.jar:na]
146 at com.ibm.ws.webcontainer.channel.WCChannelLink.writeBuffer(WCChannelLink.java:472) [com.ibm.ws.webcontainer.jar:na]
147 at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.flushWriteBuffer(WCCByteBufferOutputStream.java:406) ~[com.ibm.ws.webcontainer.jar:na]
148 at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.checkWriteArray(WCCByteBufferOutputStream.java:378) ~[com.ibm.ws.webcontainer.jar:na]
149 at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.write(WCCByteBufferOutputStream.java:111) ~[com.ibm.ws.webcontainer.jar:na]
150 at com.ibm.ws.webcontainer.srt.SRTOutputStream.write(SRTOutputStream.java:97) ~[com.ibm.ws.webcontainer.jar:na]
151 at com.ibm.wsspi.webcontainer.util.BufferedServletOutputStream.writeOut(BufferedServletOutputStream.java:569) ~[com.ibm.ws.webcontainer.jar:na]
152 at com.ibm.wsspi.webcontainer.util.BufferedServletOutputStream.flushBytes(BufferedServletOutputStream.java:433) ~[com.ibm.ws.webcontainer.jar:na]
153 at com.ibm.wsspi.webcontainer.util.BufferedServletOutputStream.write(BufferedServletOutputStream.java:383) ~[com.ibm.ws.webcontainer.jar:na]
154 at org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754) ~[jackson-all-1.9.11.jar:1.9.11]
155 at org.codehaus.jackson.impl.Utf8Generator.writeString(Utf8Generator.java:561) ~[jackson-all-1.9.11.jar:1.9.11]
156 at org.codehaus.jackson.map.ser.std.StringSerializer.serialize(StringSerializer.java:28) ~[jackson-all-1.9.11.jar:1.9.11]
157 at org.codehaus.jackson.map.ser.std.StringSerializer.serialize(StringSerializer.java:18) ~[jackson-all-1.9.11.jar:1.9.11]
158 at org.codehaus.jackson.map.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:446) ~[jackson-all-1.9.11.jar:1.9.11]
159 ... 58 common frames omitted
Below is my code
#RequestMapping(value="/clothes/{clothType}/properties", method = RequestMethod.GET)
public #ResponseBody List<Clothes> getClothesList(#PathVariable String clothType, HttpServletResponse response) {
List<Clothes> listClothes;
try {
listClothes = ClothesApiService.getClothesProperties(clothType);
} catch (Exception e) {
log.info("Problem getting the properties for Clothes " + clothType + ". Error: " + e.getMessage());
}
return listClothes;
}
I'm trying to use VisualVM to profile my program, but it always crashes with generally the same error message:
Waiting...
Profiler Agent: Waiting for connection on port 5140 (Protocol version: 15)
Profiler Agent: Established connection with the tool
Profiler Agent: Local accelerated session
Starting test 0
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000c12e9d20, pid=4808, tid=11472
#
# JRE version: Java(TM) SE Runtime Environment (8.0_31-b13) (build 1.8.0_31-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.31-b07 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C 0x00000000c12e9d20
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\Brendon\workspace\JavaFx\hs_err_pid4808.log
Compiled method (c1) 10245 952 3 pathfinderGameTest.Pathfinder$$Lambda$4/1163157884::get$Lambda (10 bytes)
total in heap [0x00000000027d72d0,0x00000000027d7798] = 1224
relocation [0x00000000027d73f0,0x00000000027d7430] = 64
main code [0x00000000027d7440,0x00000000027d7620] = 480
stub code [0x00000000027d7620,0x00000000027d76b0] = 144
oops [0x00000000027d76b0,0x00000000027d76b8] = 8
metadata [0x00000000027d76b8,0x00000000027d76d0] = 24
scopes data [0x00000000027d76d0,0x00000000027d7730] = 96
scopes pcs [0x00000000027d7730,0x00000000027d7790] = 96
dependencies [0x00000000027d7790,0x00000000027d7798] = 8
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
Profiler Agent: JNI OnLoad Initializing...
Profiler Agent: JNI OnLoad Initialized successfully
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
Profiler Agent Warning: JVMTI classLoadHook: class name is null.
VisualVM manages to give me a very quick snapshot (~60ms), but I'm not sure how reliable such a quick test is.
I followed these instructions, but it didn't change anything. I'm using Java7 anyways, so it shouldn't even be an issue.
This is the code I'm trying to profile:
package pathfinderGameTest;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Date;
import java.util.Deque;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.function.BiConsumer;
import utils.Duple;
public class Pathfinder<T> {
//To track walls
Area<T> area;
public Pathfinder(Area<T> a) {
area = a;
}
/**
* Preset offsets representing each of the four directions.
*/
private static final List<Duple<Double>> fourDirectionOffsets = Collections.unmodifiableList(Arrays.asList(
new Duple<Double>(1.0,0.0), new Duple<Double>(-1.0,0.0), new Duple<Double>(0.0,1.0), new Duple<Double>(0.0,-1.0) ));
/**
* Finds a path from aStart to bGoal, taking into consideration walls
*
* #param aStart The start position
* #param bGoal The goal position
* #return A list representing the path from the start to the goal
*/
public List<Duple<Double>> findPathFromAToB(Duple<Double> aStart, Duple<Double> bGoal) {
Deque<Duple<Double>> frontier = new ArrayDeque<>();
Map<Duple<Double>, Duple<Double>> cameFrom = new HashMap<>();
frontier.push(aStart);
while (!frontier.isEmpty()) {
Duple<Double> current = frontier.pop();
if (current.equals(bGoal)) break;
List<Duple<Double>> neighbors = cellsAround(current, fourDirectionOffsets);
neighbors.stream()
.filter(location -> !cameFrom.containsKey(location) && area.cellIsInBounds(location) && area.getCellAt(location) == null)
.forEach( neighbor -> {
frontier.add(neighbor);
cameFrom.put(neighbor, current);
});
}
return reconstructPath(cameFrom, aStart, bGoal);
}
/**
* Transforms a backtracking map into a path
*
* #param cameFrom Backtracking map
* #param start Start position
* #param goal Goal position
* #return A list representing the path from the start to the goal
*/
private static List<Duple<Double>> reconstructPath(Map<Duple<Double>, Duple<Double>> cameFrom, Duple<Double> start, Duple<Double> goal) {
List<Duple<Double>> path = new ArrayList<>();
//path.add(goal);
Duple<Double> current = goal;
do {
if (current != goal) {
path.add(current);
}
current = cameFrom.get(current);
} while (current != null && !current.equals(start));
Collections.reverse(path);
return path;
}
/**
* Calculates the cells surrounding pos as indicated by the given offsets
* #param pos The position to find the surrounding cells of
* #param offsets Positions relative to pos to check
* #return
*/
private static List<Duple<Double>> cellsAround(Duple<Double> pos, List<Duple<Double>> offsets) {
List<Duple<Double>> surroundingCells = new ArrayList<>();
/*offsets.stream()
.map( offset -> pos.map(offset, (x1, x2) -> x1 + x2) )
.forEach(surroundingCells::add);*/
for (Duple<Double> offset : offsets) {
surroundingCells.add( pos.map(offset, (x1, x2) -> x1 + x2) );
}
return surroundingCells;
}
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
System.out.println("Waiting...");
s.nextLine();
List<Long> times = new ArrayList<>();
for (int tests = 0; tests < 900; tests++) {
System.out.println("Starting test " + tests);
long startT = new Date().getTime();
Area<Wall> a = new Area<>(500, 500);
Duple<Double> source = new Duple<>(0.0, 0.0);
Duple<Double> target = new Duple<>(500.0, 500.0);
Pathfinder<Wall> p = new Pathfinder<>(a);
List<Duple<Double>> path = p.findPathFromAToB(source, target);
times.add( (new Date().getTime()) - startT );
}
System.out.println("\n\n");
long sum = 0;
for (long t : times) {
System.out.println(t);
sum += t;
}
System.out.println("Average: " + ((double)sum / times.size() / 1000) + " seconds.");
}
}
Waker.cs
class Waker
{
Timer timer;
public Waker()
{
timer = null;
}
public void WakeUpApplicationPool(object obj)
{
string url = "http://www.example.com";
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Program.LogToFile("WakeUpApplicationPool: " + response.StatusDescription);
response.Close();
}
catch (Exception ex)
{
Program.LogToFile("WakeUpApplicationPool_Error: " + ex.ToString());
}
}
public void Start()
{
TimerCallback callback = new TimerCallback(WakeUpApplicationPool);
int DUE_TIME = 0; //The amount of time to delay before the callback parameter invokes its methods.
int PERIOD = int.Parse(ConfigurationManager.AppSettings["WakerIntervalPeriod"]); //The time interval (miliseconds) between invocations of the methods referenced by callback
timer = new Timer(callback, null, DUE_TIME, PERIOD);
}
public void Stop()
{
timer.Dispose();
}
}
Program.cs:
static void Main(string[] args)
{
try
{
Waker waker = new Waker();
waker.Start();
}
catch(Exception ex)
{
LogToFile(ex.ToString());
}
}
Log file:
15 Apr 2015 18:29:39 - WakeUpApplicationPool: OK
15 Apr 2015 18:31:39 - WakeUpApplicationPool: OK
15 Apr 2015 18:33:59 - WakeUpApplicationPool_Error: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 205.144.171.35:80
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.GetResponse()
at ConsoleReporting.Waker.WakeUpApplicationPool(Object obj)
15 Apr 2015 18:35:39 - WakeUpApplicationPool: OK
15 Apr 2015 18:37:39 - WakeUpApplicationPool: OK
15 Apr 2015 18:41:18 - WakeUpApplicationPool_Error: System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at ConsoleReporting.Waker.WakeUpApplicationPool(Object obj)
15 Apr 2015 18:43:18 - WakeUpApplicationPool_Error: System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at ConsoleReporting.Waker.WakeUpApplicationPool(Object obj)
15 Apr 2015 18:45:18 - WakeUpApplicationPool_Error: System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at ConsoleReporting.Waker.WakeUpApplicationPool(Object obj)
15 Apr 2015 18:47:18 - WakeUpApplicationPool_Error: System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at ConsoleReporting.Waker.WakeUpApplicationPool(Object obj)
The problem is:
My code is not working after it hit the Timed Out error. But after I restart the Program.exe, it is working again but it hit the Timed Out error after 10 minutes.
I want to use this Program.exe to wake up my application pool which hosted at hosting provider.
So could anyone tell the reason and solution is? I referred this,but it is not working for my code either
The problem is solved after I set the WakerIntervalPeriod to 10 minutes instead of 5 minuets.
I have a Java UDF that takes tuples and returns a bag of tuples. When I operate on that bag (see code below) I get the error message
2013-12-18 14:32:33,943 [main] ERROR
org.apache.pig.tools.pigstats.PigStats - ERROR: java.lang.Long cannot
be cast to org.apache.pig.data.Tuple
I cannot recreate this error just by reading in data, grouping and flattening, it only happens with the bag-of-tuples returned by the UDF, even when the DESCRIBE-ed data looks identical to the result of group/flatten/etc.
UPDATE: Here is actual code that reproduces the error. (A thousand thanks to anyone who takes the time to read through it.)
REGISTER test.jar;
A = LOAD 'test-input.txt' using PigStorage(',')
AS (id:long, time:long, lat:double, lon:double, alt:double);
A_grouped = GROUP A BY (id);
U_out = FOREACH A_grouped
GENERATE FLATTEN(
test.Test(A)
);
DESCRIBE U_out;
V = FOREACH U_out GENERATE output_tuple.id, output_tuple.time;
DESCRIBE V;
rmf test.out
STORE V INTO 'test.out' using PigStorage(',');
file 'test-input.txt':
0,1000,33,-100,5000
0,1010,33,-101,6000
0,1020,33,-102,7000
0,1030,33,-103,8000
1,1100,34,-100,15000
1,1110,34,-101,16000
1,1120,34,-102,17000
1,1130,34,-103,18000
The output:
$ pig -x local test.pig
2013-12-18 16:47:50,467 [main] INFO org.apache.pig.Main - Logging error messages to: /home/jsnider/pig_1387403270431.log
2013-12-18 16:47:50,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
U_out: {bag_of_tuples::output_tuple: (id: long,time: long,lat: double,lon: double,alt: double)}
V: {id: long,time: long}
2013-12-18 16:47:51,532 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2013-12-18 16:47:51,532 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2013-12-18 16:47:51,907 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: V: Store(file:///home/jsnider/test.out:PigStorage(',')) - scope-32 Operator Key: scope-32)
2013-12-18 16:47:51,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-12-18 16:47:51,988 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-12-18 16:47:51,988 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-12-18 16:47:51,996 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2013-12-18 16:47:52,139 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2013-12-18 16:47:52,158 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-12-18 16:47:52,199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-12-18 16:47:54,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-12-18 16:47:54,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=164
2013-12-18 16:47:54,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2013-12-18 16:47:54,299 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-18 16:47:54,299 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-12-18 16:47:54,308 [Thread-1] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-12-18 16:47:54,601 [Thread-1] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-18 16:47:54,601 [Thread-1] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-12-18 16:47:54,627 [Thread-1] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is available
2013-12-18 16:47:54,627 [Thread-1] INFO org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library loaded
2013-12-18 16:47:54,633 [Thread-1] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-12-18 16:47:54,801 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-12-18 16:47:54,965 [Thread-1] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-jsnider/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.system.dir; Ignoring.
2013-12-18 16:47:54,966 [Thread-1] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-jsnider/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.trash.interval; Ignoring.
2013-12-18 16:47:54,966 [Thread-1] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-jsnider/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.userlog.retain.hours; Ignoring.
2013-12-18 16:47:54,968 [Thread-1] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-jsnider/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.userlog.limit.kb; Ignoring.
2013-12-18 16:47:54,970 [Thread-1] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-jsnider/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.temp.dir; Ignoring.
2013-12-18 16:47:54,991 [Thread-2] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2013-12-18 16:47:54,994 [pool-1-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local_0001_m_000000_0
2013-12-18 16:47:55,047 [pool-1-thread-1] INFO org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0
2013-12-18 16:47:55,053 [pool-1-thread-1] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#ffeef1
2013-12-18 16:47:55,058 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 164
Input split[0]:
Length = 164
Locations:
-----------------------
2013-12-18 16:47:55,068 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2013-12-18 16:47:55,118 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-12-18 16:47:55,118 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-12-18 16:47:55,152 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-12-18 16:47:55,164 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2013-12-18 16:47:55,167 [pool-1-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2013-12-18 16:47:55,170 [pool-1-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-12-18 16:47:55,171 [pool-1-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000000_0' done.
2013-12-18 16:47:55,171 [pool-1-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local_0001_m_000000_0
2013-12-18 16:47:55,172 [Thread-2] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-12-18 16:47:55,192 [Thread-2] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#38650646
2013-12-18 16:47:55,192 [Thread-2] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-12-18 16:47:55,196 [Thread-2] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2013-12-18 16:47:55,201 [Thread-2] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 418 bytes
2013-12-18 16:47:55,201 [Thread-2] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-12-18 16:47:55,257 [Thread-2] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.pig.data.Tuple
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:408)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:360)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:434)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:402)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:382)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:392)
2013-12-18 16:47:55,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2013-12-18 16:47:59,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0001 has failed! Stop running all dependent jobs
2013-12-18 16:48:00,008 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-12-18 16:48:00,010 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-12-18 16:48:00,011 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete
2013-12-18 16:48:00,015 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u6 0.8.1-cdh3u6 jsnider 2013-12-18 16:47:52 2013-12-18 16:48:00 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0001 A,A_grouped,U_out,V GROUP_BY Message: Job failed! Error - NA file:///home/jsnider/test.out,
Input(s):
Failed to read data from "file:///home/jsnider/test-input.txt"
Output(s):
Failed to produce result in "file:///home/jsnider/test.out"
Job DAG:
job_local_0001
2013-12-18 16:48:00,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-12-18 16:48:00,040 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/jsnider/pig_1387403270431.log
And the three java files:
Test.java
package test;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import org.apache.pig.Accumulator;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigException;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class Test extends EvalFunc<DataBag> implements Accumulator<DataBag>
{
public static ArrayList<Point> points = null;
public DataBag exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
accumulate(input);
DataBag output = getValue();
cleanup();
return output;
}
public void accumulate(DataBag b) throws IOException {
try {
if (b == null)
return;
Iterator<Tuple> fit = b.iterator();
while (fit.hasNext()) {
Tuple f = fit.next();
storePt(f);
}
} catch (Exception e) {
int errCode = 2106;
String msg = "Error while computing in " + this.getClass().getSimpleName();
throw new ExecException(msg, errCode, PigException.BUG, e);
}
}
public void accumulate(Tuple b) throws IOException {
try {
if (b == null || b.size() == 0)
return;
for (Object f : b.getAll()) {
if (f instanceof Tuple) {
storePt((Tuple)f);
} else if (f instanceof DataBag) {
accumulate((DataBag)f);
} else {
throw new IOException("tuple input is not a tuple or a databag... x__x");
}
}
} catch (Exception e) {
int errCode = 2106;
String msg = "Error while computing in " + this.getClass().getSimpleName();
throw new ExecException(msg, errCode, PigException.BUG, e);
}
}
#Override
public DataBag getValue() {
if (points == null)
points = new ArrayList<Point>();
Collections.sort(points);
DataBag myBag = BagFactory.getInstance().newDefaultBag();
for (Point pt : points) {
Measure sm = new Measure(pt);
myBag.add(sm.asTuple());
}
return myBag;
}
public void cleanup() {
points = null;
}
public Schema outputSchema(Schema input) {
try {
Schema.FieldSchema tupleFs
= new Schema.FieldSchema("output_tuple", Measure.smSchema(), DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs);
Schema.FieldSchema bagFs = new Schema.FieldSchema("bag_of_tuples", bagSchema, DataType.BAG);
return new Schema(bagFs);
} catch (Exception e){
return null;
}
}
public static void storePt(Tuple f) {
Object[] field = f.getAll().toArray();
Point pt = new Point(
field[0] == null ? 0 : (Long)field[0],
field[1] == null ? 0 : (Long)field[1],
field[2] == null ? 0 : (Double)field[2],
field[3] == null ? 0 : (Double)field[3],
field[4] == null ? Double.MIN_VALUE : (Double)field[4]
);
if (points == null)
points = new ArrayList<Point>();
points.add(pt);
}
}
Point.java:
package test;
public class Point implements Comparable<Point> {
long id;
long time;
double lat;
double lon;
double alt;
public Point(Point c) {
this.id = c.id;
this.time = c.time;
this.lat = c.lat;
this.lon = c.lon;
this.alt = c.alt;
}
public Point(long l, long m, double d, double e, double f) {
id = l;
time = m;
lat = d;
lon = e;
alt = f;
}
#Override
public int compareTo(Point other) {
final int BEFORE = -1;
final int EQUAL = 0;
final int AFTER = 1;
if (this == other) return EQUAL;
if (this.id < other.id) return BEFORE;
if (this.id > other.id) return AFTER;
if (this.time < other.time) return BEFORE;
if (this.time > other.time) return AFTER;
if (this.lat > other.lat) return BEFORE;
if (this.lat < other.lat) return AFTER;
if (this.lon > other.lon) return BEFORE;
if (this.lon < other.lon) return AFTER;
if (this.alt > other.alt) return BEFORE;
if (this.alt < other.alt) return AFTER;
return EQUAL;
}
public String toString() {
return id + " " + time;
}
}
Measure.java:
package test;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class Measure {
private long id;
private long time;
private double lat;
private double lon;
private double alt;
public Measure(Point pt) {
id = pt.id;
time = pt.time;
lat = pt.lat;
lon = pt.lon;
alt = pt.alt;
}
public Tuple asTuple() {
Tuple myTuple = TupleFactory.getInstance().newTuple();
myTuple.append(id);
myTuple.append(time);
myTuple.append(lat);
myTuple.append(lon);
myTuple.append(alt);
return myTuple;
}
public static Schema smSchema() {
Schema tupleSchema = new Schema();
tupleSchema.add(new Schema.FieldSchema("id", DataType.LONG));
tupleSchema.add(new Schema.FieldSchema("time", DataType.LONG));
tupleSchema.add(new Schema.FieldSchema("lat", DataType.DOUBLE));
tupleSchema.add(new Schema.FieldSchema("lon", DataType.DOUBLE));
tupleSchema.add(new Schema.FieldSchema("alt", DataType.DOUBLE));
return tupleSchema;
}
}
The solution is to cast the return of the UDF to the appropriate bag:
U_out = FOREACH A_grouped
GENERATE FLATTEN(
(bag{tuple(long,long,double,double,double)})(test.Test(A))
) AS (id:long, time:long, lat:double, lon:double, alt:double);
Even though the schema returned by the UDF is correct, the output still needs to be cast, in order to work correctly.