Join - Pig scripts - hadoop

Iam new to Pig scripts. I need help in joining 'B' and 'E'. Below is my script.
A = LOAD ....
PAPS_1 = FILTER A BY (dataMap#'corr_id_' is NOT null);
B = FOREACH PAPS_1 GENERATE dataMap#'corr_id_' as id, dataMap#'response' as resp, status;
C = LOAD ..
D = FILTER C BY (dataMap#'corr_id_' is NOT null);
E = FOREACH D GENERATE dataMap#'corr_id_' as id, status;
I tried joining like this. But it doesn't work. I am getting null values. Please correct me.
F = JOIN B BY id, E BY id;
Values in B:
23456ac, 200, 0
3453da3, 200, 0
Values in C:
23456ac, 0
3453da3, 0
Values in E:
23456ac, 0
3453da3, 0
My output is:
NULL, 200, 0, NULL, 0
NULL, 200, 0, NULL, 0
Expected is:
23456ac, 200, 0, 23456ac, 0
3453da3, 200, 0, 3453da3, 0
Thanks In Advance

Related

Filtering data in Linq c#

Val
param
Status
1
100
1
2
100
1
3
100
1
4
100
1
5
100
1
3
200
0
5
200
0
i want linq filteration c# to filter like this
Val
param
Status
1
100
1
2
100
1
4
100
1
1)I want to eliminate rows with status zero '0'
2)I want to eliminate all rows containing same val column values if one is having status 0
Help will be appreciate.
thanks in advance
Try this:
void Main()
{
var data = new List<Item>()
{
new Item(1,100,1),
new Item(2, 100, 1),
new Item(3, 100, 1),
new Item(4, 100, 1),
new Item(5, 100, 1),
new Item(3, 200, 0),
new Item(5, 200, 0)
};
IEnumerable<Item> dataWithNonZeroSatus = data.Where(d=> d.Status !=0);
int[] itemsToSkip = data.Except(dataWithNonZeroSatus).Select(v=> v.Val).ToArray();
var result = dataWithNonZeroSatus.Where(item=> !itemsToSkip.Contains(item.Val));
result.Dump();
}
class Item{
public int Val{get;set;}
public int param{get;set;}
public int Status{get;set;}
public Item(int val, int par, int status)
{
Val = val;
param = par;
Status =status;
}
}
Result:

how to group by columnA and sum columnB, columnC in mixpanel JQL

I have a JQL in mixpanel. I managed to get to a result in the following format:
key, count1, count2, count3
a , 10, 0, 0
a , 0, 3, 0
a , 0, 0, 7
b , 2, 0, 0
b , 0, 3, 0
b , 0, 0, 5
And I'd like to get the results like:
key, count1, count2, count3
a , 10, 3, 7
b , 2, 3, 5
In other words: groupBy(['key'], WHAT_REDUCER_DO_I_NEED???)
You can use multiple reducers in your groupBy statement like this
.groupBy(['key'], [
mixpanel.reducer.sum('count1'),
mixpanel.reducer.sum('count2'),
mixpanel.reducer.sum('count3')
])
.groupBy(["key"], function(accumulators, events){
var sum = {"count1":0,"count2":0,"count3":0};
for (var i = 0; i < accumulators.length; ++i) {
sum["count1"] += accumulators[i]["count1"];
sum["count2"] += accumulators[i]["count2"];
sum["count3"] += accumulators[i]["count3"];
}
for (i = 0; i < events.length; ++i) {
var event = events[i];
sum["count1"] += event["count1"];
sum["count2"] += event["count2"];
sum["count3"] += event["count3"];
}
return sum;
})

HBase FuzzyRowFilter doesn't work

I have a row key composed of twenty characters like this:
XXAAAAXXXXXXXXXXXXXXX
I want to scan using a FuzzyRowFilter on the AA value on 2 to 6 position . But AA is not fixed value.
If you don't have AAAA as fixed,you can do like this :
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);
String[] matches=new String[]{"AAAA","BBBB"};
for (String match:matches) {
byte[] rk = Bytes.toBytesBinary("??" + match + "??????????????");
byte[] fuzzyVal = new byte[]{1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
List<Pair<byte[], byte[]>> pairs = new ArrayList<>();
pairs.add(new Pair<>(rk, fuzzyVal));
filterList.addFilter(new FuzzyRowFilter(pairs));
}
Scan scan=new Scan();
scan.setFilter(filterList);
This will filter based on all the fuzzyFilters in the list and match based on FilterList.Operator.MUST_PASS_ONE.
Based on your requirements,go ahead and modify it accordingly.
Sorry but using,
FilterList filterList = new FilterList(/*FilterList.Operator.MUST_PASS_ONE*/);
String[] matches=new String[]{"1609","1610"};
for (String match:matches) {
byte[] rk = Bytes.toBytesBinary("??" + match + "??????????????");
byte[] fuzzyVal = new byte[]{1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
List<Pair<byte[], byte[]>> pairs = new ArrayList<>();
pairs.add(new Pair<>(rk, fuzzyVal));
filterList.addFilter(new FuzzyRowFilter(pairs));
}
Scan scan_fuzzy=new Scan();
scan_fuzzy.setFilter(filterList);
ResultScanner rs_fuzzy = table.getScanner(scan);
for( Result result : rs_fuzzy) {
System.out.println(result);
System.out.println("Value: " + Bytes.toString(result.getValue(FAMILY,COLUMN)));
}
my result doesn't contain just the row corresponding to the rowkey :02160901222720647002
but all the row (and the rowkey) of my table

Algorithms: How to highlight image difference using rectangles?

I need to compare two images and create rectangles of the difference. I can build a difference matrix like this:
0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1
0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 1 1 0
0 0 0 0 1 1 0 0
where 1 is diff pixel. I need to find the way to create rectangles for areas of image diff. In my example, I have three areas to highlight.
# # #
# 1 #
# # #
# # # #
# 1 1 1
# # # #
# # # # #
# 1 0 0 #
# 0 1 0 #
# 1 1 1 #
# 1 1 0 #
So I'm looking for an algorithm to do that in a convenient way.
I'm assuming that the problem is as follows. Given a 0/1-matrix, cover the regions containing 1s with disjoint rectnagles (i.e. the rectangles must no intersect). In particular, non-rectangular shapes---for example, an L-shaped domino---are disallowed.
Here's an idea for an algorithm:
start at the origin, at index (0,0), and expand out until the expanded region contains a single rectangle of 1s that you cannot enlarge by moving to adjacent cells in either direction.
add that rectangle to the collection, and remove the processed region;
recurse on the remaining cells.
The running time should be linear in the number of cells; however, depending on whether there are additional specifications on the type of the output, you may need to change the first step.
I like the problem a lot. Notice that many different solutions may exist for a problem instance. A natural variation would be to require a cover comprised of as few rectangles as possible (i.e. a minimal cover); in this case, too, there may exist many different solutions. (The counting version of the problem looks interesting from the complexity-theoretic viewpoint.)
You could do a two-step algorithm: First, you find 8-connected components in your image, then you calculate the bounding box for each of the component.
This approach may lead to overlapping rectangles (imagine two nearby "L"-shapes), which you could solve by either merging the overlapping rectangles or by zeroing out non-connected components from each rectangle (so that you can sum all the rectangles and reconstruct the difference image appropriately).
If you go with the second choice, you can get the rectangles in Matlab as follows:
%# each element of the struct array rectangles contains a field
%# .Image, which is the rectangle, and
%# .BoundingBox, which is the coordinates of the rectangle.
rectangles = regionprops(differenceImage,'Image','BoundingBox');
Below is some JS demonstration code to find the rectangles, starting each time at the next remaining non-empty cell and exploring all paths in a recursive way. Cells are cleared as they are visited.
This is pretty close to what the the 'fill' tool is doing in MS Paint and the like. More precisely, this is a flood fill algorithm, as mentioned by j-random-hacker in the comments.
This code will find the inner bounds of the rectangles. It'd need to be slightly updated if you want the outer bounds instead.
var W = 8, H = 8;
var matrix = [
// 0 1 2 3 4 5 6 7
[ 0, 0, 0, 0, 0, 0, 0, 0 ], // 0
[ 0, 0, 0, 0, 0, 1, 1, 1 ], // 1
[ 0, 0, 0, 1, 0, 0, 0, 0 ], // 2
[ 0, 0, 0, 0, 0, 0, 0, 0 ], // 3
[ 0, 0, 0, 0, 1, 0, 0, 0 ], // 4
[ 0, 0, 0, 0, 0, 1, 0, 0 ], // 5
[ 0, 0, 0, 0, 1, 1, 1, 0 ], // 6
[ 0, 0, 0, 0, 1, 1, 0, 0 ] // 7
];
var dir = [
[ -1, -1 ], [ 0, -1 ], [ 1, -1 ], [ 1, 0 ], [ 1, 1 ], [ 0, 1 ], [ -1, 1 ], [ -1, 0 ]
];
var x, y, rect;
for(y = 0; y < H; y++) {
for(x = 0; x < W; x++) {
if(diffAt(x, y)) {
rect = { x0:x, y0:y, x1:x, y1:y };
recurse(x, y, rect);
console.log(
'Found rectangle: ' +
'(' + rect.x0 + ',' + rect.y0 + ') -> ' +
'(' + rect.x1 + ',' + rect.y1 + ')'
);
}
}
}
function recurse(x, y, rect) {
rect.x0 = Math.min(rect.x0, x);
rect.y0 = Math.min(rect.y0, y);
rect.x1 = Math.max(rect.x1, x);
rect.y1 = Math.max(rect.y1, y);
matrix[y][x] = 0;
for(var d = 0; d < 8; d++) {
if(diffAt(x + dir[d][0], y + dir[d][1])) {
recurse(x + dir[d][0], y + dir[d][1], rect);
}
}
}
function diffAt(x, y) {
return x < 0 || x >= W || y < 0 || y >= H ? 0 : matrix[y][x];
}

SSRS Divide by zero error

I am getting NaN in 3 places in my SSRS report. I believe it is because I am dividing by 0. I am trying to find the average days for prescriptions on-time, late, and not filled. The 3 expressions that I was given are below. What and where would I need to insert the iif statement addressing the 0 issue. I am new to this
=sum(iif(Fields!DaysDifference.Value >= -1 and Fields!DaysDifference.Value <= 1 and Fields!ActualNextFillDateKey.Value <> 0, Fields!DaysDifference.Value,0))/
sum(iif(Fields!DaysDifference.Value >= -1 and Fields!DaysDifference.Value <= 1 and Fields!ActualNextFillDateKey.Value <> 0, 1,0))
=sum(iif(Fields!DaysDifference.Value > 1 and Fields!ActualNextFillDateKey.Value <> 0, Fields!DaysDifference.Value,0))/
sum(iif(Fields!DaysDifference.Value > 1 and Fields!ActualNextFillDateKey.Value <> 0, 1,0))
=sum(iif(Fields!ActualNextFillDateKey.Value = 0, Fields!DaysDifference.Value, 0))/
sum(iif(Fields!ActualNextFillDateKey.Value = 0, 1, 0))
Instead of using 0, you should be dividing by your field:
=SUM(IIF(Fields!ActualNextFillDateKey.Value = 0, Fields!DaysDifference.Value, 0))/
SUM(IIF(Fields!ActualNextFillDateKey.Value = 0, 1, Fields!ActualNextFillDateKey.Value))

Resources