Need help to solve this mapreduce code - hadoop

Below is the data
c1 p1 q1 d1
c2 p2 q2 d2
Need output like- if customer has purchase p1 it should give flag as 1 else it should give flag 0. there are millions of customer and millions of product Below is the required output. Any help on this is highly appreciated.
c1 p1 q1 d1 1
c1 p2 q1 d1 0
c2 p2 q2 d2 1
c2 p1 q2 d2 0

you can achieve it with just a mapside logic, sample code for your reference:
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
NullWritable value = NullWritable.get();
String tokens[] = line.split("<your delim>");
if (tokens[1] == "p1") {
line = line + "<your delim>" + "1";
} else if (tokens[1] == "p2") {
line = line + "<your delim>" + "0";
}
context.write(newText(line), value);
}

Related

Optimised EmEditor Macro to transpose and transform tab delimited data

I currently have a large tab delimited data set that looks like this (n is simply a number, there are more than 4 Headers), an initial Headers column that repeats, followed by at most, 2 columns of data (sometimes only one):
Input file:
Hdr1 A1 B1
Hdr2 A2 B2
Hdr3 A3 B3
Hdrn An Bn
Hdr1 C1
Hdr2 C2
Hdr3 C3
Hdrn Cn
Hdr1 D1 E1
Hdr2 D2 E2
Hdr2 D3 E3
Hdrn Dn En
I need to transpose and transform the data so output looks similar to this (so the repeating headers are removed and the data remains):
Hdr1 Hdr2 Hdr3 Hdrn
A1 A2 A3 An
B1 B2 B3 Bn
C1 C2 C3 Cn
D1 D2 D3 Dn
E1 E2 E3 En
Any ideas for how to do this with an Optimised EmEditor javascript Macro would be much appreciated.
Here is a JavaScript macro for you:
sSeparator = "Hdr1";
sDelimiter = "\t";
sNL = "\r\n";
Redraw = false;
document.CellMode = true; // Must be cell selection mode
editor.ExecuteCommandByID(4323); // Clear All Bookmarks in This Document
document.Filter("^" + sSeparator,0,eeFindReplaceCase | eeFindReplaceRegExp,0,0,0);
editor.ExecuteCommandByID(3927); // Bookmark All
document.Filter("",0,0,0,0,0); // Reset Filter
if( document.BookmarkCount == 0 ) {
alert( "Cannot find any lines that begin with \"" + sSeparator + "\"" );
Quit();
}
document.selection.StartOfDocument(false);
x1 = 1;
y1 = 1;
nLines = 0;
nMaxCol = document.GetColumns();
str = "";
for( ;; ) {
bStop = false;
if( document.selection.NextBookmark() ) { // if next bookmark found
y2 = document.selection.GetActivePointY(eePosCellLogical);
}
else {
y2 = document.GetLines(); // if bookmark does NOT exist at end of document
bStop = true;
}
if( nLines == 0 ) {
nLines = y2 - y1;
}
else {
if( nLines != y2 - y1 ) {
alert( "Number of lines between " + sSeparator + " is not same. Check the format of the input file." );
Quit();
}
}
for( iCol = x1; iCol <= nMaxCol; ++iCol ) {
s = document.GetCell( y1, iCol, eeCellIncludeQuotes );
if( s.length == 0 ) {
break;
}
str += s;
str += sDelimiter;
str += document.GetColumn( iCol, sDelimiter, eeCellIncludeQuotes, y1 + 1, y2 - y1 - 1 );
str += sNL;
}
y1 = y2;
if( bStop ) {
break;
}
x1 = 2; // don't need the first column except the first time
}
editor.NewFile();
document.selection.Text = str; // insert copied text
editor.ExecuteCommandByID(22529); // set TSV mode
To run this, save this code as, for instance, Transpose.jsee, and then select this file from Select... in the Macros menu. Finally, select Run Transpose.jsee in the Macros menu.

All possible combinations of N objects in K buckets

Suppose I have 3 boxes labeled A, B, C and I have 2 balls, B1 and B2. I want to get all possible combinations of these balls in the boxes. Please note, it is important to know which ball is in each box, meaning B1 and B2 are not the same.
A B C
B1, B2
B1 B2
B1 B2
B2 B1
B2 B1
B1, B2
B1 B2
B2 B1
B1, B2
Edit
If there is a known algorithm for this problem, please tell me its name.
Let N be number of buckets (3 in the example), M number of balls (2). Now, let's have a look at numbers in a range [0..N**M) - [0..9) in the example; these numbers we represent with radix = N. For the example in the question we have trinary numbers
Now we can easily interprete these numbers: first digit shows 1st ball location, second - 2nd ball position.
|--- Second Ball position [0..2]
||-- First Ball position [0..2]
||
0 = 00 - both balls are in the bucket #0 (`A`)
1 = 01 - first ball is in the bucket #1 ('B'), second is in the bucket #0 (`A`)
2 = 02 - first ball is in the bucket #2 ('C'), second is in the bucket #0 (`A`)
3 = 10 - first ball is in the bucket #0 ('A'), second is in the bucket #1 (`B`)
4 = 11 - both balls are in the bucket #1 (`B`)
5 = 12 ...
6 = 20
7 = 21 ...
8 = 22 - both balls are in the bucket #2 (`C`)
the general algorithm is:
For each number in 0 .. N**M range
ith ball (i = 0..M-1) will be in the bucket # (number / N**i) % N (here / stands for integer division, % for remainder)
If you want just total count, the answer is simple N ** M, in the example above 3 ** 2 == 9
C# Code The algorithm itself is easy to implement:
static IEnumerable<int[]> BallsLocations(int boxCount, int ballCount) {
BigInteger count = BigInteger.Pow(boxCount, ballCount);
for (BigInteger i = 0; i < count; ++i) {
int[] balls = new int[ballCount];
int index = 0;
for (BigInteger value = i; value > 0; value /= boxCount)
balls[index++] = (int)(value % boxCount);
yield return balls;
}
}
It's answer representation which can be entangled:
static IEnumerable<string> BallsSolutions(int boxCount, int ballCount) {
foreach (int[] balls in BallsLocations(boxCount, ballCount)) {
List<int>[] boxes = Enumerable
.Range(0, boxCount)
.Select(_ => new List<int>())
.ToArray();
for (int j = 0; j < balls.Length; ++j)
boxes[balls[j]].Add(j + 1);
yield return string.Join(Environment.NewLine, boxes
.Select((item, index) => $"Box {index + 1} : {string.Join(", ", item.Select(b => $"B{b}"))}"));
}
}
Demo:
int balls = 3;
int boxes = 2;
string report = string.Join(
Environment.NewLine + "------------------" + Environment.NewLine,
BallsSolutions(boxes, balls));
Console.Write(report);
Outcome:
Box 1 : B1, B2, B3
Box 2 :
------------------
Box 1 : B2, B3
Box 2 : B1
------------------
Box 1 : B1, B3
Box 2 : B2
------------------
Box 1 : B3
Box 2 : B1, B2
------------------
Box 1 : B1, B2
Box 2 : B3
------------------
Box 1 : B2
Box 2 : B1, B3
------------------
Box 1 : B1
Box 2 : B2, B3
------------------
Box 1 :
Box 2 : B1, B2, B3
Fiddle
There's a very simple recursive implementation that at each level adds the current ball to each box. The recursion ends when all balls have been processed.
Here's some Java code to illustrate. We use a Stack to represent each box so we can simply pop the last-added ball after each level of recursion.
void boxBalls(List<Stack<String>> boxes, String[] balls, int i)
{
if(i == balls.length)
{
System.out.println(boxes);
return;
}
for(Stack<String> box : boxes)
{
box.push(balls[i]);
boxBalls(boxes, balls, i+1);
box.pop();
}
}
Test:
String[] balls = {"B1", "B2"};
List<Stack<String>> boxes = new ArrayList<>();
for(int i=0; i<3; i++) boxes.add(new Stack<>());
boxBalls(boxes, balls, 0);
Output:
[[B1, B2], [], []]
[[B1], [B2], []]
[[B1], [], [B2]]
[[B2], [B1], []]
[[], [B1, B2], []]
[[], [B1], [B2]]
[[B2], [], [B1]]
[[], [B2], [B1]]
[[], [], [B1, B2]]

Evaluate a BitVec in Z3Py

I am learning Z3 and perhaps my question does not apply, so please be patient.
Suppose I have the following:
c1, c2 = z3.BitVec('c1', 32), z3.BitVec('c2', 32)
c1 = c1 + c1
c2 = c2 + c2
c2 = c2 + c1
c1 = c1 + c2
e1 = z3.simplify(c1)
e2 = z3.simplify(c2)
When I print their sexpr():
print "e1=", e1.sexpr()
print "e2=", e2.sexpr()
Output:
e1= (bvadd (bvmul #x00000004 c1) (bvmul #x00000002 c2))
e2= (bvadd (bvmul #x00000002 c2) (bvmul #x00000002 c1))
My question is, how can I evaluate the numerical value of 'e1' and 'e2' for user supplied values of c1 and c2?
For example, e1(c1=1, c2=1) == 6, e2(c1=1, c2=1) == 4
Thanks!
I figured it out. I had to introduce two separate variables that will hold the expressions. Then I had to introduce two result variables for which I can query the model for their value:
e1, e2, c1, c2, r1, r2 = z3.BitVec('e1', 32), z3.BitVec('e2', 32), z3.BitVec('c1', 32),
z3.BitVec('c2', 32), z3.BitVec('r1', 32), z3.BitVec('r2', 32)
e1 = c1
e2 = c2
e1 = e1 + e1
e2 = e2 + e2
e2 = e2 + e1
e1 = e1 + e2
e1 = z3.simplify(e1)
e2 = z3.simplify(e2)
print "e1=", e1
print "e2=", e2
s = z3.Solver()
s.add(c1 == 5, c2 == 1, e1 == r1, e2 == r2)
if s.check() == z3.sat:
m = s.model()
print 'r1=', m[r1].as_long()
print 'r2=', m[r2].as_long()

Using java 8 streams api for nested lookup

How can we use java 8 streams api to get expected output here
A1 has B1, B2
A2 has B1, B2, B3
B1, B2 belong to C1
B3 belong to C2
So, for C1, count should be 4 as B1, B2 appears 4 times
Likewise count for C2 will be 1 as B3 appears 1 time
List<String> A= new ArrayList<>();
A.add("A1");
A.add("A2");
Map<String, List<String>> AMap = new HashMap<>();
AMap.put("A1", Arrays.asList("B1", "B2"));
AMap.put("A2", Arrays.asList("B1", "B2", "B3"));
Map<String, String> BMap = new HashMap<>();
CMap.put("B1", "C1");
CMap.put("B2", "C1");
CMap.put("B3", "C2");
Expected output
C1 : 4 , C2 : 1
For each key in the A list, I would fetch each B key which would fetch each C value from the CMap. Then flatmap the stream, group by identity and count the values.
import static java.util.Collections.emptyList;
import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
...
Map<String, Long> res = A.stream()
.flatMap(a -> AMap.getOrDefault(a, emptyList()).stream().map(BMap::get))
.collect(groupingBy(identity(), counting()));
In two steps...
List<String> all = AMap.values()
.stream()
.flatMap(Collection::stream)
.collect(Collectors.toList());
Map<String, Long> result = CMap.entrySet().stream()
.collect(Collectors.groupingBy(
Entry::getValue,
Collectors.summingLong(
x -> all.stream().filter(y -> y.equals(x.getKey())).count())));
System.out.println(result); // {C1=4, C2=1}

Using stack to find the greatest common divisor

I am trying to implement an alogorithm to find the greatest common divisor using a stack: I am unable to formulate the correct answer based on my logic below. Please help. Here is my code:
def d8(a,b)
if (a==b)
return a
end
s = Stack.new
s.push(b)
s.push(a)
c1 = s.pop
c2 = s.pop
while c1!=c2
if s.count>0
c1 = s.pop
c2 = s.pop
end
if c1== c2
return c1
elsif c1>c2
c1 = c1-c2
s.push(c2)
s.push(c1)
else
c2 = c2 -c1
s.push(c2)
s.push(c1)
end
end
return nil
end
GCD cannot be nil. Two integers always have a GCD. So the logic in the function is already incorrect just because under some condition it has a return nil.
Looking at this return nil condition, it is happening when c1 == c2 (it will exit the while loop). At the same time, inside the while loop, you return a value if c1 == c2. These two cases are in logical contradiction. In other words, you are exiting the while loop on the c1 == c2 condition and treating that condition as invalid before your if c1 == c2 condition can trigger and treat the condition as valid and return the correct answer.
Simplifying the logic a little, you get:
def d8(a,b)
return a if a == b # Just a simpler way of doing a small if statement
s = Stack.new # "Stack" must be a gem, not std Ruby; "Array" will work here
s.push(b)
s.push(a)
#c1 = s.pop # These two statements aren't really needed because of the first
#c2 = s.pop # "if" condition in the while loop
while c1 != c2
if s.count > 0
c1 = s.pop
c2 = s.pop
end
# if c1 == c2 isn't needed because the `while` condition takes care of it
if c1 > c2
c1 = c1 - c2
else
c2 = c2 - c1
end
# These pushes are the same at the end of both if conditions, so they
# can be pulled out
s.push(c2)
s.push(c1)
end
return c1 # This return occurs when c1 == c2
end
This will work, but it becomes more obvious that the use of a stack is superfluous and serves no purpose at all in the algorithm. s.count > 0 will always be true, and you are popping variables off right after you push them (basically a no-op). So this is equivalent to:
def d8(a,b)
return a if a == b
c1 = a
c2 = b
while c1 != c2
if c1 > c2
c1 = c1 - c2
else
c2 = c2 - c1
end
end
return c1
end
Java code for it would be
public static int gcd (int p, int q) {
StackGeneric<Integer> stack = new StackGeneric<Integer>();
int temp;
stack.push(p);
stack.push(q);
while (true) {
q = stack.pop();
p = stack.pop();
if (q == 0) {
break;
}
temp = q;
q = p % q;
p = temp;
stack.push(p);
stack.push(q);
}
return p;
}
Replace the function call in recursive solution with the while loop and iterate it till the second argument becomes 0, as it happens with the recursive function call
public static int gcd (int p, int q) {
if (q == 0) {
return p;
}
return gcd(q, p % q);
}

Resources