Hadoop Pig comparing two values and sort them

Hadoop Pig comparing two values and sort them - hadoop

I'm currently learning the hadoop framework and the pig latin language.
Now I've a problem.
I've got a data-set with the following format:
"long a, long b, char c, char d"
Now I want to read this data-sets with pig. That's no problem with the load and PigStoarage funktion..
bla = load 'data/examples/test' as (a:long, b:long, c:chararray, d:chararray);
My next step is, that I want to compare a with b on each line. If a is greater than b it's okay. If b is greater than a, I wan't to switch a with b, so that the higher value is always the first value of my data set...
Is this possible? In Java I can do this with a simple "compareTo"...
sorry for my bad english :-)

blb = FOREACH bla GENERATE ((a < b) ? b : a), ((a < b) ? a : b), c, d;
This operator in Pig is called bincond. The first one says, if a is less than b, then output b. The second one says, if a is less than b, then output a. Notice that when a is greater than b, it outputs the opposite.

Related

Prolog pythagorean triplet

I'm trying to solve project Euler problem 9 using prolog. I'm a 100% n00b with prolog and I'm just trying to understand the basics right now.
I'm trying to use findall to get all triplets that would add up to 1000, but I can't really figure out the syntax.
What I'm hoping for is something like:
pythag_trip(A, B, C, D) :- D is (A * A) + (B * B) + (C * C).
one_thou_pythag(A, B, C) :- pythag_trip(A, B, C, 1000).
product_trip(A, B, C, D) :- D is A * B * C.
findall([A, B, C], one_thou_pythag(A, B, C) , Bag)).
writeln(Bag).
I know that doesn't work because it's saying Bag is not instantiated. But there are still some basics that I don't understand about the language, too.
1: can I even do this? With multiple moving pieces at once? Can I find all triplets satisfying a condition? Do I need to go down a completely different like like using clpfd?
2: What is supposed to be going in that last argument where I put Bag?
3: Is it possible to create data types? I was thinking it might be good to create a triplet set type and an operation to get the pythagorean triplet sum of them if I have to find some way to generate all the possibilities on my own
Basically those questions and then, I could use some pointing in the right direction if anyone has tips

Sorry but I don't answer your questions. It seems to me that you're trying not a prolog-like approach.
You should try to solve it logically.
So do this problem from the top to bottom.
We want to have 3 numbers that sum to 1000.
between(1,1000,A), between(A,1000,B), between(B,1000,C), C is 1000-A-B.
In that case, we will have them sorted and we won't take permutations.
So let's go a step further. We want them to be pythagorem triplet.
between(1,1000,A), between(A,1000,B), between(B,1000,C), C is 1000-A-B, is_triplet(A,B,C).
But we don't have is_triplet/3 predicate, so let's create it
is_triplet(A,B,C) :- D is C*C - A*A -B*B, D=0.
And that's actually it.
So let's sum it up
is_triple(A, B, C) :- D is C*C - A*A - B*B, D = 0.
triplet(A,B,C) :- between(1,1000,A), between(A,1000,B), C is 1000-A-B, between(B,1000,C), C is 1000-A-B, is_triple(A,B,C).
When you call triplet(A,B,C) you should get an answer.
Notice one thing, that at the end I've swapped C is 1000-A-B with between(B,1000,C). It makes the program much faster, try to think why.

get all combinations of elements and elements can be repeat many times in single combination

I have a problem to get all combinations of elements, and elements can be repeat and reuse for many times, even in a single combination.
For example, I have a box with 100 cm2, then i have below objects:
1) Object A: 20cm2
2) Object B: 50cm2
The expected combinations would be: (A), (A, A), (A, A, A), (A, A, A, A), (A, A, A, A, A), (A, B), (A, B, A), (A, B, A, A) .....
Any combination are allowed, as long as they can fit into the box. Objects can be repeat many times in single combination. However, repeated pattern is not needed e.g. (A, B) is equal to (B, A).
I not sure what is the keyword to search for this question, do let me know if this is a repeated question.

Seems to me like a recursive algo would do the job: fit the first object then add all combinations of the next objects (including the one you just included) in the box with a reduced size.
Then do the same with the second object, always using combinations with the next objects in line, not the previous ones (can't have an A after a B).
With your example, you would have:
(A)
(A,A)
(A,A,A)
(A,A,A,A)
(A,A,A,A,A)
(A,A,A,A,B) does not work
(A,A,A,B) does not work
(A,A,B)
(A,B)
(A,B,B) does not work
(B)
(B,B)

Construction from many sets

I have four sets:
A={a,b,c}, B={d,e}, C={c,d}, D={a,b,c,e}
I want to search the sequence of sets that give me: a b c d
Example: the sequence A A A C can give me a b c d because "a" is an element of A, "b" is an element of A, "c" is an element of A and "d" is an element of C.
The same thing for : D A C B, etc.
I want an algorithm to enumerate all sequences possibles or a mathematical method to find the sequences.

You should really come up with some code of your own and then ask specific questions about problems with it. But it's interesting, so I'll share some thoughts.
You want a b c d.
a can come from A, D
b can come from A, D
c can come from A, C, D
d can come from B, C
So the problem reduces to finding all of the 2*2*3*2=24 ways to combine those options.
One way is recursion with backtracking. Build it from left to right, output when you have a complete set. Like the 8 queens problem, but much simpler since everything is independent.
Another way is to count the integers and map them into a mixed-base system. First digit base 2, then 2, 3, 2. So 0 becomes AAAB, 1 is AAAC, 2 is AACB, etc. 23 is DDDC and 24 needs five digits so you stop there.

Getting all nodes that have atleast 2 nodes connected to them

Having some issues with a Prolog question:
The following clauses represent a directed graph, where nodes are
atoms, and edges are denoted by the connected predicate. Given that
the following clauses are in the database, answer the two questions
below.
connected(a,b).
connected(b,d).
connected(b,e).
connected(b,f).
connected(a,c).
connected(c,f).
connected(c,g).
connected(h,c).
path(X,Y) :- connected(X,Y).
path(X,Z) :- connected(X,Y), path(Y,Z).
Show the Prolog query that returns all nodes having two or more
different incoming edges (i.e., at least two different nodes are
connected to it). Also, show the result of entering the query (asking
for every solution). Your query may return the same node multiple
times, and may printout the values of other variables in the query.
The variable denoting the node in question should be called DNode.
So far I have:
path(DNode,__) , path(__,DNode).
But that only give me b and c
I think the letters with more than one nodes are a, b, c, f.
I tried this to get a, b and c:
path(__,DNode),path(DNode,__) ; path(DNode,__) , path(DNode,__).
But I got a, b, c and h.
I am assuming I'll have to like this to get all the letters I want:
path(__,DNode),path(DNode,__) ; path(DNode,__) , path(DNode,__) ; path(__,DNode) , path(__,DNode).
It gives me a, b, c, e, f, g and h though.
Any advice about how to get the 4 letters I want would be appreciated.

if you display your graph, perhaps with Graphviz
?- writeln('digraph G {'), forall(connected(A,B), writeln(A->B)), writeln('}').
you can see that only [c,f] have 2 incoming edges, and that you don't need path/2. It's sufficient a join on the second argument of connected/2, with a test that the first arguments are different (operator (\=)/2).

Algorithm for generating different orders

I am trying to write a simple algorithm that generates different sets
(c b a) (c a b) (b a c) (b c a) (a c b) from (a b c)
by doing two operations:
exchange first and second elements of input (a b c) , So I get (b a c)
then shift first element to last = > input is (b a c), output is (a c b)
so final output of this procedure is (a c b).
Of course, this method only generates a c b and a b c. I was wondering if using these two operations (perhaps using 2 exchange in a row and then a shift, or any variation) is enough to produce all different orderings?
I would like to come up with a simple algorithm, not using > < or + , just by repeatedly exchanging certain positions (for example always exchanging positions 1 and 2) and always shifting certain positions (for example shift 1st element to last).

Note that the shift operation (move the first element to the end) is equivalent to allowing an exchange (swap) of any adjacent pair: you simply shift until you get to the pair you want to swap, and then swap the elements.
So your question is essentially equivalent to the following question: Is it possible to generate every permutation using only adjacent-pair swap. And if it is, is there an algorithm to do that.
The answer is yes (to both questions). One of the algorithms to do that is called "The Johnson–Trotter algorithm" and you can find it on Wikipedia.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hadoop Pig comparing two values and sort them - hadoop

blb = FOREACH bla GENERATE ((a < b) ? b : a), ((a < b) ? a : b), c, d; This operator in Pig is called bincond. The first one says, if a is less than b, then output b. The second one says, if a is less than b, then output a. Notice that when a is greater than b, it outputs the opposite.

Related

Prolog pythagorean triplet

get all combinations of elements and elements can be repeat many times in single combination

Construction from many sets

Getting all nodes that have atleast 2 nodes connected to them

Algorithm for generating different orders

Categories

Resources