loading data based on conditions in APACHE PIG - hadoop

Problem statement-
I want to check if value of column in relation xyz is even then load first 10 fields(1-10) of a file abc and if not then load another 10(11-20).
Relation XYZ
123
Relation ABC
a b c d e f g h i j k l m n o p q r s t
if 123 is even then
relation PQR should have a-j
other wise k-t
Could somebody help.

You should write a storage function to do that.
See the implementation of CSVExcelStorage http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java for example.

Related

Why does \lstinputlisting show up as raw text in my latex file?

I have the .txt file encoded as UTF-8, I imported the package "\usepackage{listings}", I referenced it properly but yet it wont show up as the code but only prints out the
"\ l s t i n p u t l i s t i n g { f i g u r e s / C o d e S t a t e m a c h i n e . t x t }" part. What am I doing wrong here?
\begin{lstlisting}[language=Matlab, caption={Code Statemachine}, label={lst:CodeStatemachine}, captionpos=b]
\lstinputlisting{figures/CodeStatemachine.txt}
\end{lstlisting}
I also tried just pasting the Code itself, however it just cuts off after it fills the A4 page..
The correct syntax is
\begin{lstlisting}[language=Matlab, caption={Code Statemachine}, label={lst:CodeStatemachine}, captionpos=b]
Code goes here
\end{lstlisting}
or
\lstinputlisting[language=Matlab, caption={Code Statemachine}, label={lst:CodeStatemachine}, captionpos=b]{test.txt}
but not both at the same time.

Can I use a GraphQL union for plain strings?

In Graphql, I can create a union such as the following:
union SearchResult = Book | Movie
Is there a way I can do this for plain strings? Something like this:
union AccountRole = "admin" | "consumer"
I am afraid you cannot do that because it is what defined by the specification.
From the union syntax mentioned at specification here , the part that you want to change should follow the Names syntax , which the first character is only allow to be upper case letter, lower case latter or _
(i.e. the characters set as follows)
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m
n o p q r s t u v w x y z _

Restricted combinations (algorithm)

Consider the following example:
I have a list of 5 items, each with their occurrence with either 1 or 0:
{a, b, c, d, e}
The restricted combinations are as follows:
the occurrence of a, c, and e cannot be 1 at any given time.
the occurrence of b, d, and e cannot be 1 at any given time.
basically, if found in database that occurrence of a and c is already 1, and if a given input is e (giving e an occurrence of 1) is not allowed (clause 1) or vice versa.
another example, d and e has an occurrence of 1 respectively in the database, a new input of b will not be allowed (following clause 2).
An even more solid example:
LETTER | COUNT(OCCURRENCE)
------------------------------
a | 1
b | 1
c | 1
d | 0
e | 0
Therefore, a new input of e would be rejected because of the violation of clause 1.
What is the best algorithm/practice for this solution?
I thought of having many if-else statements, but that doesn't seem efficient enough. What if I had a dynamic list of elements instead? Or at least have a better extensibility to this piece of program.
As mentioned by BKassem(I think) in the comments(removed for whatever reason).
The algorithm for this scenario:
(count(a) * count(c) * count(e)) == 0 //proceed to further actions
Worked flawlessly!

Algorithm to find a group seating arrangement for an open book test

You are planning the group seating arrangement for a open book test given a list of students, V from different schools to participate. Assuming the fact that students who are known to each other directly or indirectly will probably cheat more as compared to unknown people sitting together.
Suppose you are also given a lookup table T where T[u] for u ? V is a list of students that u knows. If u knows v, then v knows u. You are required to arrange the seating such that any student at a table doesn't knows any other student sitting at the same table either directly or through some other student sitting at the same table. For example, if x knows y, and y knows z, then x, y, z can sit at the same table. Describe an efficient algorithm that, given V and T, returns the minimum number of tables needed to achieve this requirement. Analyze the running time of your algorithm.
Follow a student relations out to two edges, get a graph:
a - e - j
\ q
b - d
\ t
r - w - x - y - z
All the students in the same subgraph have to be separated, so the minimum number of tables is one for each students in the largest group - in this example the largest subgraph is r-w-x-y-z, so 5 tables.
Untested Python pseudocode:
# Given a student list
# a b c d e f j q r t w x y z
# start a chain at a
# a b c d e f j q r t w x y z
# .
# visit friends of a
# a b c d e f j q r t w x y z
# . .
# visit friends of a's friends
# a b c d e f j q r t w x y z
# . . . .
# if e and j are friends, don't double-count
# Get a count of 4 starting at person a
# Repeat for all students
# Report the longest chain.
friendCounts = {}
def countFriendsOf(T, student, friendTracker, moreSteps=2):
friendTracker[student] = True #quicker to set it regardless,
#than to check if it's set
if not moreSteps:
return
for friend in T[student]:
countFriendsOf(T, friend, friendTracker, moreSteps - 1)
return friendTracker
for u in V:
friends = countFriendsOf(T, u, friendTracker={})
friendCounts[u] = (len(friends), friends)
results = sorted(friendCounts.items(), key=lambda x: x[1][0], reverse=True)
(student, (friendCount, friends)) = results[0]
print "The smallest number of tables is:", friendCount
print "Mandated by the friend group of:", student
print
from pprint import pprint
pprint(friends)
Analyze the running time of your algorithm.
Analysis: Fine on any computer more powerful than a snowglobe.
Not sure. Best case: students have no friends - linear with respect to number of students. O(n). Worst case: every student is friends with every other student, then it does lookups for every student for every student, so O(n^3). Ew.
It was running more like O(n^2) until I realised that version was definitely wrong.
This version is only not-definitely-wrong, it isn't definitely-right.
I didn't even start it as a recursive solution, it just ended up going that way. friendTracker use is a nasty side-effect, and the recursive call is not tail recursion optimizable. Not that Python does that,

Can you cube between multiple relations in PIG?

I want to find the combination given another variable:
Example:
name, group, points
jim, T, 12
steven, T, 10
ting, T, 15
matt, F, 16
aamir, F, 12
I want to be able to get all combinations between members of T and F and do some multiplication to the points column for that. I first thought to break this into two relations, i.e. a T and an F relation and do some combination between them using CUBE but i don't think you can use CUBE between relations? Any suggestions?
Results:
jim, matt, 12*16
jim, aamir, 12*12
steven, matt, 16*16
...
...
ting, aamir, 15*12
Can you try this?
input.txt
jim,T,12
steven,T,10
ting,T,15
matt,F,16
aamir,F,12
PigScript:
A = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, group:chararray, points:int);
B = FILTER A BY group=='T';
C = FILTER A BY group=='F';
D = CROSS B,C;
E = FOREACH D GENERATE B::name,C::name,B::points*C::points;
DUMP E;
Output:
(jim,matt,192)
(jim,aamir,144)
(steven,matt,160)
(steven,aamir,120)
(ting,matt,240)
(ting,aamir,180)

Resources