Recently, I am reading the definitive guide of hadoop.
I have two questions:
1.I saw a piece of code of one custom Partitioner:
public class KeyPartitioner extends Partitioner<TextPair, Text>{
#Override
public int getPartition(TextPair key, Text value, int numPartitions){
return (key.getFirst().hashCode()&Interger.MAX_VALUE)%numPartitions;
}
}
what does that mean for &Integer.MAX_VALUE? why should use & operator?
2.I also want write a custom Partitioner for IntWritable. So is it OK and best for key.value%numPartitions directly?
Like I already wrote in the comments, it is used to keep the resulting integer positive.
Let's use a simple example using Strings:
String h = "Hello I'm negative!";
int hashCode = h.hashCode();
hashCode is negative with the value of -1937832979.
If you would mod this with a positive number (>0) that denotes the partition, the resulting number is always negative.
System.out.println(hashCode % 5); // yields -4
Since partitions can never be negative, you need to make sure the number is positive. Here comes a simple bit twiddeling trick into play, because Integer.MAX_VALUE has all-ones execpt the sign bit (MSB in Java as it is big endian) which is only 1 on negative numbers.
So if you have a negative number with the sign bit set, you will always AND it with the zero of the Integer.MAX_VALUE which is always going to be zero.
You can make it more readable though:
return Math.abs(key.getFirst().hashCode() % numPartitions);
For example I have done that in Apache Hama's partitioner for arbitrary objects:
#Override
public int getPartition(K key, V value, int numTasks) {
return Math.abs(key.hashCode() % numTasks);
}
Related
While using a random number generator (RNG) with a given seed several times (ie. each time calling setSeed() with the same seed to start over), I have encountered some deviation in the sequence of numbers generated on each pass. After banging my head against the wall a few times I found the reason to be this:
box2d's World.createBody() calls LongMap.put(), which calls LongMap.push(), which calls MathUtils.random() inside a while loop.
To my knowledge particle effects call MathUtils.random() too.
So how can I trust a sequence of numbers to always repeat itself if LibGDX internally uses the same static RNG instance and therefor could mess up the sequence?
How am I supposed to know exactly where and when MathUtils.random() gets called outside my code?
As noted by #Peter R, one can create one's own RNG which guarantees nothing would interfere with the sequence of numbers.
Either Java's Random can be used:
import java.util.Random;
private Random random = new Random();
or RandomXS128 that is used by MathUtils (which extends Java's Random but is faster):
import com.badlogic.gdx.math.RandomXS128;
private Random random = new RandomXS128();
The convenient wrapper methods (random() signatures) in MathUtils can be used too (copied into one's own class) as per needed, whether static or not. eg:
/** Returns a random number between start (inclusive) and end (inclusive). */
public int random (int start, int end) {
return start + random.nextInt(end - start + 1);
}
/** Returns a random number between start (inclusive) and end (exclusive). */
public float random (float start, float end) {
return start + random.nextFloat() * (end - start);
}
For myself I'm still befuddled as to why MathUtils provides a shared RNG to be used both internally and externally, which makes using it with a seed unsafe, and with no mention of that in the comments.
But the above workaround should be satisfactory to anyone who is not as petty as I am.
I'm trying to execute an algorithm on an Arduino UNO, it needs const table with some larges numbers and sometimes, I get overflow values. This is the case for this number : 628331966747.0
Okay, this is a big one, but its type is float (32 bit) where maximum is 3.4028235e38. So it should work, theoretically ?
What can I do against this ? Do you know a solution ?
EDIT : On Arduino UNO, double are exaclty the same type that floats (32 bits)
Here is a code that leads to the error :
float A;
void setup() {
A = 628331966747.0;
Serial.begin(9600);
}
void loop() {
Serial.println(A);
delay(1000);
}
it print "ovf, ovf, ..., ovf"
There is nothing wrong with the constant itself (except for its rather optimistic number of significant figures), but the problem is with the implementation of the Arduino's library support for printing floating point values. Print::printFloat() contains the following pre-condition tests:
if (isnan(number)) return print("nan");
if (isinf(number)) return print("inf");
if (number > 4294967040.0) return print ("ovf"); // constant determined empirically
if (number <-4294967040.0) return print ("ovf"); // constant determined empirically
It seems that the range of printable values is deliberately restricted in order presumably to reduce complexity and code size. The subsequent code reveals why:
// Extract the integer part of the number and print it
unsigned long int_part = (unsigned long)number;
double remainder = number - (double)int_part;
n += print(int_part);
The somewhat simplistic implementation requires that the absolute value of the integer part is itself a 32bit integer.
The worrying thing perhaps is the comment "constant determined empirically" which rather suggests that the values were arrived at by trial and error rather then an understanding of the mathematics! One has to wonder why these values are not defined in terms of INT_UMAX.
There is a proposed "fix" described here, but it will not work at least because it applies the integer abs() function to the double parameter number, which will only work if the integer part is less than the even more restrictive MAX_INT. The author has posted a link to a zip file containing a fix that looks more likely to work (there is evidence at least of testing!).
i am learning partitioner concept now.can any one explain me the below piece of code.it is hard for me to understand
public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {
#Override
public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
return taggedKey.getJoinKey().hashCode() % numPartitions;
}
}
how this taggedKey.getJoinKey().hashCode() % numPartitions determine which reducer to be executed for a key?
can any one explain me this?
It's not as complex as you think once you break things down a little bit.
taggedKey.getJoinKey().hashCode() will simply return an integer. Every object will have a hashCode() function that simply returns a number that will hopefully be unique to that object itself. You could look into the source code of TaggedKey to see how it works if you'd like, but all you need to know is that it returns an integer based on the contents of the object.
The % operator performs modulus division, which is where you return the remainder after performing division. (8 % 3 = 2, 15 % 7 = 1, etc.).
So let's say you have 3 partitioners (numPartitions = 3). Every time you do modulus division with 3, you'll get either 0, 1, or 2, no matter what number is passed. This is used to determine which of the 3 partitioners will get the data.
The whole idea of partitioners is that you can use them to group data to be sorted. If you wanted to sort by month, you could pass every piece of data with the string "January" to the first partition, "December" to the 12th partitioner, etc. But in your case it on the outside looks a bit confusing. But really they just want to spread the data out (hopefully) evenly, so they're using a simple hash/modulus function to choose the partition at random.
I can not write integer into the LCD using those functions :S it shows something weird in screen
I just added the function below!!! please check it for me
I added everything needed
my_delay(1000);
LCDWriteStringXY(0,0,"Welcome..");
my_delay(1000);
LCDWriteStringXY(0,0,"Welcome...");
my_delay(1000);
LCDClear();
LCDWriteStringXY(4,0,"Testing");
LCDGotoXY(2,1);
int m=952520;
LCDWriteInt(m,6);//I can not write it!!!
void LCDWriteInt(int val,unsigned int field_length)
{
char str[5]={0,0,0,0,0};
int i=4,j=0;
while(val)
{
str[i]=val%10;
val=val/10;
i--;
}
if(field_length==-1)
while(str[j]==0) j++;
else
j=5-field_length;
if(val<0) LCDData('-');
for(i=j;i<5;i++)
{
LCDData(48+str[i]);
}
}
I think the function is written for 16-bit integers for which the maximum value would be 65535 (5 digits - same as the length of str[]). You are giving it 6 digit value, which first overruns the string when it tries to write to str[5], and then produces j = -1.
My suggestion is to either use smaller integers (16-bit only), or write another function like the one you showed us to do the same thing for larger values.
Lastly, I don't know if the if(val<0) LCDData('-') would actually ever work properly since you overwrite 'val' in the first while loop.
Use itoa function. That will help you converting integer to string and displaying on lcd. Best of luck!
I built a web application that is going to launch a beta test soon. I would really like to hand out beta invites and keys that look nice.
i.e. A3E6-7C24-9876-235B
This is around 16 character, hexadecimal digits.
It looks like the typical beta key you might see.
My question is what is a standard way to generate something like this and make sure that it is unique and that it will not be easy for someone to guess a beta key and generate their own.
I have some ideas that would probably work for beta keys:
MD5 is secure enough for this, but it is long and ugly looking and could cause confusion between 0 and O, or 1 and l.
I could start off with a large hexadecimal number that is 16 digits in length. To prevent people from guessing what the next beta key might be increment the value by a random number each time. The range of numbers between 1111-1111-1111-1111 and eeee-eeee-eeee-eeee will have plenty of room to spare even if I am skipping large quantities of numbers.
I guess I am just wondering if there is a standard way for doing this that I am not finding with google. Is there a better way?
The canonical "unique identifying number" is a uuid. There are various forms - you can generate one from random numbers (version 4) or from a hash of some value (user's email + salt?) (versions 3 and 5), for example.
Libraries for java, python and a bunch more exist.
PS I have to add that when I read your question title I thought you were looking for something cool and different. You might consider using an "interesting" word list and combining words with hyphens to encode a number (based on hash of email + salt). That would be much more attractive imho: "your beta code is secret-wombat-cookie-ninja" (I'm sure I read an article describing an example, but I can't find it now).
One way (C# but the code is simple enough to port to other languages):
private static readonly Random random = new Random(Guid.NewGuid().GetHashCode());
static void Main(string[] args)
{
string x = GenerateBetaString();
}
public static string GenerateBetaString()
{
const string alphabet = "ABCDEF0123456789";
string x = GenerateRandomString(16, alphabet);
return x.Substring(0, 4) + "-" + x.Substring(4, 4) + "-"
+ x.Substring(8, 4) + "-" + x.Substring(12, 4);
}
public static string GenerateRandomString(int length, string alphabet)
{
int maxlen = alphabet.Length;
StringBuilder randomChars = new StringBuilder(length);
for (int i = 0; i < length; i++)
{
randomChars.Append(alphabet[random.Next(0, maxlen)]);
}
return randomChars.ToString();
}
Output:
97A8-55E5-C6B8-959E
8C60-6597-B71D-5CAF
8E1B-B625-68ED-107B
A6B5-1D2E-8D77-EB99
5595-E8DC-3A47-0605
Doing this way gives you precise control of the characters in the alphabet. If you need crypto strength randomness (unlikely) use the cryto random class to generate random bytes (possibly mod the alphabet length).
Computing power is cheap, take your idea of the MD5 and run an "aesthetic" of your own devising over the set. The code below generates 2000 unique keys almost instantaneously that do not have a 0,1,L,O character in them. Modify aesthetic to fit any additional criteria:
import random, hashlib
def potential_key():
x = random.random()
m = hashlib.md5()
m.update(str(x))
s = m.hexdigest().upper()[:16]
return "%s-%s-%s-%s" % (s[:4],s[4:8],s[8:12],s[12:])
def aesthetic(s):
bad_chars = ["0","1","L","O"]
for b in bad_chars:
if b in s: return False
return True
key_set = set()
while len(key_set) < 2000:
k = potential_key()
if aesthetic(k):
key_set.add(k)
print key_set
Example keys:
'4297-CAC6-9DA8-625A', '43DD-2ED4-E4F8-3E8D', '4A8D-D5EF-C7A3-E4D5',
'A68D-9986-4489-B66C', '9B23-6259-9832-9639', '2C36-FE65-EDDB-2CF7',
'BFB6-7769-4993-CD86', 'B4F4-E278-D672-3D2C', 'EEC4-3357-2EAB-96F5',
'6B69-C6DA-99C3-7B67', '9ED7-FED5-3CC6-D4C6', 'D3AA-AF48-6379-92EF', ...