I want to make the following ProtoBuffer message to be forward compatible.
The current Storage message defines a state field as a bool type:
message Storage {
bool state = 1;
}
In the Protobuffer encoding, it encodes the Varint types like the bool and the enum type in the following format:
|1-bit sequence number|4-bit serial number|3-bit data type|n-bit payload|
For the Varint type, the data type value will become 000:
|X|XXXX|000|XXXX...|
Since the Storage message structure only contains one field with a serial number of 1, the sequence number will become 0 as the serial number hasn't been resolved to the last byte. Hence, the above format will become:
|0|0001|000|XXXX...|
Now, if set Storage.state = 0, it will be stored as follows:
|0|0001|000|<0 will not be encoded>
The Protobuffer value for the Storage message will become 0x8.
if set Storage.state = 1, it will be stored as follows:
|0|0001|000|00000001|
The Protobuffer value for the Storage message will become 0x8 0x1.
Now, I want to change the above Storage.state definition from the bool type to an enum type as follows:
// BIT7 | BIT6 | BIT5 | BIT4 | BIT3 | BIT2 | BIT1 | BIT0 |
//-------------------------------------------------------
// 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = STATE0 (0)
// 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = STATE1 (1)
// 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | = STATE2 (2)
// 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | = STATE2 (3)
//... so go on
// 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | = STATE2 (254)
// 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = STATE2 (255)
enum State {
STATE0 = 0;
STATE1 = 1;
STATE2 = 2;
STATE3 = 3;
//... so go on
STATE254 = 254;
STATE255 = 255;
}
message Storage {
State state = 1
}
So now, in Protobuf encoding,
if set Storage.state = State.STATE0, it will be stored as follows:
|0|0001|000|<0 will not be encoded>
The Protobuffer value for the Storage message will become 0x8.
if set Storage.state = State.STATE1, it will be stored as follows:
|0|0001|000|00000001|
The Protobuffer value for the Storage message will become 0x8 0x1.
if set Storage.state = State.STATE2, it will be stored as follows:
|0|0001|000|00000010|
The Protobuffer value for the Storage message will become 0x8 0x2.
if set Storage.state = State.STATE255, it will be stored as follows:
|0|0001|000|11111111|
The Protobuffer value for the Storage message will become 0x8 0xFF.
Will this change still be forward compatible for proto2 and proto3 and in C and Java?
I based my question on the reference below:
google protocol buffer -- the coding principle of protobuf II
I'm assuming that what you're actually trying to store here is: the bitwise state values - what might be a [Flags] enum in C# (mentioned purely to set context).
Honestly, declaring an enum with a value per bit combination: isn't a good idea; it will escalate very quickly, and it isn't intuitive to use. It also leaves potential for silly errors when copy/pasting large volumes of lines...
// omitted... 212 lines - but would you spot the error?
STATE213 = 213;
STATE214 = 214;
STATE215 = 214;
STATE216 = 216;
STATE217 = 217;
// ... etc
(OK, that specific error requires the allow-alias flag, but: you get the point)
In proto2, enums are expected to be recognised; when unexpected enum values are encountered, it gets a bit... hazy, with any of:
parse failure
treated as an unknown field (needing to be accessed via a separate API)
silently handled and parsed via the integer value (which has the effect of preserving bit flags)
Since every flag combination will not have an enum definition, what you want here is option 3, but that isn't guaranteed in all implementations.
In proto3, the framework leans as far in the direction of 3 as possible, explicitly in the language specification, with the integer value being stored and retrieved (which has the effect of preserving bit flags) but it is also explicitly called out that some platforms do not allow open enums types - for example, Java.
Because of this limitation, since you mention java in the tags, I would recommend simply using an integer directly. It will at least work similarly on all implementations. By comparison to your proposed solution, it is at least as usable - but usually a lot more usable; consider how it works as an enum:
obj.state = State.State217;
vs as an integer:
obj.state = 217;
This will also allow bitwise combination/test/etc operations to be used re values, which isn't the case for closed enum types.
As for whether bool, enum and int32/uint32/sint32 (and the 64-bit counterparts) are technically interchangeable (scale permitting): yes; they're all encoded as varint.
What is preferred (or right) way to group large number of related constants in the Go language? For example C# and C++ both have enum for this.
const?
const (
pi = 3.14
foo = 42
bar = "hello"
)
There are two options, depending on how the constants will be used.
The first is to create a new type based on int, and declare your constants using this new type, e.g.:
type MyFlag int
const (
Foo MyFlag = 1
Bar
)
Foo and Bar will have type MyFlag. If you want to extract the int value back from a MyFlag, you need a type coersion:
var i int = int( Bar )
If this is inconvenient, use newacct's suggestion of a bare const block:
const (
Foo = 1
Bar = 2
)
Foo and Bar are perfect constants that can be assigned to int, float, etc.
This is covered in Effective Go in the Constants section. See also the discussion of the iota keyword for auto-assignment of values like C/C++.
My closest approach to enums is to declare constants as a type. At least you have some type-safety which is the major perk of an enum type.
type PoiType string
const (
Camping PoiType = "Camping"
Eatery PoiType = "Eatery"
Viewpoint PoiType = "Viewpoint"
)
It depends on what do you want to achieve by this grouping. Go allows grouping with the following braces syntax:
const (
c0 = 123
c1 = 67.23
c2 = "string"
)
Which just adds nice visual block for a programmer (editors allow to fold it), but does nothing for a compiler (you can not specify a name for a block).
The only thing that depends on this block is the iota constant declaration in Go (which is pretty handy for enums). It allows you to create auto increments easily (way more than just auto increments: more on this in the link).
For example this:
const (
c0 = 3 + 5 * iota
c1
c2
)
will create constants c0 = 3 (3 + 5 * 0), c1 = 8 (3 + 5 * 1) and c2 = 13 (3 + 5 * 2).
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
(Edit: What is Code Golf: Code Golf are challenges to solve a specific problem with the shortest amount of code by character count in whichever language you prefer. More info here on Meta StackOverflow. )
Code Golfers, here's a challenge on string operations.
Email Address Validation, but without regular expressions (or similar parsing library) of course. It's not so much about the email addresses but how short you can write the different string operations and constraints given below.
The rules are the following (yes, I know, this is not RFC compliant, but these are going to be the 5 rules for this challenge):
At least 1 character out of this group before the #:
A-Z, a-z, 0-9, . (period), _ (underscore)
# has to exist, exactly one time
john#smith.com
^
Period (.) has to exist exactly one time after the #
john#smith.com
^
At least 1 only [A-Z, a-z] character between # and the following . (period)
john#s.com
^
At least 2 only [A-Z, a-z] characters after the final . period
john#smith.ab
^^
Please post the method/function only, which would take a string (proposed email address) and then return a Boolean result (true/false) depending on the email address being valid (true) or invalid (false).
Samples:
b#w.org (valid/true) #w.org (invalid/false)
b#c#d.org (invalid/false) test#org (invalid/false)
test#%.org (invalid/false) s%p#m.org (invalid/false)
j_r#x.c.il (invalid/false) j_r#x.mil (valid/true)
r..t#x.tw (valid/true) foo#a%.com (invalid/false)
Good luck!
C89 (166 characters)
#define B(c)isalnum(c)|c==46|c==95
#define C(x)if(!v|*i++-x)return!1;
#define D(x)for(v=0;x(*i);++i)++v;
v;e(char*i){D(B)C(64)D(isalpha)C(46)D(isalpha)return!*i&v>1;}
Not re-entrant, but can be run multiple times. Test bed:
#include<stdio.h>
#include<assert.h>
main(){
assert(e("b#w.org"));
assert(e("r..t#x.tw"));
assert(e("j_r#x.mil"));
assert(!e("b#c#d.org"));
assert(!e("test#%.org"));
assert(!e("j_r#x.c.il"));
assert(!e("#w.org"));
assert(!e("test#org"));
assert(!e("s%p#m.org"));
assert(!e("foo#a%.com"));
puts("success!");
}
J
:[[/%^(:[[+-/^,&i|:[$[' ']^j+0__:k<3:]]
C89, 175 characters.
#define G &&*((a+=t+1)-1)==
#define H (t=strspn(a,A
t;e(char*a){char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;return H))G 64&&H+12))G 46&&H+12))>1 G 0;}
I am using the standard library function strspn(), so I feel this answer isn't as "clean" as strager's answer which does without any library functions. (I also stole his idea of declaring a global variable without a type!)
One of the tricks here is that by putting . and _ at the start of the string A, it's possible to include or exclude them easily in a strspn() test: when you want to allow them, use strspn(something, A); when you don't, use strspn(something, A+12). Another is assuming that sizeof (short) == 2 * sizeof (char), and building up the array of valid characters 2 at a time from the "seed" pair Aa. The rest was just looking for a way to force subexpressions to look similar enough that they could be pulled out into #defined macros.
To make this code more "portable" (heh :-P) you can change the array-building code from
char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;
to
char*A="_.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
for a cost of 5 additional characters.
Python (181 characters including newlines)
def v(E):
import string as t;a=t.ascii_letters;e=a+"1234567890_.";t=e,e,"#",e,".",a,a,a,a,a,"",a
for c in E:
if c in t[0]:t=t[2:]
elif not c in t[1]:return 0>1
return""==t[0]
Basically just a state machine using obfuscatingly short variable names.
C (166 characters)
#define F(t,u)for(r=s;t=(*s-64?*s-46?isalpha(*s)?3:isdigit(*s)|*s==95?4:0:2:1);++s);if(s-r-1 u)return 0;
V(char*s){char*r;F(2<,<0)F(1=)F(3=,<0)F(2=)F(3=,<1)return 1;}
The single newline is required, and I've counted it as one character.
Python, 149 chars (after putting the whole for loop into one semicolon-separated line, which I haven't done here for "readability" purposes):
def v(s,t=0,o=1):
for c in s:
k=c=="#"
p=c=="."
A=c.isalnum()|p|(c=="_")
L=c.isalpha()
o&=[A,k|A,L,L|p,L,L,L][t]
t+=[1,k,1,p,1,1,0][t]
return(t>5)&o
Test cases, borrowed from strager's answer:
assert v("b#w.org")
assert v("r..t#x.tw")
assert v("j_r#x.mil")
assert not v("b#c#d.org")
assert not v("test#%.org")
assert not v("j_r#x.c.il")
assert not v("#w.org")
assert not v("test#org")
assert not v("s%p#m.org")
assert not v("foo#a%.com")
print "Yeah!"
Explanation: When iterating over the string, two variables keep getting updated.
t keeps the current state:
t = 0: We're at the beginning.
t = 1: We where at the beginning and have found at least one legal character (letter, number, underscore, period)
t = 2: We have found the "#"
t = 3: We have found at least on legal character (i.e. letter) after the "#"
t = 4: We have found the period in the domain name
t = 5: We have found one legal character (letter) after the period
t = 6: We have found at least two legal characters after the period
o as in "okay" starts as 1, i.e. true, and is set to 0 as soon as a character is found that is illegal in the current state.
Legal characters are:
In state 0: letter, number, underscore, period (change state to 1 in any case)
In state 1: letter, number, underscore, period, at-sign (change state to 2 if "#" is found)
In state 2: letter (change state to 3)
In state 3: letter, period (change state to 4 if period found)
In states 4 thru 6: letter (increment state when in 4 or 5)
When we have gone all the way through the string, we return whether t==6 (t>5 is one char less) and o is 1.
Whatever version of C++ MSVC2008 supports.
Here's my humble submission. Now I know why they told me never to do the things I did in here:
#define N return 0
#define I(x) &&*x!='.'&&*x!='_'
bool p(char*a) {
if(!isalnum(a[0])I(a))N;
char*p=a,*b=0,*c=0;
for(int d=0,e=0;*p;p++){
if(*p=='#'){d++;b=p;}
else if(*p=='.'){if(d){e++;c=p;}}
else if(!isalnum(*p)I(p))N;
if (d>1||e>1)N;
}
if(b>c||b+1>=c||c+2>=p)N;
return 1;
}
Not the greatest solution no doubt, and pretty darn verbose, but it is valid.
Fixed (All test cases pass now)
static bool ValidateEmail(string email)
{
var numbers = "1234567890";
var uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
var lowercase = uppercase.ToLower();
var arUppercase = uppercase.ToCharArray();
var arLowercase = lowercase.ToCharArray();
var arNumbers = numbers.ToCharArray();
var atPieces = email.Split(new string[] { "#"}, StringSplitOptions.RemoveEmptyEntries);
if (atPieces.Length != 2)
return false;
foreach (var c in atPieces[0])
{
if (!(arNumbers.Contains(c) || arLowercase.Contains(c) || arUppercase.Contains(c) || c == '.' || c == '_'))
return false;
}
if(!atPieces[1].Contains("."))
return false;
var dotPieces = atPieces[1].Split('.');
if (dotPieces.Length != 2)
return false;
foreach (var c in dotPieces[0])
{
if (!(arLowercase.Contains(c) || arUppercase.Contains(c)))
return false;
}
var found = 0;
foreach (var c in dotPieces[1])
{
if ((arLowercase.Contains(c) || arUppercase.Contains(c)))
found++;
else
return false;
}
return found >= 2;
}
C89 character set agnostic (262 characters)
#include <stdio.h>
/* the 'const ' qualifiers should be removed when */
/* counting characters: I don't like warnings :) */
/* also the 'int ' should not be counted. */
/* it needs only 2 spaces (after the returns), should be only 2 lines */
/* that's a total of 262 characters (1 newline, 2 spaces) */
/* code golf starts here */
#include<string.h>
int v(const char*e){
const char*s="0123456789._abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(e=strpbrk(e,s))
if(e=strchr(e+1,'#'))
if(!strchr(e+1,'#'))
if(e=strpbrk(e+1,s+12))
if(e=strchr(e+1,'.'))
if(!strchr(e+1,'.'))
if(strlen(e+1)>1)
return 1;
return 0;
}
/* code golf ends here */
int main(void) {
const char *t;
t = "b#w.org"; printf("%s ==> %d\n", t, v(t));
t = "r..t#x.tw"; printf("%s ==> %d\n", t, v(t));
t = "j_r#x.mil"; printf("%s ==> %d\n", t, v(t));
t = "b#c#d.org"; printf("%s ==> %d\n", t, v(t));
t = "test#%.org"; printf("%s ==> %d\n", t, v(t));
t = "j_r#x.c.il"; printf("%s ==> %d\n", t, v(t));
t = "#w.org"; printf("%s ==> %d\n", t, v(t));
t = "test#org"; printf("%s ==> %d\n", t, v(t));
t = "s%p#m.org"; printf("%s ==> %d\n", t, v(t));
t = "foo#a%.com"; printf("%s ==> %d\n", t, v(t));
return 0;
}
Version 2
Still C89 character set agnostic, bugs hopefully corrected (303 chars; 284 without the #include)
#include<string.h>
#define Y strchr
#define X{while(Y
v(char*e){char*s="0123456789_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(*e!='#')X(s,*e))e++;if(*e++=='#'&&!Y(e,'#')&&Y(e+1,'.'))X(s+12,*e))e++;if(*e++=='.'
&&!Y(e,'.')&&strlen(e)>1){while(*e&&Y(s+12,*e++));if(!*e)return 1;}}}return 0;}
That #define X is absolutely disgusting!
Test as for my first (buggy) version.
VBA/VB6 - 484 chars
Explicit off
usage: VE("b#w.org")
Function V(S, C)
V = True
For I = 1 To Len(S)
If InStr(C, Mid(S, I, 1)) = 0 Then
V = False: Exit For
End If
Next
End Function
Function VE(E)
VE = False
C1 = "abcdefghijklmnopqrstuvwxyzABCDEFGHILKLMNOPQRSTUVWXYZ"
C2 = "0123456789._"
P = Split(E, "#")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1 & C2) Then GoTo X
E = P(1): P = Split(E, ".")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1) Or Len(P(1)) < 2 Or Not V(P(1), C1) Then GoTo X
VE = True
X:
End Function
Java: 257 chars (not including the 3 end of lines for readability ;-)).
boolean q(char[]s){int a=0,b=0,c=0,d=0,e=0,f=0,g,y=-99;for(int i:s)
d=(g="#._0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm".indexOf(i))<0?
y:g<1&&++e>0&(b<1|++a>1)?y:g==1&e>0&(c<1||f++>0)?y:++b>0&g>12?f>0?d+1:f<1&e>0&&++c>0?
d:d:d;return d>1;}
Passes all the tests (my older version was incorrect).
Erlang 266 chars:
-module(cg_email).
-export([test/0]).
%%% golf code begin %%%
-define(E,when X>=$a,X=<$z;X>=$A,X=<$Z).
-define(I(Y,Z),Y([X|L])?E->Z(L);Y(_)->false).
-define(L(Y,Z),Y([X|L])?E;X>=$0,X=<$9;X=:=$.;X=:=$_->Z(L);Y(_)->false).
?L(e,m).
m([$#|L])->a(L);?L(m,m).
?I(a,i).
i([$.|L])->l(L);?I(i,i).
?I(l,c).
?I(c,g).
g([])->true;?I(g,g).
%%% golf code end %%%
test() ->
true = e("b#w.org"),
false = e("b#c#d.org"),
false = e("test#%.org"),
false = e("j_r#x.c.il"),
true = e("r..t#x.tw"),
false = e("test#org"),
false = e("s%p#m.org"),
true = e("j_r#x.mil"),
false = e("foo#a%.com"),
ok.
Ruby, 225 chars.
This is my first Ruby program, so it's probably not very Ruby-like :-)
def v z;r=!a=b=c=d=e=f=0;z.chars{|x|case x when'#';r||=b<1||!e;e=!1 when'.'
e ?b+=1:(a+=1;f=e);r||=a>1||(c<1&&!e)when'0'..'9';b+=1;r|=!e when'A'..'Z','a'..'z'
e ?b+=1:f ?c+=1:d+=1;else r=1 if x!='_'||!e|!b+=1;end};!r&&d>1 end
'Using no regex':
PHP 47 Chars.
<?=filter_var($argv[1],FILTER_VALIDATE_EMAIL);
Haskell (GHC 6.8.2), 165 161 144C Characters
Using pattern matching, elem, span and all:
a=['A'..'Z']++['a'..'z']
e=f.span(`elem`"._0123456789"++a)
f(_:_,'#':d)=g$span(`elem`a)d
f _=False
g(_:_,'.':t#(_:_:_))=all(`elem`a)t
g _=False
The above was tested with the following code:
main :: IO ()
main = print $ and [
e "b#w.org",
e "r..t#x.tw",
e "j_r#x.mil",
not $ e "b#c#d.org",
not $ e "test#%.org",
not $ e "j_r#x.c.il",
not $ e "#w.org",
not $ e "test#org",
not $ e "s%p#m.org",
not $ e "foo#a%.com"
]