Determine country from IP - IPv6 - validation

In my project, I have a function in postgres (plpgsql) that determines country from a given ip address:
CREATE OR REPLACE FUNCTION get_country_for_ip(character varying)
RETURNS character varying AS
$BODY$
declare
ip ALIAS for $1;
ccode varchar;
cparts varchar[];
nparts bigint[];
addr bigint;
begin
cparts := string_to_array(ip, '.');
if array_upper(cparts, 1) <> 4 then
raise exception 'gcfi01: Invalid IP address: %', ip;
end if;
nparts := array[a2i(cparts[1])::bigint, a2i(cparts[2])::bigint, a2i(cparts[3])::bigint, a2i(cparts[4])::bigint];
if(nparts[1] is null or nparts[1] < 0 or nparts[1] > 255 or
nparts[2] is null or nparts[2] < 0 or nparts[2] > 255 or
nparts[3] is null or nparts[3] < 0 or nparts[3] > 255 or
nparts[4] is null or nparts[4] < 0 or nparts[4] > 255) then
raise exception 'gcfi02: Invalid IP address: %', ip;
end if;
addr := (nparts[1] << 24) | (nparts[2] << 16) | (nparts[3] << 8) | nparts[4];
addr := nparts[1] * 256 * 65536 + nparts[2] * 65536 + nparts[3] * 256 + nparts[4];
select into ccode t_country_code from ip_to_country where addr between n_from and n_to limit 1;
if ccode is null then
ccode := '';
end if;
return ccode;
end;$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
This may not be the most efficient, but it does the job. Note that it uses an internal table (ip_to_country), which contains data as below (the numbers n_from and n_to are the long values of the start and end of address ranges:
n_from | n_to | t_country_code
----------+----------+----------------
0 | 16777215 | ZZ
16777216 | 16777471 | AU
...
Now we are starting to look at the IPv6 addressing as well - and I need to add similar functionality for IPv6 addresses. I have a similar set of data for IPv6, which looks like this:
t_start | t_end | t_country_code
-------------+-----------------------------------------+----------------
:: | ff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | ZZ
100:: | 1ff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | ZZ
...
2000:: | 2000:ffff:ffff:ffff:ffff:ffff:ffff:ffff | ZZ
...
2001:1200:: | 2001:1200:ffff:ffff:ffff:ffff:ffff:ffff | MX
...
Now, given an IP address ::1, how do I (1) check that it's a valid IPv6 address and (2) get the corresponding country mapping?

I believe I found the solution. It involves modifying the data first and then some massaging of the input. Here's what worked.
First, the data needs to be converted so that all addresses are full, without shortening, with semicolon separators removed. The sample data shown in my question is converted to:
t_start | t_end | t_country_code
----------------------------------+----------------------------------+----------------
00000000000000000000000000000000 | 00ffffffffffffffffffffffffffffff | ZZ
01000000000000000000000000000000 | 01ffffffffffffffffffffffffffffff | ZZ
...
20000000000000000000000000000000 | 2000ffffffffffffffffffffffffffff | ZZ
...
20011200000000000000000000000000 | 20011200ffffffffffffffffffffffff | MX
...
This is what is stored in the database.
The next step was to convert the IP address received in the code to be in the same format. This is done in PHP with the following code (assume that $ip_address is the incoming IPv6 address):
$addr_bin = inet_pton($ip_address);
$bytes = unpack('n*', $addr_bin);
$ip_address = implode('', array_map(function ($b) {return sprintf("%04x", $b); }, $bytes));
Now variable $ip_adress wil contain the full IPv6 address, for example
:: => 00000000000000000000000000000000
2001:1200::ab => 200112000000000000000000000000ab
and so on.
Now you can simply compare this full address with the ranges in the database. I added a second function to the database to deal with IPv6 addresses, which looks like this:
CREATE OR REPLACE FUNCTION get_country_for_ipv6(character varying)
RETURNS character varying AS
$BODY$
declare
ip ALIAS for $1;
ccode varchar;
begin
select into ccode t_country_code from ipv6_to_country where addr between n_from and n_to limit 1;
if ccode is null then
ccode := '';
end if;
return ccode;
end;$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Finally, in my php code I added the code that calls one or the other Postgres function based on the input ip_address.

First, I see a couple things you are doing that will pose problems. The first is this use of varchar and long to represent IP addresses when PostgreSQL has perfectly valid INET and CIDR types that will do what you want only better and faster. Note these do not support GIN indexing properly at present so you can't do exclude constraints on them. If you need that, look at the ip4r extension which does support this.
Note as a patch for now you can cast your varchar to inet. Inet also supports both ipv4 and ipv6 addresses as does cidr, and similar types exist on ip4r.
This will solve the ipv6 validation issue for you, and likely cut down on your storage as well as provide better operational checks and better performance.
As for countries, I am also thinking that the mappings may not be so straight-forward.

Related

Will changing the ProtoBuffer Varint type from a bool type to an enum type representing all bit-mask values be forward compatible?

I want to make the following ProtoBuffer message to be forward compatible.
The current Storage message defines a state field as a bool type:
message Storage {
bool state = 1;
}
In the Protobuffer encoding, it encodes the Varint types like the bool and the enum type in the following format:
|1-bit sequence number|4-bit serial number|3-bit data type|n-bit payload|
For the Varint type, the data type value will become 000:
|X|XXXX|000|XXXX...|
Since the Storage message structure only contains one field with a serial number of 1, the sequence number will become 0 as the serial number hasn't been resolved to the last byte. Hence, the above format will become:
|0|0001|000|XXXX...|
Now, if set Storage.state = 0, it will be stored as follows:
|0|0001|000|<0 will not be encoded>
The Protobuffer value for the Storage message will become 0x8.
if set Storage.state = 1, it will be stored as follows:
|0|0001|000|00000001|
The Protobuffer value for the Storage message will become 0x8 0x1.
Now, I want to change the above Storage.state definition from the bool type to an enum type as follows:
// BIT7 | BIT6 | BIT5 | BIT4 | BIT3 | BIT2 | BIT1 | BIT0 |
//-------------------------------------------------------
// 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = STATE0 (0)
// 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = STATE1 (1)
// 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | = STATE2 (2)
// 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | = STATE2 (3)
//... so go on
// 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | = STATE2 (254)
// 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = STATE2 (255)
enum State {
STATE0 = 0;
STATE1 = 1;
STATE2 = 2;
STATE3 = 3;
//... so go on
STATE254 = 254;
STATE255 = 255;
}
message Storage {
State state = 1
}
So now, in Protobuf encoding,
if set Storage.state = State.STATE0, it will be stored as follows:
|0|0001|000|<0 will not be encoded>
The Protobuffer value for the Storage message will become 0x8.
if set Storage.state = State.STATE1, it will be stored as follows:
|0|0001|000|00000001|
The Protobuffer value for the Storage message will become 0x8 0x1.
if set Storage.state = State.STATE2, it will be stored as follows:
|0|0001|000|00000010|
The Protobuffer value for the Storage message will become 0x8 0x2.
if set Storage.state = State.STATE255, it will be stored as follows:
|0|0001|000|11111111|
The Protobuffer value for the Storage message will become 0x8 0xFF.
Will this change still be forward compatible for proto2 and proto3 and in C and Java?
I based my question on the reference below:
google protocol buffer -- the coding principle of protobuf II
I'm assuming that what you're actually trying to store here is: the bitwise state values - what might be a [Flags] enum in C# (mentioned purely to set context).
Honestly, declaring an enum with a value per bit combination: isn't a good idea; it will escalate very quickly, and it isn't intuitive to use. It also leaves potential for silly errors when copy/pasting large volumes of lines...
// omitted... 212 lines - but would you spot the error?
STATE213 = 213;
STATE214 = 214;
STATE215 = 214;
STATE216 = 216;
STATE217 = 217;
// ... etc
(OK, that specific error requires the allow-alias flag, but: you get the point)
In proto2, enums are expected to be recognised; when unexpected enum values are encountered, it gets a bit... hazy, with any of:
parse failure
treated as an unknown field (needing to be accessed via a separate API)
silently handled and parsed via the integer value (which has the effect of preserving bit flags)
Since every flag combination will not have an enum definition, what you want here is option 3, but that isn't guaranteed in all implementations.
In proto3, the framework leans as far in the direction of 3 as possible, explicitly in the language specification, with the integer value being stored and retrieved (which has the effect of preserving bit flags) but it is also explicitly called out that some platforms do not allow open enums types - for example, Java.
Because of this limitation, since you mention java in the tags, I would recommend simply using an integer directly. It will at least work similarly on all implementations. By comparison to your proposed solution, it is at least as usable - but usually a lot more usable; consider how it works as an enum:
obj.state = State.State217;
vs as an integer:
obj.state = 217;
This will also allow bitwise combination/test/etc operations to be used re values, which isn't the case for closed enum types.
As for whether bool, enum and int32/uint32/sint32 (and the 64-bit counterparts) are technically interchangeable (scale permitting): yes; they're all encoded as varint.

Where is the canonical specification for proto3 that allows JavaScript-like object assignment to an option?

In the Protocol Buffers Version 3 Language Specification
The EBNF syntax for an option is
option = "option" optionName "=" constant ";"
optionName = ( ident | "(" fullIdent ")" ) { "." ident }
constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ] floatLit ) | strLit | boolLit
ident = letter { letter | decimalDigit | "_" }
fullIdent = ident { "." ident }
strLit = ( "'" { charValue } "'" ) | ( '"' { charValue } '"' )
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
hexEscape = '\' ( "x" | "X" ) hexDigit hexDigit
octEscape = '\' octalDigit octalDigit octalDigit
charEscape = '\' ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | '\' | "'" | '"' )
Or in plain English, an option may be assigned a dotted.notation.identifier, an integer, a float, a boolean, or a single- or double-quoted string, which MUST NOT have "raw" newline characters.
And yet, I'm encountering .proto files in various projects such as grpc-gateway and googleapis, where the rhs of the assignment is not quoted and spans multiple lines. For example in googleapis/google/api/http.proto there is this service definition in a comment block:
// service Messaging {
// rpc UpdateMessage(Message) returns (Message) {
// option (google.api.http) = {
// patch: "/v1/messages/{message_id}"
// body: "*"
// };
// }
// }
In other files, the use of semicolons (and occasionally commas) as separators seems somewhat arbitrary, and I have also seen keys repeated, which in JSON or JavaScript would result in loss of data due to overwriting.
Are there any canonical extensions to the language specification, or are people just Microsofting? (Yes, that's a verb now.)
I posted a similar question on the Protocol Buffers Google Group, and received a private message from a fellow at Google stating the following
This syntax is correct and valid for setting fields on a proto option field which is itself a field referencing a message type. This form is based on the TextFormat spec which I'm unclear if its super well documented, but here's an implementation of it: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
When I have time, I will try to unpack what I learn from analyzing TextFormat.
update
I received an answer on the Groups forum
I think for better or worse, "what protoc implements" takes precedence over whatever the spec says. The spec came later and as far as I know we have not put a lot of effort into ensuring that it comprehensively matches the format that protoc expects. I believe the syntax you are looking at is missing from the .proto file format spec but is mentioned here as the "aggregate syntax."
The link above is to a section titled Custom Options in the Language Guide (proto2) page. If you scroll all the way to the end of that section, there is the following snippet that mentions TextFormat:
message FooOptions {
optional int32 opt1 = 1;
optional string opt2 = 2;
}
extend google.protobuf.FieldOptions {
optional FooOptions foo_options = 1234;
}
// usage:
message Bar {
optional int32 a = 1 [(foo_options).opt1 = 123, (foo_options).opt2 = "baz"];
// alternative aggregate syntax (uses TextFormat):
optional int32 b = 2 [(foo_options) = { opt1: 123 opt2: "baz" }];
}

Convert connect address with address familiy AF_SYSTEM to human readable string

Background
I'm writing some dtrace program which tracks application socket file descriptors. Aim is to provide logs which help me spot leak of file descriptors in some very complex OS X application.
Here is my other question with very helpful answer.
Problem
I want that my program is logging address to which file descriptor has been connected to. In examples there is a code which partial do what I need: soconnect_mac.d, here is link to github.
soconnect_mac.d works great when applied on Firefox, but it completely fails in case of my application. Quick investigation shown that soconnect_mac.d is able to interpret only AF_INET (value 2) family address and som library used by my application is using AF_SYSTEM (value 32) family address.
I can't find anything which could help me convert received address to something what is human readable.
So far I've got this:
#!/usr/sbin/dtrace -s
inline int af_inet = 2 ; /* AF_INET defined in Kernel/sys/socket.h */
inline int af_inet6 = 30; /* AF_INET6 defined in Kernel/sys/socket.h */
inline int af_system = 32; /* AF_SYSTEM defined in Kernel/sys/socket.h */
… // some stuff
syscall::connect:entry
/pid == $target && isOpened[pid, arg0] == 1/
{
/* assume this is sockaddr_in until we can examine family */
this->s = (struct sockaddr_in *)copyin(arg1, arg2);
this->f = this->s->sin_family;
self->fileDescriptor = arg0;
}
/* this section is copied with pride from "soconnect_mac.d" */
syscall::connect:entry
/this->f == af_inet/
{
/* Convert port to host byte order without ntohs() being available. */
self->port = (this->s->sin_port & 0xFF00) >> 8;
self->port |= (this->s->sin_port & 0xFF) << 8;
/*
* Convert an IPv4 address into a dotted quad decimal string.
* Until the inet_ntoa() functions are available from DTrace, this is
* converted using the existing strjoin() and lltostr(). It's done in
* two parts to avoid exhausting DTrace registers in one line of code.
*/
this->a = (uint8_t *)&this->s->sin_addr;
this->addr1 = strjoin(lltostr(this->a[0] + 0ULL),
strjoin(".",
strjoin(lltostr(this->a[1] + 0ULL),
".")));
this->addr2 = strjoin(lltostr(this->a[2] + 0ULL),
strjoin(".",
lltostr(this->a[3] + 0ULL)));
self->address = strjoin(this->addr1, this->addr2);
}
/* this section is my */
syscall::connect:entry
/this->f == af_system/
{
/* TODO: Problem how to handle AF_SYSTEM address family */
/* Convert port to host byte order without ntohs() being available. */
self->port = (this->s->sin_port & 0xFF00) >> 8;
self->port |= (this->s->sin_port & 0xFF) << 8; // this also doen't work as it should
self->address = "system family address needed here";
}
// a fallback
syscall::connect:entry
/this->f && this->f != af_inet && this->f != af_system/
{
/* Convert port to host byte order without ntohs() being available. */
self->port = (this->s->sin_port & 0xFF00) >> 8;
self->port |= (this->s->sin_port & 0xFF) << 8;
self->address = strjoin("Can't handle family: ", lltostr(this->f));
}
syscall::connect:return
/self->fileDescriptor/
{
this->errstr = err[errno] != NULL ? err[errno] : lltostr(errno);
printf("%Y.%03d FD:%d Status:%s Address:%s Port:%d",
walltimestamp, walltimestamp % 1000000000 / 1000000,
self->fileDescriptor, this->errstr, self->address, self->port);
self->fileDescriptor = 0;
self->address = 0;
self->port = 0;
}
What is even more annoying my code has failed to read port number (I get 512 value instead one of this: 443, 8443, 5061).
IMO problem is first syscall::connect:entry where it is assumed that second argument can be treated as struct sockaddr_in. I'm guessing struct sockaddr_storage should be used in case of AF_SYSTEM address family, but I didn't found any documentation or source code which proves this in direct way.
My section with this->f == af_system condition properly catches events from application I'm investigating.

Aerospike Query Return Highest Value

I'm trying to create a query for my Aerospike database, that would return the highest value in a specific bin; similar to the way that the MAX() function works in MySQL. For example, if I had a set like this:
+--------------+---------+
| filename | version |
+--------------+---------+
| alphabet.doc | 4 |
| people.doc | 2 |
| alphabet.doc | 6 |
| people.doc | 3 |
+--------------+---------+
What I need is to only return the filename with the highest version number. At the moment I can add a filter like this:
stmt := db.NewStatement(DBns, DBset, "filename", "version")
stmt.Addfilter(db.NewEqualFilter("filename", "alphabet.doc"))
// run database query
records := runQuery(stmt)
Anyone know how to do this?
You can apply a Lua user-defined function (UDF) to the query to filter the results efficiently.
E.g. here is a Stream UDF that would return the record with the max. version number:
function maxVersion(stream, bin)
-- The stream function cannot return record objects directly,
-- so we have to map to a Map data type first.
local function toArray(rec)
local result = map()
result['filename'] = rec['filename']
result['version'] = rec['version']
return result
end
local function findMax(a, b)
if a.version > b.version then
return a
else
return b
end
end
return stream : map(toArray) : reduce(findMax)
end
Using the Go client you would execute the function like this:
stmt := NewStatement(ns, set)
recordset, _ := client.QueryAggregate(nil, stmt, "udfFilter", "maxVersion")
for rec := range recordset.Results() {
res := rec.Record.Bins["SUCCESS"].(map[interface{}]interface{})
fmt.Printf("filename with max. version: %s (ver. %d)\n", res["filename"], res["version"])
}
I've uploaded a fully working example as a Gist here: https://gist.github.com/jhecking/b98783bea7564d610ea291b5ac47808c
You can find more information about how to work with Stream UDFs for query aggregation here: http://www.aerospike.com/docs/guide/aggregation.html

Loop in records in datatable at forumla field in crystal report

I have datatable attached to my Crystal Report with the following structure
TypeId
TypeName
I want to display TypeName in GroupHeaderSection based on condition
For example
if TypeId = 1 then display hans
if TypeId = 2 then display MNHS
I tried the following formula to display records from this datatable
WhilePrintingRecords;
Local NumberVar result := -1;
Local NumberVar i := 1;
Local StringVar inString := "";
While i <= 5 And result = -1 Do
(
// inString := IIF({DTPMS_RptLocationTr.LocationTypeId} = 1,{DTPMS_RptLocationTr.LocationTypeName},"")
If {DTPMS_RptLocationTr.LocationTypeId} = 5 Then
inString := {DTPMS_RptLocationTr.LocationTypeName};
i := i + 1;
);
inString
Any suggestion on how to solve this
I found how to solve my issue.
First i changed the way records returned from database I returned the data like that
Type1 | Type2 | Type3
======================
hans | MNHS | nhues
so now i can bind data directly from datatable to report header

Resources