typedef with range clamping - go

In Go we can say:
type Month int
to create a new type based off of int.
Is it possible to also say that the range of values allowed for this type is 1 - 12, and to guarantee that no value < 1 or > 12 can be assigned?

No, you cannot put limits on an int whether you define it as a custom type or not. The closest you can get is something like the following code using a construct called iota
type Month int
const (
Jan Month = iota + 1
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
)
func main() {
fmt.Println(Jan, Feb, Mar)
}
This will print 1 2 3. There are a lot more uses for it, you can look up more information here https://splice.com/blog/iota-elegant-constants-golang/
This does not guarantee that you cannot assign random values to the resulting variable, but as long as you use the defined constants everywhere you should be fine.

You can limit access to a single package by using an unexported variable to store the value. For example,
package date
type Month struct {
month int
}

Related

Understanding golang date formatting for time package

So I have the function performing well.
func Today()(result string){
current_time := time.Now().Local()
result = current_time.Format("01/02/2006")
return
}
Prints MM/DD/YYYY And I thought that it would be more readable if I had a value greater than 12 in the days position to make it clear that it was MM/DD/YYYY so I changed the to following
func Today()(result string){
current_time := time.Now().Local()
result = current_time.Format("01/23/2004")
return
}
Which to my chagrin caused bad results. Prints MM/DDHH/DD0MM
Realizing my mistake I see that the format is defined by the reference time...
Mon Jan 2 15:04:05 -0700 MST 2006
I'm wondering if there is any other instances this moment being used as a formatting reference for date times, and if this reference moment has a nickname (like null island)?
The values in a date string are not arbitrary. You can't just change 02 to 03 and expect it to work. The date formatter looks for those specific values, and knows that 1 means month, 2 means day of month, etc.
Changing 01/02/2006 to 01/23/2004 is like changing a human-readable form that says First Name: ______ Last Name: ______ to one that says First Name: ______ Ice Cream: ______. You can't expect anyone to know that Ice Cream should mean Last Name.
The name
The only name provided for this is "reference time", here:
Parse parses a formatted string and returns the time value it represents. The layout defines the format by showing how the reference time, defined to be
Mon Jan 2 15:04:05 -0700 MST 2006
and here:
These are predefined layouts for use in Time.Format and Time.Parse. The reference time used in the layouts is the specific time:
Mon Jan 2 15:04:05 MST 2006
which is Unix time 1136239445. Since MST is GMT-0700, the reference time can be thought of as
01/02 03:04:05PM '06 -0700
To define your own format, write down what the reference time would look like formatted your way; see the values of constants like ANSIC, StampMicro or Kitchen for examples. The model is to demonstrate what the reference time looks like so that the Format and Parse methods can apply the same transformation to a general time value.
To specify that you're talking about Go's reference time, I'd say "Go's reference time." Or to be blatantly obvious, "Go's time.Parse reference time."
As an aside, your function can be greatly shortened:
func Today() string {
return time.Now().Local().Format("01/02/2006")
}

Generate a custom date range

How I can generate a date range, reject some days for example sundays or some holidays and extend the range with a next available day? Obviously I can do something like (Date.today..Date.today+5.days).reject{|day| day.sunday?} but this would remove sunday and make my range smaller. How can I solve this? Should I implement a custom Range class?
That is impossible in general. A range has to be continuous. Unless the date you want to reject is at either end of the original range, that is impossible.
However, by converting the range to an array, you can do a similar thing:
(Date.today..Date.today+5.days).to_a.reject(&:sunday?)
This cannot be done with a Range as #sawa already pointed out.
I think you need to use an array filled with qualified days:
def working_days(number)
[].tap do |days|
date = Date.today
while days.size < number
days << date unless date.sunday? || date.saturday?
date = date.next
end
end
end
working_days(5)
#=> [02 Dec 2015, 03 Dec 2015, 04 Dec 2015, 07 Dec 2015, 08 Dec 2015]

is there a way to iterate over constant used as enum

I am trying to use enum in golang as below. I am struggling to find a easy way to iterate over the list of constant values. What are common practice in golang to iterate over constant values used as enum. Thanks!
type DayOfWeek int
const(
Monday DayOfWeek = iota
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
)
In Java, we can iterate as below.
public enum DayOfWeek {
MONDAY,
TUESDAY,
WEDNESDAY,
THURSDAY,
FRIDAY,
SATURDAY,
SUNDAY
}
for (DayOfWeek day: DayOfWeek.values()) {
// code logic
}
There is no direct way to enumerate the values/instances of named type at runtime, whether variables or constants, unless you specifically define a slice that lists them. This is left up to the definer or the user of the enumeration type.
package main
import (
"fmt"
"time"
)
var Weekdays = []time.Weekday{
time.Sunday,
time.Monday,
time.Tuesday,
time.Wednesday,
time.Thursday,
time.Friday,
time.Saturday,
}
func main() {
for _, day := range Weekdays {
fmt.Println(day)
}
}
In order be able to generate this list dynamically at runtime, e.g. via reflection, the linker would have to retain all the symbols defined in all packages, like Java does. The golang-nuts group discussed this, regarding names and functions exported from a package, a superset of package constant definitions. https://groups.google.com/forum/#!topic/golang-nuts/M0ORoEU115o
It would be possible for the language to include syntactic sugar for generating this list at compile time if and only if it were referenced by the program. What should the iteration order be, though? If your week starts on Monday the list I defined is not very helpful; you will have to define your own slice to range through the days from Monday to Sunday.
You can do that without reflection.
First execute the Go tools Stringer at compile time using go generate. This creates a file [filename]_string.go which contains a map _[structname]_map of enum values referencing enum variable names as strings. This map is private, so simply assign it to a public map upon package initialization.
var EnumMap map[Enum]string
func init() {
EnumMap = _Enum_map
}
type Enum uint
//go:generate go run golang.org/x/tools/cmd/stringer -type=Enum
const (
One Enum = iota
Two
)
Then you can simply loop over the keys of the map.
The comment from #davec was great. This works perfect when you have a count that increments by one.
You could either do a simple loop such as for d := Monday; d <= Sunday; d++ {}
I had constant that jumped in bits (1,2,4,8,16 etc):
const (
Approved = 1 << iota
AlreadyApproved
NotApproved
OldTicket
Unknown
)
I avoided range and did a left shift one to move through my constant:
var score Bits
score = Set(score, AlreadyApproved)
for i := Approved; i < Unknown; i = i << 1 {
fmt.Println(i)
}
Output:
1
2
4
8
16
Using stringer is preferable it can help you keep your codebase up to date using code generators. Unfortunately stringer does not always generate the map.
For anyone interested in keep using go generators for this purpose I wrote a small code generator called enumall. It produces a file for each provided type with variable holding all values for given type.
Use it by adding code generator comment to your code like this:
//go:generate go run github.com/tomaspavlic/enumall#latest -type=Season
type Season uint8
const (
Spring Season = 1 << iota
Summer
Autumn
Winter
)
You can find more information here: https://github.com/tomaspavlic/enumall

Compact data structure for storing parsed log lines in Go (i.e. compact data structure for multiple enums in Go)

I'm working on a script that parses and graph information from a database logfile. Some examples loglines might be:
Tue Dec 2 03:21:09.543 [rsHealthPoll] DBClientCursor::init call() failed
Tue Dec 2 03:21:09.543 [rsHealthPoll] replset info example.com:27017 heartbeat failed, retrying
Thu Nov 20 00:05:13.189 [conn1264369] insert foobar.fs.chunks ninserted:1 keyUpdates:0 locks(micros) w:110298 110ms
Thu Nov 20 00:06:19.136 [conn1263135] update foobar.fs.chunks query: { files_id: ObjectId('54661657b23a225c1e4b00ac'), n: 0 } update: { $set: { data: BinData } } nscanned:1 nupdated:1 keyUpdates:0 locks(micros) w:675 137ms
Thu Nov 20 00:06:19.136 [conn1258266] update foobar.fs.chunks query: { files_id: ObjectId('54661657ae3a22741e0132df'), n: 0 } update: { $set: { data: BinData } } nscanned:1 nupdated:1 keyUpdates:0 locks(micros) w:687 186ms
Thu Nov 20 00:12:14.859 [conn1113639] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1416453003000|74 } } cursorid:7965836327322142721 ntoreturn:0 keyUpdates:0 numYields: 15 locks(micros) r:351042 nreturned:3311 reslen:56307 188ms
Not every logline contains all fields, but some of the fields we parse out include:
Datetime
Query Duration
Name of Thread
Connection Number (e.g. 1234, 532434, 53433)
Logging Level (e.g. Warning, Error, Info, Debug etc.)
Logging Component (e.g. Storage, Journal, Commands, Indexin etc.)
Type of operation (e.g. Query, Insert, Delete etc.)
Namespace
The total logfile can often be fairly large (several hundred MBs up to a coupe of GBs). Currently the script is in Python, and as well as the fields, it's also storing the original raw logline as well as a tokenised version - the resulting memory consumption though is actually several multiples of the original logfile size. Hence, memory consumption is one of the main things I'd like to improve.
For fun/learning, I thought I might try re-doing this in Go, and looking at whether we could use a more compact data structure.
Many of the fields are enumerations (enums) - for some of them the set of values is known in advance (e.g. logging leve, logging component). For others (e.g. name of thread, connection number, namespace), we'll work out the set at runtime as we parse the logfile.
Planned Changes
Firstly, many of these enums are stored as strings. So I'm guessing one improvement will be move to using something like an uint8 to store it, and then either using consts (for the ones we know in advance), or having some kind of mapping table back to the original string (for the ones we work out.) Or are there any other reaosns I'd prefer consts versus some kind of mapping structure?
Secondly, rather than storing the original logline as a string, we can probably store an offset back to the original file on disk.
Questions
Do you see any issues with either of the two planned changes above? Are these a good starting point?
Do you have any other tips/suggestions for optimising the memory consumption of how we store the loglines?
I know for bitmaps, there's things like Roaring Bitmaps (http://roaringbitmap.org/), which are compressed bitmaps which you can still access/modify normally whilst compressed. Apparently the overall term for things like this is succinct data structures.
However, are there any equivalents to roaring bitmaps but for enumerations? Or any other clever way of storing this compactly?
I also thought of bloom filters, and maybe using those to store whether each logline was in a set (i.e. logging level warning, logging level error) - however, it can only be in one of those sets, so I don't know if that makes sense. Also, not sure how to handle the false positives.
Thoughts?
Do you see any issues with either of the two planned changes above? Are these a good starting point?
No problems with either. If the logs are definitely line-delimited you can just store the line number, but it may be more robust to store the byte-offset. The standard io.Reader interface returns the number of bytes read so you can use that to gain the offset.
Do you have any other tips/suggestions for optimising the memory consumption of how we store the loglines?
It depends on what you want to use them for, but once they've been tokenized (and you've got the data you want from the line), why hold onto the line in memory? It's already in the file, and you've now got an offset to look it up again quickly.
are there any equivalents to roaring bitmaps but for enumerations? Or any other clever way of storing this compactly?
I'd tend to just define each enum type as an int, and use iota. Something like:
package main
import (
"fmt"
"time"
)
type LogLevel int
type LogComponent int
type Operation int
const (
Info LogLevel = iota
Warning
Debug
Error
)
const (
Storage LogComponent = iota
Journal
Commands
Indexin
)
const (
Query Operation = iota
Insert
Delete
)
type LogLine struct {
DateTime time.Time
QueryDuration time.Duration
ThreadName string
ConNum uint
Level LogLevel
Comp LogComponent
Op Operation
Namespace string
}
func main() {
l := &LogLine{
time.Now(),
10 * time.Second,
"query1",
1000,
Info,
Journal,
Delete,
"ns1",
}
fmt.Printf("%v\n", l)
}
Produces &{2009-11-10 23:00:00 +0000 UTC 10s query1 1000 0 1 2 ns1}.
Playground
You could pack some of the struct fields, but then you need to define bit-ranges for each field and you lose some open-endedness. For example define LogLevel as the first 2 bits, Component as the next 2 bits etc.
I also thought of bloom filters, and maybe using those to store whether each logline was in a set (i.e. logging level warning, logging level error) - however, it can only be in one of those sets, so I don't know if that makes sense. Also, not sure how to handle the false positives.
For your current example, bloom filters may be overkill. It may be easier to have a []int for each enum, or some other master "index" that keeps track of line-number to (for example) log level relationships. As you said, each log line can only be in one set. In fact, depending on the number of enum fields, it may be easier to use the packed enums as an identifier for something like a map[int][]int.
Set := make(map[int][]int)
Set[int(Delete) << 4 + int(Journal) << 2 + int(Debug)] = []int{7, 45, 900} // Line numbers in this set.
See here for a complete, although hackish example.

GetDateFormat() fails on dates before 1/1/1601

i am trying to format a date using Windows GetDateFormat API function:
nResult = GetDateFormat(
localeId, //0x409 for en-US, or LOCALE_USER_DEFAULT if you're not testing
0, //flags
dt, //a SYSTEMTIME structure
"M/d/yyyy", //the format we require
null, //the output buffer to contain string (null for now while we get the length)
0); //the length of the output buffer (zero while we get the length)
Now we pass it a date/time:
SYSTEMTIME dt;
dt.wYear = 1600;
dt.wMonth = 12;
dt.wDay = 31;
In this case nResult returns zero:
The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:
ERROR_INSUFFICIENT_BUFFER. A supplied buffer size was not large enough, or it was incorrectly set to NULL.
ERROR_INVALID_FLAGS. The values supplied for flags were not valid.
ERROR_INVALID_PARAMETER. Any of the parameter values was invalid.
If, however, i return a date one day later:
SYSTEMTIME dt;
dt.wYear = 1601;
dt.wMonth = 1;
dt.wDay = 1;
Then it works.
What am i doing wrong? How do i format dates?
e.g. the date of the birth of Christ:
12/25/0000
or the date when the universe started:
-10/22/4004 6:00 PM
or the date Caesar died:
-3/15/44
Bonus Reading
Sorting It All Out: GetDateFormat is Gregorian based
GetDateFormatEx function
This is actually a limitation on SystemTime.
...year/month/day/hour/minute/second/milliseconds value since 1 January 1601 00:00:00 UT... to 31 December 30827 23:59:59.999
I spent some time looking up how to get around this limitation, but since GetDateFormat() takes a SystemTime you'll probably have to bite the bullet and write your own format() method.
SYSTEMTIME struct is valid only from year 1601 through 30827, because in Windows machines, is system time counted from elapsed intervals from 1.1.1601 00:00. See
Wikipedia article.

Resources