How to check if a string has a more than n repetitive patterns in GO? - go

I want to check if a string contains repetitive patterns above a threshold .
For example, these two strings both exceed a threshold of 2:
"xyzxyzxyz" // contains "xyz" 3 times in succession
"abxyxyxyns" // contains "xy" 3 times in succession
Does anyone know how this is possible?

Use the "repetitions" modifier.
re := regexp.MustCompile(`(xy){3,}`) // match "xy" 3 or more times
fmt.Println(re.MatchString("abxyxyns")) // false
fmt.Println(re.MatchString("abxyxyxyns")) // true
The available options for the regpexp package's RE2 implementation are documented here:
https://github.com/google/re2/wiki/Syntax

Related

Efficient log parsing in golang

What would be an efficient (performance and readability) of parsing lines in a log file and extracting points of interest?
For example:
*** Time: 2/1/2019 13:51:00
17.965 Pump 10 hose FF price level 1 limit 0.0000 authorise pending (Type 00)
17.965 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
38.791 Pump 10 delivery complete, Hose 1, price 72.9500, level 1, value 100.0000, volume 1.3700, v-total 8650924.3700, m-total 21885705.8800, T13:51:38
Things I need to extract are 10 (for pump 10), Price Level. Limit
The _PSTATE changes the values from the delivery completel line etc.
Currently I'm using a regular expression to capture each one and using capture groups. But it feels inefficient and there is quite a bit of duplication.
For example, I have a bunch of these:
reStateChange := regexp.MustCompile(`^(?P<offset>.*) Pump (?P<pump>\d{2}) State change (?P<oldstate>\w+_PSTATE) to (?P<newstate>\w+)_PSTATE`)
Then inside a while loop
if match := reStateChange.FindStringSubmatch(text); len(match) > 0 {
matched = true
for i, name := range match {
result[reStateChange.SubexpNames()[i]] = name
}
} else if match := otherReMatch.FindStringSubmatch(text); len(match) > 0 {
matched = true
for i, name := range match {
result[reStateChange.SubexpNames()[i]] = name
}
} else if strings.Contains(text, "*** Time:") {
}
It feels that there could be a much better way to do this. I would trade some performance for readability. The log files are only really 10MB max. Often smaller.
I'm after some suggestions on how to make this better in golang.
If all your log lines are similar to that sample you posted, they seem quite structured so regular expressions might be a bit overkill and hard to generalize.
Another option would be for you to transform each of those lines to a slice of strings ([]string) by using strings.Fields, or even strings.FieldFunc so that you can strip both white space and commas.
Then you can design an interface like:
type LogLineProcessor interface {
CanParse(line []string)
GetResultFrom(line []string) LogLineResult
}
Where LogLineResult is an struct containing the extracted information.
You can then define multiple structs with methods that implement LogLineProcessor (each implementation would look at specific positions on that []string to realize if it is a line it can process or not, like looking for the words "hose", "FF" and "price" in the positions it expects to find them).
The GetResultFrom implementations would also extract each data point from specific positions in the []string (it can rely on that information being there if it already determined it was one of the lines it can process).
You can create a var processors []LogLineProcessor, put all your processors in there and then just iterate that array:
line := strings.Fields(text)
for _, processor := range processors {
if processor.CanParse(line) {
result := processor.GetResultFrom(line)
// do whatever needed with the result
}
}

Enter string repeatedly Ruby Cucumber

I have a test that includes character lengths within fields etc.
I was wondering if I could have a set string of 10 characters like str = 'abcdefghij'
then have it multiply that string by the amount of times needed to fulfil the character length and fill in the field.
I've tried the times method but that just enters the same value over x iterations.
What I want is to take str, increase it ten fold and enter that value as 1 continuous string so abcdefghij becomes abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij etc
I'd parameterize the number of times to increase it depending on the field I'm testing. I want to do this so that I don't have huge amounts of variables stored to satisfy each test.
Can this be done? I hope I've explained clearly.
String#* would do:
'abc' * 10
#⇒ "abcabcabcabcabcabcabcabcabcabc"
To use a floating point parameter:
λ = ->(input, count) do
i, f = *count.divmod(1)
input * i << input[0...(f * input.size).to_i]
end
λ.('abcd', 2.5)
#⇒ 'abcdabcdab'

Hashing a long integer ID into a smaller string

Here is the problem, where I need to transform an ID (defined as a long integer) to a smaller alfanumeric identifier. The details are the following:
Each individual on the problem as an unique ID, a long integer of size 13 (something like 123123412341234).
I need to generate a smaller representation of this unique ID, a alfanumeric string, something like A1CB3X. The problem is that 5 or 6 character length will not be enough to represent such a large integer.
The new ID (eg A1CB3X) should be valid in a context where we know that only a small number of individuals are present (less than 500). The new ID should be unique within that small set of individuals.
The new ID (eg A1CB3X) should be the result of a calculation made over the original ID. This means that taking the original ID elsewhere and applying the same calculation, we should get the same new ID (eg A1CB3X).
This calculation should occur when the individual is added to the set, meaning that not all individuals belonging to that set will be know at that time.
Any directions on how to solve such a problem?
Assuming that you don't need a formula that goes in both directions (which is impossible if you are reducing a 13-digit number to a 5 or 6-character alphanum string):
If you can have up to 6 alphanumeric characters that gives you 366 = 2,176,782,336 possibilities, assuming only numbers and uppercase letters.
To map your larger 13-digit number onto this space, you can take a modulo of some prime number slightly smaller than that, for example 2,176,782,317, the encode it with base-36 encoding.
alphanum_id = base36encode(longnumber_id % 2176782317)
For a set of 500, this gives you a
2176782317P500 / 2176782317500 chance of a collision
(P is permutation)
Best option is to change the base to 62 using case sensitive characters
If you want it to be shorter, you can add unicode characters. See below.
Here is javascript code for you: https://jsfiddle.net/vewmdt85/1/
function compress(n) {
var symbols = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïð'.split('');
var d = n;
var compressed = '';
while (d >= 1) {
compressed = symbols[(d - (symbols.length * Math.floor(d / symbols.length)))] + compressed;
d = Math.floor(d / symbols.length);
}
return compressed;
}
$('input').keyup(function() {
$('span').html(compress($(this).val()))
})
$('span').html(compress($('input').val()))
How about using some base-X conversion, for example 123123412341234 becomes 17N644R7CI in base-36 and 9999999999999 becomes 3JLXPT2PR?
If you need a mapping that works both directions, you can simply go for a larger base.
Meaning: using base 16, you can reduce 1 to 16 to a single character.
So, base36 is the "maximum" that allows for shorter strings (when 1-1 mapping is required)!

Secure Random hex digits only

Trying to generate random digits with SecureRandom class of rails. Can we create a random number with SecureRandom.hex which includes only digits and no alphabets.
For example:
Instead of
SecureRandom.hex(4)
=> "95bf7267"
It should give
SecureRandom.hex(4)
=> "95237267"
Check out the api for SecureRandom: http://rails.rubyonrails.org/classes/ActiveSupport/SecureRandom.html
I believe you're looking for a different method: #random_number.
SecureRandom.random_number(a_big_number)
Since #hex returns a hexadecimal number, it would be unusual to ask for a random result that contained only numerical characters.
For basic use cases, it's simple enough to use #rand.
rand(9999)
Edited:
I'm not aware of a library that generates a random number of specified length, but it seems simple enough to write one. Here's my pass at it:
def rand_by_length(length)
rand((9.to_s * length).to_i).to_s.center(length, rand(9).to_s).to_i
end
The method #rand_by_length takes an integer specifying length as a param and tries to generate a random number of max digits based on the length. String#center is used to pad the missing numbers with random number characters. Worst case calls #rand for each digit of specified length. That may serve your need.
Numeric id's are good because they are easier to read over the phone (no c for charlie).
Try this
length = 20
id = (SecureRandom.random_number * (10**length)).round.to_s # => "98075825200269950976"
and for bonus points break it up for easier reading
id.split(//).each_slice(4).to_a.map(&:join).join('-') # => "9807-5825-2002-6995-0976"
This will create a number of the desired length.
length = 11
rand(10**length..(10**length+1)-1).to_s
length = 4
[*'0'..'9'].sample(length).join
as simple as that :)

itration behaviour when list length = 1

I'm quite new to python and am having issues with for loop behaviour. In my code I'm reading config from a file using configobj. The contents of the config file are variable and that is where I'm seeing issues.
Here's my test code:
if webconf.has_key(group):
scenario_list = webconf[group]['Scenarios']['names']
for scenario in scenario_list:
print "Scenario name = %s\n" % scenario
The "scenario_list" variable will contain any number of strings. When 'names' has multiple elements "scenario" is set to the value of each element, which is fine. When "names" has only 1 element then the loop iterates over each character of the first entry, breaking my code.
So, how do I get the for loop simply to return the value of the entry in "scenario_list" when list length is 1?
Thankyou in advance for any advice offered.
Are you using tuples rather than lists?
aTuple = (1,2,3)
aList = [1,2,3]
The big difference between tuples and lists are that tuples are immutable and lists are mutable. That is, with a list you may change the element of a list, or even add and remove elements.
The problem that you are likely encountering is related to a concept called tuple unpacking.
aList = [0] # aList is now [0]
notATuple = (0) # notATuple is now 0
# there was exactly one element in the tuple, so it was unpacked in the variable
aTuple = (0,) # aTuple is now (0,) - a tuple with one element
# the comma indicates that you wish that the tuple should not be unpacked
The only other problem I think of is that you are not putting the scenario string in a list or tuple when you have only one scenario. Python treats strings like lists (well, more like tuples) of characters. As such, if you iterate over a string you get the individual characters (the behaviour you experienced). Hence, you must put your scenario string in a list (or tuple) if want to iterate over your one string, and not its characters. Had you not been using strings you would have seen a runtime error.

Resources