I was at a carnival where at each location they mark your program with a special hole punch. The hole punch is a grid of 3x3 spaces. In each space, there's either a pin that punctures your paper or there isn't. This got me to wondering how many different patterns you could make with this tool. My first thought was: 2^9 = 512, but all 9 spaces being pinless isn't really a punch, so really: 511.
Then the complexity hit me. Especially since the workers aren't all that careful when they punch your paper, these would all look idential:
x.. .x. ... etc.
.x. x.. .x.
... ... ..x
Question: How could a test be written to account for rotation and shifting?
Diligence and thoughts so far:
Binary feels like an obvious part of this equation
When a unique pattern is found, store it in memory so future patterns can be tested against it
There are 4 rotation possibilities.Edit: what I mean by "rotations" is that you can take any shape and turn it 90 degrees. Consider the pattern that is a dot in the upper left corner. You can turn/rotate it 90 degrees and get the dot in the upper right corner. Do this again and it's in the lower right. Again and it's in the lower left. Using the pure 2^9 calculation, these are 4 different combinations. For this problem however, these are exactly the kind of duplicates I'm trying to weed out.
For each rotation, there are 25 ways to make 3x3 grids overlap:
Overlaps:
/ = the spaces in the new one to test
\ = the spaces in a verified unique one
1 2 25
/ / / . . . . . / / / . . . . . . . . . .
/ / / . . . . . / / / . . . . . . . . . .
/ / X \ \ . . . / X X \ . . . . \ \ \ . .
. . \ \ \ . . . . \ \ \ . . . . \ \ \ . .
. . \ \ \ . . . . \ \ \ . . . . \ \ X / /
. . . . . . . . . . . . . . . . . . / / /
. . . . . . . . . . . . . . . . . . / / /
An overlap doesn't need to be tested if either pattern contains a pin that isn't in the overlap area. Bitwise AND could help here.
If you make each position for each of the 2 patterns into strings, you can just check for equality
Can these previous two ideas be combined to increase efficiency?
We need to only consider patterns that have punches in the first row and column. If the first row is empty, the pattern can be shifted up. If the first column is empty, the pattern can be shifted left. In either case, we can derive a similar pattern that we do consider.
For these patterns, we need to check if the rotated versions are identical. We do this by applying up to three 90 degree rotations, possibly shifting left to remove leading empty columns (the first row is never empty) and finding the pattern with the lowest numeric value.
We can then add this value to a hash set, which will only keep unique values.
The empty pattern is not included because all its rows are empty.
To implement this, we encode patterns as successive bits:
012
345
678
The operations we will need are mostly very simple:
Test for an empty row: (n & 7) == 0 // bits 0,1,2 not set
Test for an empty column: (n & 73) == 0 // bits 0,3,6 not set
Shift pattern up: n -> (n >> 3)
Shift pattern left: n -> (n >> 1)
The trickiest part is the rotation, which is really just rearranging all the bits:
n -> ((n & 1) << 2) + ((n & 2) << 4) + ((n & 4) << 6)
+ ((n & 8) >> 2) + (n & 16) + ((n & 32) << 2)
+ ((n & 64) >> 6) + ((n & 128) >> 4) + ((n & 256) >> 2);
In C#:
public static int Count3x3() {
HashSet<int> patterns = new HashSet<int>();
for (int i = 0; i < 512; i++) {
if ((i & 7) == 0 || (i & 73) == 0)
continue;
int nLowest = i;
int n = i;
do {
nLowest = Math.Min(nLowest, n);
n = ((n & 1) << 2) + ((n & 2) << 4) + ((n & 4) << 6)
+ ((n & 8) >> 2) + (n & 16) + ((n & 32) << 2)
+ ((n & 64) >> 6) + ((n & 128) >> 4) + ((n & 256) >> 2);
while ((n & 73) == 0)
n >>= 1;
} while (n != i);
patterns.Add(nLowest);
}
return patterns.Count;
}
This function returns 116. The time taken on my machine was 0.023ms.
EDIT: You can get an additional 7x improvement by using 4 observations:
We can use a simple visited array instead of a hash set. If a pattern was seen before, we don't count it. This also eliminates the need to keep track of the 'lowest' pattern in the inner loop. If a pattern was visited, then its lowest rotated pattern was visited, too.
If we don't have 180 degree rotation symmetry, then the 3rd rotation won't yield the original pattern. The 4th rotation will, always, so it is unnecessary.
The rotation expression can be slightly simplified.
So, if we apply these observations and unroll the inner do loop, we get the following:
static int Rotate(int n) {
n = ((n & (1+32)) << 2) + ((n & 2) << 4) + ((n & 4) << 6)
+ ((n & (8+256)) >> 2) + (n & 16)
+ ((n & 64) >> 6) + ((n & 128) >> 4);
while ((n & 73) == 0)
n >>= 1;
return n;
}
public static int Count3x3_3() {
bool[] visited = new bool[512];
int count = 0, r;
for (int i = 0; i < 512; i++) {
if (visited[i])
continue;
if ((i & 7) == 0 || (i & 73) == 0)
continue;
count++;
if ((r = Rotate(i)) == i) continue;
visited[r] = true;
if ((r = Rotate(r)) == i) continue;
visited[r] = true;
visited[Rotate(r)] = true;
}
return count;
}
This runs in about 3μs on the same machine.
My solution: 116 unique shapes
When testing 2 shapes for equality, comparing the number of pins saves a lot of time. But my biggest breakthrough was realizing that all of those 25 positions can be replaced by this: for each of the two 3x3 shapes to be checked for equality, concatenate the lines with two zeros then trim leading and trailing zeros. The concat zeros are to prevent wrap-around. Example:
010 => 01000 => 0100010100000 => 1000101
101 10100
000 000
000 => 00000 => 0000001000101 => 1000101
010 01000
101 101
Then just test the results for equality. That's 4 easy iterations (1 for each rotation) instead of 100 (25 positions * 4 rotations) more complex ones.
Time:
strings only:
brute force, all 25 positions for each rotation: 2018ms
...00...00... trimmed: 75ms
more optimization: 59ms
OOP and better caching: 17ms
You aren't asking for a count of the number of unique patterns up to translation and rotation, but for a congruence test.
Choose a bit string representation of the 3x3 grid. I'll choose row-by-row, top down. By setting a bit where its corresponding hole is punched, we can now map 9-bit integers to punch patterns (and vice-versa.)
For any particular representation, we can contrive bit-fiddling operations representing rotations and translations. Some translations are illegal to perform on some patterns, as we wish to avoid "wrapping around".
Just as rotations and translations are reversible, so will our operations be. If two motions map pattern A to B, and then B to C, we can certainly compose the motions to make a transformation taking A to C. Doing nothing (the identity transform) is also legal, so we can reach A from A. Reachability by transformation is therefore an equivalence relation on punch patterns, and so equivalent patterns partition the space.
Having a means of assigning a positive integer score to each punch pattern, we can invoke the well-ordered principle: the equivalence class containing the pattern will contain a unique pattern (including translations and rotations) of least score. We'll choose this least pattern to be a representative of the equivalence class. If two patterns have the same equivalence class representative, they are necessarily congruent. If they do not, they are necessarily not congruent.
Given a pattern, how do we find its least-weight representative? By inspection, greedy algorithms aren't guaranteed to work. We can reach for one of umpteen heuristic optimization algorithms, or we can note we're only pushing 9 bits around and exhaustively search the space. It should be noted that by the same token this lends itself nicely to being computed once, and shoved in a lookup table forever after.
Here's the exhaustive search. Note with proper caching it is still quite fast (less than a second.)
#!/usr/bin/env ruby
require 'set'
class PunchPattern < String
##representatives = Hash.new do |h, k|
equivalence_class = k.closure
representative = equivalence_class.min
k.closure.each do |node|
h[node] = representative
end
representative
end
def initialize(s)
raise "9 digits of 0 and 1, pls" unless s =~ /[01]{9}/
super
end
def left
return nil unless self =~ /0..0..0../
PunchPattern.new([self[1...3], 0, self[4...6], 0, self[7...9], 0].join)
end
def right
return nil unless self =~ /..0..0..0/
PunchPattern.new([0, self[0...2], 0, self[3...5], 0, self[6...8]].join)
end
def up
return nil unless self =~ /000....../
PunchPattern.new([self[3...9], 0, 0, 0].join)
end
def down
return nil unless self =~ /......000/
PunchPattern.new([0, 0, 0, self[0...6]].join)
end
def turn
PunchPattern.new([2, 5, 8, 1, 4, 7, 0, 3, 6].collect { |i| self[i].chr }.join)
end
def closure
seen = Set.new([])
frontier = Set.new([self])
until frontier.empty?
node = frontier.first
frontier.delete(node)
seen.add(node)
%w{left right up down turn}.collect do |motion|
node.send motion
end.compact.each do |neighbor|
frontier.add(neighbor) unless seen.include? neighbor
end
end
seen
end
def representative
self.class.representatives[self]
end
def self.representatives
##representatives
end
end
(0...512).collect do |p|
p = PunchPattern.new(p.to_s(2).rjust(9, '0'))
[p.representative, p]
end.inject(Hash.new { |h, k| h[k] = [] }) do |h, pair|
h[pair.first] << pair.last
h
end.sort_by { |pair| pair.first }.each do |representative, patterns|
puts patterns.collect { |p| p[0...3] + " " }.join
puts patterns.collect { |p| p[3...6] + " " }.join
puts patterns.collect { |p| p[6...9] + " " }.join
puts
end
puts "#{PunchPattern.representatives.values.uniq.size} total congruence classes"
Produces
$ ./congruence.rb
000
000
000
000 000 000 000 000 000 001 010 100
000 000 000 001 010 100 000 000 000
001 010 100 000 000 000 000 000 000
000 000 000 000 000 000 000 001 010 011 100 110
000 000 001 010 011 100 110 001 010 000 100 000
011 110 001 010 000 100 000 000 000 000 000 000
000 000 001 010 100 101
000 101 000 000 000 000
101 000 001 010 100 000
000 000 001 010 100 111
000 111 001 010 100 000
111 000 001 010 100 000
000 000 000 000 001 010 010 100
001 010 010 100 010 001 100 010
010 001 100 010 000 000 000 000
000 000 000 000 000 000 000 000 001 010 010 011 011 100 110 110
001 010 010 011 011 100 110 110 011 011 110 001 010 110 010 100
011 011 110 001 010 110 010 100 000 000 000 000 000 000 000 000
000 001 010 100
001 100 000 000
100 000 001 010
000 000 001 010 011 100 101 110
001 101 101 000 000 000 100 000
101 100 000 011 001 110 000 010
000 000 001 010 010 011 100 100
001 011 110 001 010 100 010 100
110 100 000 001 001 000 010 010
000 000 001 010 011 100 110 111
001 111 111 010 001 100 010 100
111 100 000 011 001 110 010 000
000 000 001 010 010 010 100 101
010 101 010 001 100 101 010 010
101 010 001 010 010 000 100 000
000 000 001 010 010 010 100 111
010 111 011 011 110 111 110 010
111 010 001 010 010 000 100 000
000 000 011 110
011 110 011 110
011 110 000 000
000 000 010 011 011 100 101 110
011 101 001 010 101 010 110 100
101 110 011 001 000 110 000 010
000 010 011 100
011 011 110 110
110 001 000 010
000 000 010 011 011 100 110 111
011 111 011 011 111 110 110 110
111 110 011 001 000 110 010 000
000 001 010 100
100 000 000 001
001 010 100 000
000 000 001 001 010 010 100 110
100 110 001 010 010 100 011 001
011 001 010 010 100 100 000 000
000 000 001 010 011 100 101 110
100 101 000 000 000 101 001 000
101 001 011 110 010 000 000 100
000 000 001 010 011 100 110 111
100 111 001 010 010 111 100 001
111 001 011 110 010 000 100 000
000 000 001 010 011 101 110 110
101 110 010 100 001 011 010 101
011 101 011 110 010 000 100 000
000 011 101 110
101 000 101 000
101 011 000 110
000 000 011 011 101 110 110 111
101 111 001 010 111 010 100 101
111 101 011 011 000 110 110 000
000 001 010 110
110 011 110 011
011 010 100 000
000 000 001 010 011 110 110 111
110 111 011 110 011 110 111 011
111 011 011 110 010 100 000 000
000 011 110 111
111 011 110 111
111 011 110 000
001 100
000 000
100 001
001 100 101 101
000 000 000 000
101 101 001 100
001 011 100 100
000 000 001 100
110 100 001 001
001 100 101 111
000 100 001 000
111 101 001 100
001 001 100 110
001 100 000 000
100 100 011 001
001 100 101 111
001 000 100 000
101 111 100 001
001 011 100 110
001 100 100 001
110 100 011 001
001 100 111 111
001 100 001 100
111 111 001 100
001 100
010 010
100 001
001 100 101 101
010 010 010 010
101 101 001 100
001 011 100 100
010 010 011 110
110 100 001 001
001 100 101 111
010 110 011 010
111 101 001 100
001 001 100 110
011 110 010 010
100 100 011 001
001 100 101 111
011 010 110 010
101 111 100 001
001 011 100 110
011 110 110 011
110 100 011 001
001 100 111 111
011 110 011 110
111 111 001 100
001 010 100 101
100 000 001 000
001 101 100 010
001 010 010 100
100 001 100 001
010 100 001 010
001 010 101 110
100 100 001 001
011 101 010 100
001 101 101 110
100 000 001 000
101 011 100 101
001 011 100 110
100 001 001 100
110 100 011 001
001 101 110 111
100 001 100 001
111 011 101 100
001 010 100 111
101 000 101 000
001 111 100 010
001 010 010 110
101 100 101 001
010 011 100 010
001 010 110 111
101 100 101 001
011 111 100 010
001 110
101 000
100 011
001 101 110 111
101 101 000 000
101 100 111 011
001 011 110 110
101 101 001 100
110 100 011 011
001 110 111 111
101 100 001 101
111 111 011 100
001 010 100 101
110 010 011 010
001 101 100 010
001 010 010 100
110 011 110 011
010 100 001 010
001 010 101 110
110 110 011 011
011 101 010 100
001 101 101 110
110 010 011 010
101 011 100 101
001 011 100 110
110 011 011 110
110 100 011 001
001 101 110 111
110 011 110 011
111 011 101 100
001 010 100 111
111 010 111 010
001 111 100 010
001 010 010 110
111 110 111 011
010 011 100 010
001 010 110 111
111 110 111 011
011 111 100 010
001 110
111 010
100 011
001 101 110 111
111 111 010 010
101 100 111 011
001 011 110 110
111 111 011 110
110 100 011 011
001 110 111 111
111 110 011 111
111 111 011 100
010 011 100 101
001 100 001 100
101 001 110 010
010 010 011 100
001 101 100 101
110 001 010 010
010 011 100 111
001 101 101 100
111 001 110 010
010 011 100 101
011 110 011 110
101 001 110 010
010 010 011 100
011 111 110 111
110 001 010 010
010 011 100 111
011 111 111 110
111 001 110 010
010
101
010
010 010 011 110
101 101 101 101
011 110 010 010
010 011 101 110
101 100 101 001
101 011 010 110
010 011 110 111
101 101 101 101
111 011 110 010
010
111
010
010 010 011 110
111 111 111 111
011 110 010 010
010 011 101 110
111 110 111 011
101 011 010 110
010 011 110 111
111 111 111 111
111 011 110 010
011 100 101 101
000 001 000 100
101 101 110 001
011 100
000 101
110 001
011 100 101 111
000 101 101 000
111 101 001 110
011 100 101 111
001 001 100 100
101 111 110 001
011 011 100 110
001 100 101 101
110 110 011 001
011 100 111 111
001 101 100 101
111 111 110 001
011 100 101 101
010 011 010 110
101 101 110 001
011 100
010 111
110 001
011 100 101 111
010 111 111 010
111 101 001 110
011 100 101 111
011 011 110 110
101 111 110 001
011 011 100 110
011 110 111 111
110 110 011 001
011 100 111 111
011 111 110 111
111 111 110 001
011 101 101 110
100 001 100 001
101 110 011 101
011 101 110 111
100 101 101 001
111 011 101 110
011 101 110 111
101 101 001 100
101 110 111 011
011 110
101 101
110 011
011 110 111 111
101 101 101 101
111 111 011 110
011 101 101 110
110 011 110 011
101 110 011 101
011 101 110 111
110 111 111 011
111 011 101 110
011 101 110 111
111 111 011 110
101 110 111 011
011 110
111 111
110 011
011 110 111 111
111 111 111 111
111 111 011 110
101
000
101
101 101 101 111
000 001 100 000
111 101 101 101
101 101 111 111
001 100 001 100
111 111 101 101
101
010
101
101 101 101 111
010 011 110 010
111 101 101 101
101 101 111 111
011 110 011 110
111 111 101 101
101 111
101 000
101 111
101 111 111 111
101 001 100 101
111 111 111 101
101 111
111 010
101 111
101 111 111 111
111 011 110 111
111 111 111 101
111
101
111
111
111
111
117 total congruence classes
..for 117 classes.
First, we can view two punches that are equivalent, except for a translation, as rotations of each other. Imagine the punch pattern is on the surface of a sphere: we can 'translate' it by rotating the sphere along the horizontal and vertical axes (as it is held in our hand.)
Two punches that are equivalent up to rotation (like a 90-degree turn) are also captured here by us rotating our sphere along the third, remaining axis.
Now we've reduced the problem to "How many unique punch patterns are there on the surface of a sphere, up to rotation?" For counting unique objects up to symmetry like this, you want not-Burnside's Lemma. This book is a good primer.
I don't think that this is like the sphere case, since you can't rotate around the edges? IE:
XOO
XXO
XOO
is not the same as
OOX
XOX
OOX
I tried counting by hand on paper to see what I got. Consider the 2x2 case - you have 1 with 0 dots, 1 with 1 dot, 2 with 2 dots (adjacent, or diagonal), 1 with 3 dots and 1 with 4; for a total of 5 (or 4 if you neglect the empty case). Note that the enumeration is symmetric, since it is the same to count empty spaces as full ones. For the 3x3 case I got this:
C(0) = 1
C(1) = 1
C(2) = 5
C(3) = 10
C(4) = 21
and then by symmetry, 21, 10, 5, 1, 1
I get 76. I could very easily have miscounted, especially in the 4/5 case.
The only way I can think of enumerating these automatically would involve shifting and rotating the patterns to see if they match a previously-enumerated one. Shifting is tricky, since you can only shift until you "bump" against an edge.
It's worth pointing out that if you truly need each shape to "look" unique, no matter how it's rotated or shifted, you have very few to choose from. For example, a single punch, no matter WHERE it is in the grid, will always look the same. Furthermore, assuming a square grid and round pins, and assuming that minor spacing differences (√2) are insignificant, then 2 holes diagonal in a row will look the same as two adjacent pins, since all the viewer sees is 2 holes close together. Likewise, 3 in a diagonal will look just like 3 in a straight line, which dramatically limits your options.
Note that shape is probably a better word for what we're after than combination, since we don't care what the actual combination was, just what the resultant shape is on paper.
I think we can posit that no matter what the shape, it can be rotated and shifted such that the top-left pin is punched (specifically if you allow rotation on the 45-degree), which allows us to narrow our search even further. We accomplish this using the following rules:
If any corner is punched, rotate the grid until the punched corner is in the top-left
Otherwise shift the pattern as far up and left as it will go.
Repeat step 1
If we get this far, then we know that only the top middle position is punched (since we know that neither corner is), in which case we rotate the pattern 45 degrees, making the top-middle now the top-left. QED.
I did a really quick pen-and-paper brute-force search for possible shapes and it appears that the list of viable options is so small that you could list them all out in just a few minutes.
I don't understand clearly this part about rotations but for this scenario:
There are 3x3=9 holes and 10 cases and every time only one case can take place:
Case 1 = no holes
Case 2 = one hole
...
Case 10 = 9 holes
Then it would go like this with the combination formula C(n,k):
Total = C(9,0)+C(9,1)+....+C(9,9)
this is to sum k different combinations of n elements.
Totol = 1+9+36+84+126+126+84+36+9+1 = 512
I think you are right, 512 seems to be correct and theoritically pinless is defined as a combination too, if you don't want it just remove it to get 511.
All these cases happen separately so we add the different cases.
If they were to happen synchronously, for example calculating the combinations for 3 employees punching the paper once or calculating the combinations for marking the paper 3 times by one employee then it gets more complicated and should be 512 * 512 * 512 by the nxm rule.
The m*n rule in a simple display is:
I have 4=m pockets and two=n hands:
my_left * 4(pockets)=4 my_right * 4(pockets)=4
total= 4+4=4*2=m*n
Only one hand can enter the pocket at a time or only one combination of one employe is paired with only one combination of the other employee and this is the exact reason we take the m*n rule.
This is what I think, I am not a mathematician and do not claim to be correct 100%, it's just what I remember from the probabilities course.
I don't claim to be 100% correct, is jus what I remember for the probability course.
As for patterns overlapping, checking if pattern already found etc I wouldn't bother at all since there are so many algorithms in pseudocode that generate combinations. Since they generate then no need to check overlaps or anything.
After all this is what generat means it is designed a priori to find all results, just let it run.
I had found once a even more rare algorithm than simple combinations and when I turned it to php it did the job perfectly, no need to worry for overlaps or anything.
Related
I am trying to decode base64 encoded binary content in JQ using explode function.
When I run explode and then through implode, I am expecting it to return the same string. But it is not. Try it here: https://jqplay.org/s/Rt8H1qv8VRP
Base64 encoded string: "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/"
JQ: '#base64d | explode | implode | #base64'
Output: "AQEAAAABAQAyGWRkZBXvv71bBwAAAAABAgMEBQYHCAkKUgbvv70/Ugbvv70/"
Debugging further,
#base64d | explode | .[14]
returns
65533
Running the following on Ubuntu, you can see the [14] char is 315 (octal) == 215(decimal)
$ echo "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/" | base64 -d | od -bc
0000000 001 001 000 000 000 001 001 000 062 031 144 144 144 025 315 133
001 001 \0 \0 \0 001 001 \0 2 031 d d d 025 315 [
0000020 007 000 000 000 000 001 002 003 004 005 006 007 010 011 012 122
\a \0 \0 \0 \0 001 002 003 004 005 006 \a \b \t \n R
0000040 006 236 077 122 006 236 077
006 236 ? R 006 236 ?
0000047
Why is JQ returning this weird 65533 (0xFFFD) character? What am I missing?
First of all, the issue has nothing to do with explode or implode. Using just #base64d | #base64 produces the same result.
jq expects the string encoded with base64 to be text encoded with UTF-8.
If the decoded string is not UTF-8, the results are undefined.
Your input is not UTF-8.
U+FFFD REPLACEMENT CHARACTER is a character used to mark input errors.
this one was tricky: I have a text file with several user DNs and bellow each one a matrix with the LogonHours pattern (extracted from Active Directory using PowerShell). Looks like this:
CN=John Doe,OU=Users,OU=Almox,DC=acme,DC=inc
Day
of --------- Daily Hours --------
Week M-3 3-6 6-9 9-N N-3 3-6 6-9 9-M
Dom 000 000 000 000 000 000 000 000
Seg 000 000 011 111 111 111 100 000
Ter 000 000 011 111 111 111 100 000
Qua 000 000 011 111 111 111 100 000
Qui 000 000 011 111 111 111 100 000
Sex 000 000 011 111 111 111 100 000
Sab 000 000 000 000 000 000 000 000
CN=Jack Dumb,OU=Users,OU=Managers,DC=acme,DC=inc
All Hours
CN=Barry Foo,OU=Disabled Users,DC=acme,DC=inc
Day
of --------- Daily Hours --------
Week M-3 3-6 6-9 9-N N-3 3-6 6-9 9-M
Dom 000 000 000 000 000 011 111 111
Seg 111 111 111 111 111 111 111 111
Ter 111 111 111 111 111 111 111 111
Qua 111 111 111 111 111 111 111 111
Qui 111 111 111 111 111 111 111 111
Sex 111 111 111 111 111 111 111 111
Sab 111 111 111 111 110 000 000 000
How can I extract usernames with an specific LogonHours pattern with AWK?
I was able to count the number of specific patterns using AWK using a "pattern file" as a RS as follows:
awk -v RS="$(<patternfile.txt)" '{print RT}' logonhoursCNlist.txt | grep "Week M-3" | wc -l
Expected result using a patternfile with the following content...
Day
of --------- Daily Hours --------
Week M-3 3-6 6-9 9-N N-3 3-6 6-9 9-M
Dom 000 000 000 000 000 000 000 000
Seg 000 000 011 111 111 111 100 000
Ter 000 000 011 111 111 111 100 000
Qua 000 000 011 111 111 111 100 000
Qui 000 000 011 111 111 111 100 000
Sex 000 000 011 111 111 111 100 000
Sab 000 000 000 000 000 000 000 000
... is just the usernames on CNs, like this:
John Doe
Any help I'll appreciate!
It'd be something like (untested but will be very close if not exactly what you want):
awk -v RS="$(<patternfile.txt)" 'RT!=""{print gensub(/.*CN=([^,]+).*/,"\\1",1)}' logonhoursCNlist.txt
The above uses GNU awk for multi-char RS and gensub().
I want to SUM the RW column for each port hourly
Time ID Name RW
-------- --- ------- ----------
14:57:01 000 Port0 1340
14:57:01 001 Port1 13
14:58:01 000 Port0 864
14:58:01 001 Port1 36
14:59:01 000 Port0 1394
14:59:01 001 Port1 22
15:57:01 000 Port0 1340
15:57:01 001 Port1 13
15:58:01 000 Port0 864
15:58:01 001 Port1 36
15:59:01 000 Port0 1394
15:59:01 001 Port1 22
.
.
.
20:57:01 000 Port0 1340
20:57:01 001 Port1 13
20:58:01 000 Port0 864
20:58:01 001 Port1 36
20:59:01 000 Port0 1394
20:59:01 001 Port1 22
My script is
data = LOAD 'hdfs:/data/data.txt' USING PigStorage(',') AS (time:chararray, id:chararray, name:chararray, read:int, write:int, rw:int);
runs = FOREACH data GENERATE time, name, rw;
How to
You will have to generate a new column from the time column called hours,then group by hours,port name and then get the sum for each grouping.
data = LOAD 'hdfs:/data/data.txt' USING PigStorage(',') AS (time:chararray, id:chararray, name:chararray, read:int, write:int, rw:int);
runs = FOREACH data GENERATE GetHour((timestamp)time) as hour, name, rw;
grouped = GROUP runs by (hour,name);
port_total = FOREACH grouped GENERATE FLATTEN(group) as (hour,name),SUM(data.rw);
DUMP port_total;
There are some file formats that i am looking into, and in order to use them i want to understand them at the byte level. I have been trying to find a program that displays the files bytes but the only ones i have so far only support displaying the bytes in hexadecimal. I would prefer them to be displayed in decimal instead of hexadecimal since the format i am looking at uses decimal. I could write my own program to do this, however that would be less readable, and would take more time. Currently all the programs i have found so far have displayed their output only in hexadecimal .
You can try HexEdit. It gives your the option to show Content as Integer-Values. (Property-Window):
You can download it from www.hexedit.com.
For me, the following works:
$ od -t u1 /bin/ls | head -5
0000000 127 069 076 070 001 002 001 000 000 000 000 000 000 000 000 000
0000020 000 002 000 002 000 000 000 001 000 001 015 192 000 000 000 052
0000040 000 000 103 092 000 000 000 000 000 052 000 032 000 006 000 040
0000060 000 023 000 021 000 000 000 006 000 000 000 052 000 001 000 052
0000100 000 000 000 000 000 000 000 192 000 000 000 192 000 000 000 005
I find it a bad idea, though, to look at it in decimal.
One should be able to mentally convert hexadecimal byte values to decimal or to ASCII characters if one dives into such matters.
000 000 000 000 (4 fields each of which is a group of 3 zeros separated by a space)
Process to generate 4 new lines
100 000 000 000
000 100 000 000
000 000 100 000
000 000 000 100
On ea line a group of three zeros is replaced by 100
How can I do this ?
tom
$ echo '000 000 000 000' | awk '{for (i=1;i<=NF;i++) {f=$i; $i="100"; print; $i=f}}'
100 000 000 000
000 100 000 000
000 000 100 000
000 000 000 100
Edit:
The fields are iterated over using the for loop. Each field ($i - for the field number i) is saved to a temporary variable f. Then the contents of the field are replaced. The record ($0) is printed. The field is returned to its previous value using the temporary value.
It might be easier to follow if this data was used: 001 002 003 004. Then the output would look like:
100 002 003 004
001 100 003 004
001 002 100 004
001 002 003 100
Here's a shell script version using sed:
data='001 002 003 004' # or 000 000 000 000
for i in {1..4}; do echo "$data" | sed "s/\<[0-9]\{3\}\>/100/$i"; done
or
count=$(echo "data" | $wc -w)
for ((i=1;i<=count;i++)); do echo "$data" | sed "s/\<[0-9]\{3\}\>/100/$i"; done
or Bash without any external utilities:
data=(001 002 003 004) # use an array
count=${#data[#]}
for ((i=0;i<count;i++)); do f=${data[i]}; data[i]=100; echo "${data[#]}"; data[i]=$f; done
Or many other possibilities.