Icon -- how to implement equivalent of ZipWith - lazy-evaluation

In Icon, every expression is basically a stream. Most of the functions like * are named in a way that's suggestive of what they do to a single element.
For instance, here is foo.icn which maps * 2 over a stream.
procedure main()
every write((1 | 2 | 3) * 2)
end
I couldn't tell from the small bits of Icon documentation I've seen how it would interleave two non-trivial streams ... and whether a singleton like 2 is a one-element stream or some kind of magical scalar value that distributes and sometimes behaves as an infinite stream ... similar to how scalar extension works in APL.
Doing a test with two non-trivial streams shows what traversal order Icon uses. The rightmost stream is the innermost loop and streams can be traversed multiple times.
This is also nice because it means that we don't need a distinction between a scalar and a singleton stream.
procedure main()
every write((1 | 2 | 3) * (2 | 0 | 0))
end
produces
% icon foo.icn
2
0
0
4
0
0
6
0
0
However, I can't figure out how to write an operation that traverses two streams at the same time ... How would you implement the equivalent of zipWith?
Here's an example using the hypothetical zipWith function, with * passed in as a string. (I'm not sure whether Icon has first-class functions)
procedure main()
every write(zipWith("*", (1 | 2 | 3), (2 | 0 | 0)))
end
Which would produce:
2
0
0

Related

hpack encoding integer significance

After reading this, https://httpwg.org/specs/rfc7541.html#integer.representation
I am confused about quite a few things, although I seem to have the overall gist of the idea.
For one, What are the 'prefixes' exactly/what is their purpose?
For two:
C.1.1. Example 1: Encoding 10 Using a 5-Bit Prefix
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (2^5 - 1) and is represented using the 5-bit prefix.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits
+---+---+---+---+---+---+---+---+
What are the leading Xs? What is the starting 0 for?
>>> bin(10)
'0b1010'
>>>
Typing this in the python IDE, you see almost the same output... Why does it differ?
This is when the number fits within the number of prefix bits though, making it seemingly simple.
C.1.2. Example 2: Encoding 1337 Using a 5-Bit Prefix
The value I=1337 is to be encoded with a 5-bit prefix.
1337 is greater than 31 (25 - 1).
The 5-bit prefix is filled with its max value (31).
I = 1337 - (25 - 1) = 1306.
I (1306) is greater than or equal to 128, so the while loop body executes:
I % 128 == 26
26 + 128 == 154
154 is encoded in 8 bits as: 10011010
I is set to 10 (1306 / 128 == 10)
I is no longer greater than or equal to 128, so the while loop terminates.
I, now 10, is encoded in 8 bits as: 00001010.
The process ends.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I=1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done
+---+---+---+---+---+---+---+---+
The octet-like diagram shows three different numbers being produced... Since the numbers are produced throughout the loop, how do you replicate this octet-like diagram within an integer? What is the actual final result? The diagram or "I" being 10, or 00001010.
def f(a, b):
if a < 2**b - 1:
print(a)
else:
c = 2**b - 1
remain = a - c
print(c)
if remain >= 128:
while 1:
e = remain % 128
g = e + 128
remain = remain / 128
if remain >= 128:
continue
else:
print(remain)
c+=int(remain)
print(c)
break
As im trying to figure this out, I wrote a quick python implementation of it, It seems that i am left with a few useless variables, one being g which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
For one, What are the 'prefixes' exactly/what is their purpose?
Integers are used in a few places in HPACK messages and often they have leading bits that cannot be used to for the actual integer. Therefore, there will often be a few leading digits that will be unavailable to use for the integer itself. They are represented by the X. For the purposes of this calculation it doesn't make what those Xs are: could be 000, or 111, or 010 or...etc. Also, there will not always be 3 Xs - that is just an example. There could only be one leading X, or two, or four...etc.
For example, to look up a previous HPACK decoded header, we use 6.1. Indexed Header Field Representation which starts with a leading 1, followed by the table index value. Therefore that 1 is the X in the previous example. We have 7-bits (instead of only 5-bits in the original example in your question). If the table index value is 127 or less we can represent it using those 7-bits. If it's >= 127 then we need to do some extra work (we'll come back to this).
If it's a new value we want to add to the table (to reuse in future requests), but we already have that header name in the table (so it's just a new value for that name we want as a new entry) then we use 6.2.1. Literal Header Field with Incremental Indexing. This has 2 bits at the beginning (01 - which are the Xs), and we only have 6-bits this time to represent the index of the name we want to reuse. So in this case there are two Xs.
So don't worry about there being 3 Xs - that's just an example. In the above examples there was one X (as first bit had to be 1), and two Xs (as first two bits had to be 01) respectively. The Integer Representation section is telling you how to handle any prefixed integer, whether prefixed by 1, 2, 3... etc unusable "X" bits.
What are the leading Xs? What is the starting 0 for?
The leading Xs are discussed above. The starting 0 is just because, in this example we have 5-bits to represent the integers and only need 4-bits. So we pad it with 0. If the value to encode was 20 it would be 10100. If the value was 40, we couldn't fit it in 5-bits so need to do something else.
Typing this in the python IDE, you see almost the same output... Why does it differ?
Python uses 0b to show it's a binary number. It doesn't bother showing any leading zeros. So 0b1010 is the same as 0b01010 and also the same as 0b00001010.
This is when the number fits within the number of prefix bits though, making it seemingly simple.
Exactly. If you need more than the number of bits you have, you don't have space for it. You can't just use more bits as HPACK will not know whether you are intending to use more bits (so should look at next byte) or if it's just a straight number (so only look at this one byte). It needs a signal to know that. That signal is using all 1s.
So to encode 40 in 5 bits, we need to use 11111 to say "it's not big enough", overflow to next byte. 11111 in binary is 31, so we know it's bigger than that, so we'll not waste that, and instead use it, and subtract it from the 40 to give 9 left to encode in the next byte. A new additional byte gives us 8 new bits to play with (well actually only 7 as we'll soon discover, as the first bit is used to signal a further overflow). This is enough so we can use 00001001 to encode our 9. So our complex number is represented in two bytes: XXX11111 and 00001001.
If we want to encode a value bigger than can fix in the first prefixed bit, AND the left over is bigger than 127 that would fit into the available 7 bits of the second byte, then we can't use this overflow mechanism using two bytes. Instead we use another "overflow, overflow" mechanism using three bytes:
For this "overflow, overflow" mechanism, we set the first byte bits to 1s as usual for an overflow (XXX11111) and then set the first bit of the second byte to 1. This leaves 7 bits available to encode the value, plus the next 8 bits in the third byte we're going to have to use (actually only 7 bits of the third byte, because again it uses the first bit to indicate another overflow).
There's various ways they could go have gone about this using the second and third bytes. What they decided to do was encode this as two numbers: the 128 mod, and the 128 multiplier.
1337 = 31 + (128 * 10) + 26
So that means the frist byte is set to 31 as per pervious example, the second byte is set to 26 (which is 11010) plus the leading 1 to show we're using the overflow overflow method (so 100011010), and the third byte is set to 10 (or 00001010).
So 1337 is encoded in three bytes: XXX11111 100011010 00001010 (including setting X to whatever those values were).
Using 128 mod and multiplier is quite efficient and means this large number (and in fact any number up to 16,383) can be represented in three bytes which is, not uncoincidentally, also the max integer that can be represented in 7 + 7 = 14 bits). But it does take a bit of getting your head around!
If it's bigger than 16,383 then we need to do another round of overflow in a similar manner.
All this seems horrendously complex but is actually relatively simply, and efficiently, coded up. Computers can do this pretty easily and quickly.
It seems that i am left with a few useless variables, one being g
You are not print this value in the if statement. Only the left over value in the else. You need to print both.
which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
Exactly, it's because the first bit (value 128) needs to be set as per explanation above, to show we are continuing/overflowing into needing a third byte.

Put elements of similar type together

I read this interview question somewhere and was trying to solve it:
Given a fruit stall (at max 8 different types of fruits). Put fruits of similar types together.
Restrictions: a) Fruit Stall is your entire world (i.e. dont use extra space), b) Taking a fruit and knowing its type (getType()) is a costly operation but swapping is a very cheap operation.
Note: You need to write a code to handle all cases keeping in mind the max types of fruit can be 8.
So, the idea which pops in my mind is, we need to call getType() for all the fruits(array elements) and then sort them accordingly based on a particular type. I am not able to get how swapping can be done here without knowing the Type of the fruit and what can be the best solution to this problem?
Since this is an interview question, I'm going to assume that your fruit stall is an array. Divide the array into eight regions, so that each region contains only fruit of a given type, using seven pointers, one to the start of each region except the first. Use an eighth pointer to point at the start of the unsorted area.
Initialize the pointers to point at the start of the array. Getting the definition of the pointers is tricky because you have to cover cases where there are no fruits of a given type. One possible definition is that Pointer i contains the number of fruits sorted so far of types up to and including i, for i = 1..8. Then at the beginning all the pointers are set equal to zero and 1 1 1 2 2 3 4 4 | corresponds to p1=3 p2=5 p3=6 p4=8 p5=p6=p7=p8=8
Repeatedly look at the first fruit at the start of the unsorted region to find out its type. If it should not go in the final region swap it with the element at the start of the final region and advance the pointer to the start of the final region. If it should not go in the second last region swap it with the element at the start of the second last region and advance the pointer to the start of the second last region... and so on until the new fruit is in its correct place. Now advance the pointer to the first unsorted fruit and repeat.
This looks at each fruit once, and I don't think you can sort with fewer calls to getType(). You don't care about the number of swaps, so I think this is optimal.
I will put in lines showing the swaps starting with c1,c2,c1,c3,c2,c1,c4,c4. I won't bother to write in the cs and I will use a | to divide the region on the left where everything is known to be in order from the region on the right where the types are unknown
| 1 2 1 3 2 1 4 4
1 | 2 1 3 2 1 4 4
1 2 | 1 3 2 1 4 4
1 1 | 2 3 2 1 4 4
1 1 2 | 3 2 1 4 4
1 1 2 3 | 2 1 4 4
1 1 2 2 | 3 1 4 4
1 1 2 2 3 | 1 4 4
1 1 2 2 1 | 3 4 4
1 1 1 2 2 | 3 4 4
1 1 1 2 2 3 | 4 4
1 1 1 2 2 3 4 | 4
1 1 1 2 2 3 4 4 |
This can most likely be done as an in place merge sort. As you mentioned check the type of each fruit immediately. This wont use up any extra memory (many guides on how to do an in place merge sort) will only call getType() once, and will result in nlog(n) run time with n memory usage.
Is there any info we know right off the bat? It seems like the question is worded in such a way that they would normally give us an alternative way to avoid having to make the getType() call n times. If this is an in person interview question don't be surprised if the goal of this exercise is supposed to evolve as the interviewer starts going into it. This would explain why they specifically mention the getType() as being expensive

Algorithm for building an equation from a list of numbers using basic operators (+,-,*,/)

I am struggling to find a smart/creative way to approach this problem instead of a brute force algorithm. Any insights that could allow me to view this problem in a different light would be greatly appreciated. Any language can be used.
Given a sequence of numbers, it is sometimes possible to place the basic operations of =, +, −, ×, and ÷ between those number to build a complete, correct equation. For instance, the sequence [2, 3, 8, 4, 22] can be made into an equation by putting the symbols in the following places:
2 + 3 × 8 − 4 = 22
Note how the order of operations is followed. Write a program that will build such a number sentence, given the list of numbers. An ideal solution would allow the numbers to be specified on the command line, but prompting the user for a list is also acceptable. If no true sentence can be found, print such a message to the user. Assume only one = will be used (allowing for multiple equal signs makes the assignment a lot harder, but is fun!)
You may present the sentence using prefix, infix, or postfix operators.
Sample Output
Note that some of these sequences could form multiple sentences. Only one need be printed.
$ assignment -4 1 2 3 4
1 - 2 = 3 - 4
$ assignment -4 1 1 1
1 / 1 = 1
$ assignment -4 0 18 1 2 1 1
0 / 18 = 1 - 2 + 1 * 1
$ assignment -4 2 2 2
No equation found

golang array referencing eg. b[1:4] references elements 1,2,3

The golang blog states :
"A slice can also be formed by "slicing" an existing slice or array. Slicing is done by specifying a half-open range with two indices separated by a colon. For example, the expression b[1:4] creates a slice including elements 1 through 3 of b (the indices of the resulting slice will be 0 through 2)."
Can someone please explain to me the logic in the above. IE. Why doesn't b[1:4] reference elements 1 through 4? Is this consistent with other array referencing?
Indexes point to the "start" of the element. This is shared by all languages using zero-based indexing:
| 0 | first | 1 | second | 2 | third | 3 | fourth | 4 | fifth | 5 |
[0] = ^
[0:1] = ^ --------> ^
[1:4] = ^-------------------------------------> ^
[0:5] = ^ ----------------------------------------------------------> ^
It's also common to support negative indexing, although Go doesn't allow this:
|-6 | |-5 | |-4 | |-3 | |-2 | |-1 |
| 0 | first | 1 | second | 2 | third | 3 | fourth | 4 | fifth | 5 |
The reason is given in the Go Language Specification section on Slices.
For a string, array, or slice a, the
primary expression
a[low : high]
constructs a substring or slice. The
index expressions low and high select
which elements appear in the result.
The result has indexes starting at 0
and length equal to high - low.
For convenience, any of the index
expressions may be omitted. A missing
low index defaults to zero; a missing
high index defaults to the length of
the sliced operand.
It's easy and efficient to calculate the length of the slice as high - low.
Half-open intervals make sense for many reasons, when you get down to it. For instance, with a half-open interval like this, the number of elements is:
n = end - start
which is a pretty nice and easy formula. For a closed interval, it would be:
n = (end - start) + 1
which is (not a lot, but still) more complicated.
It also means that for e.g. a string, the entire string is [1, len(s)] which also seems intuitive. If the interval was closed, to get the entire string you would need [1, len(s) + 1].
Go uses half-open intervals for slices like many other languages. In a more mathematical notation, the slice b[1:4] is the interval [1,4) which excludes the upper endpoint.

How can I maximally partition a set?

I'm trying to solve one of the Project Euler problems. As a consequence, I need an algorithm that will help me find all possible partitions of a set, in any order.
For instance, given the set 2 3 3 5:
2 | 3 3 5
2 | 3 | 3 5
2 | 3 3 | 5
2 | 3 | 3 | 5
2 5 | 3 3
and so on. Pretty much every possible combination of the members of the set. I've searched the net of course, but haven't found much that's directly useful to me, since I speak programmer-ese not advanced-math-ese.
Can anyone help me out with this? I can read pretty much any programming language, from BASIC to Haskell, so post in whatever language you wish.
Have you considered a search tree? Each node would represent a choice of where to put an element and the leaf nodes are answers. I won't give you code because that's part of the fun of Project Euler ;)
Take a look at:
The Art of Computer Programming, Volume 4, Fascicle 3: Generating All Combinations and Partitions
7.2.1.5. Generating all set partitions
In general I would look at the structure of the recursion used to compute the number of configurations, and build a similar recursion for enumerating them. Best is to compute a one-to-one mapping between integers and configurations. This works well for permutations, combinations, etc. and ensures that each configuration is enumerated only once.
Now even the recursion for the number of partitions of some identical items is rather complicated.
For partitions of multisets the counting amounts to solving the generalization of Project Euler problem 181 to arbitrary multisets.
Well, the problem has two aspects.
Firsty, the items can be arranged in any order. So for N items, there are N! permutations (assuming the items are treated as unique).
Secondly, you can envision the grouping as a bit flag between each item indicating a divide. There would be N-1 of these flags, so for a given permutation there would be 2^(N-1) possible groupings.
This means that for N items, there would be a total of N!*(2^(N-1)) groupings/permutations, which gets big very very fast.
In your example, the top four items are groupings of one permutation. The last item is a grouping of another permutation. Your items can be viewed as :
2 on 3 off 3 off 5
2 on 3 on 3 off 5
2 on 3 off 3 on 5
2 on 3 on 3 on 5
2 off 5 on 3 off 3
The permutations (the order of display) can be derived by looking at them like a tree, as mentioned by the other two. This would almost certainly involve recursion, such as here.
The grouping is independent of them in many ways. Once you have all the permutations, you can link them with the groupings if needed.
Here is the code you need for this part of your problem:
def memoize(f):
memo={}
def helper(x):
if x not in memo:
memo[x]=f(x)
return memo[x]
return helper
#memoize
def A000041(n):
if n == 0: return 1
S = 0
J = n-1
k = 2
while 0 <= J:
T = A000041(J)
S = S+T if k//2%2!=0 else S-T
J -= k if k%2!=0 else k//2
k += 1
return S
print A000041(100) #the 100's number in this series, as an example
I quickly whipped up some code to do this. However, I left out separating every possible combination of the given list, because I wasn't sure it was actually needed, but it should be easy to add, if necessary.
Anyway, the code runs quite well for small amounts, but, as CodeByMoonlight already mentioned, the amount of possibilities gets really high really fast, so the runtime increases accordingly.
Anyway, here's the python code:
import time
def separate(toseparate):
"Find every possible way to separate a given list."
#The list of every possibility
possibilities = []
n = len(toseparate)
#We can distribute n-1 separations in the given list, so iterate from 0 to n
for i in xrange(n):
#Create a copy of the list to avoid modifying the already existing list
copy = list(toseparate)
#A boolean list indicating where a separator is put. 'True' indicates a separator
#and 'False', of course, no separator.
#The list will contain i separators, the rest is filled with 'False'
separators = [True]*i + [False]*(n-i-1)
for j in xrange(len(separators)):
#We insert the separators into our given list. The separators have to
#be between two elements. The index between two elements is always
#2*[index of the left element]+1.
copy.insert(2*j+1, separators[j])
#The first possibility is, of course, the one we just created
possibilities.append(list(copy))
#The following is a modification of the QuickPerm algorithm, which finds
#all possible permutations of a given list. It was modified to only permutate
#the spaces between two elements, so it finds every possibility to insert n
#separators in the given list.
m = len(separators)
hi, lo = 1, 0
p = [0]*m
while hi < m:
if p[hi] < hi:
lo = (hi%2)*p[hi]
copy[2*lo+1], copy[2*hi+1] = copy[2*hi+1], copy[2*lo+1]
#Since the items are non-unique, some possibilities will show up more than once, so we
#avoid this by checking first.
if not copy in possibilities:
possibilities.append(list(copy))
p[hi] += 1
hi = 1
else:
p[hi] = 0
hi += 1
return possibilities
t1 = time.time()
separations = separate([2, 3, 3, 5])
print time.time()-t1
sepmap = {True:"|", False:""}
for a in separations:
for b in a:
if sepmap.has_key(b):
print sepmap[b],
else:
print b,
print "\n",
It's based on the QuickPerm algorithm, which you can read more about here: QuickPerm
Basically, my code generates a list containing n separations, inserts them into the given list and then finds all possible permutations of the separations in the list.
So, if we use your example we would get:
2 3 3 5
2 | 3 3 5
2 3 | 3 5
2 3 3 | 5
2 | 3 | 3 5
2 3 | 3 | 5
2 | 3 3 | 5
2 | 3 | 3 | 5
In 0.000154972076416 seconds.
However, I read through the problem description of the problem you are doing and I see how you are trying to solve this, but seeing how quickly the runtime increases I don't think that it would work as fast you would expect. Remember that Project Euler's problems should solve in around a minute.

Resources