Considering elements together which are approximately equal

Considering elements together which are approximately equal - algorithm

We have some elements characterized by some key value.
We consider the elements in descending order of key values. So, if we have ten elements with key values, 4, 5, 7, 10, 2, 8, 9, 10, 8.5, 9, we sort the elements by their key values, and consider the elements with equal key values together.
As such, elements with equal key values, e.g. 10, will be considered together, followed by elements with key values 9, and so on. When an element is considered, and it passes a certain fitness function, it is removed from the list and is not considered any more.
Now we relax the restriction of having equal key values to be considered together a little, and consider the elements with approximately equal key values together. So, when we say that in a sorted order the second element is within 10% of the first one, they are to be considered together.
So, now elements with key values 10, 10, 9, 9 are to be considered together. And provided, one element with key value 9 is not removed here, it will have to be considered again with 8.5.
The only way I can think of implementing the above scenario is something like this,
Sort the elements in descending order of key values.
For the first element in the order, find 10% as allowable deviation. Find elements which fall within this deviation window. So, here we consider, 10, 10, 9, 9, in this window.
If any of the elements passes the fitness function, remove it from the list.
Form the next window and repeat the cycle.
This is where my idea gets boggled. How do I form the start form the start of next window? If the sorted values are 10, 10, 9, 9, 8.5, 8 ..., and 10, 10, 9, 9, have been considered in the first window, the next window should start with 9 and consist of 9, 8,5.
Is it always sufficient to start the next window the last value of previous window? I tried some counterexamples and none of them invalidated my conjecture. But what if both the 9's pass the fitness function and get removed from the list, which value start the next window? The next one available in the sorted list?
So, my questions are,
Is the conjecture regarding starting the next window with the last value (and next one available in the if it gets removed) of previous window correct?
Is there a better algorithm for the whole process?

No, it's probably less-than-correct to start the window, from the last value of the previous window.
Try starting from the last window's midpoint initially; then dynamically lower the upper edge, as you iterate the lower edge down, to maintain an appropriate 'span' for the window.
It's unclear whether the fitness function & "removal from the list" you describe constitute acceptance of ideal elements, or rejection, or what.
The ideal correct semantics for your windowing, may depend on an accurate specification/ understanding of what that overall operation is -- and your question was sorely lacking in that.

Related

Finding the Max in AppleScript [duplicate]

This question already has answers here:
How do I find the index of the maximum value in the list using Applescript?
(2 answers)
Closed 3 years ago.
Basically I just need help with finding the max of this code.
set numberList to {3, 4, 5, 6, 7, 8, 9, 10}
set max to numberList
repeat with x in numberList
if x > max then set max to x
end repeat
display dialog max
The error that I am getting is:
"Can’t make {3, 4, 5, 6, 7, 8, 9, 10} into type number." number -1700 from {3, 4, 5, 6, 7, 8, 9, 10} to number

I'm glad #vadian went for the AppleScriptObjC method, because it allows me to geek out on doing it the vanilla AppleScript way, just like you tried (and came very close to achieving). Let me show you how close you were. Here's your script, where I'm just switching out an identifier because I like denoting a generic list of stuff by L, and will do this consistently in the various improvements that will follow on from this initial naive algorithm.
Also, let's make the list a bit more interesting than a sequence of numbers with increasing size.
I've labelled problem lines with a number that refers to footnotes immediately following:
set L to {6, 3, 7, 9, 4, 10, 2, 10, 8, 2}
set max to L --(1)
repeat with x in L
if x > max then set max to x
end repeat
display dialog max --(2)
¹ Setting max to the initial list, L is an idea we'll look at later, but won't work here because it's not possible to compare a number to a list of numbers and tell me which of the two has a greater numeric value ? Therefore, you want your initial value of max to be a number, but it can't be a number that is, itself, greater than the maximum value contained by your list, which is why the common mistake of thinking 0 is a suitable initial value will catch you out (consider a list of negative numbers): the only set of values that I could not create a list (of any length) that guarantees this algorithm to return the incorrect result is any subset of whatever list the algorithm operates upon. In other words, choose any value from the list L.
² Get away from the novice's inclination to visualise results in dialog boxes. You'll be thankful you did. Instead, use the Results (or, even more useful, the Replies) pane at the bottom of your Script Editor window. If you're using Script Debugger, it's the one oversight that their events panel and their results panel tried so hard to improve on, but sometimes simpler is better.
And now the working version:
set L to {6, 3, 7, 9, 4, 10, 2, 10, 8, 2}
set max to some item in L
repeat with x in L
if x > max then set max to x's contents
end repeat
return max
I spent a lot of time as a teenager with no friends learning the joys of what can be done by grouping things together in lists. In mathematics, lists are called sets, and there's a field of study called Set Theory, which it turns out can be used exclusively to define every mathematical principle that exists or ever will exist. That's how amazing lists are.
Finding the maximum value in a list is one of the first, and incredibly important things one either learns about or discovers for themselves. It's a type of mapping that operates on a set, picking out the largest element, which is fundamental to defining a number system that now has a means of putting numbers in order. I first wrote such a procedure in Pascal, which I've since written in a dozen other languages, but AppleScript is so much more fun to do this with than in some other languages, because you can get it to do things that it probably wasn't meant to be able to do but somehow does. AppleScript lists are also an odd construct, as they are recursively, and infinitely, self-referential, which turns out is an interestingly property.
I won't explore what referenced values are, but it's why my script refers to x's contents rather than just x (thanks #vadian), as contents is what dereferences (i.e. evaluates) a referenced item. (Don't worry about it for now, but learn about it later on, as it can be a very useful concept to make use of if you know how).
It's also amusing that, unless dealing with lists that are really big, vanilla AppleScript typically out-performs AppleScriptObjC speedwise, which initially suffers with the extra overhead incurred from bridging to Objective-C. The cost eventually gets recuperated when dealing with hundreds or thousands of items per list, which AppleScriptObjC will, without question, wipe the floor with AppleScript in a speed test. But for many everyday situations handling lists with under a hundred items, the tortoise beats the hare quite unexpectedly and satisfyingly, but you need to know how to structure your script to get it to do this.
So... The rest of what will follow is for educational fun...
I'll start by taking your initial idea of setting max to the initial list, and perform a process on this list over and over that ends up getting us the maximum value. But I'm not going to bother to define a separate max variable, as we can just perform the process on the original list, L. Take a look at this slight variation on your initial technique:
set L to {6, 3, 7, 9, 4, 10, 2, 10, 8, 2}
repeat while L's length > 1
if L's first item > L's last item then set ¬
L's last item to L's first item
set L to the rest of L
end repeat
return L's first item
There are two properties of any list in AppleScript that the above script makes use of: length tells use how many items the list contains; and rest is all but the first item in the list.
In this method, we compare the item at position 1 to the item at the last position. The winner (the largest value) gets to sit in the last position. Then the list is shortened (or reduced) by getting the rest of... it, which removes the first item is each time until we're left with just a one-item list. That item will be survivor from the original list and so that's our maximum value that we retrieve and return.
This technique is a basic implementation of a process to reduce or fold a list bit by bit, often acting to reduce the complexity of a list (but it doesn't have to), so that you start with an object composed of many items that might be structured in a complex way, but after repeated folding, what you're most often left with is a simpler object, like a single, unary value such as a number.
It might not appear so, but this is a really powerful technique that can form the basis of lots of other things we like to do with lists, such as counting the number of items it contains, or changing the items by using the same rules applied to each, e.g. doubling all items (this is a type of process called a map), or removing all odd-valued items but keeping all even-valued ones (this is a filter).
More to follow tomorrow...

Longest substring in which all bits are the same (DP algorithm)

You're given a bit string that consists of 1 <= n <= 32 bits.
You are also given a sequence of changes that invert some of the bits. If the original string is 001011, and the change is "3", then the changed bit string would be 000011. (The 3rd bit from the right was flipped)
I have to find after each change, the length of the longest substring in which each bit is the same. Therefore, for 000011, the answer would be 4.
I think brute force would just be a sliding window that starts at size of the string and shrinks until the first instance where all the bits in the window are the same.
How would that be altered for a dynamic programming solution?

You can solve this by maintaining a list of indices at which the bit flips. You start by creating that list: shift the sequence by one bit (losing the one on the end), and compare to the original. Any mismatch here is a bit flip. Make a list of those indices:
001011
01011
-234--
In this case, you have bit flips after locations 2, 3, and 4.
Now, you need to develop a simple function to process your change operation. This is very simple: for a change of bit n, you need to change whether indices n-1 and n are in the list: if it's not in the list, add it; if it is in the list, remove it. In the case of changing bit 3, both are in the list, so you now remove them:
---4--
Any time you want to check the longest substring, you need merely check adjacent indices for the largest different. Include 0 and the string length as endpoints. Thus, when you have the list [0, 2, 3, 4, 6], you have a maximum difference of 2 at 2-0 and 6-4. After the change, with the list [0, 4, 6], you have the maximum difference of 4 at 4-0.
If you have a large list with many indices, you can simply maintain differences, altering only the adjacent intervals affected by the single change.
This should get you going; I leave the implementation details to the student. :-)

Find optimal combination of elements from multiple sets

I have several sets of pairs like:
a: ([1, 2], [4,5])
b: ([1, 3])
c: ([4, 7], [1, 8])
d: ([9, 7], [1, 5])
...
Where no two pairs are identical, and the elements of no pair are identical. Each set can contain many pairs. There is a smallish number of elements (around 200).
From each set I take one pair. Now, I want to take pairs in such a way, that the number of elements is the smallest possible.
The problem is too large to try every combination, is there any algorithm or heuristic that might help me find the optimal (or a close guess)?

The problem has a definite NP-complete feel about it. So here are two greedy approaches that may produce reasonable approximate answers. To figure which is better you should implement both and compare.
The first is bottom up. Give each set a value of 2 if it has a pair selected from it, and (n+1)/n if it has n pairs partially selected from it. At each round, give each element a value for being selected which is the sum of the amount by which adding it increases the value of all of the sets. In the round select the element with the highest value, then update the value of all of the sets, update the value of all remaining elements, and continue.
This will pick elements that look like they are making progress towards covering all sets.
The second is top down. Start with all elements selected, and give each set a value of 1/n where n is the number of selected pairs. Elements that are required for all pairs in a given set are put into the final set. Of the remaining elements, find the one who increases the value the least if it is removed, and remove it.
The idea is that we start with too big a cover and repetitively remove the one which seems least important for covering all the sets. What we are left with is hopefully minimal.

Algorithm for seeing if many different arrays are subsets of another one?

Let's say I have an array of ~20-100 integers, for example [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (actually numbers more like [106511349 , 173316561, ...], all nonnegative 64-bit integers under 2^63, but for demonstration purposes let's use these).
And many (~50,000) smaller arrays of usually 1-20 terms to match or not match:
1=[2, 3, 8, 20]
2=[2, 3, NOT 8]
3=[2, 8, NOT 16]
4=[2, 8, NOT 16] (there will be duplicates with different list IDs)
I need to find which of these are subsets of the array being tested. A matching list must have all of the positive matches, and none of the negative ones. So for this small example, I would need to get back something like [3, 4]. List 1 fails to match because it requires 20, and list 2 fails to match because it has NOT 8. The NOT can easily be represented by using the high bit/making the number negative in those cases.
I need to do this quickly up to 10,000 times per second . The small arrays are "fixed" (they change infrequently, like once every few seconds), while the large array is done per data item to be scanned (so 10,000 different large arrays per second).
This has become a bit of a bottleneck, so I'm looking into ways to optimize it.
I'm not sure the best data structures or ways to represent this. One solution would be to turn it around and see what small lists we even need to consider:
2=[1, 2, 3, 4]
3=[1, 2]
8=[1, 2, 3, 4]
16=[3, 4]
20=[1]
Then we'd build up a list of lists to check, and do the full subset matching on these. However, certain terms (often the more frequent ones) are going to end up in many of the lists, so there's not much of an actual win here.
I was wondering if anyone is aware of a better algorithm for solving this sort of problem?

you could try to make a tree with the smaller arrays since they change less frequently, such that each subtree tries to halve the number of small arrays left.
For example, do frequency analysis on numbers in the smaller arrays. Find which number is found in closest to half of the smaller arrays. Make that the first check in the tree. In your example that would be '3' since it occurs in half the small arrays. Now that's the head node in the tree. Now put all the small lists that contain 3 to the left subtree and all the other lists to the right subtree. Now repeat this process recursively on each subtree. Then when a large array comes in, reverse index it, and then traverse the subtree to get the lists.

You did not state which of your arrays are sorted - if any.
Since your data is not that big, I would use a hash-map to store the entries of the source set (the one with ~20-100 integers). That would basically let you test if a integer is present in O(1).
Then, given that 50,000(arrays) * 20(terms each) * 8(bytes per term) = 8 megabytes + (hash map overhead), does not seem large either for most systems, I would use another hash-map to store tested arrays. This way you don't have to re-test duplicates.

I realize this may be less satisfying from a CS point of view, but if you're doing a huge number of tiny tasks that don't affect each other, you might want to consider parallelizing them (multithreading). 10,000 tasks per second, comparing a different array in each task, should fit the bill; you don't give any details about what else you're doing (e.g., where all these arrays are coming from), but it's conceivable that multithreading could improve your throughput by a large factor.

First, do what you were suggesting; make a hashmap from input integer to the IDs of the filter arrays it exists in. That lets you say "input #27 is in these 400 filters", and toss those 400 into a sorted set. You've then gotta do an intersection of the sorted sets for each one.
Optional: make a second hashmap from each input integer to it's frequency in the set of filters. When an input comes in, sort it using the second hashmap. Then take the least common input integer and start with it, so you have less overall work to do on each step. Also compute the frequencies for the "not" cases, so you basically get the most bang for your buck on each step.
Finally: this could be pretty easily made into a parallel programming problem; if it's not fast enough on one machine, it seems you could put more machines on it pretty easily, if whatever it's returning is useful enough.

What to do in case that the first element in a quicksort algorithm happens to be the smallest value

I tried looking through my lovely textbook (with no avail) and online. According to the book I'm working off by Cormen we are to use the first element in an array as the pivot. I'm just stuck on what to do since the first element happens to be 1.
The array looks as follows:
[1, 16, 2, 3, 14, 5, 12, 7, 10, 8, 9, 17, 19, 21, 23, 26, 27]
Again, the problem with the algorithm in the book is that it chooses the first element as the pivot. And once we have compared 1 to all the other elements and find that there is no other element smaller than or equal to then we are to swap the pivot and the middle element of the sub arrays, where subarray on the left is smaller than the pivot and the subarray on the right is greater than the pivot. But if our pivot IS 1 then there is no way we can swap. Really confused, any help would be great. The title of the book is Introduction to Algorithms, 3rd Edition in case someone out there is familiar with it.

No difference from the normal case: just treat the left part as empty and do quicksort on the right part which is the subarray from 1.
This is not a special case. In fact, when the input is sorted, the naive quicksort degenerates into a O(N^2) sorting algorithm. Quoting Wikipedia:
In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by R. Sedgewick).

You can use something known as the rule of three. Pick the first value, middle value and last value in the array and choose one of those as the pivot candidate. This doesn't guarantee that you will get the best pivot but it lowers the chances of getting a really bad pivot.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio