This question already has answers here:
How do I find the index of the maximum value in the list using Applescript?
(2 answers)
Closed 3 years ago.
Basically I just need help with finding the max of this code.
set numberList to {3, 4, 5, 6, 7, 8, 9, 10}
set max to numberList
repeat with x in numberList
if x > max then set max to x
end repeat
display dialog max
The error that I am getting is:
"Can’t make {3, 4, 5, 6, 7, 8, 9, 10} into type number." number -1700 from {3, 4, 5, 6, 7, 8, 9, 10} to number
I'm glad #vadian went for the AppleScriptObjC method, because it allows me to geek out on doing it the vanilla AppleScript way, just like you tried (and came very close to achieving). Let me show you how close you were. Here's your script, where I'm just switching out an identifier because I like denoting a generic list of stuff by L, and will do this consistently in the various improvements that will follow on from this initial naive algorithm.
Also, let's make the list a bit more interesting than a sequence of numbers with increasing size.
I've labelled problem lines with a number that refers to footnotes immediately following:
set L to {6, 3, 7, 9, 4, 10, 2, 10, 8, 2}
set max to L --(1)
repeat with x in L
if x > max then set max to x
end repeat
display dialog max --(2)
¹ Setting max to the initial list, L is an idea we'll look at later, but won't work here because it's not possible to compare a number to a list of numbers and tell me which of the two has a greater numeric value ? Therefore, you want your initial value of max to be a number, but it can't be a number that is, itself, greater than the maximum value contained by your list, which is why the common mistake of thinking 0 is a suitable initial value will catch you out (consider a list of negative numbers): the only set of values that I could not create a list (of any length) that guarantees this algorithm to return the incorrect result is any subset of whatever list the algorithm operates upon. In other words, choose any value from the list L.
² Get away from the novice's inclination to visualise results in dialog boxes. You'll be thankful you did. Instead, use the Results (or, even more useful, the Replies) pane at the bottom of your Script Editor window. If you're using Script Debugger, it's the one oversight that their events panel and their results panel tried so hard to improve on, but sometimes simpler is better.
And now the working version:
set L to {6, 3, 7, 9, 4, 10, 2, 10, 8, 2}
set max to some item in L
repeat with x in L
if x > max then set max to x's contents
end repeat
return max
I spent a lot of time as a teenager with no friends learning the joys of what can be done by grouping things together in lists. In mathematics, lists are called sets, and there's a field of study called Set Theory, which it turns out can be used exclusively to define every mathematical principle that exists or ever will exist. That's how amazing lists are.
Finding the maximum value in a list is one of the first, and incredibly important things one either learns about or discovers for themselves. It's a type of mapping that operates on a set, picking out the largest element, which is fundamental to defining a number system that now has a means of putting numbers in order. I first wrote such a procedure in Pascal, which I've since written in a dozen other languages, but AppleScript is so much more fun to do this with than in some other languages, because you can get it to do things that it probably wasn't meant to be able to do but somehow does. AppleScript lists are also an odd construct, as they are recursively, and infinitely, self-referential, which turns out is an interestingly property.
I won't explore what referenced values are, but it's why my script refers to x's contents rather than just x (thanks #vadian), as contents is what dereferences (i.e. evaluates) a referenced item. (Don't worry about it for now, but learn about it later on, as it can be a very useful concept to make use of if you know how).
It's also amusing that, unless dealing with lists that are really big, vanilla AppleScript typically out-performs AppleScriptObjC speedwise, which initially suffers with the extra overhead incurred from bridging to Objective-C. The cost eventually gets recuperated when dealing with hundreds or thousands of items per list, which AppleScriptObjC will, without question, wipe the floor with AppleScript in a speed test. But for many everyday situations handling lists with under a hundred items, the tortoise beats the hare quite unexpectedly and satisfyingly, but you need to know how to structure your script to get it to do this.
So... The rest of what will follow is for educational fun...
I'll start by taking your initial idea of setting max to the initial list, and perform a process on this list over and over that ends up getting us the maximum value. But I'm not going to bother to define a separate max variable, as we can just perform the process on the original list, L. Take a look at this slight variation on your initial technique:
set L to {6, 3, 7, 9, 4, 10, 2, 10, 8, 2}
repeat while L's length > 1
if L's first item > L's last item then set ¬
L's last item to L's first item
set L to the rest of L
end repeat
return L's first item
There are two properties of any list in AppleScript that the above script makes use of: length tells use how many items the list contains; and rest is all but the first item in the list.
In this method, we compare the item at position 1 to the item at the last position. The winner (the largest value) gets to sit in the last position. Then the list is shortened (or reduced) by getting the rest of... it, which removes the first item is each time until we're left with just a one-item list. That item will be survivor from the original list and so that's our maximum value that we retrieve and return.
This technique is a basic implementation of a process to reduce or fold a list bit by bit, often acting to reduce the complexity of a list (but it doesn't have to), so that you start with an object composed of many items that might be structured in a complex way, but after repeated folding, what you're most often left with is a simpler object, like a single, unary value such as a number.
It might not appear so, but this is a really powerful technique that can form the basis of lots of other things we like to do with lists, such as counting the number of items it contains, or changing the items by using the same rules applied to each, e.g. doubling all items (this is a type of process called a map), or removing all odd-valued items but keeping all even-valued ones (this is a filter).
More to follow tomorrow...
Related
You're given a bit string that consists of 1 <= n <= 32 bits.
You are also given a sequence of changes that invert some of the bits. If the original string is 001011, and the change is "3", then the changed bit string would be 000011. (The 3rd bit from the right was flipped)
I have to find after each change, the length of the longest substring in which each bit is the same. Therefore, for 000011, the answer would be 4.
I think brute force would just be a sliding window that starts at size of the string and shrinks until the first instance where all the bits in the window are the same.
How would that be altered for a dynamic programming solution?
You can solve this by maintaining a list of indices at which the bit flips. You start by creating that list: shift the sequence by one bit (losing the one on the end), and compare to the original. Any mismatch here is a bit flip. Make a list of those indices:
001011
01011
-234--
In this case, you have bit flips after locations 2, 3, and 4.
Now, you need to develop a simple function to process your change operation. This is very simple: for a change of bit n, you need to change whether indices n-1 and n are in the list: if it's not in the list, add it; if it is in the list, remove it. In the case of changing bit 3, both are in the list, so you now remove them:
---4--
Any time you want to check the longest substring, you need merely check adjacent indices for the largest different. Include 0 and the string length as endpoints. Thus, when you have the list [0, 2, 3, 4, 6], you have a maximum difference of 2 at 2-0 and 6-4. After the change, with the list [0, 4, 6], you have the maximum difference of 4 at 4-0.
If you have a large list with many indices, you can simply maintain differences, altering only the adjacent intervals affected by the single change.
This should get you going; I leave the implementation details to the student. :-)
Let's say I have an array of ~20-100 integers, for example [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (actually numbers more like [106511349 , 173316561, ...], all nonnegative 64-bit integers under 2^63, but for demonstration purposes let's use these).
And many (~50,000) smaller arrays of usually 1-20 terms to match or not match:
1=[2, 3, 8, 20]
2=[2, 3, NOT 8]
3=[2, 8, NOT 16]
4=[2, 8, NOT 16] (there will be duplicates with different list IDs)
I need to find which of these are subsets of the array being tested. A matching list must have all of the positive matches, and none of the negative ones. So for this small example, I would need to get back something like [3, 4]. List 1 fails to match because it requires 20, and list 2 fails to match because it has NOT 8. The NOT can easily be represented by using the high bit/making the number negative in those cases.
I need to do this quickly up to 10,000 times per second . The small arrays are "fixed" (they change infrequently, like once every few seconds), while the large array is done per data item to be scanned (so 10,000 different large arrays per second).
This has become a bit of a bottleneck, so I'm looking into ways to optimize it.
I'm not sure the best data structures or ways to represent this. One solution would be to turn it around and see what small lists we even need to consider:
2=[1, 2, 3, 4]
3=[1, 2]
8=[1, 2, 3, 4]
16=[3, 4]
20=[1]
Then we'd build up a list of lists to check, and do the full subset matching on these. However, certain terms (often the more frequent ones) are going to end up in many of the lists, so there's not much of an actual win here.
I was wondering if anyone is aware of a better algorithm for solving this sort of problem?
you could try to make a tree with the smaller arrays since they change less frequently, such that each subtree tries to halve the number of small arrays left.
For example, do frequency analysis on numbers in the smaller arrays. Find which number is found in closest to half of the smaller arrays. Make that the first check in the tree. In your example that would be '3' since it occurs in half the small arrays. Now that's the head node in the tree. Now put all the small lists that contain 3 to the left subtree and all the other lists to the right subtree. Now repeat this process recursively on each subtree. Then when a large array comes in, reverse index it, and then traverse the subtree to get the lists.
You did not state which of your arrays are sorted - if any.
Since your data is not that big, I would use a hash-map to store the entries of the source set (the one with ~20-100 integers). That would basically let you test if a integer is present in O(1).
Then, given that 50,000(arrays) * 20(terms each) * 8(bytes per term) = 8 megabytes + (hash map overhead), does not seem large either for most systems, I would use another hash-map to store tested arrays. This way you don't have to re-test duplicates.
I realize this may be less satisfying from a CS point of view, but if you're doing a huge number of tiny tasks that don't affect each other, you might want to consider parallelizing them (multithreading). 10,000 tasks per second, comparing a different array in each task, should fit the bill; you don't give any details about what else you're doing (e.g., where all these arrays are coming from), but it's conceivable that multithreading could improve your throughput by a large factor.
First, do what you were suggesting; make a hashmap from input integer to the IDs of the filter arrays it exists in. That lets you say "input #27 is in these 400 filters", and toss those 400 into a sorted set. You've then gotta do an intersection of the sorted sets for each one.
Optional: make a second hashmap from each input integer to it's frequency in the set of filters. When an input comes in, sort it using the second hashmap. Then take the least common input integer and start with it, so you have less overall work to do on each step. Also compute the frequencies for the "not" cases, so you basically get the most bang for your buck on each step.
Finally: this could be pretty easily made into a parallel programming problem; if it's not fast enough on one machine, it seems you could put more machines on it pretty easily, if whatever it's returning is useful enough.
How to form a combination of say 10 questions so that each student (total students = 10) get unique combination.
I don't want to use factorial.
you can use circular queue data structure
now you can cut this at any point you like , and it then it will give you a unique string
for example , if you cut this at point between 2 and 3 and then iterate your queue, you will get :
3, 4, 5, 6, 7, 8, 9, 10, 1, 2
so you need to implement a circular queue, then cut it from 10 different points (after 1, after 2[shown in picture 2],after 3,....)
There are 3,628,800 different permutations of 10 items taken 10 at a time.
If you only need 10 of them you could start with an array that has the values 1-10 in it. Then shuffle the array. That becomes your first permutation. Shuffle the array again and check to see that you haven't already generated that permutation. Repeat that process: shuffle, check, save, until you have 10 unique permutations.
It's highly unlikely (although possible) that you'll generate a duplicate permutation in only 10 tries.
The likelihood that you generate a duplicate increases as you generate more permutations, increasing to 50% by the time you've generated about 2,000. But if you just want a few hundred or less, then this method will do it for you pretty quickly.
The proposed circular queue technique works, too, and has the benefit of simplicity, but the resulting sequences are simply rotations of the original order, and it can't produce more than 10 without a shuffle. The technique I suggest will produce more "random" looking orderings.
We have some elements characterized by some key value.
We consider the elements in descending order of key values. So, if we have ten elements with key values, 4, 5, 7, 10, 2, 8, 9, 10, 8.5, 9, we sort the elements by their key values, and consider the elements with equal key values together.
As such, elements with equal key values, e.g. 10, will be considered together, followed by elements with key values 9, and so on. When an element is considered, and it passes a certain fitness function, it is removed from the list and is not considered any more.
Now we relax the restriction of having equal key values to be considered together a little, and consider the elements with approximately equal key values together. So, when we say that in a sorted order the second element is within 10% of the first one, they are to be considered together.
So, now elements with key values 10, 10, 9, 9 are to be considered together. And provided, one element with key value 9 is not removed here, it will have to be considered again with 8.5.
The only way I can think of implementing the above scenario is something like this,
Sort the elements in descending order of key values.
For the first element in the order, find 10% as allowable deviation. Find elements which fall within this deviation window. So, here we consider, 10, 10, 9, 9, in this window.
If any of the elements passes the fitness function, remove it from the list.
Form the next window and repeat the cycle.
This is where my idea gets boggled. How do I form the start form the start of next window? If the sorted values are 10, 10, 9, 9, 8.5, 8 ..., and 10, 10, 9, 9, have been considered in the first window, the next window should start with 9 and consist of 9, 8,5.
Is it always sufficient to start the next window the last value of previous window? I tried some counterexamples and none of them invalidated my conjecture. But what if both the 9's pass the fitness function and get removed from the list, which value start the next window? The next one available in the sorted list?
So, my questions are,
Is the conjecture regarding starting the next window with the last value (and next one available in the if it gets removed) of previous window correct?
Is there a better algorithm for the whole process?
No, it's probably less-than-correct to start the window, from the last value of the previous window.
Try starting from the last window's midpoint initially; then dynamically lower the upper edge, as you iterate the lower edge down, to maintain an appropriate 'span' for the window.
It's unclear whether the fitness function & "removal from the list" you describe constitute acceptance of ideal elements, or rejection, or what.
The ideal correct semantics for your windowing, may depend on an accurate specification/ understanding of what that overall operation is -- and your question was sorely lacking in that.
I'm currently implementing an algorithm where one particular step requires me to calculate subsets in the following way.
Imagine I have sets (possibly millions of them) of integers. Where each set could potentially contain around a 1000 elements:
Set1: [1, 3, 7]
Set2: [1, 5, 8, 10]
Set3: [1, 3, 11, 14, 15]
...,
Set1000000: [1, 7, 10, 19]
Imagine a particular input set:
InputSet: [1, 7]
I now want to quickly calculate to which this InputSet is a subset. In this particular case, it should return Set1 and Set1000000.
Now, brute-forcing it takes too much time. I could also parallelise via Map/Reduce, but I'm looking for a more intelligent solution. Also, to a certain extend, it should be memory-efficient. I already optimised the calculation by making use of BloomFilters to quickly eliminate sets to which the input set could never be a subset.
Any smart technique I'm missing out on?
Thanks!
Well - it seems that the bottle neck is the number of sets, so instead of finding a set by iterating all of them, you could enhance performance by mapping from elements to all sets containing them, and return the sets containing all the elements you searched for.
This is very similar to what is done in AND query when searching the inverted index in the field of information retrieval.
In your example, you will have:
1 -> [set1, set2, set3, ..., set1000000]
3 -> [set1, set3]
5 -> [set2]
7 -> [set1, set7]
8 -> [set2]
...
EDIT:
In inverted index in IR, to save space we sometimes use d-gaps - meaning we store the offset between documents and not the actual number. For example, [2,5,10] will become [2,3,5]. Doing so and using delta encoding to represent the numbers tends to help a lot when it comes to space.
(Of course there is also a downside: you need to read the entire list in order to find if a specific set/document is in it, and cannot use binary search, but it sometimes worths it, especially if it is the difference between fitting the index into RAM or not).
How about storing a list of the sets which contain each number?
1 -- 1, 2, 3, 1000000
3 -- 1, 3
5 -- 2
etc.
Extending amit's solution, instead of storing the actual numbers, you could just store intervals and their associated sets.
For example using a interval size of 5:
(1-5): [1,2,3,1000000]
(6-10): [2,1000000]
(11-15): [3]
(16-20): [1000000]
In the case of (1,7) you should consider intervals (1-5) and (5-10) (which can be determined simply by knowing the size of the interval). Intersecting those ranges gives you [2,1000000]. Binary search of the sets shows that indeed, (1,7) exists in both sets.
Though you'll want to check the min and max values for each set to get a better idea of what the interval size should be. For example, 5 is probably a bad choice if the min and max values go from 1 to a million.
You should probably keep it so that a binary search can be used to check for values, so the subset range should be something like (min + max)/N, where 2N is the max number of values that will need to be binary searched in each set. For example, "does set 3 contain any values from 5 to 10?" this is done by finding the closest values to 5 (3) and 10 (11), in this case, no it does not. You would have to go through each set and do binary searches for the interval values that could be within the set. This means ensuring that you don't go searching for 100 when the set only goes up to 10.
You could also just store the range (min and max). However, the issue is that I suspect your numbers are going be be clustered, thus not providing much use. Although as mentioned, it'll probably be useful for determining how to set up the intervals.
It'll still be troublesome to pick what range to use, too large and it'll take a long time to build the data structure (1000 * million * log(N)). Too small, and you'll start to run into space issues. The ideal size of the range is probably such that it ensures that the number of set's related to each range is approximately equal, while also ensuring that the total number of ranges isn't too high.
Edit:
One benefit is that you don't actually need to store all intervals, just the ones you need. Although, if you have too many unused intervals, it might be wise to increase the interval and split the current intervals to ensure that the search is fast. This is especially true if processioning time isn't a major issue.
Start searching from biggest number (7) of input set and
eliminate other subsets (Set1 and Set1000000 will returned).
Search other input elements (1) in remaining sets.