algorithm to find blocks of trends - algorithm

Let's say I have a a 24 lines of data, such that each line represents an hour in the day. What I want to achieve is to implement an algorithm that can detect trends in the data and can divide it into 2 blocks - a "good" block and a "bad" block. for example, in the attached image you can see that at line 6 a good block begins and ends in line 19. line 0 also has a good score but it is not part of a block so the algorithm should know how to handle this situation.
I think it's about clustering but couldn't find something simple enough that fits our needs.
Looking forward for any advice.

start = -1
Append a below-threshold value to the end of the data array x[]
For i from 1 to n:
If x[i] >= thresholdValue:
if start == -1:
start = i
Else:
If start != -1 and i - start >= thresholdLength:
ReportGoodBlock(start, i-1)
start = -1

Related

similar code , same function , can't figure it out any difference btw them

https://leetcode.com/problems/find-all-numbers-disappeared-in-an-array/discuss/93007/simple-java-in-place-sort-solution
Could you please check above link?
I can't understand the code
while (nums[i] != i + 1 && nums[i] != nums[nums[i] - 1])
What is the difference between those two?
1) nums[i] != i+1
2) nums[i] != nums[nums[i]-1]
for example
index 0 : 1
index 1 : 2
index 2 : 3
Then, the first one just simply using the index we can check
index+1 is the value or not.
and Second one,
nums[0] = nums[nums[i]-1]
nums[0] = nums[nums[0]-1]
nums[0] = nums[1-1]
nums[0] = nums[0]
It is also ultimately the same thing, just to prove that
index value = index+1.
But why while loop have to have both condition?
or we can just use one of that?
I agree the second condition is unnecessary. In fact, I think it needlessly clutters the code.
In English, the code essentially says "if [something] and (x != y), then swap x and y". All the "x != y" check does is prevent swapping x with (something equal to) itself. But that is a no-op, so that check can be removed without changing the behavior or O(n) performance.
Removing that check makes it easier to read the algorithm: "For each slot i, while the item at slot i is wrong, swap it to where it belongs."
[Update]
Whoops! I just realized the point of the check... It prevents a potential infinite loop where you keep swapping the same value back and forth. (Because the condition is actually a "while", not an "if".)
So the algorithm as presented is correct.
nums[i] != i+1
Is the value at its place? If not may be swap it to its place...
This is needed because you have to test every position
nums[i] != nums[nums[i]-1]
Does the value needs to be swapped to its place ?
This is needed because the algorithm place every element in a chain to its place.
Take this example:
[3,1,2,4,6,5,8,7]
it should be clear that you need to rearrange 1,2,3 and 5,6 and 7,8.
Lets look how the sorting takes place:
i:0 [3,1,2,4,6,5,8,7] 3<->2
i:0 [2,1,3,4,6,5,8,7] 2<->1
i:0 [1,2,3,4,6,5,8,7] now 1 is at its place, go to the right and find another chain
i:1 [1,2,3,4,6,5,8,7] 2 at its place
i:2 [1,2,3,4,6,5,8,7] 3 at its place
i:3 [1,2,3,4,6,5,8,7] 4 at its place
i:4 [1,2,3,4,6,5,8,7] 6<->5
i:4 [1,2,3,4,5,6,8,7] now is 5 at its place, go to the right and find another chain
i:5 [1,2,3,4,5,6,8,7] 6 at its place
i:6 [1,2,3,4,5,6,8,7] 8<->7
i:6 [1,2,3,4,5,6,7,8] now is 7 at its place, go to the right and find another chain
i:7 [1,2,3,4,5,6,7,8] 8 at its place
END
Beware that the algorithm can't sort the array given in the link! What the algorithm provides is that if in the initial array element e is present, then it will be at its place at the end. In the given example, 3 is present two times, one is placed at the right place but the other not! But the end of the algorithm retains values that are at their right places and ignores others. Then it's a "sorting and doublons removal" algorithm or "longest strictly increasing sequence algorithm".

Running a function multiple times and tracking results of the fight simulation

Ive made a function to run a fight simulation. Its got a random element so would like to run it 100 times to check results.
Ive learnt that ruby cant have functions inside functions.
$p1_skill = 10
$p1_health = 10
$p2_skill = 10
$p2_health = 10
def hp_check
if $p2_health >= 1 && $p1_health == 0
return "p2_wins"
elsif $p1_health >= 1 && $p2_health == 0
return "p1_wins"
else
battle
end
end
def battle
p1_fight = $p1_skill + rand(2..12)
p2_fight = $p2_skill + rand(2..12)
if p1_fight > p2_fight
$p2_health -= 2
hp_check
elsif p2_fight > p1_fight
$p1_health -= 2
hp_check
else
battle
end
end
battle
Right now this accurately produces a winner. It rolls two dice and adds them to a players skill. If its higher than the other players the other player loses 2 health.
The skills and hp of players will change throughout the game, this is for a project assignment.
Id like this to produce odds for win chances for balancing issues.
I have several suggestions regarding your implementation. Note that since this is a homework I'm providing the answer in pieces rather than just giving you an entire program. In no particular order...
Don't use global variables. I suspect this is the major hurdle you're running into with trying to achieve multiple runs of your model. The model state should be contained within the model methods, and initial state can be passed to it as arguments. Example:
def battle(p1_skill, p1_health, p2_skill, p2_health)
Unless your instructor has mandated that you use recursion, a simple loop structure will serve you much better. There's no need to check who won until one player or the other drops down to zero (or lower). There's also no need for an else to recursively call battle, the loop will iterate to the next round of the fight if both are still in the running, even if neither player took a hit.
while p1_health > 0 && p2_health > 0
# roll the dice and update health
end
# check who won and return that answer
hp_check really isn't needed, when you lose the recursive calls it becomes a one-liner if you perform the check after breaking out of the loop. Also, it would be more useful to return just the winner, so whoever gets that return value can decide whether they want to print it, use it to update a tally, both, or something else entirely. After you break out of the loop outlined above:
# determine which player won, since somebody's health dropped to 0 or less
p1_health > 0 ? 1 : 2
When you're incrementing or decrementing a quantity, don't do equality testing. p1_health <= 0 is much safer than p1_health == 0, because some day you or somebody else is going to start from an odd number while decrementing by 2's, or decrement by some other (random?) amount.
Generating a number uniformly between 2 and 12 is not the same as summing two 6-sided dice. There are 36 possible outcomes for the two dice. Only one of the 36 yields a 2, only one yields a 12, and at the other extreme, there are six ways to get a sum of 7. I created a little die-roll method which takes the number of dice as an argument:
def roll_dice(n)
n.times.inject(0) { |total| total + rand(1..6) }
end
so, for example, determining player 1's fight score becomes p1_fight = p1_skill + roll_dice(2).
After making these sorts of changes, tallying up the statistics is pretty straightforward:
n = 10000
number_of_p1_wins = 0
n.times { number_of_p1_wins += 1 if battle(10, 10, 10, 10) == 1 }
proportion = number_of_p1_wins.to_f / n
puts "p1 won #{"%5.2f" % (100.0 * proportion)}% of the time"
If you replace the constant 10's in the call to battle by getting user input or iterating over ranges, you can explore a rich set of other scenarios.

Algorithm for traveling through a sequence of digits

Does anybody know an efficient algorithm for traveling through a sequence of digits by looking for a certain combination, e.g.:
There is this given sequence and I want to find the index of a certain combination of 21??73 in e.g.
... 124321947362862188734738 ...
So I have a pattern 21??94 and need to find out where is the index of:
219473
218873
I assume that there is way to not touch every single digit.
EDIT:
"Lasse V. Karlsen" has brought up an important point that I did forget.
There is no overlapping allowed, e.g.
21217373215573
212173 is ok, then the next would be 215573
Seems like you are looking for the regular expression 21..73 - . stands for "any character"1
Next you just need iterate all matches of this regex.
Most high level languages already have a regex library built in that is simple and easy to use for such tasks.
Note that many regex libraries already take care of "no overlapping" for you, including java:
String s = "21217373215573";
Matcher m = Pattern.compile("21..73").matcher(s);
while (m.find()) System.out.println(m.group());
Will yield the required output of:
212173
215573
(1) This assumes your sequence is of digits in the first place, as your question implies.
Depending on what language you are using, you could use regular expressions of the sort 21\d{2}73 which will look for 21, followed by two digits which are in turn followed by 73. Languages such as C# allow you to get the index of the match, as shown here.
Alternatively, you could construct your own Final State Machine which could be something of the sort:
string input = ...
int index = 0
while(index < input.length - 5)
if(input[index] == 2) && (input[index + 1] == 1) && (input[index + 4] == 7) && (input[index + 5] == 3)
print(index);
index += 6;
else index++
Since you dont know where these combinations start and you are not looking just for the first one, there is no way to not touch each digit (maybe just last n-1 digits, where n is length of combination, because if there is less numbers, there is not enough space).
I just dont know better way then just read whole sequence, because you can have
... 84452121737338494684 ...
and then you have two combinations overlapping. If you are not looking for overlapping combinations, it's just easier version, but it is possibility in your example.
Some non-overlap algorithm pseudo-code:
start := -1; i := 0
for each digit in sequence
if sequence[digit] = combination[i]
if start = -1
start := digit
endif
i++
if i >= length(combination)
possibleCombinations.add(start)
start := -1
i := 0
endif
else
start := -1
endif
end
This should be O(n). Same complexity as looking for one value in unsorted array. If you are looking for overlapping combinations like in my example, then complexity is a bit higher and you have to check each possible start, which add one loop inside checking each found start value. Something that check if combination continue, then leave start value or discarding it when combination is broken. Then complexity will be something like O(n*length(combination)), because there cannot be more starts, then what is length of combination.

Trying to understand binary search

I'm trying to understand this code my pairing partner wrote. I dont understand why she used the until loop stating to loop until (finish - start) == 1. What exactly is she looping until?
def binary_search(object, array)
array.sort!
start = -1
finish = array.length
until (finish - start) == 1 do
median = start + ((finish - start) / 2)
# p start
# p finish
return median if object == array[median]
if object > array[median]
start = median
elsif object < array[median]
finish = median
end
end
-1
end
finish - start is the length of the window left to search (+ 1, for easier arithmetic); it starts of as the entire array and gets halved on every iteration, by setting either the start or the finish to the median.
When it reaches 1, there is nothing left to search, and the input object was not found.
Think about how kids play the "guess a number between 1 and 100" game. "Is it bigger than 50?" "No." You now know it's a number between 1 and 50. "Is it bigger than 25?" "Yes." You now know it's between 26 and 50. And so on...
It's the same with binary search. You check to see if the target is above or below the midrange. Whichever way the answer turns out, you've eliminated half of the possibilities and can focus on the remaining subset. Every time you repeat the process, you cut the range that's still under consideration in half. When the range gets down to size one, you've either found the target value or established it wasn't in the set.

Transforming set of arbitrary intervals into set of continuous intervals, where possible

I have a practical situation, where I need to minimize amount of data.
Let's say I'm given a set of intervals of normal numbers.
e.g. N1 = {(0,1],(1,2],(3,4]};
I would like to minimize this set to:
N2 = {(0,2],(3,4]};
So basically what I need is to combine multiple small intervals into continuous intervals, where it is possible.
Is there any clever/efficient algorithms for doings this? Because I would like to avoid inefficient for-each-ing.
*If this problem have some wide-known name, please name it in the comments.
This is a sweep-line algorithm.
Split the intervals into start and end points.
Sort the points.
Let count = 0.
Iterate through the points:
Whenever you encounter an end point:
Decrement the count.
If the count = 0, record this point.
Whenever you encounter a start point.
If the count = 0, record this point.
Increment the count.
As a technical note, when sorting, if both a start point and an end point have the same value, put the start point first, otherwise you may record that as a gap, as opposed to a continuous interval.
Example:
(0,1],(1,2],(3,4]
Split 0 start, 1 start, 1 end, 2 end, 3 start, 4 end
Count 1 2 1 0 1 0
Record (0 N/A N/A 2] (3 4]
Getting the recorded values gives us {(0,2], (3,4]}.

Resources