My question isn't language specific... I would probably implement this in C# or Python unless there is a specific feature of a language that helps me get what I am looking for.
Is there some sort of algorithm that anyone knows of that can help me determine if a list of numbers contains a repeating pattern?
Let's say I have a several lists of numbers...
[12, 4, 5, 7, 1, 2]
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 1, 1, 1, 1, 1]
[ 1, 2, 4, 12, 13, 1, 2, 4, 12, 13]
I need to detect if there is a repeating pattern in each list... For example, list 1 returns false, but and lists 2, 3, and 4 return true.
I was thinking maybe taking a count of each value that appears in the list and if val 1 == val 2 == val n... then that would do it. Any better ideas?
You want to look at the autocorrelation of the signal. Autocorrelation basically does a convolution of the signal with itself. When a you iteratively slide one signal across another, and there is a repeating pattern, the output will resonate strongly.
The second and fourth strings are periodic; I'm going to assume you're looking for an algorithm for detecting periodic strings. Most fast string matching algorithms need to find periods of strings in order to compute their shifting rules.
Knuth-Morris-Pratt's preprocessing, for instance, computes, for every prefix P[0..k] of the pattern P, the length SP[k] of the longest proper suffix P[s..k] of P[0..k] that exactly matches the prefix P[0..(k-s)]. If SP[k] < k/2, then P[0..k] is aperiodic; otherwise, it is a prefix of a string with period k - SP[k].
One option would be to look at compression algorithms, some of those rely on finding repeating patterns and replacing them with another symbol. In your case you simply need the part that identifies the pattern. You may find that it is similar to the method that you've described already though
Assuming that your "repeating pattern" is always repeated in full, like your sample data suggests, you could just think of your array as a bunch of repeating arrays of equal length. Meaning:
[1, 2, 3, 1, 2, 3, 1, 2, 3] is the same as [1, 2, 3] repeated three times.
This means that you could just check to see if every x value in the array is equal to each other. So:
array[0] == array[3] == array[6]
array[1] == array[4] == array[7]
array[2] == array[5] == array[8]
Since you don't know the length of the repeated pattern, you'd just have to try all possible lengths until you found a pattern or ran out of possible shorter arrays. I'm sure there are optimizations that can be added to the following, but it works (assuming I understand the question correctly, of course).
static void Main(string[] args)
{
int[] array1 = {12, 4, 5, 7, 1, 2};
int[] array2 = {1, 2, 3, 1, 2, 3, 1, 2, 3};
int[] array3 = {1, 1, 1, 1, 1, 1 };
int[] array4 = {1, 2, 4, 12, 13, 1, 2, 4, 12, 13 };
Console.WriteLine(splitMethod(array1));
Console.WriteLine(splitMethod(array2));
Console.WriteLine(splitMethod(array3));
Console.WriteLine(splitMethod(array4));
Console.ReadLine();
}
static bool splitMethod(int[] array)
{
for(int patternLength = 1; patternLength <= array.Length/2; patternLength++)
{
// if the pattern length doesn't divide the length of the array evenly,
// then we can't have a pattern of that length.
if(array.Length % patternLength != 0)
{
continue;
}
// To check if every x value is equal, we need to give a start index
// To begin our comparisons at.
// We'll start at index 0 and check it against 0+x, 0+x+x, 0+x+x+x, etc.
// Then we'll use index 1 and check it against 1+x, 1+x+x, 1+x+x+x, etc.
// Then... etc.
// If we find that every x value starting at a given start index aren't
// equal, then we'll continue to the next pattern length.
// We'll assume our patternLength will produce a pattern and let
// our test determines if we don't have a pattern.
bool foundPattern = true;
for (int startIndex = 0; startIndex < patternLength; startIndex++)
{
if (!everyXValueEqual(array, patternLength, startIndex))
{
foundPattern = false;
break;
}
}
if (foundPattern)
{
return true;
}
}
return false;
}
static bool everyXValueEqual(int[] array, int x, int startIndex)
{
// if the next index we want to compare against is outside the bounds of the array
// we've done all the matching we can for a pattern of length x.
if (startIndex+x > array.Length-1)
return true;
// if the value at starIndex equals the value at startIndex + x
// we can go on to test values at startIndex + x and startIndex + x + x
if (array[startIndex] == array[startIndex + x])
return everyXValueEqual(array, x, startIndex + x);
return false;
}
Simple pattern recognition is the task of compression algorithms. Depending on the type of input and the type of patterns you're looking for the algorithm of choice may be very different - just consider that any file is an array of bytes and there are many types of compression for various types of data. Lossless compression finds exact patterns that repeat and lossy compression - approximate patterns where the approximation is limited by some "real-world" consideration.
In your case you can apply a pseudo zip compression where you start filling up a list of encountered sequences
here's a pseudo suggestion:
//C#-based pseudo code
int[] input = GetInputData();
var encounters = new Dictionary<ItemCount<int[],int>>();// the string and the number of times it's found
int from = 0;
for(int to=0; to<input.Length; i++){
for (int j = from; j<=i; j++){ // for each substring between 'from' and 'i'
if (encounters.ContainsKey(input.SubArray(j,i)){
if (j==from) from++; // if the entire substring already exists - move the starting point
encounters[input.SubArray(j,i)] += 1; // increase the count where the substring already exists
} else {
// consider: if (MeetsSomeMinimumRequirements(input.SubArray(j,i))
encounters.Add(input.SubArray(j,i),1); //add a new pattern
}
}
}
Output(encounters.Where(itemValue => itemValue.Value>1); // show the patterns found more than once
I haven't debugged the sample above, so use it just as a starting point. The core idea is that you'd have an encounters list where various substrings are collected and counted, the most frequent will have highest Value in the end.
You can alter the algorithm above by storing some function of the substrings instead of the entire substring or add some minimum requirements such as minimum length etc. Too many options, complete discussion is not possible within a post.
Since you're looking for repeated patterns, you could force your array into a string and run a regular expression against it. This being my second answer, I'm just playing around here.
static Regex regex = new Regex(#"^(?<main>(?<v>;\d+)+?)(\k<main>)+$", RegexOptions.Compiled);
static bool regexMethod(int[] array)
{
string a = ";" + string.Join(";", array);
return regex.IsMatch(a);
}
The regular expression is
(?<v>;\d+) - A group named "v" which matches a semicolon (the delimiter in this case) and 1 or more digits
(?<main>(?<v>;\d+)+?) - a group named "main" which matches the "v" group 1 or more times, but the least number of times it can to satisfy the regex.
(\k<main>)+ - matches the text that the "main" group matched 1 or more times
^ ... $ - these anchor the ends of the pattern to the ends of the string.
Related
This is a problem about substrings that I created. I am wondering how to implement an O(nlog(n)) solution to this problem because the naive approach is pretty easy. Here is how it goes. You have a string S. S has many substrings. In some substrings, the first character and last character are there more than once. Find how many substrings where the first and last character are there more than once.
Input: "ABCDCBE"
Expected output: 2
Explanation: "BCDCB" and "CDC" are two such substrings
That test case explanation only has "BCDCB" and "CDC" where first and last char are same.
There can be another case aside from the sample case with "ABABCAC" being the substring where the first character "A" appears 3 times and the last character "C" appears twice. "AAAABB" is also another substring.
"AAAAB" does not satisfy.
What I have learned that is O(nlog(n)) that might or might not contribute to solution is Binary Indexed Trees. Binary Indexed Trees can somehow be used to solve this. There is also sorting and binary search, but first I want to focus especially on Binary Indexed Trees.
I am looking for a space complexity of O(n log(n)) or better.
Also Characters are in UTF-16
The gist of my solution is as follows:
Iterate over the input array, and, for each position, compute the amount of 'valid' substrings that end on that position. The sum of these values is the total amount of valid substrings. We achieve this by counting the amount of valid starts to a substring, that come before the current position, using a Binary Indexed Tree.
Now for the full detail:
As we iterate over the array we think of the current element as the end of a substring, and we say that the positions that are a valid start are those such that its value appears again between it, and the position we are currently iterating over. (i.e. if the value at the start of a substring appears at least twice in it)
For example:
current index V
data = [1, 2, 3, 4, 1, 4, 3, 2]
valid = [1, 0, 1, 1, 0, 0, 0, 0]
0 1 2 3 4 5 6 7
The first 1 (at index 0) is a valid start, because there is another 1 (at index 4) after it, but before the current index (index 6).
Now, counting the amount of valid starts that come before the current index gives us something pretty close to what we wanted, except that we may grab some substrings that don't have two appearances of the last value of the substring (i.e. the one we are currently iterating over)
For example:
current index V
data = [1, 2, 3, 4, 1, 4, 3, 2]
valid = [1, 0, 1, 1, 0, 0, 0, 0]
0 1 2 3 4 5 6 7
^--------^
Here, the 4 is marked as a valid start (because there is another 4 that comes after it), but the corresponding substring does not have two 3s.
To fix this, we shall only consider valid starts up to the previous appearance of the current value. (this means that the substring will contain both the current value, and its previous appearance, thus, the last element will be in the substring at least twice)
The pseudocode goes as follows:
fn solve(arr) {
answer := 0
for i from 1 to length(arr) {
previous_index := find_previous(arr, i)
if there is a previous_index {
arr[previous_index].is_valid_start = true
answer += count_valid_starts_up_to_and_including(arr, previous_index)
}
}
return answer
}
To implement these operations efficiently, we use a hash table for looking up the previous position of a value, and a Binary Indexed Tree (BIT) to keep track of and count the valid positions.
Thus, a more fleshed out pseudocode would look like
fn solve(arr) {
n := length(arr)
prev := hash_table{}
bit := bit_indexed_tree{length = n}
answer := 0
for i from 1 to length(arr) {
value := arr[i]
previous_index := prev[value]
if there is a previous_index {
bit.update(previous_index, 1)
answer += bit.query(previous_index)
}
prev[value] = i
}
return answer
}
Finally, since a pseudocode is not always enough, here is an implementation in C++, where the control flow is a bit munged, to ensure efficient usage of std::unordered_map (C++'s built-in hash table)
class Bit {
std::vector<int> m_data;
public:
// initialize BIT of size `n` with all 0s
Bit(int n);
// add `value` to index `i`
void update(int i, int value);
// sum from index 0 to index `i` (inclusive)
int query(int i);
};
long long solve (std::vector<int> const& arr) {
int const n = arr.size();
std::unordered_map<int, int> prev_index;
Bit bit(n);
long long answer = 0;
int i = 0;
for (int value : arr) {
auto insert_result = prev_index.insert({value, i});
if (!insert_result.second) { // there is a previous index
int j = insert_result.first->second;
bit.update(j, 1);
answer += bit.query(j);
insert_result.first->second = i;
}
++i;
}
return answer;
}
EDIT: For transparency, here is the Fenwick tree implementation i used to test this code
struct Bit {
std::vector<int> m_data;
Bit(int n) : m_data(n+2, 0) { }
int query(int i) {
int res = 0;
for(++i; i > 0; i -= i&-i) res += m_data[i];
return res;
}
void update(int i, int x) {
for(++i; i < m_data.size(); i += i&-i) m_data[i] += x;
}
};
I am trying to pick one number from multiple arraylists and find all possible ways to pick the numbers such that the sum of those numbers is greater than a given number. I can only think of brute force implementation.
For example, I have five arraylists such as
A = [2, 6, 7]
B = [6, 9]
C = [4]
D = [4, 7]
E = [8, 10, 15]
and a given number is 40.
Then after picking one number from each list, all possible ways could be
[7, 9, 4, 7, 15]
[6, 9, 4, 7, 15]
So, these are the two possible ways to pick numbers greater than or equal to 40. In case the given number is small then there could be many solutions. So how can I count them without brute force? Even with brute force how do I devise the solution in Java.
Below is my implementation. It works fine for small numbers but if the numbers are large then it gives me runtime error since the program runs for too long.
public static void numberOfWays(List<Integer> A, List<Integer> B, List<Integer> C, List<Integer> D,
List<Integer> E, int k){
long ways = 0;
for(Integer a:A){
for(Integer b:B){
for(Integer c:C){
for(Integer d:D){
for(Integer e:E){
int sum = a+b+c+d+e;
//System.out.println(a+" "+b+" "+c+" "+d+" "+e+" "+sum);
if(sum > k)
ways++;
}
}
}
}
}
System.out.println(ways);
}
The list can contain up to 1000 elements and the elements can range from 1 to 1000. The threshold value k can range from 1 to 10^9.
I am not a java programmer.But I think its a logical problem.So,I have solved it for you in python.I am pretty sure you can convert it into java.
Here is the code:
x = input('Enter the number:')
a = [2, 6, 7]
b = [6, 9]
c = [4]
d = [4, 7]
e = [8, 10, 15]
i = 0
z = 0
final_list = []
while i <= int(x):
try:
i += a[z]
final_list.append(a[z])
except BaseException:
pass
try:
i += b[z]
final_list.append(b[z])
except BaseException:
pass
try:
i += c[z]
final_list.append(c[z])
except BaseException:
pass
try:
i += d[z]
final_list.append(d[z])
except BaseException:
pass
try:
i += e[z]
final_list.append(e[z])
except BaseException:
pass
z += 1
print(final_list)
One way is this. There has to be at least one solution where you pick one number from each array and add them up to be greater than or equal to another.
Considering the fact that arrays might have random numbers in any order, first use this sort function to have them in decreasing order (largest number first, smallest number last) :
Arrays.sort(<array name>, Collections.reverseOrder());
Then pick the 1st element in the array :
v = A[0]
w = B[0]
x = C[0]
y = D[0]
z = E[0]
Then you can print them like this : v,w,x,y,z
Now your output will be :
7,9,4,7,15
Since it took the largest number of each array, it has to be equal to or greater than the given number, unless the number is greater than all of these combined in which case it is impossible.
Edit : I think I got the question wrong. If you want to know how many of the possible solutions there are, that is much easier.
First create a variable to store the possibilities
var total = 0
Use the rand function to get a random number. In your array say something like :
v=A[Math.random(0,A[].length)]
Do the same thing for all arrays, then add them up
var sum = v+w+x+y+z
Now you have an if statement to see if the sum is greater than or equal to the number given (lets say the value is stored in the variable "given")
if(sum >= given){
total+=1
}else{
<repeat the random function to restart the process and generate a new sum>
}
Finally, you need to repeat this multiple times as incase there are multiple solutions, the code will only find one and give you a false total.
To solve this, create a for loop and put all of this code inside it :
//create a variable outside to store the total number of elements in all the arrays
var elements = A[].length + B[].length + C[].length + D[].length + E[].length
for(var i = 0; i <= elements; i++){
<The code is inside here, except for "total" as otherwise the value will keep resetting>
}
The end result should look something like this :
var total = 0
var elements = A[].length + B[].length + C[].length + D[].length + E[].length
for(var i = 0; i <= elements; i++){
v=A[Math.random(0,A[].length)]
w=B[Math.random(0,B[].length)]
x=C[Math.random(0,C[].length)]
y=D[Math.random(0,D[].length)]
z=E[Math.random(0,E[].length)]
var sum = v+w+x+y+z
if(sum >= given){
total+=1
}else{
v=A[Math.random(0,A[].length)]
w=B[Math.random(0,B[].length)]
x=C[Math.random(0,C[].length)]
y=D[Math.random(0,D[].length)]
z=E[Math.random(0,E[].length)]
}
}
At the end just print the total once the entire cycle is over or just do
console.log(total)
This is just for reference and the code might not work, it probably has a bunch of bugs in it, this was just my 1st draft attempt at it. I have to test it out on my own but i hope you see where I'm coming from. Just look at the process, make your own amendments and this should work fine.
I have not deleted the first part of my answer even though it isn't the answer to this question just so that if you're having trouble in that part as well, where you select the highest possible number, it might help you
Good luck!
I only heard of this question, so I don't know the exact limits. You are given a list of positive integers. Each two consecutive values form a closed interval. Find the number that appears in most intervals. If two values appear the same amount of times, select the smallest one.
Example: [4, 1, 6, 5] results in [1, 4], [1, 6], [5, 6] with 1, 2, 3, 4, 5 each showing up twice. The correct answer would be 1 since it's the smallest.
I unfortunately have no idea how this can be done without going for an O(n^2) approach. The only optimisation I could think of was merging consecutive descending or ascending intervals, but this doesn't really work since [4, 3, 2] would count 3 twice.
Edit: Someone commented (but then deleted) a solution with this link http://www.zrzahid.com/maximum-number-of-overlapping-intervals/. I find this one the most elegant, even though it doesn't take into account the fact that some elements in my input would be both the beginning and end of some intervals.
Sort intervals based on their starting value. Then run a swipe line from left (the global smallest value) to the right (the global maximum value) value. At each meeting point (start or end of an interval) count the number of intersection with the swipe line (in O(log(n))). Time complexity of this algorithm would be O(n log(n)) (n is the number of intervals).
The major observation is that the result will be one of the numbers in the input (proof left to the reader as simple exercise, yada yada).
My solution will be inspired by #Prune's solution. The important step is mapping the input numbers to their order within all different numbers in the input.
I will work with C++ std. We can first load all the numbers into a set. We can then create map from that, which maps a number to its order within all numbers.
int solve(input) {
set<int> vals;
for (int n : input) {
vals.insert(n);
}
map<int, int> numberOrder;
int order = 0;
for (int n : vals) { // values in a set are ordered
numberOrder[n] = order++;
}
We then create process array (similar to #Prune's solution).
int process[map.size() + 1]; // adding past-the-end element
int curr = input[0];
for (int i = 0; i < input.size(); ++i) {
last = curr;
curr = input[i];
process[numberOrder[min(last, curr)]]++;
process[numberOrder[max(last, curr)] + 1]--;
}
int appear = 0;
int maxAppear = 0;
for (int i = 0; i < process.size(); ++i) {
appear += process[i];
if (appear > maxAppear) {
maxAppear = appear;
maxOrder = i;
}
}
Last, we need to find our found value in the map.
for (pair<int, int> a : numberOrder) {
if (a.second == maxOrder) {
return a.first;
}
}
}
This solution has O(n * log(n)) time complexity and O(n) space complexity, which is independent on maximum input number size (unlike other solutions).
If the maximum number in the range array is less than the maximum size limit of an array, my solution will work with complexity o(n).
1- I created a new array to process ranges and use it to find the
numbers that appears most in all intervals. For simplicity let's use
your example. the input = [1, 4], [1, 6], [5, 6]. let's call the new
array process and give it length 6 and it is initialized with 0s
process = [0,0,0,0,0,0].
2-Then loop through all the intervals and mark the start with (+1) and
the cell immediately after my range end with (-1)
for range [1,4] process = [1,0,0,0,-1,0]
for range [1,6] process = [2,0,0,0,-1,0]
for range [5,6] process = [2,0,0,0,0,0]
3- The p rocess array will work as accumulative array. initialize a
variable let's call it appear = process[0] which will be equal to 2
in our case. Go through process and keep accumulating what can you
notice? elements 1,2,3,4,5,6 will have appear =2 because each of
them appeared twice in the given ranges .
4- Maximize while you loop through process array you will find the
solution
public class Test {
public static void main(String[] args) {
int[] arr = new int[] { 4, 1, 6, 5 };
System.out.println(solve(arr));
}
public static int solve(int[] range) {
// I assume that the max number is Integer.MAX_VALUE
int size = (int) 1e8;
int[] process = new int[size];
// fill process array
for (int i = 0; i < range.length - 1; ++i) {
int start = Math.min(range[i], range[i + 1]);
int end = Math.max(range[i], range[i + 1]);
process[start]++;
if (end + 1 < size)
process[end + 1]--;
}
// Find the number that appears in most intervals (smallest one)
int appear = process[0];
int max = appear;
int solu = 0;
for (int i = 1; i < size; ++i) {
appear += process[i];
if (appear > max){
solu = i;
max = appear;
}
}
return solu;
}
}
Think of these as parentheses: ( to start and interval, ) to end. Now check the bounds for each pair [a, b], and tally interval start/end markers for each position: the lower number gets an interval start to the left; the larger number gets a close interval to the right. For the given input:
Process [4, 1]
result: [0, 1, 0, 0, 0, -1]
Process [1, 6]
result: [0, 2, 0, 0, 0, -1, 0, -1]
Process [6, 5]
result: [0, 2, 0, 0, 0, -1, 1, -2]
Now, merely make a cumulative sum of this list; the position of the largest value is your desired answer.
result: [0, 2, 0, 0, 0, -1, 1, -2]
cumsum: [0, 2, 2, 2, 2, 1, 2, 0]
Note that the final sum must be 0, and can never be negative. The largest value is 2, which appears first at position 1. Thus, 1 is the lowest integer that appears the maximum (2) quantity.
No that's one pass on the input, and one pass on the range of numbers. Note that with a simple table of values, you can save storage. The processing table would look something like:
[(1, 2)
(4, -1)
(5, 1)
(6, -2)]
If you have input with intervals both starting and stopping at a number, then you need to handle the starts first. For instance, [4, 3, 2] would look like
[(2, 1)
(3, 1)
(3, -1)
(4, -1)]
NOTE: maintaining a sorted insert list is O(n^2) time on the size of the input; sorting the list afterward is O(n log n). Either is O(n) space.
My first suggestion, indexing on the number itself, is O(n) time, but O(r) space on the range of input values.
[
I have the following code which implements a recursive solution for this problem, instead of using the reference variable 'x' to store overall max, How can I or can I return the result from recursion so I don't have to use the 'x' which would help memoization?
// Test Cases:
// Input: {1, 101, 2, 3, 100, 4, 5} Output: 106
// Input: {3, 4, 5, 10} Output: 22
int sum(vector<int> seq)
{
int x = INT32_MIN;
helper(seq, seq.size(), x);
return x;
}
int helper(vector<int>& seq, int n, int& x)
{
if (n == 1) return seq[0];
int maxTillNow = seq[0];
int res = INT32_MIN;
for (int i = 1; i < n; ++i)
{
res = helper(seq, i, x);
if (seq[i - 1] < seq[n - 1] && res + seq[n - 1] > maxTillNow) maxTillNow = res + seq[n - 1];
}
x = max(x, maxTillNow);
return maxTillNow;
}
First, I don't think this implementation is correct. For this input {5, 1, 2, 3, 4} it gives 14 while the correct result is 10.
For writing a recursive solution for this problem, you don't need to pass x as a parameter, as x is the result you expect to get from the function itself. Instead, you can construct a state as the following:
Current index: this is the index you're processing at the current step.
Last taken number: This is the value of the last number you included in your result subsequence so far. This is to make sure that you pick larger numbers in the following steps to keep the result subsequence increasing.
So your function definition is something like sum(current_index, last_taken_number) = the maximum increasing sum from current_index until the end, given that you have to pick elements greater than last_taken_number to keep it an increasing subsequence, where the answer that you desire is sum(0, a small value) since it calculates the result for the whole sequence. by a small value I mean smaller than any other value in the whole sequence.
sum(current_index, last_taken_number) could be calculated recursively using smaller substates. First assume the simple cases:
N = 0, result is 0 since you don't have a sequence at all.
N = 1, the sequence contains only one number, the result is either that number or 0 in case the number is negative (I'm considering an empty subsequence as a valid subsequence, so not taking any number is a valid answer).
Now to the tricky part, when N >= 2.
Assume that N = 2. In this case you have two options:
Either ignore the first number, then the problem can be reduced to the N=1 version where that number is the last one in the sequence. In this case the result is the same as sum(1,MIN_VAL), where current_index=1 since we already processed index=0 and decided to ignore it, and MIN_VAL is the small value we mentioned above
Take the first number. Assume the its value is X. Then the result is X + sum(1, X). That means the solution includes X since you decided to include it in the sequence, plus whatever the result is from sum(1,X). Note that we're calling sum with MIN_VAL=X since we decided to take X, so the following values that we pick have to be greater than X.
Both decisions are valid. The result is whatever the maximum of these two. So we can deduce the general recurrence as the following:
sum(current_index, MIN_VAL) = max(
sum(current_index + 1, MIN_VAL) // ignore,
seq[current_index] + sum(current_index + 1, seq[current_index]) // take
).
The second decision is not always valid, so you have to make sure that the current element > MIN_VAL in order to be valid to take it.
This is a pseudo code for the idea:
sum(current_index, MIN_VAL){
if(current_index == END_OF_SEQUENCE) return 0
if( state[current_index,MIN_VAL] was calculated before ) return the perviously calculated result
decision_1 = sum(current_index + 1, MIN_VAL) // ignore case
if(sequence[current_index] > MIN_VAL) // decision_2 is valid
decision_2 = sequence[current_index] + sum(current_index + 1, sequence[current_index]) // take case
else
decision_2 = INT_MIN
result = max(decision_1, decision_2)
memorize result for the state[current_index, MIN_VAL]
return result
}
Let's say I have an increasing sequence of integers: seq = [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4 ... ] not guaranteed to have exactly the same number of each integer but guaranteed to be increasing by 1.
Is there a function F that can operate on this sequence whereby F(seq, x) would give me all 1's when an integer in the sequence equals x and all other integers would be 0.
For example:
t = [1, 1, 1, 1, 2, 2, 3, 3, 3, 4]
F(t, 2) = [0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
EDIT: I probably should have made it more clear. Is there a solution where I can do some algebraic operations on the entire array to get the desired result, without iterating over it?
So, I'm wondering if I can do something like: F(t, x) = t op x ?
In Python (t is a numpy.array) it could be:
(t * -1) % x or something...
EDIT2: I found out that the identity function I(t[i] == x) is acceptable to use as an algebraic operation. Sorry, I did not know about identity functions.
There's a very simple solution to this that doesn't require most of the restrictions you place upon the domain. Just create a new array of the same size, loop through and test for equality between the element in the array and the value you want to compare against. When they're the same, set the corresponding element in the new array to 1. Otherwise, set it to 0. The actual implementation depends on the language you're working with, but should be fairly simple.
If we do take into account your domain, you can introduce a couple of optimisations. If you start with an array of zeroes, you only need to fill in the ones. You know you don't need to start checking until the (n - 1)th element, where n is the value you're comparing against, because there must be at least one of the numbers 1 to n in increasing order. If you don't have to start at 1, you can still start at (n - start). Similarly, if you haven't come across it at array[n - 1], you can jump n - array[n - 1] more elements. You can repeat this, skipping most of the elements, as much as you need to until you either hit the right value or the end of the list (if it's not in there at all).
After you finish dealing with the value you want, there's no need to check the rest of the array, as you know it'll always be increasing. So you can stop early too.
A simple method (with C# code) is to simply iterate over the sequence and test it, returning either 1 or 0.
foreach (int element in sequence)
if (element == myValue)
yield return 1;
else
yield return 0;
(Written using LINQ)
sequence.Select(elem => elem == myValue ? 1 : 0);
A dichotomy algorithm can quickly locate the range where t[x] = n making such a function of sub-linear complexity in time.
Are you asking for a readymade c++, java API or are you asking for an algorithm? Or is this homework question?
I see the simple algorithm for scanning the array from start to end and comparing with each. If equals then put as 1 else put as 0. Anyway to put the elements in the array you will have to access each element of the new array atleast one. So overall approach will be O(1).
You can certainly reduce the comparison by starting a binary search. Once you find the required number then simply go forward and backward searching for the same number.
Here is a java method which returns a new array.
public static int[] sequence(int[] seq, int number)
{
int[] newSequence = new int[seq.length];
for ( int index = 0; index < seq.length; index++ )
{
if ( seq[index] == number )
{
newSequence[index] = 1;
}
else
{
newSequence[index] = 0;
}
}
return newSequence;
}
I would initialize an array of zeroes, then do a binary search on the sequence to find the first element that fits your criteria, and only start setting 1's from there. As soon as you have a not equal condition, stop.
Here is a way to do it in O(log n)
>>> from bisect import bisect
>>> def f(t, n):
... i = bisect(t,n-1)
... j = bisect(t,n,lo=i) - i
... return [0]*i+[1]*j+[0]*(len(t)-j-i)
...
...
>>> t = [1, 1, 1, 1, 2, 2, 3, 3, 3, 4]
>>> print f(t, 2)
[0, 0, 0, 0, 1, 1, 0, 0, 0, 0]