Find all anagrams in a string O(n) solution - algorithm

Here is the problem:
Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.
Input: s: "cbaebabacd" p: "abc"
Output: [0, 6]
Input: s: "abab" p: "ab"
Output: [0, 1, 2]
Here is my solution
vector<int> findAnagrams(string s, string p) {
vector<int> res, s_map(26,0), p_map(26,0);
int s_len = s.size();
int p_len = p.size();
if (s_len < p_len) return res;
for (int i = 0; i < p_len; i++) {
++s_map[s[i] - 'a'];
++p_map[p[i] - 'a'];
}
if (s_map == p_map)
res.push_back(0);
for (int i = p_len; i < s_len; i++) {
++s_map[s[i] - 'a'];
--s_map[s[i - p_len] - 'a'];
if (s_map == p_map)
res.push_back(i - p_len + 1);
}
return res;
}
However, I think it is O(n^2) solution because I have to compare vectors s_map and p_map.
Does a O(n) solution exist for this problem?

lets say p has size n.
lets say you have an array A of size 26 that is filled with the number of a,b,c,... which p contains.
then you create a new array B of size 26 filled with 0.
lets call the given (big) string s.
first of all you initialize B with the number of a,b,c,... in the first n chars of s.
then you iterate through each word of size n in s always updating B to fit this n-sized word.
always B matches A you will have an index where we have an anagram.
to change B from one n-sized word to another, notice you just have to remove in B the first char of the previous word and add the new char of the next word.
Look at the example:
Input
s: "cbaebabacd"
p: "abc" n = 3 (size of p)
A = {1, 1, 1, 0, 0, 0, ... } // p contains just 1a, 1b and 1c.
B = {1, 1, 1, 0, 0, 0, ... } // initially, the first n-sized word contains this.
compare(A,B)
for i = n; i < size of s; i++ {
B[ s[i-n] ]--;
B[ s[ i ] ]++;
compare(A,B)
}
and suppose that compare(A,B) prints the index always A matches B.
the total complexity will be:
first fill of A = O(size of p)
first fill of B = O(size of s)
first comparison = O(26)
for-loop = |s| * (2 + O(26)) = |s| * O(28) = O(28|s|) = O(size of s)
____________________________________________________________________
2 * O(size of s) + O(size of p) + O(26)
which is linear in size of s.

Your solution is the O(n) solution. The size of the s_map and p_map vectors is a constant (26) that doesn't depend on n. So the comparison between s_map and p_map takes a constant amount of time regardless of how big n is.
Your solution takes about 26 * n integer comparisons to complete, which is O(n).

// In papers on string searching algorithms, the alphabet is often
// called Sigma, and it is often not considered a constant. Your
// algorthm works in (Sigma * n) time, where n is the length of the
// longer string. Below is an algorithm that works in O(n) time even
// when Sigma is too large to make an array of size Sigma, as long as
// values from Sigma are a constant number of "machine words".
// This solution works in O(n) time "with high probability", meaning
// that for all c > 2 the probability that the algorithm takes more
// than c*n time is 1-o(n^-c). This is a looser bound than O(n)
// worst-cast because it uses hash tables, which depend on randomness.
#include <functional>
#include <iostream>
#include <type_traits>
#include <vector>
#include <unordered_map>
#include <vector>
using namespace std;
// Finding a needle in a haystack. This works for any iterable type
// whose members can be stored as keys of an unordered_map.
template <typename T>
vector<size_t> AnagramLocations(const T& needle, const T& haystack) {
// Think of a contiguous region of an ordered container as
// representing a function f with the domain being the type of item
// stored in the container and the codomain being the natural
// numbers. We say that f(x) = n when there are n x's in the
// contiguous region.
//
// Then two contiguous regions are anagrams when they have the same
// function. We can track how close they are to being anagrams by
// subtracting one function from the other, pointwise. When that
// difference is uniformly 0, then the regions are anagrams.
unordered_map<remove_const_t<remove_reference_t<decltype(*needle.begin())>>,
intmax_t> difference;
// As we iterate through the haystack, we track the lead (part
// closest to the end) and lag (part closest to the beginning) of a
// contiguous region in the haystack. When we move the region
// forward by one, one part of the function f is increased by +1 and
// one part is decreased by -1, so the same is true of difference.
auto lag = haystack.begin(), lead = haystack.begin();
// To compare difference to the uniformly-zero function in O(1)
// time, we make sure it does not contain any points that map to
// 0. The the property of being uniformly zero is the same as the
// property of having an empty difference.
const auto find = [&](const auto& x) {
difference[x]++;
if (0 == difference[x]) difference.erase(x);
};
const auto lose = [&](const auto& x) {
difference[x]--;
if (0 == difference[x]) difference.erase(x);
};
vector<size_t> result;
// First we initialize the difference with the first needle.size()
// items from both needle and haystack.
for (const auto& x : needle) {
lose(x);
find(*lead);
++lead;
if (lead == haystack.end()) return result;
}
size_t i = 0;
if (difference.empty()) result.push_back(i++);
// Now we iterate through the haystack with lead, lag, and i (the
// position of lag) updating difference in O(1) time at each spot.
for (; lead != haystack.end(); ++lead, ++lag, ++i) {
find(*lead);
lose(*lag);
if (difference.empty()) result.push_back(i);
}
return result;
}
int main() {
string needle, haystack;
cin >> needle >> haystack;
const auto result = AnagramLocations(needle, haystack);
for (auto x : result) cout << x << ' ';
}

import java.util.*;
public class FindAllAnagramsInAString_438{
public static void main(String[] args){
String s="abab";
String p="ab";
// String s="cbaebabacd";
// String p="abc";
System.out.println(findAnagrams(s,p));
}
public static List<Integer> findAnagrams(String s, String p) {
int i=0;
int j=p.length();
List<Integer> list=new ArrayList<>();
while(j<=s.length()){
//System.out.println("Substring >>"+s.substring(i,j));
if(isAnamgram(s.substring(i,j),p)){
list.add(i);
}
i++;
j++;
}
return list;
}
public static boolean isAnamgram(String s,String p){
HashMap<Character,Integer> map=new HashMap<>();
if(s.length()!=p.length()) return false;
for(int i=0;i<s.length();i++){
char chs=s.charAt(i);
char chp=p.charAt(i);
map.put(chs,map.getOrDefault(chs,0)+1);
map.put(chp,map.getOrDefault(chp,0)-1);
}
for(int val:map.values()){
if(val!=0) return false;
}
return true;
}
}

Related

String permutation with duplicate characters

I have string "0011" and want all of the combinations without duplicate.
that's means I want a string with a combination of two '0' and two '1';
for example : [0011,0101,0110,1001,1010,1100]
I tried with this and the result is exactly what i need.
private void permutation(String result, String str, HashSet hashset) {
if (str.length()==0 && !hashSet.contains(result)){
System.out.println(result);
hashSet.add(result);
return;
}
IntStream.range(0,str.length()).forEach(pos->permutation(result+ str.charAt(pos), str.substring(0, pos) + str.substring(pos+1),hashset));
}
if i remove HashSet, this code will produce 24 results instead of 6 results.
but the time complexity of this code is O(n!).
how to avoid it to create a duplicate string and reduce the time complexity?
Probably something like this can be faster than n! even on small n
The idea is to count how many bits we need should be in resulting item and
iterate through all posible values and filter only those than have same number of bits. It will work similar amount of time with only one 1 and for 50%/50% of 0 and 1
function bitCount(n) {
n = n - ((n >> 1) & 0x55555555)
n = (n & 0x33333333) + ((n >> 2) & 0x33333333)
return ((n + (n >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
}
function perm(inp) {
const bitString = 2;
const len = inp.length;
const target = bitCount(parseInt(inp, bitString));
const min = (Math.pow(target, bitString) - 1);
const max = min << (len - target);
const result = [];
for (let i = min; i < max + 1; i++) {
if (bitCount(i) === target) {
result.push(i.toString(bitString).padStart(len, '0'));
}
}
return result;
}
const inp = '0011';
const res = perm(inp);
console.log('result',res);
P.s. My first idea was probably faster than upper code. But upper is easier to implement
first idea was to convert string to int
and use bitwise left shift but only for one digit every time. it still depends on n. and can be larger or smaller than upper solution. but bitwise shift is faster itself.
example
const input = '0011'
const len = input.length;
step1: calc number of bits = 2;
then generate first element = 3(Dec) is = '0011' in bin
step2 move last from the right bit one position left with << operator: '0101'
step3 move again: '1001'
step4: we are reached `len` so use next bit:100'1' : '1010'
step5: repeat:'1100'
step6: move initial 3 << 1: '0110'
repeat above steps: '1010'
step8: '1100'
it will generate duplicates so probably can be improved
Hope it helps
The worst case time complexity cannot be improved because there can be no duplicates in a string. However, in case of a multi-set, we could prune a lot of sub-trees to prevent duplicates.
The key idea is to permute the string using traditional backtracking algorithm but prevent swapping if the character has been previously swapped to prevent duplicates.
Here is a C++ code snippet that prevents duplicates and doesn't use any memory for lookup.
bool shouldSwap(const string& str, size_t start, size_t index) {
for (auto i = start; i < index; ++i) {
if (str[i] == str[index])
return false;
}
return true;
}
void permute(string& str, size_t index)
{
if (index >= str.size()) {
cout << str << endl;;
return;
}
for (size_t i = index; i < str.size(); ++i) {
if(shouldSwap(str, index, i)) {
swap(str[index], str[i]);
permute(str, index + 1);
swap(str[index], str[i]);
}
}
}
Running demo. Also refer to SO answer here and Distinct permutations for more references.
Also, note that the time complexity of this solution is O(n2 n!)
O(n) for printing a string
O(n) for iterating over the string to generate swaps and recurrence.
O(n!) possible states for the number of permutations.

Points and segments

I'm doing online course and got stuck at this problem.
The first line contains two non-negative integers 1 ≤ n, m ≤ 50000 — the number of segments and points on a line, respectively. The next n lines contain two integers a_i ≤ b_i defining the i-th segment. The next line contain m integers defining points. All the integers are of absolute value at most 10^8. For each segment, output the number of points it is used from the n-points table.
My solution is :
for point in points:
occurrence = 0
for l, r in segments:
if l <= point <= r:
occurrence += 1
print(occurrence),
The complexity of this algorithm is O(m*n), which is obviously not very efficient. What is the best way of solving this problem? Any help will be appreciated!
Sample Input:
2 3
0 5
7 10
1 6 11
Sample Output:
1 0 0
Sample Input 2:
1 3
-10 10
-100 100 0
Sample Output 2:
0 0 1
You can use sweep line algorithm to solve this problem.
First, break each segment into two points, open and close points.
Add all these points together with those m points, and sort them based on their locations.
Iterating through the list of points, maintaining a counter, every time you encounter an open point, increase the counter, and if you encounter an end point, decrease it. If you encounter a point in list m point, the result for this point is the value of counter at this moment.
For example 2, we have:
1 3
-10 10
-100 100 0
After sorting, what we have is:
-100 -10 0 10 100
At point -100, we have `counter = 0`
At point -10, this is open point, we increase `counter = 1`
At point 0, so result is 1
At point 10, this is close point, we decrease `counter = 0`
At point 100, result is 0
So, result for point -100 is 0, point 100 is 0 and point 0 is 1 as expected.
Time complexity is O((n + m) log (n + m)).
[Original answer] by how many segments is each point used
I am not sure I got the problem correctly but looks like simple example of Histogram use ...
create counter array (one item per point)
set it to zero
process the last line incrementing each used point counter O(m)
write the answer by reading histogram O(n)
So the result should be O(m+n) something like (C++):
const int n=2,m=3;
const int p[n][2]={ {0,5},{7,10} };
const int s[m]={1,6,11};
int i,cnt[n];
for (i=0;i<n;i++) cnt[i]=0;
for (i=0;i<m;i++) if ((s[i]>=0)&&(s[i]<n)) cnt[s[i]]++;
for (i=0;i<n;i++) cout << cnt[i] << " "; // result: 0 1
But as you can see the p[] coordinates are never used so either I missed something in your problem description or you missing something or it is there just to trick solvers ...
[edit1] after clearing the inconsistencies in OP the result is a bit different
By how many points is each segment used:
create counter array (one item per segment)
set it to zero
process the last line incrementing each used point counter O(m)
write the answer by reading histogram O(m)
So the result is O(m) something like (C++):
const int n=2,m=3;
const int p[n][2]={ {0,5},{7,10} };
const int s[m]={1,6,11};
int i,cnt[m];
for (i=0;i<m;i++) cnt[i]=0;
for (i=0;i<m;i++) if ((s[i]>=0)&&(s[i]<n)) cnt[i]++;
for (i=0;i<m;i++) cout << cnt[i] << " "; // result: 1,0,0
[Notes]
After added new sample set to OP it is clear now that:
indexes starts from 0
the problem is how many points from table p[n] are really used by each segment (m numbers in output)
Use Binary Search.
Sort the line segments according to 1st value and the second value. If you use c++, you can use custom sort like this:
sort(a,a+n,fun); //a is your array of pair<int,int>, coordinates representing line
bool fun(pair<int,int> a, pair<int,int> b){
if(a.first<b.first)
return true;
if(a.first>b.first)
return false;
return a.second < b.second;
}
Then, for every point, find the 1st line that captures the point and the first line that does not (after the line that does of course). If no line captures the point, you can return -1 or something (and not check for the point that does not).
Something like:
int checkFirstHold(pair<int,int> a[], int p,int min, int max){ //p is the point
while(min < max){
int mid = (min + max)/2;
if(a[mid].first <= p && a[mid].second>=p && a[mid-1].first<p && a[mid-1].second<p) //ie, p is in line a[mid] and not in line a[mid-1]
return mid;
if(a[mid].first <= p && a[mid].second>=p && a[mid-1].first<=p && a[mid-1].second>=p) //ie, p is both in line a[mid] and not in line a[mid-1]
max = mid-1;
if(a[mid].first < p && a[mid].second<p ) //ie, p is not in line a[mid]
min = mid + 1;
}
return -1; //implying no point holds the line
}
Similarly, write a checkLastHold function.
Then, find checkLastHold - checkFirstHold for every point, which is the answer.
The complexity of this solution will be O(n log m), as it takes (log m) for every calculation.
Here is my counter-based solution in Java.
Note that all points, segment start and segment end are read into one array.
If points of different PointType have the same x-coordinate, then the point is sorted after segment start and before segment end. This is done to count the point as "in" the segment if it coincides with both the segment start (counter already increased) and the segment end (counter not yet decreased).
For storing an answer in the same order as the points from the input, I create the array result of size pointsCount (only points counted, not the segments) and set its element with index SuperPoint.index, which stores the position of the point in the original input.
import java.util.Arrays;
import java.util.Scanner;
public final class PointsAndSegmentsSolution {
enum PointType { // in order of sort, so that the point will be counted on both segment start and end coordinates
SEGMENT_START,
POINT,
SEGMENT_END,
}
static class SuperPoint {
final PointType type;
final int x;
final int index; // -1 (actually does not matter) for segments, index for points
public SuperPoint(final PointType type, final int x) {
this(type, x, -1);
}
public SuperPoint(final PointType type, final int x, final int index) {
this.type = type;
this.x = x;
this.index = index;
}
}
private static int[] countSegments(final SuperPoint[] allPoints, final int pointsCount) {
Arrays.sort(allPoints, (o1, o2) -> {
if (o1.x < o2.x)
return -1;
if (o1.x > o2.x)
return 1;
return Integer.compare( o1.type.ordinal(), o2.type.ordinal() ); // points with the same X coordinate by order in PointType enum
});
final int[] result = new int[pointsCount];
int counter = 0;
for (final SuperPoint superPoint : allPoints) {
switch (superPoint.type) {
case SEGMENT_START:
counter++;
break;
case SEGMENT_END:
counter--;
break;
case POINT:
result[superPoint.index] = counter;
break;
default:
throw new IllegalArgumentException( String.format("Unknown SuperPoint type: %s", superPoint.type) );
}
}
return result;
}
public static void main(final String[] args) {
final Scanner scanner = new Scanner(System.in);
final int segmentsCount = scanner.nextInt();
final int pointsCount = scanner.nextInt();
final SuperPoint[] allPoints = new SuperPoint[(segmentsCount * 2) + pointsCount];
int allPointsIndex = 0;
for (int i = 0; i < segmentsCount; i++) {
final int start = scanner.nextInt();
final int end = scanner.nextInt();
allPoints[allPointsIndex] = new SuperPoint(PointType.SEGMENT_START, start);
allPointsIndex++;
allPoints[allPointsIndex] = new SuperPoint(PointType.SEGMENT_END, end);
allPointsIndex++;
}
for (int i = 0; i < pointsCount; i++) {
final int x = scanner.nextInt();
allPoints[allPointsIndex] = new SuperPoint(PointType.POINT, x, i);
allPointsIndex++;
}
final int[] pointsSegmentsCounts = countSegments(allPoints, pointsCount);
for (final int count : pointsSegmentsCounts) {
System.out.print(count + " ");
}
}
}

Parallel radix sort with virtual memory and write-combining

I'm attempting to implement the variant of parallel radix sort described in http://arxiv.org/pdf/1008.2849v2.pdf (Algorithm 2), but my C++ implementation (for 4 digits in base 10) contains a bug that I'm unable to locate.
For debugging purposes I'm using no parallelism, but the code should still sort correctly.
For instance the line arr.at(i) = item accesses indices outside its bounds in the following
std::vector<int> v = {4612, 4598};
radix_sort2(v);
My implementation is as follows
#include <set>
#include <array>
#include <vector>
void radix_sort2(std::vector<int>& arr) {
std::array<std::set<int>, 10> buckets3;
for (const int item : arr) {
int d = item / 1000;
buckets3.at(d).insert(item);
}
//Prefix sum
std::array<int, 10> outputIndices;
outputIndices.at(0) = 0;
for (int i = 1; i < 10; ++i) {
outputIndices.at(i) = outputIndices.at(i - 1) +
buckets3.at(i - 1).size();
}
for (const auto& bucket3 : buckets3) {
std::array<std::set<int>, 10> buckets0, buckets1;
std::array<int, 10> histogram2 = {};
for (const int item : bucket3) {
int d = item % 10;
buckets0.at(d).insert(item);
}
for (const auto& bucket0 : buckets0) {
for (const int item : bucket0) {
int d = (item / 10) % 10;
buckets1.at(d).insert(item);
int d2 = (item / 100) % 10;
++histogram2.at(d2);
}
}
for (const auto& bucket1 : buckets1) {
for (const int item : bucket1) {
int d = (item / 100) % 10;
int i = outputIndices.at(d) + histogram2.at(d);
++histogram2.at(d);
arr.at(i) = item;
}
}
}
}
Can anyone spot my mistake?
I took at look at the paper you linked. You haven't made any mistakes, none that I can see. In fact, in my estimation, you corrected a mistake in the algorithm.
I wrote out the algorithm and ended up with the exact same problem as you did. After reviewing Algorithm 2, either I woefully mis-understand how it is supposed to work, or it is flawed. There are at least a couple of problems with the algorithm, specifically revolving around outputIndices, and histogram2.
Looking at the algorithm, the final index of an item is determined by the counting sort stored in outputIndices. (lets ignore the histogram for now).
If you had an inital array of numbers {0100, 0103, 0102, 0101} The prefix sum of that would be 4.
The algorithm makes no indication I can determine to lag the result by 1. That being said, in order for the algorithm to work the way they intend, it does have to be lagged, so, moving on.
Now, the prefix sums are 0, 4, 4.... The algorithm doesn't use the MSD as the index into the outputIndices array, it uses "MSD - 1"; So taking 1 as the index into the array, the starting index for the first item without the histogram is 4! Outside the array on the first try.
The outputIndices is built with the MSD, it makes sense for it to be accessed by MSD.
Further, even if you tweak the algorithm to correctly to use the MSD into the outputIndices, it still won't sort correctly. With your initial inputs (swapped) {4598, 4612}, they will stay in that order. They are sorted (locally) as if they are 2 digit numbers. If you increase it to have other numbers not starting with 4, they will be globally, sorted, but the local sort is never finished.
According to the paper the goal is to use the histogram to do that, but I don't see that happening.
Ultimately, I'm assuming, what you want is an algorithm that works the way described. I've modified the algorithm, keeping with the overall stated goal of the paper of using the MSD to do a global sort, and the rest of the digits by reverse LSD.
I don't think these changes should have any impact on your desire to parallel-ize the function.
void radix_sort2(std::vector<int>& arr)
{
std::array<std::vector<int>, 10> buckets3;
for (const int item : arr)
{
int d = item / 1000;
buckets3.at(d).push_back(item);
}
//Prefix sum
std::array<int, 10> outputIndices;
outputIndices.at(0) = 0;
for (int i = 1; i < 10; ++i)
{
outputIndices.at(i) = outputIndices.at(i - 1) + buckets3.at(i - 1).size();
}
for (const auto& bucket3 : buckets3)
{
if (bucket3.size() <= 0)
continue;
std::array<std::vector<int>, 10> buckets0, buckets1, buckets2;
for (const int item : bucket3)
buckets0.at(item % 10).push_back(item);
for (const auto& bucket0 : buckets0)
for (const int item : bucket0)
buckets1.at((item / 10) % 10).push_back(item);
for (const auto& bucket1 : buckets1)
for (const int item : bucket1)
buckets2.at((item / 100) % 10).push_back(item);
int count = 0;
for (const auto& bucket2 : buckets2)
{
for (const int item : bucket2)
{
int d = (item / 1000) % 10;
int i = outputIndices.at(d) + count;
++count;
arr.at(i) = item;
}
}
}
}
For extensiblility, it would probably make sense to create a helper function that does the local sorting. You should be able to extend it to handle any number of digit numbers that way.

Finding union length of many line segments

I have few bolded line segments on x-axis in form of their beginning and ending x-coordinates. Some line segments may be overlapping. How to find the union length of all the line segments.
Example, a line segment is 5,0 to 8,0 and other is 9,0 to 12,0. Both are non overlapping, so sum of length is 3 + 3 = 6.
a line segment is 5,0 to 8,0 and other is 7,0 to 12,0. But they are overlapping for range, 7,0 to 8,0. So union of length is 7.
But the x- coordinates may be floating points.
Represent a line segment as 2 EndPoint object. Each EndPoint object has the form <coordinate, isStartEndPoint>. Put all EndPoint objects of all the line segments together in a list endPointList.
The algorithm:
Sort endPointList, first by coordinate in ascending order, then place the start end points in front of the tail end points (regardless of which segment, since it doesn't matter - all at the same coordinate).
Loop through the sorted list according to this pseudocode:
prevCoordinate = -Inf
numSegment = 0
unionLength = 0
for (endPoint in endPointList):
if (numSegment > 0):
unionLength += endPoint.coordinate - prevCoordinate
prevCoordinate = endPoint.coordinate
if (endPoint.isStartCoordinate):
numSegment = numSegment + 1
else:
numSegment = numSegment - 1
The numSegment variable will tell whether we are in a segment or not. When it is larger than 0, we are inside some segment, so we can include the distance to the previous end point. If it is 0, it means that the part before the current end point doesn't contain any segment.
The complexity is dominated by the sorting part, since comparison-based sorting algorithm has lower bound of Omega(n log n), while the loop is clearly O(n) at best. So the complexity of the algorithm can be said to be O(n log n) if you choose an O(n log n) comparison-based sorting algorithm.
Use a range tree. A range tree is n log(n), just like the sorted begin/end points, but it has the additional advantage that overlapping ranges will reduce the number of elements (but maybe increase the cost of insertion) Snippet (untested)
struct segment {
struct segment *ll, *rr;
float lo, hi;
};
struct segment * newsegment(float lo, float hi) {
struct segment * ret;
ret = malloc (sizeof *ret);
ret->lo = lo; ret->hi = hi;
ret->ll= ret->rr = NULL;
return ret;
}
struct segment * insert_range(struct segment *root, float lo, float hi)
{
if (!root) return newsegment(lo, hi);
/* non-overlapping(or touching) ranges can be put into the {l,r} subtrees} */
if (hi < root->lo) {
root->ll = insert_range(root->ll, lo, hi);
return root;
}
if (lo > root->hi) {
root->rr = insert_range(root->rr, lo, hi);
return root;
}
/* when we get here, we must have overlap; we can extend the current node
** we also need to check if the broader range overlaps the child nodes
*/
if (lo < root->lo ) {
root->lo = lo;
while (root->ll && root->ll->hi >= root->lo) {
struct segment *tmp;
tmp = root->ll;
root->lo = tmp->lo;
root->ll = tmp->ll;
tmp->ll = NULL;
// freetree(tmp);
}
}
if (hi > root->hi ) {
root->hi = hi;
while (root->rr && root->rr->lo <= root->hi) {
struct segment *tmp;
tmp = root->rr;
root->hi = tmp->hi;
root->rr = tmp->rr;
tmp->rr = NULL;
// freetree(tmp);
}
}
return root;
}
float total_width(struct segment *ptr)
{
float ret;
if (!ptr) return 0.0;
ret = ptr->hi - ptr->lo;
ret += total_width(ptr->ll);
ret += total_width(ptr->rr);
return ret;
}
Here is a solution I just wrote in Haskell and below it is an example of how it can be implemented in the interpreter command prompt. The segments must be presented in the form of a list of tuples [(a,a)]. I hope you can get a sense of the algorithm from the code.
import Data.List
unionSegments segments =
let (x:xs) = sort segments
one_segment = snd x - fst x
in if xs /= []
then if snd x > fst (head xs)
then one_segment - (snd x - fst (head xs)) + unionSegments xs
else one_segment + unionSegments xs
else one_segment
*Main> :load "unionSegments.hs"
[1 of 1] Compiling Main ( unionSegments.hs, interpreted )
Ok, modules loaded: Main.
*Main> unionSegments [(5,8), (7,12)]
7
Java implementation
import java.util.*;
public class HelloWorld{
static void unionLength(int a[][],int sets)
{
TreeMap<Integer,Boolean> t=new TreeMap<>();
for(int i=0;i<sets;i++)
{
t.put(a[i][0],false);
t.put(a[i][1],true);
}
int count=0;
int res=0;
int one=1;
Set set = t.entrySet();
Iterator it = set.iterator();
int prev=0;
while(it.hasNext()) {
if(one==1){
Map.Entry me = (Map.Entry)it.next();
one=0;
prev=(int)me.getKey();
if((boolean)me.getValue()==false)
count++;
else
count--;
}
Map.Entry me = (Map.Entry)it.next();
if(count>0)
res=res+((int)me.getKey()-prev);
if((boolean)me.getValue()==false)
count++;
else
count--;
prev=(int)me.getKey();
}
System.out.println(res);
}
public static void main(String []args){
int a[][]={{0, 4}, {3, 6},{8,10}};
int b[][]={{5, 10}, {8, 12}};
unionLength(a,3);
unionLength(b,2);
}
}

Reorder a string by half the character

This is an interview question.
Given a string such as: 123456abcdef consisting of n/2 integers followed by n/2 characters. Reorder the string to contain as 1a2b3c4d5e6f . The algortithm should be in-place.
The solution I gave was trivial - O(n^2). Just shift the characters by n/2 places to the left.
I tried using recursion as -
a. Swap later half of the first half with the previous half of the 2nd part - eg
123 456 abc def
123 abc 456 def
b. Recurse on the two halves.
The pbm I am stuck is that the swapping varies with the number of elements - for eg.
What to do next?
123 abc
12ab 3c
And what to do for : 12345 abcde
123abc 45ab
This is a pretty old question and may be a duplicate. Please let me know.. :)
Another example:
Input: 38726zfgsa
Output: 3z8f7g2s6a
Here's how I would approach the problem:
1) Divide the string into two partitions, number part and letter part
2) Divide each of those partitions into two more (equal sized)
3) Swap the second the third partition (inner number and inner letter)
4) Recurse on the original two partitions (with their newly swapped bits)
5) Stop when the partition has a size of 2
For example:
123456abcdef -> 123456 abcdef -> 123 456 abc def -> 123 abc 456 def
123abc -> 123 abc -> 12 3 ab c -> 12 ab 3 c
12 ab -> 1 2 a b -> 1 a 2 b
... etc
And the same for the other half of the recursion..
All can be done in place with the only gotcha being swapping partitions that aren't the same size (but it'll be off by one, so not difficult to handle).
It is easy to permute an array in place by chasing elements round cycles if you have a bit-map to mark which elements have been moved. We don't have a separate bit-map, but IF your characters are letters (or at least have the high order bit clear) then we can use the top bit of each character to mark this. This produces the following program, which is not recursive and so does not use stack space.
class XX
{
/** new position given old position */
static int newFromOld(int x, int n)
{
if (x < n / 2)
{
return x * 2;
}
return (x - n / 2) * 2 + 1;
}
private static int HIGH_ORDER_BIT = 1 << 15; // 16-bit chars
public static void main(String[] s)
{
// input data - create an array so we can modify
// characters in place
char[] x = s[0].toCharArray();
if ((x.length & 1) != 0)
{
System.err.println("Only works with even length strings");
return;
}
// Character we have read but not yet written, if any
char holding = 0;
// where character in hand was read from
int holdingPos = 0;
// whether picked up a character in our hand
boolean isHolding = false;
int rpos = 0;
while (rpos < x.length)
{ // Here => moved out everything up to rpos
// and put in place with top bit set to mark new occupant
if (!isHolding)
{ // advance read pointer to read new character
char here = x[rpos];
holdingPos = rpos++;
if ((here & HIGH_ORDER_BIT) != 0)
{
// already dealt with
continue;
}
int targetPos = newFromOld(holdingPos, x.length);
// pick up char at target position
holding = x[targetPos];
// place new character, and mark as new
x[targetPos] = (char)(here | HIGH_ORDER_BIT);
// Now holding a character that needs to be put in its
// correct place
isHolding = true;
holdingPos = targetPos;
}
int targetPos = newFromOld(holdingPos, x.length);
char here = x[targetPos];
if ((here & HIGH_ORDER_BIT) != 0)
{ // back to where we picked up a character to hold
isHolding = false;
continue;
}
x[targetPos] = (char)(holding | HIGH_ORDER_BIT);
holding = here;
holdingPos = targetPos;
}
for (int i = 0; i < x.length; i++)
{
x[i] ^= HIGH_ORDER_BIT;
}
System.out.println("Result is " + new String(x));
}
}
These days, if I asked someone that question, what I'm looking for them to write on the whiteboard first is:
assertEquals("1a2b3c4d5e6f",funnySort("123456abcdef"));
...
and then maybe ask for more examples.
(And then, depending, if the task is to interleave numbers & letters, I think you can do it with two walking-pointers, indexLetter and indexDigit, and advance them across swapping as needed til you reach the end.)
In your recursive solution why don't you just make a test if n/2 % 2 == 0 (n%4 ==0 ) and treat the 2 situations differently
As templatetypedef commented your recursion cannot be in-place.
But here is a solution (not in place) using the way you wanted to make your recursion :
def f(s):
n=len(s)
if n==2: #initialisation
return s
elif n%4 == 0 : #if n%4 == 0 it's easy
return f(s[:n/4]+s[n/2:3*n/4])+f(s[n/4:n/2]+s[3*n/4:])
else: #otherwise, n-2 %4 == 0
return s[0]+s[n/2]+f(s[1:n/2]+s[n/2+1:])
Here we go. Recursive, cuts it in half each time, and in-place. Uses the approach outlined by #Chris Mennie. Getting the splitting right was tricky. A lot longer than Python, innit?
/* In-place, divide-and-conquer, recursive riffle-shuffle of strings;
* even length only. No wide characters or Unicode; old school. */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void testrif(const char *s);
void riffle(char *s);
void rif_recur(char *s, size_t len);
void swap(char *s, size_t midpt, size_t len);
void flip(char *s, size_t len);
void if_odd_quit(const char *s);
int main(void)
{
testrif("");
testrif("a1");
testrif("ab12");
testrif("abc123");
testrif("abcd1234");
testrif("abcde12345");
testrif("abcdef123456");
return 0;
}
void testrif(const char *s)
{
char mutable[20];
strcpy(mutable, s);
printf("'%s'\n", mutable);
riffle(mutable);
printf("'%s'\n\n", mutable);
}
void riffle(char *s)
{
if_odd_quit(s);
rif_recur(s, strlen(s));
}
void rif_recur(char *s, size_t len)
{
/* Turn, e.g., "abcde12345" into "abc123de45", then recurse. */
size_t pivot = len / 2;
size_t half = (pivot + 1) / 2;
size_t twice = half * 2;
if (len < 4)
return;
swap(s + half, pivot - half, pivot);
rif_recur(s, twice);
rif_recur(s + twice, len - twice);
}
void swap(char *s, size_t midpt, size_t len)
{
/* Swap s[0..midpt] with s[midpt..len], in place. Algorithm from
* Programming Pearls, Chapter 2. */
flip(s, midpt);
flip(s + midpt, len - midpt);
flip(s, len);
}
void flip(char *s, size_t len)
{
/* Reverse order of characters in s, in place. */
char *p, *q, tmp;
if (len < 2)
return;
for (p = s, q = s + len - 1; p < q; p++, q--) {
tmp = *p;
*p = *q;
*q = tmp;
}
}
void if_odd_quit(const char *s)
{
if (strlen(s) % 2) {
fputs("String length is odd; aborting.\n", stderr);
exit(1);
}
}
By comparing 123456abcdef and 1a2b3c4d5e6f we can note that only the first and the last characters are in their correct position. We can also note that for each remaining n-2 characters we can compute their correct position directly from their original position. They will get there, and the element that was there surely was not in the correct position, so it will have to replace another one. By doing n-2 such steps all the elements will get to the correct positions:
void funny_sort(char* arr, int n){
int pos = 1; // first unordered element
char aux = arr[pos];
for (int iter = 0; iter < n-2; iter++) { // n-2 unordered elements
pos = (pos < n/2) ? pos*2 : (pos-n/2)*2+1;// correct pos for aux
swap(&aux, arr + pos);
}
}
Score each digit as its numerical value. Score each letter as a = 1.5, b = 2.5 c = 3.5 etc. Run an insertion sort of the string based on the score of each character.
[ETA] Simple scoring won't work so use two pointers and reverse the piece of the string between the two pointers. One pointer starts at the front of the string and advances one step each cycle. The other pointer starts in the middle of the string and advances every second cycle.
123456abcdef
^ ^
1a65432bcdef
^ ^
1a23456bcdef
^ ^
1a2b6543cdef
^ ^

Resources