Find all pairs of similar words - algorithm

I have some 40000 words and want to find all similar pairs. For similarity, I use a soft of Damerau–Levenshtein distance scaled by the word lengths. For simplicity, I don't consider overlapping edits (just like the linked algorithm). All words (most of them being German, French or English) were converted to lowercase as case caries no information in our data. I did two modifications to the distance computation
the distance between two characters is
0, when they're the same
0.2, when they differ just in the accent (like a vs ä or à)
0.2, when they're s and ß (the German Sharp S)
1, otherwise
Additionally, the distance of the strings ß and ss is set to 0.2. Our data shows that this complication is necessary.
For finding all similar pairs, my idea was to only consider pairs found via a common n-gram, but this fails for short words (which is acceptable) and in general because of the above modifications.
My next idea was an early abort of the distance computation, if it's known that the result is over a threshold (say 4). However, as many words have common prefixes, this abort comes too late to being a nice speed-up. Reversing the words fails (not so badly) on common suffixes.
The whole computation for all 20e6 pairs takes some five minutes, so for now, it's possible to do it once and store all the results (only distances below a threshold are needed). But I'm looking for something more future-proof.
Is it possible to quickly compute a good lower bound on the Damerau–Levenshtein distance (ideally allowing an early exit)?
Is it possible with the above modifications? Note that e.g., "at least the difference of the sizes of the two strings" doesn't hold because of the second modification.
The code
public final class Levenshtein {
private static final class CharDistance {
private static int distance(char c0, char c1) {
if (c0 == c1) return 0;
if ((c0|c1) < 0x80) return SCALE;
return c0<=c1 ? distanceInternal(c0, c1) : distanceInternal(c1, c0);
}
private static int distanceInternal(char c0, char c1) {
assert c0 <= c1;
final String key = c0 + " " + c1;
{
final Integer result = CACHE.get(key);
if (result != null) return result.intValue();
}
final int result = distanceUncached(c0, c1);
CACHE.put(key, Integer.valueOf(result));
return result;
}
private static int distanceUncached(char c0, char c1) {
final String norm0 = Normalizer.normalize("" + c0, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
final String norm1 = Normalizer.normalize("" + c1, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
if (norm0.equals(norm1)) return DIACRITICS;
assert c0 <= c1;
if (c0=='s' && c1=='ß') return DIACRITICS;
return SCALE;
}
private static final Map<String, Integer> CACHE = new ConcurrentHashMap<>();
}
/**
* Return the scaled distance between {#code s0} and {#code s1}, if it's below {#code limit}.
* Otherwise, return some lower bound ({#code >= limit}.
*/
int distance(String s0, String s1, int limit) {
final int len0 = s0.length();
final int len1 = s1.length();
int result = SCALE * (len0 + len1);
final int[] array = new int[len0 * len1];
for (int i0=0; i0<len0; ++i0) {
final char c0 = s0.charAt(i0);
for (int i1=0; i1<len1; ++i1) {
final char c1 = s1.charAt(i1);
final int d = CharDistance.distance(c0, c1);
// Append c0 and c1 respectively.
result = get(len0, array, i0-1, i1-1) + d;
// Append c0.
result = Math.min(result, get(len0, array, i0-1, i1) + SCALE);
// Append c1.
result = Math.min(result, get(len0, array, i0, i1-1) + SCALE);
// Handle the "ß" <-> "ss" substitution.
if (c0=='ß' && c1=='s' && i1>0 && s1.charAt(i1-1)=='s') result = Math.min(result, get(len0, array, i0-1, i1-2) + DIACRITICS);
if (c1=='ß' && c0=='s' && i0>0 && s0.charAt(i0-1)=='s') result = Math.min(result, get(len0, array, i0-2, i1-1) + DIACRITICS);
// Handle a transposition.
if (i0>0 && i1>0 && s0.charAt(i0-1)==c1 && s1.charAt(i1-1)==c0) result = Math.min(result, get(len0, array, i0-2, i1-2) + SCALE);
set(len0, array, i0, i1, result);
}
// Early exit.
{
final int j = i0 - len0 + len1;
final int lowerBound = get(len0, array, i0, j);
if (lowerBound >= limit) return lowerBound;
}
}
return result;
}
// Simulate reading from a 2D array at indexes i0 and i1;
private int get(int stride, int[] array, int i0, int i1) {
if (i0<0 || i1<0) return SCALE * (i0+i1+2);
return array[i1*stride + i0];
}
// Simulate writing to a 2D array at indexes i0 and i1;
private void set(int stride, int[] array, int i0, int i1, int value) {
array[i1*stride + i0] = value;
}
private static final int SCALE = 10;
private static final int DIACRITICS = 2;
}
Example words
rotwein
rotweincuv
rotweincuvee
rotweincuveé
rotweincuvée
rotweincuvúe
rotweindekanter
rotweinessig
rotweinfass
rotweinglas
rotweinkelch
rotweißkomposition
rotwild
roug
rouge
rougeaoc
rougeaop
rougeots
rougers
rouges
rougeáaop
rough
roughstock
rouladen
roulette
roumier
roumieu
round
rounded
roundhouse
rounds
rouret
rouss
roussanne
rousseau
roussi
roussillion
roussillon
route
rouvinez
rove
roveglia
rovere
roveri
rovertondo
rovo
rowan
rowein
roxburgh
roxx
roy
roya
royal
royalbl
royalblau
royaldanishnavy
royale
royales
royaline
royals
royer
royere
roze
rozenberg
rozes
rozier
rozès
rozés
roßberg
roßdorfer
roßerer
rpa
rr
rrvb
rry
rs
rsaftschorle
rsbaron
rscastillo
rsgut
rsl
rstenapfel
rstenberg
rstenbrõu
rt
rtd
rtebecker
rtebeker
ru
ruadh
ruanda
rub
rubaiyat
ruban
rubata
rubblez
rubenkov
rubeno
rubentino
ruber

I would suggest putting all of the words into a trie, and then recursively searching the trie against itself.
Generating the trie should be fast, and now matching of common prefixes against each other is only calculated once no matter how many words share them.
You do have to keep track of a lot of state as you wander the trie, because your state is, "All of the intermediate stuff we could be in the middle of calculating."

Related

Points and segments

I'm doing online course and got stuck at this problem.
The first line contains two non-negative integers 1 ≤ n, m ≤ 50000 — the number of segments and points on a line, respectively. The next n lines contain two integers a_i ≤ b_i defining the i-th segment. The next line contain m integers defining points. All the integers are of absolute value at most 10^8. For each segment, output the number of points it is used from the n-points table.
My solution is :
for point in points:
occurrence = 0
for l, r in segments:
if l <= point <= r:
occurrence += 1
print(occurrence),
The complexity of this algorithm is O(m*n), which is obviously not very efficient. What is the best way of solving this problem? Any help will be appreciated!
Sample Input:
2 3
0 5
7 10
1 6 11
Sample Output:
1 0 0
Sample Input 2:
1 3
-10 10
-100 100 0
Sample Output 2:
0 0 1
You can use sweep line algorithm to solve this problem.
First, break each segment into two points, open and close points.
Add all these points together with those m points, and sort them based on their locations.
Iterating through the list of points, maintaining a counter, every time you encounter an open point, increase the counter, and if you encounter an end point, decrease it. If you encounter a point in list m point, the result for this point is the value of counter at this moment.
For example 2, we have:
1 3
-10 10
-100 100 0
After sorting, what we have is:
-100 -10 0 10 100
At point -100, we have `counter = 0`
At point -10, this is open point, we increase `counter = 1`
At point 0, so result is 1
At point 10, this is close point, we decrease `counter = 0`
At point 100, result is 0
So, result for point -100 is 0, point 100 is 0 and point 0 is 1 as expected.
Time complexity is O((n + m) log (n + m)).
[Original answer] by how many segments is each point used
I am not sure I got the problem correctly but looks like simple example of Histogram use ...
create counter array (one item per point)
set it to zero
process the last line incrementing each used point counter O(m)
write the answer by reading histogram O(n)
So the result should be O(m+n) something like (C++):
const int n=2,m=3;
const int p[n][2]={ {0,5},{7,10} };
const int s[m]={1,6,11};
int i,cnt[n];
for (i=0;i<n;i++) cnt[i]=0;
for (i=0;i<m;i++) if ((s[i]>=0)&&(s[i]<n)) cnt[s[i]]++;
for (i=0;i<n;i++) cout << cnt[i] << " "; // result: 0 1
But as you can see the p[] coordinates are never used so either I missed something in your problem description or you missing something or it is there just to trick solvers ...
[edit1] after clearing the inconsistencies in OP the result is a bit different
By how many points is each segment used:
create counter array (one item per segment)
set it to zero
process the last line incrementing each used point counter O(m)
write the answer by reading histogram O(m)
So the result is O(m) something like (C++):
const int n=2,m=3;
const int p[n][2]={ {0,5},{7,10} };
const int s[m]={1,6,11};
int i,cnt[m];
for (i=0;i<m;i++) cnt[i]=0;
for (i=0;i<m;i++) if ((s[i]>=0)&&(s[i]<n)) cnt[i]++;
for (i=0;i<m;i++) cout << cnt[i] << " "; // result: 1,0,0
[Notes]
After added new sample set to OP it is clear now that:
indexes starts from 0
the problem is how many points from table p[n] are really used by each segment (m numbers in output)
Use Binary Search.
Sort the line segments according to 1st value and the second value. If you use c++, you can use custom sort like this:
sort(a,a+n,fun); //a is your array of pair<int,int>, coordinates representing line
bool fun(pair<int,int> a, pair<int,int> b){
if(a.first<b.first)
return true;
if(a.first>b.first)
return false;
return a.second < b.second;
}
Then, for every point, find the 1st line that captures the point and the first line that does not (after the line that does of course). If no line captures the point, you can return -1 or something (and not check for the point that does not).
Something like:
int checkFirstHold(pair<int,int> a[], int p,int min, int max){ //p is the point
while(min < max){
int mid = (min + max)/2;
if(a[mid].first <= p && a[mid].second>=p && a[mid-1].first<p && a[mid-1].second<p) //ie, p is in line a[mid] and not in line a[mid-1]
return mid;
if(a[mid].first <= p && a[mid].second>=p && a[mid-1].first<=p && a[mid-1].second>=p) //ie, p is both in line a[mid] and not in line a[mid-1]
max = mid-1;
if(a[mid].first < p && a[mid].second<p ) //ie, p is not in line a[mid]
min = mid + 1;
}
return -1; //implying no point holds the line
}
Similarly, write a checkLastHold function.
Then, find checkLastHold - checkFirstHold for every point, which is the answer.
The complexity of this solution will be O(n log m), as it takes (log m) for every calculation.
Here is my counter-based solution in Java.
Note that all points, segment start and segment end are read into one array.
If points of different PointType have the same x-coordinate, then the point is sorted after segment start and before segment end. This is done to count the point as "in" the segment if it coincides with both the segment start (counter already increased) and the segment end (counter not yet decreased).
For storing an answer in the same order as the points from the input, I create the array result of size pointsCount (only points counted, not the segments) and set its element with index SuperPoint.index, which stores the position of the point in the original input.
import java.util.Arrays;
import java.util.Scanner;
public final class PointsAndSegmentsSolution {
enum PointType { // in order of sort, so that the point will be counted on both segment start and end coordinates
SEGMENT_START,
POINT,
SEGMENT_END,
}
static class SuperPoint {
final PointType type;
final int x;
final int index; // -1 (actually does not matter) for segments, index for points
public SuperPoint(final PointType type, final int x) {
this(type, x, -1);
}
public SuperPoint(final PointType type, final int x, final int index) {
this.type = type;
this.x = x;
this.index = index;
}
}
private static int[] countSegments(final SuperPoint[] allPoints, final int pointsCount) {
Arrays.sort(allPoints, (o1, o2) -> {
if (o1.x < o2.x)
return -1;
if (o1.x > o2.x)
return 1;
return Integer.compare( o1.type.ordinal(), o2.type.ordinal() ); // points with the same X coordinate by order in PointType enum
});
final int[] result = new int[pointsCount];
int counter = 0;
for (final SuperPoint superPoint : allPoints) {
switch (superPoint.type) {
case SEGMENT_START:
counter++;
break;
case SEGMENT_END:
counter--;
break;
case POINT:
result[superPoint.index] = counter;
break;
default:
throw new IllegalArgumentException( String.format("Unknown SuperPoint type: %s", superPoint.type) );
}
}
return result;
}
public static void main(final String[] args) {
final Scanner scanner = new Scanner(System.in);
final int segmentsCount = scanner.nextInt();
final int pointsCount = scanner.nextInt();
final SuperPoint[] allPoints = new SuperPoint[(segmentsCount * 2) + pointsCount];
int allPointsIndex = 0;
for (int i = 0; i < segmentsCount; i++) {
final int start = scanner.nextInt();
final int end = scanner.nextInt();
allPoints[allPointsIndex] = new SuperPoint(PointType.SEGMENT_START, start);
allPointsIndex++;
allPoints[allPointsIndex] = new SuperPoint(PointType.SEGMENT_END, end);
allPointsIndex++;
}
for (int i = 0; i < pointsCount; i++) {
final int x = scanner.nextInt();
allPoints[allPointsIndex] = new SuperPoint(PointType.POINT, x, i);
allPointsIndex++;
}
final int[] pointsSegmentsCounts = countSegments(allPoints, pointsCount);
for (final int count : pointsSegmentsCounts) {
System.out.print(count + " ");
}
}
}

Check if binary string can be partitioned such that each partition is a power of 5

I recently came across this question - Given a binary string, check if we can partition/split the string into 0..n parts such that each part is a power of 5. Return the minimum number of splits, if it can be done.
Examples would be:
input = "101101" - returns 1, as the string can be split once to form "101" and "101",as 101= 5^1.
input = "1111101" - returns 0, as the string itself is 5^3.
input = "100"- returns -1, as it can't be split into power(s) of 5.
I came up with this recursive algorithm:
Check if the string itself is a power of 5. if yes, return 0
Else, iterate over the string character by character, checking at every point if the number seen so far is a power of 5. If yes, add 1 to split count and check the rest of the string recursively for powers of 5 starting from step 1.
return the minimum number of splits seen so far.
I implemented the above algo in Java. I believe it works alright, but it's a straightforward recursive solution. Can this be solved using dynamic programming to improve the run time?
The code is below:
public int partition(String inp){
if(inp==null || inp.length()==0)
return 0;
return partition(inp,inp.length(),0);
}
public int partition(String inp,int len,int index){
if(len==index)
return 0;
if(isPowerOfFive(inp,index))
return 0;
long sub=0;
int count = Integer.MAX_VALUE;
for(int i=index;i<len;++i){
sub = sub*2 +(inp.charAt(i)-'0');
if(isPowerOfFive(sub))
count = Math.min(count,1+partition(inp,len,i+1));
}
return count;
}
Helper functions:
public boolean isPowerOfFive(String inp,int index){
long sub = 0;
for(int i=index;i<inp.length();++i){
sub = sub*2 +(inp.charAt(i)-'0');
}
return isPowerOfFive(sub);
}
public boolean isPowerOfFive(long val){
if(val==0)
return true;
if(val==1)
return false;
while(val>1){
if(val%5 != 0)
return false;
val = val/5;
}
return true;
}
Here is simple improvements that can be done:
Calculate all powers of 5 before start, so you could do checks faster.
Stop split input string if the number of splits is already greater than in the best split you've already done.
Here is my solution using these ideas:
public static List<String> powers = new ArrayList<String>();
public static int bestSplit = Integer.MAX_VALUE;
public static void main(String[] args) throws Exception {
// input string (5^5, 5^1, 5^10)
String inp = "110000110101101100101010000001011111001";
// calc all powers of 5 that fits in given string
for (int pow = 1; ; ++pow) {
String powStr = Long.toBinaryString((long) Math.pow(5, pow));
if (powStr.length() <= inp.length()) { // can be fit in input string
powers.add(powStr);
} else {
break;
}
}
Collections.reverse(powers); // simple heuristics, sort powers in decreasing order
// do simple recursive split
split(inp, 0, -1);
// print result
if (bestSplit == Integer.MAX_VALUE) {
System.out.println(-1);
} else {
System.out.println(bestSplit);
}
}
public static void split(String inp, int start, int depth) {
if (depth >= bestSplit) {
return; // can't do better split
}
if (start == inp.length()) { // perfect split
bestSplit = depth;
return;
}
for (String pow : powers) {
if (inp.startsWith(pow, start)) {
split(inp, start + pow.length(), depth + 1);
}
}
}
EDIT:
I also found another approach which looks like very fast one.
Calculate all powers of 5 whose string representation is shorter than input string. Save those strings in powers array.
For every string power from powers array: if power is substring of input then save its start and end indexes into the edges array (array of tuples).
Now we just need to find shortest path from index 0 to index input.length() by edges from the edges array. Every edge has the same weight, so the shortest path can be found very fast with BFS.
The number of edges in the shortest path found is exactly what you need -- minimum number of splits of the input string.
Instead of calculating all possible substrings, you can check the binary representation of the powers of 5 in search of a common pattern. Using something like:
bc <<< "obase=2; for(i = 1; i < 40; i++) 5^i"
You get:
51 = 1012
52 = 110012
53 = 11111012
54 = 10011100012
55 = 1100001101012
56 = 111101000010012
57 = 100110001001011012
58 = 10111110101111000012
59 = 1110111001101011001012
510 = 1001010100000010111110012
511 = 101110100100001110110111012
512 = 11101000110101001010010100012
513 = 10010001100001001110011100101012
514 = 1011010111100110001000001111010012
515 = 111000110101111110101001001100011012
516 = 100011100001101111001001101111110000012
517 = 10110001101000101011110000101110110001012
518 = 1101111000001011011010110011101001110110012
...
529 = 101000011000111100000111110101110011011010111001000010111110010101012
As you can see, odd powers of 5 always ends with 101 and even powers of 5 ends with the pattern 10+1 (where + means one or more occurrences).
You could put your input string in a trie and then iterate over it identifying the 10+1 pattern, once you have a match, evaluate it to check if is not a false positive.
You just have to save the value for a given string in a map. For example having if you have a string ending like this: (each letter may be a string of arbitrary size)
ABCD
You find that part A mod 5 is ok, so you try again for BCD, but find that B mod 5 is also ok, same for C and D as well as CD together. Now you should have the following results cached:
C -> 0
D -> 0
CD -> 0
BCD -> 1 # split B/CD is the best
But you're not finished with ABCD - you find that AB mod 5 is ok, so you check the resulting CD - it's already in the cache and you don't have to process it from the beginning.
In practice you just need to cache answers from partition() - either for the actual string or for the (string, start, length) tuple. Which one is better depends on how many repeating sequences you have and whether it's faster to compare the contents, or just indexes.
Given below is a solution in C++. Using dynamic programming I am considering all the possible splits and saving the best results.
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int isPowerOfFive(ll n)
{
if(n == 0) return 0;
ll temp = (ll)(log(n)/log(5));
ll t = round(pow(5,temp));
if(t == n)
{
return 1;
}
else
{
return 0;
}
}
ll solve(string s)
{
vector<ll> dp(s.length()+1);
for(int i = 1; i <= s.length(); i++)
{
dp[i] = INT_MAX;
for(int j = 1; j <= i; j++)
{
if( s[j-1] == '0')
{
continue;
}
ll num = stoll(s.substr(j-1, i-j+1), nullptr, 2);
if(isPowerOfFive(num))
{
dp[i] = min(dp[i], dp[j-1]+1);
}
}
}
if(dp[s.length()] == INT_MAX)
{
return -1;
}
else
{
return dp[s.length()];
}
}
int main()
{
string s;
cin>>s;
cout<<solve(s);
}

Finding union length of many line segments

I have few bolded line segments on x-axis in form of their beginning and ending x-coordinates. Some line segments may be overlapping. How to find the union length of all the line segments.
Example, a line segment is 5,0 to 8,0 and other is 9,0 to 12,0. Both are non overlapping, so sum of length is 3 + 3 = 6.
a line segment is 5,0 to 8,0 and other is 7,0 to 12,0. But they are overlapping for range, 7,0 to 8,0. So union of length is 7.
But the x- coordinates may be floating points.
Represent a line segment as 2 EndPoint object. Each EndPoint object has the form <coordinate, isStartEndPoint>. Put all EndPoint objects of all the line segments together in a list endPointList.
The algorithm:
Sort endPointList, first by coordinate in ascending order, then place the start end points in front of the tail end points (regardless of which segment, since it doesn't matter - all at the same coordinate).
Loop through the sorted list according to this pseudocode:
prevCoordinate = -Inf
numSegment = 0
unionLength = 0
for (endPoint in endPointList):
if (numSegment > 0):
unionLength += endPoint.coordinate - prevCoordinate
prevCoordinate = endPoint.coordinate
if (endPoint.isStartCoordinate):
numSegment = numSegment + 1
else:
numSegment = numSegment - 1
The numSegment variable will tell whether we are in a segment or not. When it is larger than 0, we are inside some segment, so we can include the distance to the previous end point. If it is 0, it means that the part before the current end point doesn't contain any segment.
The complexity is dominated by the sorting part, since comparison-based sorting algorithm has lower bound of Omega(n log n), while the loop is clearly O(n) at best. So the complexity of the algorithm can be said to be O(n log n) if you choose an O(n log n) comparison-based sorting algorithm.
Use a range tree. A range tree is n log(n), just like the sorted begin/end points, but it has the additional advantage that overlapping ranges will reduce the number of elements (but maybe increase the cost of insertion) Snippet (untested)
struct segment {
struct segment *ll, *rr;
float lo, hi;
};
struct segment * newsegment(float lo, float hi) {
struct segment * ret;
ret = malloc (sizeof *ret);
ret->lo = lo; ret->hi = hi;
ret->ll= ret->rr = NULL;
return ret;
}
struct segment * insert_range(struct segment *root, float lo, float hi)
{
if (!root) return newsegment(lo, hi);
/* non-overlapping(or touching) ranges can be put into the {l,r} subtrees} */
if (hi < root->lo) {
root->ll = insert_range(root->ll, lo, hi);
return root;
}
if (lo > root->hi) {
root->rr = insert_range(root->rr, lo, hi);
return root;
}
/* when we get here, we must have overlap; we can extend the current node
** we also need to check if the broader range overlaps the child nodes
*/
if (lo < root->lo ) {
root->lo = lo;
while (root->ll && root->ll->hi >= root->lo) {
struct segment *tmp;
tmp = root->ll;
root->lo = tmp->lo;
root->ll = tmp->ll;
tmp->ll = NULL;
// freetree(tmp);
}
}
if (hi > root->hi ) {
root->hi = hi;
while (root->rr && root->rr->lo <= root->hi) {
struct segment *tmp;
tmp = root->rr;
root->hi = tmp->hi;
root->rr = tmp->rr;
tmp->rr = NULL;
// freetree(tmp);
}
}
return root;
}
float total_width(struct segment *ptr)
{
float ret;
if (!ptr) return 0.0;
ret = ptr->hi - ptr->lo;
ret += total_width(ptr->ll);
ret += total_width(ptr->rr);
return ret;
}
Here is a solution I just wrote in Haskell and below it is an example of how it can be implemented in the interpreter command prompt. The segments must be presented in the form of a list of tuples [(a,a)]. I hope you can get a sense of the algorithm from the code.
import Data.List
unionSegments segments =
let (x:xs) = sort segments
one_segment = snd x - fst x
in if xs /= []
then if snd x > fst (head xs)
then one_segment - (snd x - fst (head xs)) + unionSegments xs
else one_segment + unionSegments xs
else one_segment
*Main> :load "unionSegments.hs"
[1 of 1] Compiling Main ( unionSegments.hs, interpreted )
Ok, modules loaded: Main.
*Main> unionSegments [(5,8), (7,12)]
7
Java implementation
import java.util.*;
public class HelloWorld{
static void unionLength(int a[][],int sets)
{
TreeMap<Integer,Boolean> t=new TreeMap<>();
for(int i=0;i<sets;i++)
{
t.put(a[i][0],false);
t.put(a[i][1],true);
}
int count=0;
int res=0;
int one=1;
Set set = t.entrySet();
Iterator it = set.iterator();
int prev=0;
while(it.hasNext()) {
if(one==1){
Map.Entry me = (Map.Entry)it.next();
one=0;
prev=(int)me.getKey();
if((boolean)me.getValue()==false)
count++;
else
count--;
}
Map.Entry me = (Map.Entry)it.next();
if(count>0)
res=res+((int)me.getKey()-prev);
if((boolean)me.getValue()==false)
count++;
else
count--;
prev=(int)me.getKey();
}
System.out.println(res);
}
public static void main(String []args){
int a[][]={{0, 4}, {3, 6},{8,10}};
int b[][]={{5, 10}, {8, 12}};
unionLength(a,3);
unionLength(b,2);
}
}

Dynamic programming: Algorithm to solve the following?

I have recently completed the following interview exercise:
'A robot can be programmed to run "a", "b", "c"... "n" kilometers and it takes ta, tb, tc... tn minutes, respectively. Once it runs to programmed kilometers, it must be turned off for "m" minutes.
After "m" minutes it can again be programmed to run for a further "a", "b", "c"... "n" kilometers.
How would you program this robot to go an exact number of kilometers in the minimum amount of time?'
I thought it was a variation of the unbounded knapsack problem, in which the size would be the number of kilometers and the value, the time needed to complete each stretch. The main difference is that we need to minimise, rather than maximise, the value. So I used the equivalent of the following solution: http://en.wikipedia.org/wiki/Knapsack_problem#Unbounded_knapsack_problem
in which I select the minimum.
Finally, because we need an exact solution (if there is one), over the map constructed by the algorithm for all the different distances, I iterated through each and trough each robot's programmed distance to find the exact distance and minimum time among those.
I think the pause the robot takes between runs is a bit of a red herring and you just need to include it in your calculations, but it does not affect the approach taken.
I am probably wrong, because I failed the test. I don't have any other feedback as to the expected solution.
Edit: maybe I wasn't wrong after all and I failed for different reasons. I just wanted to validate my approach to this problem.
import static com.google.common.collect.Sets.*;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.apache.log4j.Logger;
import com.google.common.base.Objects;
import com.google.common.base.Preconditions;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
public final class Robot {
static final Logger logger = Logger.getLogger (Robot.class);
private Set<ProgrammedRun> programmedRuns;
private int pause;
private int totalDistance;
private Robot () {
//don't expose default constructor & prevent subclassing
}
private Robot (int[] programmedDistances, int[] timesPerDistance, int pause, int totalDistance) {
this.programmedRuns = newHashSet ();
for (int i = 0; i < programmedDistances.length; i++) {
this.programmedRuns.add (new ProgrammedRun (programmedDistances [i], timesPerDistance [i] ) );
}
this.pause = pause;
this.totalDistance = totalDistance;
}
public static Robot create (int[] programmedDistances, int[] timesPerDistance, int pause, int totalDistance) {
Preconditions.checkArgument (programmedDistances.length == timesPerDistance.length);
Preconditions.checkArgument (pause >= 0);
Preconditions.checkArgument (totalDistance >= 0);
return new Robot (programmedDistances, timesPerDistance, pause, totalDistance);
}
/**
* #returns null if no strategy was found. An empty map if distance is zero. A
* map with the programmed runs as keys and number of time they need to be run
* as value.
*
*/
Map<ProgrammedRun, Integer> calculateOptimalStrategy () {
//for efficiency, consider this case first
if (this.totalDistance == 0) {
return Maps.newHashMap ();
}
//list of solutions for different distances. Element "i" of the list is the best set of runs that cover at least "i" kilometers
List <Map<ProgrammedRun, Integer>> runsForDistances = Lists.newArrayList();
//special case i = 0 -> empty map (no runs needed)
runsForDistances.add (new HashMap<ProgrammedRun, Integer> () );
for (int i = 1; i <= totalDistance; i++) {
Map<ProgrammedRun, Integer> map = new HashMap<ProgrammedRun, Integer> ();
int minimumTime = -1;
for (ProgrammedRun pr : programmedRuns) {
int distance = Math.max (0, i - pr.getDistance ());
int time = getTotalTime (runsForDistances.get (distance) ) + pause + pr.getTime();
if (minimumTime < 0 || time < minimumTime) {
minimumTime = time;
//new minimum found
map = new HashMap<ProgrammedRun, Integer> ();
map.putAll(runsForDistances.get (distance) );
//increase count
Integer num = map.get (pr);
if (num == null) num = Integer.valueOf (1);
else num++;
//update map
map.put (pr, num);
}
}
runsForDistances.add (map );
}
//last step: calculate the combination with exact distance
int minimumTime2 = -1;
int bestIndex = -1;
for (int i = 0; i <= totalDistance; i++) {
if (getTotalDistance (runsForDistances.get (i) ) == this.totalDistance ) {
int time = getTotalTime (runsForDistances.get (i) );
if (time > 0) time -= pause;
if (minimumTime2 < 0 || time < minimumTime2 ) {
minimumTime2 = time;
bestIndex = i;
}
}
}
//if solution found
if (bestIndex != -1) {
return runsForDistances.get (bestIndex);
}
//try all combinations, since none of the existing maps run for the exact distance
List <Map<ProgrammedRun, Integer>> exactRuns = Lists.newArrayList();
for (int i = 0; i <= totalDistance; i++) {
int distance = getTotalDistance (runsForDistances.get (i) );
for (ProgrammedRun pr : programmedRuns) {
//solution found
if (distance + pr.getDistance() == this.totalDistance ) {
Map<ProgrammedRun, Integer> map = new HashMap<ProgrammedRun, Integer> ();
map.putAll (runsForDistances.get (i));
//increase count
Integer num = map.get (pr);
if (num == null) num = Integer.valueOf (1);
else num++;
//update map
map.put (pr, num);
exactRuns.add (map);
}
}
}
if (exactRuns.isEmpty()) return null;
//finally return the map with the best time
minimumTime2 = -1;
Map<ProgrammedRun, Integer> bestMap = null;
for (Map<ProgrammedRun, Integer> m : exactRuns) {
int time = getTotalTime (m);
if (time > 0) time -= pause; //remove last pause
if (minimumTime2 < 0 || time < minimumTime2 ) {
minimumTime2 = time;
bestMap = m;
}
}
return bestMap;
}
private int getTotalTime (Map<ProgrammedRun, Integer> runs) {
int time = 0;
for (Map.Entry<ProgrammedRun, Integer> runEntry : runs.entrySet()) {
time += runEntry.getValue () * runEntry.getKey().getTime ();
//add pauses
time += this.pause * runEntry.getValue ();
}
return time;
}
private int getTotalDistance (Map<ProgrammedRun, Integer> runs) {
int distance = 0;
for (Map.Entry<ProgrammedRun, Integer> runEntry : runs.entrySet()) {
distance += runEntry.getValue() * runEntry.getKey().getDistance ();
}
return distance;
}
class ProgrammedRun {
private int distance;
private int time;
private transient float speed;
ProgrammedRun (int distance, int time) {
this.distance = distance;
this.time = time;
this.speed = (float) distance / time;
}
#Override public String toString () {
return "(distance =" + distance + "; time=" + time + ")";
}
#Override public boolean equals (Object other) {
return other instanceof ProgrammedRun
&& this.distance == ((ProgrammedRun)other).distance
&& this.time == ((ProgrammedRun)other).time;
}
#Override public int hashCode () {
return Objects.hashCode (Integer.valueOf (this.distance), Integer.valueOf (this.time));
}
int getDistance() {
return distance;
}
int getTime() {
return time;
}
float getSpeed() {
return speed;
}
}
}
public class Main {
/* Input variables for the robot */
private static int [] programmedDistances = {1, 2, 3, 5, 10}; //in kilometers
private static int [] timesPerDistance = {10, 5, 3, 2, 1}; //in minutes
private static int pause = 2; //in minutes
private static int totalDistance = 41; //in kilometers
/**
* #param args
*/
public static void main(String[] args) {
Robot r = Robot.create (programmedDistances, timesPerDistance, pause, totalDistance);
Map<ProgrammedRun, Integer> strategy = r.calculateOptimalStrategy ();
if (strategy == null) {
System.out.println ("No strategy that matches the conditions was found");
} else if (strategy.isEmpty ()) {
System.out.println ("No need to run; distance is zero");
} else {
System.out.println ("Strategy found:");
System.out.println (strategy);
}
}
}
Simplifying slightly, let ti be the time (including downtime) that it takes the robot to run distance di. Assume that t1/d1 ≤ … ≤ tn/dn. If t1/d1 is significantly smaller than t2/d2 and d1 and the total distance D to be run are large, then branch and bound likely outperforms dynamic programming. Branch and bound solves the integer programming formulation
minimize ∑i ti xi
subject to
∑i di xi = D
∀i xi &in; N
by using the value of the relaxation where xi can be any nonnegative real as a guide. The latter is easily verified to be at most (t1/d1)D, by setting x1 to D/d1 and ∀i ≠ 1 xi = 0, and at least (t1/d1)D, by setting the sole variable of the dual program to t1/d1. Solving the relaxation is the bound step; every integer solution is a fractional solution, so the best integer solution requires time at least (t1/d1)D.
The branch step takes one integer program and splits it in two whose solutions, taken together, cover the entire solution space of the original. In this case, one piece could have the extra constraint x1 = 0 and the other could have the extra constraint x1 ≥ 1. It might look as though this would create subproblems with side constraints, but in fact, we can just delete the first move, or decrease D by d1 and add the constant t1 to the objective. Another option for branching is to add either the constraint xi = ⌊D/di⌋ or xi ≤ ⌊D/di⌋ - 1, which requires generalizing to upper bounds on the number of repetitions of each move.
The main loop of branch and bound selects one of a collection of subproblems, branches, computes bounds for the two subproblems, and puts them back into the collection. The efficiency over brute force comes from the fact that, when we have a solution with a particular value, every subproblem whose relaxed value is at least that much can be thrown away. Once the collection is emptied this way, we have the optimal solution.
Hybrids of branch and bound and dynamic programming are possible, for example, computing optimal solutions for small D via DP and using those values instead of branching on subproblems that have been solved.
Create array of size m and for 0 to m( m is your distance) do:
a[i] = infinite;
a[0] = 0;
a[i] = min{min{a[i-j] + tj + m for all j in possible kilometers of robot. and j≠i} , ti if i is in possible moves of robot}
a[m] is lowest possible value. Also you can have array like b to save a[i]s selection. Also if a[m] == infinite means it's not possible.
Edit: we can solve it in another way by creating a digraph, again our graph is dependent to m length of path, graph has nodes labeled {0..m}, now start from node 0 connect it to all possible nodes; means if you have a kilometer i you can connect 0 and vi with weight ti, except for node 0->x, for all other nodes you should connect node i->j with weight tj-i + m for j>i and j-i is available in input kilometers. now you should find shortest path from v0 to vn. but this algorithm still is O(nm).
Let G be the desired distance run.
Let n be the longest possible distance run without pause.
Let L = G / n (Integer arithmetic, discard fraction part)
Let R = G mod n (ie. The remainder from the above division)
Make the robot run it's longest distance (ie. n) L times, and then whichever distance (a, b, c, etc.) is greater than R by the least amount (ie the smallest available distance that is equal to or greater than R)
Either I understood the problem wrong, or you're all over thinking it
I am a big believer in showing instead of telling. Here is a program that may be doing what you are looking for. Let me know if it satisfies your question. Simply copy, paste, and run the program. You should of course test with your own data set.
import java.util.Arrays;
public class Speed {
/***
*
* #param distance
* #param sprints ={{A,Ta},{B,Tb},{C,Tc}, ..., {N,Tn}}
*/
public static int getFastestTime(int distance, int[][] sprints){
long[] minTime = new long[distance+1];//distance from 0 to distance
Arrays.fill(minTime,Integer.MAX_VALUE);
minTime[0]=0;//key=distance; value=time
for(int[] speed: sprints)
for(int d=1; d<minTime.length; d++)
if(d>=speed[0] && minTime[d] > minTime[d-speed[0]]+speed[1])
minTime[d]=minTime[d-speed[0]]+speed[1];
return (int)minTime[distance];
}//
public static void main(String... args){
//sprints ={{A,Ta},{B,Tb},{C,Tc}, ..., {N,Tn}}
int[][] sprints={{3,2},{5,3},{7,5}};
int distance = 21;
System.out.println(getFastestTime(distance,sprints));
}
}

Support Resistance Algorithm - Technical analysis [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 months ago.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
I have an intra-day chart and I am trying to figure out how to calculate
support and resistance levels, anyone knows an algorithm for doing that, or a good starting point?
Yes, a very simple algorithm is to choose a timeframe, say 100 bars, then look for local turning points, or Maxima and Minima. Maxima and Minima can be computed from a smoothed closing price by using the 1st and second derivative (dy/dx and d^2y/dx). Where dy/dx = zero and d^y/dx is positive, you have a minima, when dy/dx = zero and d^2y/dx is negative, you have a maxima.
In practical terms this could be computed by iterating over your smoothed closing price series and looking at three adjacent points. If the points are lower/higher/lower in relative terms then you have a maxima, else higher/lower/higher you have a minima. You may wish to fine-tune this detection method to look at more points (say 5, 7) and only trigger if the edge points are a certain % away from the centre point. this is similar to the algorithm that the ZigZag indicator uses.
Once you have local maxima and minima, you then want to look for clusters of turning points within a certain distance of each other in the Y-Direction. this is simple. Take the list of N turning points and compute the Y-distance between it and each of the other discovered turning points. If the distance is less than a fixed constant then you have found two "close" turning points, indicating possible support/resistance.
You could then rank your S/R lines, so two turning points at $20 is less important than three turning points at $20 for instance.
An extension to this would be to compute trendlines. With the list of turning points discovered now take each point in turn and select two other points, trying to fit a straight line equation. If the equation is solvable within a certain error margin, you have a sloping trendline. If not, discard and move on to the next triplet of points.
The reason why you need three at a time to compute trendlines is any two points can be used in the straight line equation. Another way to compute trendlines would be to compute the straight line equation of all pairs of turning points, then see if a third point (or more than one) lies on the same straight line within a margin of error. If 1 or more other points does lie on this line, bingo you have calculated a Support/Resistance trendline.
No code examples sorry, I'm just giving you some ideas on how it could be done. In summary:
Inputs to the system
Lookback period L (number of bars)
Closing prices for L bars
Smoothing factor (to smooth closing price)
Error Margin or Delta (minimum distance between turning points to constitute a match)
Outputs
List of turning points, call them tPoints[] (x,y)
List of potential trendlines, each with the line equation (y = mx + c)
EDIT: Update
I recently learned a very simple indicator called a Donchian Channel, which basically plots a channel of the highest high in 20 bars, and lowest low. It can be used to plot an approximate support resistance level. But the above - Donchian Channel with turning points is cooler ^_^
I am using a much less complex algorithm in my algorithmic trading system.
Following steps are one side of the algorithm and are used for calculating support levels. Please read notes below the algorithm to understand how to calculate resistance levels.
Algorithm
Break timeseries into segments of size N (Say, N = 5)
Identify minimum values of each segment, you will have an array of minimum values from all segments = :arrayOfMin
Find minimum of (:arrayOfMin) = :minValue
See if any of the remaining values fall within range (X% of :minValue) (Say, X = 1.3%)
Make a separate array (:supportArr)
add values within range & remove these values from :arrayOfMin
also add :minValue from step 3
Calculating support (or resistance)
Take a mean of this array = support_level
If support is tested many times, then it is considered strong.
strength_of_support = supportArr.length
level_type (SUPPORT|RESISTANCE) = Now, if current price is below support then support changes role and becomes resistance
Repeat steps 3 to 7 until :arrayOfMin is empty
You will have all support/resistance values with a strength. Now smoothen these values, if any support levels are too close then eliminate one of them.
These support/resistance were calculated considering support levels search. You need perform steps 2 to 9 considering resistance levels search. Please see notes and implementation.
Notes:
Adjust the values of N & X to get more accurate results.
Example, for less volatile stocks or equity indexes use (N = 10, X = 1.2%)
For high volatile stocks use (N = 22, X = 1.5%)
For resistance, the procedure is exactly opposite (use maximum function instead of minimum)
This algorithm was purposely kept simple to avoid complexity, it can be improved to give better results.
Here's my implementation:
public interface ISupportResistanceCalculator {
/**
* Identifies support / resistance levels.
*
* #param timeseries
* timeseries
* #param beginIndex
* starting point (inclusive)
* #param endIndex
* ending point (exclusive)
* #param segmentSize
* number of elements per internal segment
* #param rangePct
* range % (Example: 1.5%)
* #return A tuple with the list of support levels and a list of resistance
* levels
*/
Tuple<List<Level>, List<Level>> identify(List<Float> timeseries,
int beginIndex, int endIndex, int segmentSize, float rangePct);
}
Main calculator class
/**
*
*/
package com.perseus.analysis.calculator.technical.trend;
import static com.perseus.analysis.constant.LevelType.RESISTANCE;
import static com.perseus.analysis.constant.LevelType.SUPPORT;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Date;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import com.google.common.collect.Lists;
import com.perseus.analysis.calculator.mean.IMeanCalculator;
import com.perseus.analysis.calculator.timeseries.ITimeSeriesCalculator;
import com.perseus.analysis.constant.LevelType;
import com.perseus.analysis.model.Tuple;
import com.perseus.analysis.model.technical.Level;
import com.perseus.analysis.model.timeseries.ITimeseries;
import com.perseus.analysis.util.CollectionUtils;
/**
* A support and resistance calculator.
*
* #author PRITESH
*
*/
public class SupportResistanceCalculator implements
ISupportResistanceCalculator {
static interface LevelHelper {
Float aggregate(List<Float> data);
LevelType type(float level, float priceAsOfDate, final float rangePct);
boolean withinRange(Float node, float rangePct, Float val);
}
static class Support implements LevelHelper {
#Override
public Float aggregate(final List<Float> data) {
return Collections.min(data);
}
#Override
public LevelType type(final float level, final float priceAsOfDate,
final float rangePct) {
final float threshold = level * (1 - (rangePct / 100));
return (priceAsOfDate < threshold) ? RESISTANCE : SUPPORT;
}
#Override
public boolean withinRange(final Float node, final float rangePct,
final Float val) {
final float threshold = node * (1 + (rangePct / 100f));
if (val < threshold)
return true;
return false;
}
}
static class Resistance implements LevelHelper {
#Override
public Float aggregate(final List<Float> data) {
return Collections.max(data);
}
#Override
public LevelType type(final float level, final float priceAsOfDate,
final float rangePct) {
final float threshold = level * (1 + (rangePct / 100));
return (priceAsOfDate > threshold) ? SUPPORT : RESISTANCE;
}
#Override
public boolean withinRange(final Float node, final float rangePct,
final Float val) {
final float threshold = node * (1 - (rangePct / 100f));
if (val > threshold)
return true;
return false;
}
}
private static final int SMOOTHEN_COUNT = 2;
private static final LevelHelper SUPPORT_HELPER = new Support();
private static final LevelHelper RESISTANCE_HELPER = new Resistance();
private final ITimeSeriesCalculator tsCalc;
private final IMeanCalculator meanCalc;
public SupportResistanceCalculator(final ITimeSeriesCalculator tsCalc,
final IMeanCalculator meanCalc) {
super();
this.tsCalc = tsCalc;
this.meanCalc = meanCalc;
}
#Override
public Tuple<List<Level>, List<Level>> identify(
final List<Float> timeseries, final int beginIndex,
final int endIndex, final int segmentSize, final float rangePct) {
final List<Float> series = this.seriesToWorkWith(timeseries,
beginIndex, endIndex);
// Split the timeseries into chunks
final List<List<Float>> segments = this.splitList(series, segmentSize);
final Float priceAsOfDate = series.get(series.size() - 1);
final List<Level> levels = Lists.newArrayList();
this.identifyLevel(levels, segments, rangePct, priceAsOfDate,
SUPPORT_HELPER);
this.identifyLevel(levels, segments, rangePct, priceAsOfDate,
RESISTANCE_HELPER);
final List<Level> support = Lists.newArrayList();
final List<Level> resistance = Lists.newArrayList();
this.separateLevels(support, resistance, levels);
// Smoothen the levels
this.smoothen(support, resistance, rangePct);
return new Tuple<>(support, resistance);
}
private void identifyLevel(final List<Level> levels,
final List<List<Float>> segments, final float rangePct,
final float priceAsOfDate, final LevelHelper helper) {
final List<Float> aggregateVals = Lists.newArrayList();
// Find min/max of each segment
for (final List<Float> segment : segments) {
aggregateVals.add(helper.aggregate(segment));
}
while (!aggregateVals.isEmpty()) {
final List<Float> withinRange = new ArrayList<>();
final Set<Integer> withinRangeIdx = new TreeSet<>();
// Support/resistance level node
final Float node = helper.aggregate(aggregateVals);
// Find elements within range
for (int i = 0; i < aggregateVals.size(); ++i) {
final Float f = aggregateVals.get(i);
if (helper.withinRange(node, rangePct, f)) {
withinRangeIdx.add(i);
withinRange.add(f);
}
}
// Remove elements within range
CollectionUtils.remove(aggregateVals, withinRangeIdx);
// Take an average
final float level = this.meanCalc.mean(
withinRange.toArray(new Float[] {}), 0, withinRange.size());
final float strength = withinRange.size();
levels.add(new Level(helper.type(level, priceAsOfDate, rangePct),
level, strength));
}
}
private List<List<Float>> splitList(final List<Float> series,
final int segmentSize) {
final List<List<Float>> splitList = CollectionUtils
.convertToNewLists(CollectionUtils.splitList(series,
segmentSize));
if (splitList.size() > 1) {
// If last segment it too small
final int lastIdx = splitList.size() - 1;
final List<Float> last = splitList.get(lastIdx);
if (last.size() <= (segmentSize / 1.5f)) {
// Remove last segment
splitList.remove(lastIdx);
// Move all elements from removed last segment to new last
// segment
splitList.get(lastIdx - 1).addAll(last);
}
}
return splitList;
}
private void separateLevels(final List<Level> support,
final List<Level> resistance, final List<Level> levels) {
for (final Level level : levels) {
if (level.getType() == SUPPORT) {
support.add(level);
} else {
resistance.add(level);
}
}
}
private void smoothen(final List<Level> support,
final List<Level> resistance, final float rangePct) {
for (int i = 0; i < SMOOTHEN_COUNT; ++i) {
this.smoothen(support, rangePct);
this.smoothen(resistance, rangePct);
}
}
/**
* Removes one of the adjacent levels which are close to each other.
*/
private void smoothen(final List<Level> levels, final float rangePct) {
if (levels.size() < 2)
return;
final List<Integer> removeIdx = Lists.newArrayList();
Collections.sort(levels);
for (int i = 0; i < (levels.size() - 1); i++) {
final Level currentLevel = levels.get(i);
final Level nextLevel = levels.get(i + 1);
final Float current = currentLevel.getLevel();
final Float next = nextLevel.getLevel();
final float difference = Math.abs(next - current);
final float threshold = (current * rangePct) / 100;
if (difference < threshold) {
final int remove = currentLevel.getStrength() >= nextLevel
.getStrength() ? i : i + 1;
removeIdx.add(remove);
i++; // start with next pair
}
}
CollectionUtils.remove(levels, removeIdx);
}
private List<Float> seriesToWorkWith(final List<Float> timeseries,
final int beginIndex, final int endIndex) {
if ((beginIndex == 0) && (endIndex == timeseries.size()))
return timeseries;
return timeseries.subList(beginIndex, endIndex);
}
}
Here are some supporting classes:
public enum LevelType {
SUPPORT, RESISTANCE
}
public class Tuple<A, B> {
private final A a;
private final B b;
public Tuple(final A a, final B b) {
super();
this.a = a;
this.b = b;
}
public final A getA() {
return this.a;
}
public final B getB() {
return this.b;
}
#Override
public String toString() {
return "Tuple [a=" + this.a + ", b=" + this.b + "]";
};
}
public abstract class CollectionUtils {
/**
* Removes items from the list based on their indexes.
*
* #param list
* list
* #param indexes
* indexes this collection must be sorted in ascending order
*/
public static <T> void remove(final List<T> list,
final Collection<Integer> indexes) {
int i = 0;
for (final int idx : indexes) {
list.remove(idx - i++);
}
}
/**
* Splits the given list in segments of the specified size.
*
* #param list
* list
* #param segmentSize
* segment size
* #return segments
*/
public static <T> List<List<T>> splitList(final List<T> list,
final int segmentSize) {
int from = 0, to = 0;
final List<List<T>> result = new ArrayList<>();
while (from < list.size()) {
to = from + segmentSize;
if (to > list.size()) {
to = list.size();
}
result.add(list.subList(from, to));
from = to;
}
return result;
}
}
/**
* This class represents a support / resistance level.
*
* #author PRITESH
*
*/
public class Level implements Serializable {
private static final long serialVersionUID = -7561265699198045328L;
private final LevelType type;
private final float level, strength;
public Level(final LevelType type, final float level) {
this(type, level, 0f);
}
public Level(final LevelType type, final float level, final float strength) {
super();
this.type = type;
this.level = level;
this.strength = strength;
}
public final LevelType getType() {
return this.type;
}
public final float getLevel() {
return this.level;
}
public final float getStrength() {
return this.strength;
}
#Override
public String toString() {
return "Level [type=" + this.type + ", level=" + this.level
+ ", strength=" + this.strength + "]";
}
}
I put together a package that implements support and resistance trendlines like what you're asking about. Here are a few examples of some examples:
import numpy as np
import pandas.io.data as pd
from matplotlib.pyplot import *
gentrends('fb', window = 1.0/3.0)
Output
That example just pulls the adjusted close prices, but if you have intraday data already loaded in you can also feed it raw data as a numpy array and it will implement the same algorithm on that data as it would if you just fed it a ticker symbol.
Not sure if this is exactly what you were looking for but hopefully this helps get you started. The code and some more explanation can be found on the GitHub page where I have it hosted: https://github.com/dysonance/Trendy
I have figured out another way of calculating Support/Resistance dynamically.
Steps:
Create a list of important price - The high and low of each candle in your range is important. Each of this prices is basically a probable SR(Support / Resistance).
Give each price a score.
Sort the prices by score and remove the ones close to each other(at a distance of x% from each other).
Print the top N prices and having a mimimum score of Y. These are your Support Resistances. It worked very well for me in ~300 different stocks.
The scoring technique
A price is acting as a strong SR if there are many candles which comes close to this but cannot cross this.
So, for each candle which are close to this price (within a distance of y% from the price), we will add +S1 to the score.
For each candle which cuts through this price, we will add -S2(negative) to the score.
This should give you a very basic idea of how to assign scores to this.
Now you have to tweak it according to your requirements.
Some tweak I made and which improved the performance a lot are as follows:
Different score for different types of cut. If the body of a candle cuts through the price, then score change is -S3 but the wick of a candle cuts through the price, the score change is -S4. Here Abs(S3) > Abs(S4) because cut by body is more significant than cut by wick.
If the candle which closes close the price but unable to cross is a high(higher than two candles on each side) or low(lower than 2 candles on each side), then add a higher score than other normal candles closing near this.
If the candle closing near this is a high or low, and the price was in a downtrend or a uptrend (at least y% move) then add a higher score to this point.
You can remove some prices from the initial list. I consider a price only if it is the highest or the lowest among N candles on both side of it.
Here is a snippet of my code.
private void findSupportResistance(List<Candle> candles, Long scripId) throws ExecutionException {
// This is a cron job, so I skip for some time once a SR is found in a stock
if(processedCandles.getIfPresent(scripId) == null || checkAlways) {
//Combining small candles to get larger candles of required timeframe. ( I have 1 minute candles and here creating 1 Hr candles)
List<Candle> cumulativeCandles = cumulativeCandleHelper.getCumulativeCandles(candles, CUMULATIVE_CANDLE_SIZE);
//Tell whether each point is a high(higher than two candles on each side) or a low(lower than two candles on each side)
List<Boolean> highLowValueList = this.highLow.findHighLow(cumulativeCandles);
String name = scripIdCache.getScripName(scripId);
Set<Double> impPoints = new HashSet<Double>();
int pos = 0;
for(Candle candle : cumulativeCandles){
//A candle is imp only if it is the highest / lowest among #CONSECUTIVE_CANDLE_TO_CHECK_MIN on each side
List<Candle> subList = cumulativeCandles.subList(Math.max(0, pos - CONSECUTIVE_CANDLE_TO_CHECK_MIN),
Math.min(cumulativeCandles.size(), pos + CONSECUTIVE_CANDLE_TO_CHECK_MIN));
if(subList.stream().min(Comparator.comparing(Candle::getLow)).get().getLow().equals(candle.getLow()) ||
subList.stream().min(Comparator.comparing(Candle::getHigh)).get().getHigh().equals(candle.getHigh())) {
impPoints.add(candle.getHigh());
impPoints.add(candle.getLow());
}
pos++;
}
Iterator<Double> iterator = impPoints.iterator();
List<PointScore> score = new ArrayList<PointScore>();
while (iterator.hasNext()){
Double currentValue = iterator.next();
//Get score of each point
score.add(getScore(cumulativeCandles, highLowValueList, currentValue));
}
score.sort((o1, o2) -> o2.getScore().compareTo(o1.getScore()));
List<Double> used = new ArrayList<Double>();
int total = 0;
Double min = getMin(cumulativeCandles);
Double max = getMax(cumulativeCandles);
for(PointScore pointScore : score){
// Each point should have at least #MIN_SCORE_TO_PRINT point
if(pointScore.getScore() < MIN_SCORE_TO_PRINT){
break;
}
//The extremes always come as a Strong SR, so I remove some of them
// I also reject a price which is very close the one already used
if (!similar(pointScore.getPoint(), used) && !closeFromExtreme(pointScore.getPoint(), min, max)) {
logger.info("Strong SR for scrip {} at {} and score {}", name, pointScore.getPoint(), pointScore.getScore());
// logger.info("Events at point are {}", pointScore.getPointEventList());
used.add(pointScore.getPoint());
total += 1;
}
if(total >= totalPointsToPrint){
break;
}
}
}
}
private boolean closeFromExtreme(Double key, Double min, Double max) {
return Math.abs(key - min) < (min * DIFF_PERC_FROM_EXTREME / 100.0) || Math.abs(key - max) < (max * DIFF_PERC_FROM_EXTREME / 100);
}
private Double getMin(List<Candle> cumulativeCandles) {
return cumulativeCandles.stream()
.min(Comparator.comparing(Candle::getLow)).get().getLow();
}
private Double getMax(List<Candle> cumulativeCandles) {
return cumulativeCandles.stream()
.max(Comparator.comparing(Candle::getLow)).get().getHigh();
}
private boolean similar(Double key, List<Double> used) {
for(Double value : used){
if(Math.abs(key - value) <= (DIFF_PERC_FOR_INTRASR_DISTANCE * value / 100)){
return true;
}
}
return false;
}
private PointScore getScore(List<Candle> cumulativeCandles, List<Boolean> highLowValueList, Double price) {
List<PointEvent> events = new ArrayList<>();
Double score = 0.0;
int pos = 0;
int lastCutPos = -10;
for(Candle candle : cumulativeCandles){
//If the body of the candle cuts through the price, then deduct some score
if(cutBody(price, candle) && (pos - lastCutPos > MIN_DIFF_FOR_CONSECUTIVE_CUT)){
score += scoreForCutBody;
lastCutPos = pos;
events.add(new PointEvent(PointEvent.Type.CUT_BODY, candle.getTimestamp(), scoreForCutBody));
//If the wick of the candle cuts through the price, then deduct some score
} else if(cutWick(price, candle) && (pos - lastCutPos > MIN_DIFF_FOR_CONSECUTIVE_CUT)){
score += scoreForCutWick;
lastCutPos = pos;
events.add(new PointEvent(PointEvent.Type.CUT_WICK, candle.getTimestamp(), scoreForCutWick));
//If the if is close the high of some candle and it was in an uptrend, then add some score to this
} else if(touchHigh(price, candle) && inUpTrend(cumulativeCandles, price, pos)){
Boolean highLowValue = highLowValueList.get(pos);
//If it is a high, then add some score S1
if(highLowValue != null && highLowValue){
score += scoreForTouchHighLow;
events.add(new PointEvent(PointEvent.Type.TOUCH_UP_HIGHLOW, candle.getTimestamp(), scoreForTouchHighLow));
//Else add S2. S2 > S1
} else {
score += scoreForTouchNormal;
events.add(new PointEvent(PointEvent.Type.TOUCH_UP, candle.getTimestamp(), scoreForTouchNormal));
}
//If the if is close the low of some candle and it was in an downtrend, then add some score to this
} else if(touchLow(price, candle) && inDownTrend(cumulativeCandles, price, pos)){
Boolean highLowValue = highLowValueList.get(pos);
//If it is a high, then add some score S1
if (highLowValue != null && !highLowValue) {
score += scoreForTouchHighLow;
events.add(new PointEvent(PointEvent.Type.TOUCH_DOWN, candle.getTimestamp(), scoreForTouchHighLow));
//Else add S2. S2 > S1
} else {
score += scoreForTouchNormal;
events.add(new PointEvent(PointEvent.Type.TOUCH_DOWN_HIGHLOW, candle.getTimestamp(), scoreForTouchNormal));
}
}
pos += 1;
}
return new PointScore(price, score, events);
}
private boolean inDownTrend(List<Candle> cumulativeCandles, Double price, int startPos) {
//Either move #MIN_PERC_FOR_TREND in direction of trend, or cut through the price
for(int pos = startPos; pos >= 0; pos-- ){
Candle candle = cumulativeCandles.get(pos);
if(candle.getLow() < price){
return false;
}
if(candle.getLow() - price > (price * MIN_PERC_FOR_TREND / 100)){
return true;
}
}
return false;
}
private boolean inUpTrend(List<Candle> cumulativeCandles, Double price, int startPos) {
for(int pos = startPos; pos >= 0; pos-- ){
Candle candle = cumulativeCandles.get(pos);
if(candle.getHigh() > price){
return false;
}
if(price - candle.getLow() > (price * MIN_PERC_FOR_TREND / 100)){
return true;
}
}
return false;
}
private boolean touchHigh(Double price, Candle candle) {
Double high = candle.getHigh();
Double ltp = candle.getLtp();
return high <= price && Math.abs(high - price) < ltp * DIFF_PERC_FOR_CANDLE_CLOSE / 100;
}
private boolean touchLow(Double price, Candle candle) {
Double low = candle.getLow();
Double ltp = candle.getLtp();
return low >= price && Math.abs(low - price) < ltp * DIFF_PERC_FOR_CANDLE_CLOSE / 100;
}
private boolean cutBody(Double point, Candle candle) {
return Math.max(candle.getOpen(), candle.getClose()) > point && Math.min(candle.getOpen(), candle.getClose()) < point;
}
private boolean cutWick(Double price, Candle candle) {
return !cutBody(price, candle) && candle.getHigh() > price && candle.getLow() < price;
}
Some Helper classes:
public class PointScore {
Double point;
Double score;
List<PointEvent> pointEventList;
public PointScore(Double point, Double score, List<PointEvent> pointEventList) {
this.point = point;
this.score = score;
this.pointEventList = pointEventList;
}
}
public class PointEvent {
public enum Type{
CUT_BODY, CUT_WICK, TOUCH_DOWN_HIGHLOW, TOUCH_DOWN, TOUCH_UP_HIGHLOW, TOUCH_UP;
}
Type type;
Date timestamp;
Double scoreChange;
public PointEvent(Type type, Date timestamp, Double scoreChange) {
this.type = type;
this.timestamp = timestamp;
this.scoreChange = scoreChange;
}
#Override
public String toString() {
return "PointEvent{" +
"type=" + type +
", timestamp=" + timestamp +
", points=" + scoreChange +
'}';
}
}
Some example of SR created by the code.
Here's a python function to find support / resistance levels
This function takes a numpy array of last traded price and returns a
list of support and resistance levels respectively. n is the number
of entries to be scanned.
def supres(ltp, n):
"""
This function takes a numpy array of last traded price
and returns a list of support and resistance levels
respectively. n is the number of entries to be scanned.
"""
from scipy.signal import savgol_filter as smooth
# converting n to a nearest even number
if n % 2 != 0:
n += 1
n_ltp = ltp.shape[0]
# smoothening the curve
ltp_s = smooth(ltp, (n + 1), 3)
# taking a simple derivative
ltp_d = np.zeros(n_ltp)
ltp_d[1:] = np.subtract(ltp_s[1:], ltp_s[:-1])
resistance = []
support = []
for i in xrange(n_ltp - n):
arr_sl = ltp_d[i:(i + n)]
first = arr_sl[:(n / 2)] # first half
last = arr_sl[(n / 2):] # second half
r_1 = np.sum(first > 0)
r_2 = np.sum(last < 0)
s_1 = np.sum(first < 0)
s_2 = np.sum(last > 0)
# local maxima detection
if (r_1 == (n / 2)) and (r_2 == (n / 2)):
resistance.append(ltp[i + ((n / 2) - 1)])
# local minima detection
if (s_1 == (n / 2)) and (s_2 == (n / 2)):
support.append(ltp[i + ((n / 2) - 1)])
return support, resistance
SRC
The best way I have found to get SR levels is with clustering. Maxima and Minima is calculated and then those values are flattened (like a scatter plot where x is the maxima and minima values and y is always 1). You then cluster these values using Sklearn.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering
# Calculate VERY simple waves
mx = df.High_15T.rolling( 100 ).max().rename('waves')
mn = df.Low_15T.rolling( 100 ).min().rename('waves')
mx_waves = pd.concat([mx,pd.Series(np.zeros(len(mx))+1)],axis = 1)
mn_waves = pd.concat([mn,pd.Series(np.zeros(len(mn))+-1)],axis = 1)
mx_waves.drop_duplicates('waves',inplace = True)
mn_waves.drop_duplicates('waves',inplace = True)
W = mx_waves.append(mn_waves).sort_index()
W = W[ W[0] != W[0].shift() ].dropna()
# Find Support/Resistance with clustering
# Create [x,y] array where y is always 1
X = np.concatenate((W.waves.values.reshape(-1,1),
(np.zeros(len(W))+1).reshape(-1,1)), axis = 1 )
# Pick n_clusters, I chose the sqrt of the df + 2
n = round(len(W)**(1/2)) + 2
cluster = AgglomerativeClustering(n_clusters=n,
affinity='euclidean', linkage='ward')
cluster.fit_predict(X)
W['clusters'] = cluster.labels_
# I chose to get the index of the max wave for each cluster
W2 = W.loc[W.groupby('clusters')['waves'].idxmax()]
# Plotit
fig, axis = plt.subplots()
for row in W2.itertuples():
axis.axhline( y = row.waves,
color = 'green', ls = 'dashed' )
axis.plot( W.index.values, W.waves.values )
plt.show()
Here is the PineScript code for S/Rs. It doesn't include all the logic Dr. Andrew or Nilendu discuss, but definitely a good start:
https://www.tradingview.com/script/UUUyEoU2-S-R-Barry-extended-by-PeterO/
//#version=3
study(title="S/R Barry, extended by PeterO", overlay=true)
FractalLen=input(10)
isFractal(x) => highestbars(x,FractalLen*2+1)==-FractalLen
sF=isFractal(-low), support=low, support:=sF ? low[FractalLen] : support[1]
rF=isFractal(high), resistance=high, resistance:=rF ? high[FractalLen] : resistance[1]
plot(series=support, color=sF?#00000000:blue, offset=-FractalLen)
plot(series=resistance, color=rF?#00000000:red, offset=-FractalLen)
supportprevious=low, supportprevious:=sF ? support[1] : supportprevious[1]
resistanceprevious=low, resistanceprevious:=rF ? resistance[1] : resistanceprevious[1]
plot(series=supportprevious, color=blue, style=circles, offset=-FractalLen)
plot(series=resistanceprevious, color=red, style=circles, offset=-FractalLen)
I'm not sure if it's really "Support & Resistance" detection but what about this:
function getRanges(_nums=[], _diff=1, percent=true) {
let nums = [..._nums];
nums.sort((a,b) => a-b);
const ranges = [];
for (let i=0; i<nums.length; i+=1) {
const num = nums[i];
const diff = percent ? perc(_diff, num) : _diff;
const range = nums.filter( j => isInRange(j, num-diff, num+diff) );
if (range.length) {
ranges.push(range);
nums = nums.slice(range.length);
i = -1;
}
}
return ranges;
}
function perc(percent, n) {
return n * (percent * 0.01);
}
function isInRange(n, min, max) {
return n >= min && n <= max;
}
So let's say you have an array of close prices:
const nums = [12, 14, 15, 17, 18, 19, 19, 21, 28, 29, 30, 30, 31, 32, 34, 34, 36, 39, 43, 44, 48, 48, 48, 51, 52, 58, 60, 61, 67, 68, 69, 73, 73, 75, 87, 89, 94, 95, 96, 98];
and you want to kinda split the numbers by an amount, like difference of 5 (or 5%), then you would get back a result array like this:
const ranges = getRanges(nums, 5, false) // ranges of -5 to +5
/* [
[12, 14, 15, 17]
[18, 19, 19, 21]
[28, 29, 30, 30, 31, 32]
[34, 34, 36, 39]
[43, 44, 48, 48, 48]
[51, 52]
[58, 60, 61]
[67, 68, 69]
[73, 73, 75]
[87, 89]
[94, 95, 96, 98]
]
*/
// or like
//const ranges = getRanges(nums, 5, true) // ranges of -5% to +5%
therefore the more length a range has, the more important of a support/resistance area it is.
(again: not sure if this could be classified as "Support & Resistance")
I briefly read Jacob's contribution. I think it may have some issues with the code below:
# Now the min
if min1 - window < 0:
min2 = min(x[(min1 + window):])
else:
min2 = min(x[0:(min1 - window)])
# Now find the indices of the secondary extrema
max2 = np.where(x == max2)[0][0] # find the index of the 2nd max
min2 = np.where(x == min2)[0][0] # find the index of the 2nd min
The algorithm does try to find secondary min value outside given window, but then the position corresponding to np.where(x == min2)[0][0] may lie inside the the window due to possibly duplicate values inside the window.
If you are looking for horizontal SR lines, I would rather want to know the whole distribution. But I think it is also a good assumption to just take the max of your histogram.
# python + pandas
spy["Close"][:60].plot()
hist, border = np.histogram(spy["Close"][:60].values, density=False)
sr = border[np.argmax(hist)]
plt.axhline(y=sr, color='r', linestyle='-')
You might need to tweak the bins and eventually you want to plot the whole bin not just the lower bound.
lower_bound = border[np.argmax(hist)]
upper_bound = border[np.argmax(hist) + 1]
PS the underlying "idea" is very similar to #Nilendu's solution.
Interpretations of Support & Resistance levels is very subjective. A lot of people do it different ways. […] When I am evaluating S&R from the charts, I am looking for two primary things:
Bounce off - There needs to be a visible departure (bounce off) from the horizontal line which is perceived to define the level of support or resistance.
Multiple touches - A single touch turning point is not sufficient to indicate establish support or resistance levels. Multiple touches to the same approximately level should be present, such that a horizontal line could be drawn through those turning points.

Resources