Telling the strength of a candlestick is not straightforward in most cases. How can I combine the length of the upper and lower wicks and the body to wick ratios to arrive at a way of determining the strength of every candlestick type?
The term "strength of candlestick" is not a clear concept! No single candlestick can indicate any significant concept. Depending on your viewpoint on trading, on your trading strategy, and your interpretation, you can evaluate the market status or candles to make an estimation on the future of the markets. In forex, most of the times our predictions do not work expectedly.
Related
For my project i need estimate the point on a grid that i use. To check my method works i took some Readings through x axis as below when y=2;
Blue, Brown and Grey are the access point 1,2 and 3 average RSSI readings. The node moves from red dot to blue which is 120 cm.The fluctuations of the RSSI readings are not linear and this is a very big problem in my case to get the accurate position. I use Knn to get the nearest position. What can i do to make it correct.? Use some other classifier will help ?
Check out Is rssi a reliable parameter in sensor localization algorithms: An experimental study
While i dont completely agree to the way those tests were executed and analyzed, they miss alot of details on analyzing the results such as differentiating the readings of RSSI over the different BLE channels, or measurements on antenna characteristics and orientation, the core statement i consider quite on point.
RSSI cannot be used as a reliable metric in localization algorithms
Beside the issues reflection, shadowing, antenna characteristics, etc. pose on the difficulty, the BLE Specification itself adds to the problem as the RSSI is not defined as an absolute value, it is specified to be used in a relative manner related to the distance to the golden receiver range, the rx power that would be not too weak and not too strong for the receiver to have the best receive quality. Also this RSSI can vary +/- 6dBm from the real value.
This means, we can hardly rely on the same readings across different devices and secondly, the accurracy is allowed to vary alot according to the specification.
For that reason, projects relying too hard on that accurracy are doomed to fail one way or another. However there are still applications possible getting something positive out of these RSSI readings, i.e. not relying completely on them, but instead use them as indicator in a supportive way.
If you are interested more on this matters, search for Indoor localization rssi i.e. on google scholar.
I am working with sample data set to learn clustering. This data set contains number of occurrences for the keywords.
Since all are number of occurrences for the different keywords, will it be OK not to scale the values and use them as it is?
I read couple of articles on internet where its emphasized that scaling is important as it will adjust the relativity of the frequency. Since most of frequencies are 0 (95%+), z score scaling will change the shape of distribution, which I am feeling could be problem as I am changing the nature of data.
I am thinking of not changing values at all to avoid this. Will that affect the quality of results I get from the clustering?
As it was already noted, the answer heavily depends on an algorithm being used.
If you're using distance-based algorithms with (usually default) Euclidean distance (for example, k-Means or k-NN), it'll rely more on features with bigger range just because a "typical difference" of values of that feature is bigger.
Non-distance based models can be affected, too. Though one might think that linear models do not get into this category since scaling (and translating, if needed) is a linear transformation, so if it makes results better, then the model should learn it, right? Turns out, the answer is no. The reason is that no one uses vanilla linear models, they're always used with with some sort of a regularization which penalizes too big weights. This can prevent your linear model from learning scaling from data.
There are models that are independent of the feature scale. For example, tree-based algorithms (decision trees and random forests) are not affected. A node of a tree partitions your data into 2 sets by comparing a feature (which splits dataset best) to a threshold value. There's no regularization for the threshold (because one should keep height of the tree small), so it's not affected by different scales.
That being said, it's usually advised to standardize (subtract mean and divide by standard deviation) your data.
Probably it depends on the classification algorithm. I'm only familiar with SVM. Please see Ch. 2.2 for the explanation of scaling
The type of feature (count of words) doesn't matter. The feature ranges should be more or less similar. If the count of e.g. "dignity" is 10 and the count of "have" is 100000000 in your texts, then (at least on SVM) the results of such features would be less accurate as when you scaled both counts to similar range.
The cases, where no scaling is needed are those, where the data is scaled implicitly e.g. features are pixel-values in an image. The data is scaled already to the range 0-255.
*Distance based algorithm need scaling
*There is no need of scaling in tree based algorithms
But it is good to scale your data and train model ,if possible compare the model accuracy and other evaluations before scaling and after scaling and use the best possibility
These is as per my knowledge
OK, so you have some historic data in the form of [say] an array of integers. This, for example, could represent free-space on a server HDD over a two-year period, with each array element representing a daily sample.
The data (free-space in this example) has a downward trend, but also has periodic positive spikes where files have been removed/compressed, Etc.
How would you go about identifying the overall trend for the two-year period, i.e.: iron out the peaks and troughs in the data?
Now, I did A-level statistics and then a stats module in my degree, but I've slept over 7,000 times since then, and well, it's leaked out of my brain.
I'm not after a bit of code as such, more of a description of how you'd approach this problem...
Thanks in advance!
You'll get many different answers, and the one you choose really depends on more specific requirements you may have. Examples:
Low-pass filter, or any other spectral analysis technique, and use the low frequencies to determine trend.
Linear regression (time/value) to find "r" (the correlation between time and the value).
Moving average of last "n" samples. If "n" is large enough this is my favorite as many times this is sufficient, and is very easy to code. It's a sort of approximation to #1 above.
I'm sure they'll be others.
If I was doing this to produce a line through points for me to look at, I would probably use a some variant of Loess, described at http://en.wikipedia.org/wiki/Local_regression, http://stat.ethz.ch/R-manual and /R-patched/library/stats/html/loess.html. Basically, you find the smoothed value at any particular point by doing a weighted regression on the data points near that point, with the nearest points given the most weight.
Hello All with my first post here,
I work on tracking objects through images without prior training. I use two features, the color of the region (the ab channels of the Lab space) and the HOG. In my initial experiments, I found that using min. distance classifier with the HOG feature alone has the advantage of low false positives FP but with high FN. On the other hand, using the min. distance classifier with the color alone increases the TP and decreases the FN results but with the price of increasing FP.
My question, is how to combine the two classifiers? I like to know the standard algorithm to do that in an unsupervised way.
I tried to combine the two features into one feature (after normalization) but the HOG dominates the results. Even if I weighted the combined feature, results are worse than either of the two.
The good results I reach till now is to (cascade) the two classifiers, by running the color first to increase the possibilities then run the HOG (with a threshold a little bit higher than that used with HOG alone). I googled the topic but I don't have enough knowledge about classification to find the standard methods.
Thanks for help
Some firm is supplied with large wooden panels. These panels are cut to required pieces. To make for example bookshelf, they have to cut pieces from the large panel. In most cases, the pig panel is not used from 100%, there will be some loss, some remainder pieces, which can not be used. So to minimize the loss, they have to find optimal layout of separate pieces on big panel/panels. I think this is called "two dimensional rectangle bin packing problem".
Now it is getting more interesting.
Not all panels are the same, they can have slightly different tone. Ideal bookshelf is made from pieces all cut from one panels or multiple panels with same color tone. But bookshelf can be produced in different qualities (ideal one; one piece with different tone; two pieces..., three different color plates used; etc...). Each quality has its own price. (the superior in quality the more expensive).
Now we have some wooden panels in stock and request to some furnitures (e.g. 100 bookshelves). The goal is to maximize the profit (e.g. create some ones in ideal quality and some in less quality to keep material loss low).
How to solve this problem? How to combine it with bin packing problem? And hints, papers/articles would be appreciated. I know I can minimize/maximize some function and inequalities with integer linear programming, but I really do not know how to solve this.
(please, do not consider the real scenerio, when for example would be the best to create only ideal ones... imagine, that loss from remaining material is X money per cm^2 and Y is the price for specific product quality and that X and Y can be "arbitrary")
I can give an idea of how these problems are solved and why yours is particularly difficult.
In a typical optimization problem, you want to maximize or minimize a function (e.g. energy) with respect to a set number of variables (e.g. length). For example, how long should a spring be in order to minimize the stored energy. The answer is just a number, the equilibrium length of the spring. Another example would be "what price should we set our product to maximize profit?" (Too expensive and no-one will buy anything; too cheap and you won't cover your costs.) Again, the answer is just a number, the optimal price. Optimizations like that are handled with ordinary calculus.
A much more difficult optimization problem is where the answer isn't a number, but a function, like a shape. An example is: what shape will a hanging chain make in order to minimize its gravitational potential energy. Or: what shape should we cut out of these boards in order to maximize profit? This type of problem is solved using variational calculus, which is very difficult.
In any case, when solving optimization problems numerically, there are a few basic steps to follow. First you have to define a function, for example profit(cuts,params) that you want to maximize with respect to some variables 'cuts', with other parameters 'params' fixed. 'params' stores information like the amount and type of wood that you have, and the amount of money different type of furniture is worth.
The second step is to come up with a guess for the best set of cuts, we'll call it cuts_guess. In order to do this you need to come up with an algorithm that will suggest a set of furniture you could actually make using the supplies that you have. For example, if you can make at least one bookshelf from each board, then that could be your initial guess for the best way to use the wood.
The third phase is the optimization. For the initialization, set cuts_best=cuts_guess and profit_best=profit_guess=profit(cuts_guess, params). Then you need (an algorithm) to make small pseudo-random changes to 'cuts', and check if profit increases or decreases. Record the best set of cuts that you find, and the corresponding profit. Usually it's best if there some randomness involved, in order to explore the largest number of possibilities and not get 'stuck' on a poor choice. You'll find examples of this if you look up 'Monte Carlo algorithm'.
Anyway, all of this will be very difficult for your problem. It's easy how to come up with a guess for a variable (e.g. length), and then how to change that guess (e.g. increase or decrease the length a bit). It's not at all obvious how to make a 'guess' for how to place a cut-out on a board, or how to make a small change.