How can close all the winning trades in the forex market using the Python language? - algorithmic-trading

I'm writing code in Python, which opens trades according to certain conditions. Sometimes I need to close all profitable trades, which takes a long time if I do it manually, therefore, How can close all the winning trades in the forex market using the Python language?

I'm writing code in Python, which opens trades according to certain conditions. Sometimes I need to close all profitable trades, which takes a long time if I do it manually, therefore, How can close all the winning trades in the forex market using the Python language?

using asyncio/multiprocessing, opening/requesting/closing trades one by one at a time is a bit of like slightly silly. open your trades as a batch of parallel tasks.
and yeah, they shoul close themselves once they are done with their job

Related

Need an advice about algorithm to solve quite specific Job Shop Scheduling Problem

Job Shop Scheduling Problem (JSSP): I have jobs that consist of tasks and I have machines that can perform these tasks.
I should be able to add new jobs dynamically. E.g. I have a schedule for the first 5 jobs, and when the 6th arrive - I need to be able to fit it into the schedule in the best way. It is possible to adjust existing schedule within the given flexibility constrains.
Look at the picture below.
Jobs have tasks, each task is the same type of action. Think about painting of some objects with paint spray. All the machines are the same (paint sprays), and all of the tasks are the same.
Constraint 1. Jobs have a preferred deadline for completion, but the deadline is flexible to some extent.
Edit after #tucuxi answer: Flexible deadline mean that the time of completion can be extended by some delta if necessary.
Constraint 2. Between the jobs there is resting phase. Think about drying the paint. Resting phase has minimal required duration. Resting phase can be longer or shorter if necessary.
Edit after #tucuxi answer: So there is planned time of rest Tp which is desired, but flexible value that can be increased or decreased if this allows for better scheduling. And there is minimal time of rest Tm. So Tp-Tadjustmenet>=Tm.
The machine is occupied by the job from the start to the completion.
Here goes parts that make this problem very distinct from what I have read about.
Jobs arrive in batches of several jobs. For example a batch can contain 10 jobs of the type Job_1 and 5 of Job_2. Different batches can contain different types of jobs. All the jobs from the batches should be finished as close to each other as possible. Not necessary at the same time, but we need to minimize the delay between the completion of first and last jobs from the batch.
Constraint 3. Machines are grouped. In each group only M machines can work simultaneously. Think about paint sprays that are connected to the common pressurizer that has limited performance.
The goal.
Having given description of the problem, it should be possible to solve JSSP. It should be also possible to add new jobs to the existing schedule.
Edit after #tucuxi answer: This is not a task that should be solved immediately: it is not a time-critical system. But it shouldn't be too long to irritate a human who put new tasks into the algorithm.
Question
What kind of many JSSP algorithms can help me solve this? I can implement an algorithm by myself, if there is one. The closest I found is This - Resource Constrained Project Scheduling Problem. But I was not able to comprehend how can I glue it to the JSSP solving algorithm.
Edit after #tucuxianswer: No, I haven't tried it yet.
Is there any libraries that can be used to solve this problem? Python or C# are the preferred languages, but in the end it doesn't really matter.
I appreciate any help: keyword to search for, link, reference to a book, reference to a library.
Thank you.
I doubt that there is a pre-made algorithm that solves your exact problem.
If I had to solve it, I would first:
compile datasets of inputs that I can feed into candidate solvers.
think of a metric to rank outputs, so that I can compare the candidates to see which is better.
A baseline solver could be a brute-force search: test and rate all possible job schedulings for small sample problems. This is of course infeasible for large inputs, but for small inputs it allows you to compare the outputs of more efficient solvers to a known-best answer.
Your link is to localsolver.com, which appears to provide a library for specifying problem constraints to then solve them. It is not freely available, requiring a license to use; but it would seem that your problem can be readily modeled in it. Have you tried to do so? They appear to support both C++ and Python. Other free options exist, including optaplanner (2.8k stars in github) or python-constraint (I have not looked into other languages).
Note that a good metric is crucial to choosing a good algorithm: unless you have a clear cost function to minimize, choosing "a good algorithm" is impossible. In your description of the problem, I see several places where cost is unclear (marked in italics):
job deadlines are flexible
minimal required rest times... which may be shortened
jobs from a batch should be finished as close together as possible
(not from specification): how long can you wait for an optimal vs a less-optimal-but-faster solution?

Expressing both concurrency and time in activity diagrams

I am not sure how to express my scenario using activity diagrams:
What I am trying to visualise is the fact that:
A message is received
Two independent and concurrent actions take place: logging of the message and processing the message
Logging always takes less time than processing
The first activity in the diagram is correct in the sense that the actions are independent but it does not relay the fact that logging is guaranteed to take less time than processing.
The second activity in the diagram is not correct because, even if logging completes before processing, it looks as though processing depended on the logging's finishing first and that does not represent the reality.
Here is a non-computer related example:
You are a novice in birdwatching, trying to make your first notes in your notebook about birds passing by
A flock of birds approaches, you try to recognise as many details as possible
You want to write down the details in your notebook, but wait, you begin to realise that your theoretical background does not work in practice, what should be a quick scribble actually amounts to nothing in the end because you did not recognise anything
In the meantime, the birds majestically flew away without waiting for you, the activity is gone
Or maybe you did actually write it down, it took you only a moment and the birds are still nearby, slowly flying away, ending the activity again after some time
Or maybe you were under such awe that you just kept watching at them, without taking any notes - they fly away, disappearing in the horizon, ending the activity
After a few hours, you have enough notes and you come home very happy - maybe you did not capture everything but this was enough to make you smile anyway
I can always add a comment to a diagram to express it all somehow but I wonder, is there a more structured way to express what I described in an activity diagram? If not an activity diagram then what kind of a diagram would be better suited in your opinion? Thank you.
Your first diagram assumes that the duration of logging is always shorter than processing:
If this assumption is correct, the upper flow reaches the flow-final node, and the remaining flows continue until the first reaches the activity-final node. Here, the processing continues and the activity ends when the processing ends. This is exactly what you want.
But if once, the execution would deviate from this assumption and logging would get delayed for any reason, then the end of the processing would reach the activity-final node, resulting in the immediate interruption of all other ongoing activities. So logging would not complete. Maybe it’s not a problem for you, but in most cases audit expects logs to be complete.
You may be interested in a safer way that would be to add a join node:
The advantage is that the activity does not depend on any assumptions. It will always work:
whenever the logging is faster, the token on that flow will wait at the join node, and as soon as process is finished the activity (safely) the join can happen and the outgoing token reaches the end. This is exactly what you currently expect.
if the logging is exceptionally slower, no problem: the processing will be over, but the activity will wait for the logging to be completed.
This robust notation makes logging like Schroedinger's cat in its box: we don't have to know what activity is longer or shorter. At the end of the activity, both actions are completed.
Time in activity diagrams?
Activity diagrams are not really meant to express timing and duration. It's about the flow of control and the synchronization.
However, if time is important to you, you could:
visually make one activity shorter than the other. This is super-ambiguous and absolute meaningless from a formal UML point of view. But it's intuitive when readers see the parallel flow (a kind of sublminal communication ;-) ) .
add a comment note to express your assumption in plain English. This has the advantage of being very clear an unambiguous.
using UML duration constraints. This is often used in timing diagram, sometimes in sequence diagrams, but in general not in activity diagrams (personally I have never seen it, but UML specs doesn't exclude it either).
Time is something very general in the UML specs, and defined independently of the diagram. For example:
8.4.4.2: A Duration is a value of relative time given in an implementation specific textual format. Often a Duration is a non- negative integer expression representing the number of “time ticks” which may elapse during this duration.
8.5.1: An Interval is a range between two values, primarily for use in Constraints that assert that some other Element has a value in the given range. Intervals can be defined for any type of value, but they are especially useful for time and duration values as part of corresponding TimeConstraints and DurationConstraints.
In your case you have a duration observation for the processing (e.g. d), and a duration constraint for the logging (e.g. 0..d).
8.5.4.2: An IntervalConstraint is shown as an annotation of its constrainedElement. The general notation for Constraints may be used for an IntervalConstraint, with the specification Interval denoted textually (...).
Unfortunately little more is said. The only graphical examples are for messages in sequence diagrams (Fig 8.5 and 17.5) and for timing diagrams (Fig 17.28 to 17.30). Nevertheless, the notation could be extrapolated for activity diagrams, but it would be so unusal that I'd rather recommend the comment note.

Is there guaranteed order of subscribers in chronicle queue?

I'm looking at Chronicle and I don't understand one thing.
An example - I have a queue with one writer - market data provider writes tick data as they appear.
Let's say queue has 10 readers - each reader is a different trading strategy that reads the new tick, and might send buy or sell order, let's name them Strategy1 .. Strategy10.
Let's say there is a rule that I can have only one trade open at any given time.
Now the problem - as far as I understand, there is no guarantee on the order how these subscribed readers process the tick event. Each strategy is subscribed to the queue, so each of them will get the new tick asynchronously.
So when I run it for the first time, it might be that Strategy1 receives the tick first and places the order, all the other strategies will then be not able to place their orders.
If I'll replay the same sequence of events, it might be that a different strategy processes the tick first, and places its order.
This will result in totally different results while using the same sequence of initial events.
Am I understanding something wrong or is this how it really works?
What are the possible solutions to this problem?
What I want to achieve is that the same sequence of source events (ticks) produces always the same sequence of trades.
If you want determinism in your system, then you need to remove sources of non-determinism. In your case, since you can only have a single trade open at one time, it sounds like it would be sensible to run all 10 strategies on a single thread (reader). This would also remove the need for any synchronisation on the reader side to ensure that there is only one open trade.
You can then use some fixed ordering to your strategies (e.g. round-robin) that will always produce the same output for a given set of inputs. Alternatively, if the decision logic is too expensive to run in serial, it could be executed in parallel, with each reader feeding into some form of gate (a Phaser-like structure), on which a decision about what strategy to use can be made deterministically. The drawback to this design is that the slowest trading strategy would hold back all the others.
Think you need to make a choice about how much you want concurrently and independently, and how much you want in order and serial. I suggest you allow the strategies to place orders independently, however the reader of those order to process them in the original order by checking a sequence number such as the queue index in the first queue.
This way the reader of the orders will process them in the same order regardless of the order they are processed and written which appears to be your goal.

When timing how long a quick process runs, how many runs should be used?

Lets say I am going to run process X and see how long it takes.
I am going to save into a database a date I ran this process, and the time it took. I want to know what to put into the DB.
Process X almost always runs under 1500ms, so this is a short process. It usually runs between 500 and 1500ms, quite a range (3x difference).
My question is, how many "runs" should be saved into the DB as a single run?
Every run saved into the DB as its
own row?
5 Runs, averaged, then save that
time?
10 Runs averaged?
20 Runs, remove anything more than 2
std deviations away, and save
everything inside that range?
Does anyone have any good info backing them up on this?
Save the data for every run into its own row. Then later you can use and analyze the data however you like... ie, all you the other options you listed can be performed after the fact. It's not really possible for someone else to draw meaningful conclusions about how to average/analyze the data without knowing more about what's going on.
The fastest run is the one that most accurately times only your code.
All slower runs are slower because of noise introduced by the operating system scheduler.
The variance you experience is going to differ from machine to machine, and even on identical machines, the set of runnable processes will introduce noise.
None of the above. Bran is close though. You should save every measurment. But don't average them. The average (arithmetic mean) can be very misleading in this type of analysis. The reason is that some of your measurments will be much longer than the others. This will happen becuse things can interfere with your process - even on 'clean' test systems. It can also happen becuse your process may not be as deterministic as you might thing.
Some people think that simply taking more samples (running more iterations) and averaging the measurmetns will give them better data. It doesn't. The more you run, the more likelty it is that you will encounter a perturbing event, thus making the average overly high.
A better way to do this is to run as many measurments as you can (time permitting). 100 is not a bad number, but 30-ish can be enough.
Then, sort these by magnitude and graph them. Note that this is not a standard distribution. Compute compute some simple statistics: mean, median, min, max, lower quaertile, upper quartile.
Contrary to some guidance, do not 'throw away' outside vaulues or 'outliers'. These are often the most intersting measurments. For example, you may establish a nice baseline, then look for departures. Understanding these departures will help you fully understand how your process works, how the sytsem affecdts your process, and what can interfere with your process. It will often readily expose bugs.
Depends what kind of data you want. I'd say one line per run initially, then analyze the data, go from there. Maybe store a min/max/average of X runs if you want to consolidate it.
http://en.wikipedia.org/wiki/Sample_size
Bryan is right - you need to investigate more. if your code has that much variance even "most" of the time then you might have a lot of fluctuation in your test environment because of other processes, os paging or other factors. If not it seems that you have code paths doing wildly varying amount of work and coming up with a single number/run data to describe the performance of such a multi-modal system is not going to tell you much. So i'd say isolate your setup as much as possible, run at least 30 trials and get a feel for what your performance curve looks like. Once you have that, you can use that wikipedia page to come up with a number that will tell you how many trials you need to run per code-change to see if the performance has increased/decreased with some level of statistical significance.
While saying, "Save every run," is nice, it might not be practical in your case. However, I do think that storing only the average eliminates too much data. I like storing the average of ten runs, but instead of storing just the average, I'd also store the max and min values, so that I can get a feel for the spread of the data in addition to its center.
The max and min information in particular will tell you how often corner cases arise. Is the 1500ms case a one-in-1000 outlier? Or is it something that recurs on a regular basis?

Is user delay between random takes is good improvement for PRNG?

I thought that for making random choices for example for next track in a player or next page in the browser it could be possible to use time as 'natural phenomenon', for example decent RPNG just can continuously get next random number without program request (for example in a thread every several milliseconds or event more often) and when the time comes (based on the user decision), the choice will be naturally affected by this user delay.
Is this approach is good enough and how can it be tested? The problem for testing manually is that I can not wait that long in real world to save enough random numbers to feed them to some test program. Any artificial attempt to speed this up will make the method itself invalid.
Thanks
A good random number generator really doesn't need improvement, and even if it did, it isn't clear that user input timing would help.
Could a user ever detect a pattern in tracks selected by an LCG? Whatever your platform, its likely that its built-in random() function would be good enough (that is, it would appear completely random to a user).
If you are still worried, however, use a cryptographic quality RNG, seeded with data from the dedicated source of randomness on your system. Nowadays, many of these system RNGs use truly random bits generated through quantum events in hardware. However, they can be slow to produce bits, so its best to use them as a seed for a fast, algorithmic PRNG.
Now, if you aren't convinced these approaches are good enough, you should be very skeptical that the timing of user typing is a good source. The keys that are pressed by users are highly predictable, given the limited vocabulary in use and the patterns that tend to appear within that limited set of words. This predictability in letter sequences leads to a high degree of predictability in timing between key presses.
I know that a lot of security programs use this technique during key generation. I don't think that it is pure snake oil, but it could be a placebo to placate users. A good product will depend on the system RNG.
Acquiring the time information that you describe can indeed add entropy to a PRNG. However, from your description of your intended applications, I don't think you need it. For "random choices for example for next track in a player or next page in the browser", a trivial, unmodified PRNG is fine. For security applications such as nonces, etc. it is much more important.
Anyway, you should read about PRNG entropy sources.
I wouldn't improve PRNGs with user delays, mostly because they're quite regular: you type at around the same speed, and it takes too long to measure the delay between a click and another (assuming normal usage). I'd rather use other user-triggered events: pressed keys, distance between each click, position of the mouse at given moments.

Resources