How to efficiently print strings with time span in a stream conditions? - algorithm

Suppose we get a stream string input with clear data structure:
content:arrive time
Below are samples:
AAA : 12:00:00
ABC : 12:00:01
ABB : 12:00:02
ABM : 12:00:11
And we have a program to check this stream input and if
1) this content does not exist before, print this content;
2) if this content arrive before and the time span is less than 10 seconds, print empty;
3) if this content arrive before and the time span is more than 10 seconds, print content;
Hashtable(String, Date) is OK, and we can update the date when a new one come in.
And my question is:
What if is the string numbers are quite large and can't be stored in hashtable? Considering we are design a program that run 24*7 and hashtable becomes bigger and bigger.
Any other way we can do to solve this issue? And can we solve this with several servers?

You could use a Hashtable(String -> Date) and a simple Queue([Date, String]), every time you add a new item to the hashtable add it to the queue as well.
Every few minutes pop items from the queue and remove them from the hashtable until the date isn't old enough (<10 seconds).
This way you only hold data from the last few minutes and both the hashtable and the queue won't grow too much.

Related

How to see what value is being calculated pine Editor

I have the following script running with the intention of closing a trade after it has been open for a period of 4 days since the trade was taken.
TimeDiff = time - time[1]
MinutesPerBar = TimeDiff / 60000
//calcuates how long one bar is in minutes
BarsSinceSwingLongCondition = barssince(SwingLongCondition)
// Calculates how many bars have passed since open of trade
CurrentSwingTradeDuration = BarsSinceSwingLongCondition * MinutesPerBar
//calculates the duration that the trade has been opened for (minutes*number of bars)
MaximumSwingTradeDuration = 4*1440
// Sets maximum trade duration. Set at 4 Days in minutes
SwingLongCloseLogic3 = CurrentSwingTradeDuration > MaximumSwingTradeDuration
// Closes trade when trade duration exceeds maximum duration set (4days)
The close logic however isn't executing when I run the strategy as i have trades open for longer than the maximum duration.
Is there any way to see what value each element of the formula is calculating so that I can see where the error is (i suspect it could be the time element). Or can anyone see where I am going wrong in the code?
The fastest way to achieve that is using the plotchar function, which would show the values in the data-window on mouse-over on each bar. The user manual contains several other techniques available for debugging.

Find the difference between 2 dates and check if smaller than a given value

my issue is that I want to be able to get two time stamps and compare if the second (later taken) one is less than 59 minutes away from the first one.
Following this thread Compare two dates with JavaScript
the date object may do the job.
but first thing i am not happy with is that it takes the time from my system.
is it possible to get the time from some public server or something?
cause there always is a chance that the system clock gets manipulated within the time stamps, so that would be too unreliable.
some outside source would be great.
then i am not too sure how to get the difference between 2 times (using 2 date objects).
many issue that may pop up:
time being something like 3:59 and 6:12
so just comparing minutes would give the wrong idea.
so we consider hours too.
biut there the issue with the modulo 24.
day 3 23:59 and day 4 0:33 wouldnt be viewed proper either.
so including days too.
then the modulo 30 thing, even though that on top changes month for month.
so month and year to be included as well.
so we would need the whole date, everything from current year to second (because second would be nice too, for precision)
and comparing them would require tons of if clauses for year, month, etc.
do the date objects have some predfeined date comparision function that actually keeps all these things in mind (havent even mentioned leap years yet, have I)?
time would be very important cause exactly at the 59 minutes mark (+-max 5 seconds wouldnt matter but getting rmeitely close to 60 is forbidden)
a certain function would have to be used that without fail closes a website.
script opens website at mark 0 min, does some stuff rinse and repeat style and closes page at 59 min mark.
checking the time like every few seconds would be smart.
Any good ideas how to implement such a time comparision that doesnt take too more computer power yet is efficient as in new month starting and stuff doesnt mess it up?
You can compare the two Date times, but when creating a date time there is a parameter of DateTime(value) which you can use.
You can use this API to get the current UTC time which returns a example JSON array like this:
{
"$id":"1",
"currentDateTime":"2019-11-09T21:12Z",
"utcOffset":"00:00:00",
"isDayLightSavingsTime":false,
"dayOfTheWeek":"Saturday",
"timeZoneName":"UTC",
"currentFileTime":132178075626292927,
"ordinalDate":"2019-313",
"serviceResponse":null
}
So you can use either the currentFileTime or the currentDateTime return from that API to construct your date object.
Example:
const date1 = new Date('2019-11-09T21:12Z') // time when I started writing this answer
const date2 = new Date('2019-11-09T21:16Z') // time when I finished writing this answer
const diff = new Date(date2-date1)
console.log(diff.toTimeString()) // time it took me to write this
Please keep in mind that due to network speeds, the time API will be a little bit off (by a few milliseconds)

Hard Disk scheduling simulator algorithm (track to track timing) Perl

I am trying to get to grips with perl. I am trying to write a few scripts as a scheduling simulator. FCFS, SSTF and Scan and Look
I have one array with a list of block requests and another to act as the buffer. First I will copy over the first request, then I need to work out the time it takes to get from the first to the second block.
the buffer reads in blocks at 1 per ms, seek, search and access time are all 1ms to make the calculations a bit easier, the simulator always starts on block 1 track 1.
http://postimg.org/image/d9osb8tkj/
so if the first block is 5, the search time will be 3ms to traverse to the start of the 5th block, the seek time will be zero as its on the same track and the access time to read the block will always be 1ms. This means that the time for this request will be 4ms so the simulator will read in the next 4 requests into the buffer. In first come first served this will just be the order that the requests are served.
So if the next request to serve is 12 the arm is on the end of the 5th block so will take 2ms to get to the right track then 1ms to get to the start of the 12th block and another 1ms to access it.
I was just wondering if anyone could give me some idea how I could express this as an algorithm. Just some pointers would be much appreciated.
write a class HardDiskSim::Abstract, 3 subs seek_time(), spin_time(), and read_time()
Write a subclass of AbstractDisk for each different set of values/logic for the three methods.
Fir example:
package HardDiskSim::Simple;
use base qw(HardDiskSim::Abstract);
our $SECTORS_PER_TRACK = 5;
our $SEEK_TTIM_PER_TRACK = 1;
sub read_time { return 1 }
sub seek_time {
my $block = #_;
my $tracks_to_seek = int($block / $SECTORS_PER_TRACK);
return $tracks_to_seek * $SEEK_TTIM_PER_TRACK;
}
sub spin_time {
# compute head position at end of seek using seek time and RPM of disk
# compute number of sectors to spin past using computed head position
# return number_of_sectors_to_spin_past * time_per_sector
}
I had the fun of writing this kind of code in Fortran, for a class, back in 1985.

Spark Streaming:how to sum up all result for several DStreams?

I am now using Spark Streaming + Kafka to construct my message processing system.But I have a little technical problem , I will describe it below:
For example , I want to do a wordcount for each 10 minutes,So, in my earliest code,I set Batch Interval to 10 minutes.Code is like below:
val sparkConf = new SparkConf().setAppName(args(0)).setMaster(args(1))
val ssc = new StreamingContext(sparkConf, Minutes(10))
But I don't think it is a very good solution because 10 minutes is what a long time and large amount of data that my memory cannot sustain so much data.So , I want to reduce batch interval to 1 minutes, like:
val sparkConf = new SparkConf().setAppName(args(0)).setMaster(args(1))
val ssc = new StreamingContext(sparkConf, Minutes(1))
Then the problem comes:How can I sum up the result of 10 minutes for ten '1 minutes'? I think this word can only be done in driver instead of worker program,what can I do?
I am new learner of Spark Streaming.Any one can give me a hand?
Maybe I have my idea. In this condition ,I should use stateful function like UpdateStateByKey() because , since what I want is a global 10 minutes' result but what I can get is just each intermediate result of each 1 minute , so before each 10 minutes end , I have to record the state of each 1 minute , such as the word count result of each 1 minute and add them up for each 1 minute.
Posting here as I had a similar issue and came across the Window Operations section of Spark Streaming. In the poster's original case, they want a count for the past 10 minutes, done every 10 minutes although their program calculates counts each 1 minute. Assuming we have counts defined and calculated as the standard word count (i.e. at a 1-minute batch duration, with tuples (word, count)), we could follow the linked guide and define something along the lines of
// Reduce/count last 10 seconds worth of data, every 10 seconds
val windowedWordCounts = counts.reduceByKeyAndWindow(_+_, Seconds(10), Seconds(10))
where _+_ is a sum function.

Limitation in retrieving rows from a mongodb from ruby code

I have a code which gets all the records from a collection of a mongodb and then it performs some computations.
My program takes too much time as the "coll_id.find().each do |eachitem|......." returns only 300 records at an instant.
If I place a counter inside the loop and check it prints 300 records and then sleeps for around 3 to 4 seconds before printing the counter value for next set of 300 records..
coll_id.find().each do |eachcollectionitem|
puts "counter value for record " + counter.to_s
counter=counter +1
---- My computations here -----
end
Is this a limitation of ruby-mongodb api or some configurations needs to be done so that the code can get access to all the records at one instant.
How large are your documents? It's possible that the deseriaization is taking a long time. Are you using the C extensions (bson_ext)?
You might want to try passing a logger when you connect. That could help sort our what's going on. Alternatively, can you paste in the MongoDB log? What's happening there during the pause?

Resources