Understanding rounds in Chainlink - oracle

I want to use Chainlink to get the price of an asset at a given point in time in the past (which I will refer to as "expiry") in order to settle options.
To get a historical price, Chainlink requires to pass in a roundId as argument into the getRoundData function. In order to verify that the round implied by the given roundId includes the expiry time, my initial thought was checking inside the smart contract whether startedAt <= expiry && timestamp >= expiry for the received roundData.
In order to assess whether this is a feasible approach, I would like to better understand the concept of rounds in Chainlink:
Can I assume that rounds always span adjacent time intervals, i.e. if say a round starts at unixTime t1 and ends at t2, can I assume that the next round will start at t2?
getRoundData(roundId) returns startedAt and timestamp. Does the timeStamp represent the end of the given roundId?
What exactly is the answeredInRound that I receive as output from getRoundData?
Any help is highly appreciated.

There are currently 2 "trigger" parameters that kick off Chainlink nodes to update. If the real-world price of an asset deviates past some interval, it will trigger all the nodes to do an update. Right now, most Ethereum data feeds have a 0.5% deviation threshold. If the price stays within the deviation parameters, it will only trigger an update every X minutes/hours. You can see these parameters on data.chain.link.
timeStamp (updatedAt) - Timestamp of when the round was updated. More details here
answeredInRound is a legacy variable from when the Chainlink price feeds were on the Flux Aggregator model instead of OCR (Off-Chain Reporting), and it was possible for a price to be updated slowly and leek into the next "round". This is now no longer the case.
More info is available at Data Feeds API Reference.
Additionally, since you want to use Chainlink to get the price of an asset at a given point in time in the past, there is an open-source community project that solves the same issue

Related

SpringBoot - observability on *_max *_count *_sum metrics

Small question regarding Spring Boot, some of the useful default metrics, and how to properly use them in Grafana please.
Currently with a Spring Boot 2.5.1+ (question applicable to 2.x.x.) with Actuator + Micrometer + Prometheus dependencies, there are lots of very handy default metrics that come out of the box.
I am seeing many many of them with pattern _max _count _sum.
Example, just to take a few:
spring_data_repository_invocations_seconds_max
spring_data_repository_invocations_seconds_count
spring_data_repository_invocations_seconds_sum
reactor_netty_http_client_data_received_bytes_max
reactor_netty_http_client_data_received_bytes_count
reactor_netty_http_client_data_received_bytes_sum
http_server_requests_seconds_max
http_server_requests_seconds_count
http_server_requests_seconds_sum
Unfortunately, I am not sure what to do with them, how to correctly use them, and feel like my ignorance makes me miss on some great application insights.
Searching on the web, I am seeing some using like this, to compute what seems to be an average with Grafana:
irate(http_server_requests_seconds::sum{exception="None", uri!~".*actuator.*"}[5m]) / irate(http_server_requests_seconds::count{exception="None", uri!~".*actuator.*"}[5m])
But Not sure if it is the correct way to use those.
May I ask what sort of queries are possible, usually used when dealing with metrics of type _max _count _sum please?
Thank you
UPD 2022/11: Recently I've had a chance to work with these metrics myself and I made a dashboard with everything I say in this answer and more. It's available on Github or Grafana.com. I hope this will be a good example of how you can use these metrics.
Original answer:
count and sum are generally used to calculate an average. count accumulates the number of times sum was increased, while sum holds the total value of something. Let's take http_server_requests_seconds for example:
http_server_requests_seconds_sum 10
http_server_requests_seconds_count 5
With the example above one can say that there were 5 HTTP requests and their combined duration was 10 seconds. If you divide sum by count you'll get the average request duration of 2 seconds.
Having these you can create at least two useful panels: average request duration (=average latency) and request rate.
Request rate
Using rate() or irate() function you can get how many there were requests per second:
rate(http_server_requests_seconds_count[5m])
rate() works in the following way:
Prometheus takes samples from the given interval ([5m] in this example) and calculates difference between current timepoint (not necessarily now) and [5m] ago.
The obtained value is then divided by the amount of seconds in the interval.
Short interval will make the graph look like a saw (every fluctuation will be noticeable); long interval will make the line more smooth and slow in displaying changes.
Average Request Duration
You can proceed with
http_server_requests_seconds_sum / http_server_requests_seconds_count
but it is highly likely that you will only see a straight line on the graph. This is because values of those metrics grow too big with time and a really drastic change must occur for this query to show any difference. Because of this nature, it will be better to calculate average on interval samples of the data. Using increase() function you can get an approximate value of how the metric changed during the interval. Thus:
increase(http_server_requests_seconds_sum[5m]) / increase(http_server_requests_seconds_count[5m])
The value is approximate because under the hood increase() is rate() multiplied by [inverval]. The error is insignificant for fast-moving counters (such as the request rate), just be ready that there can be an increase of 2.5 requests.
Aggregation and filtering
If you already ran one of the queries above, you have noticed that there is not one line, but many. This is due to labels; each unique set of labels that the metric has is considered a separate time series. This can be fixed by using an aggregation function (like sum()). For example, you can aggregate request rate by instance:
sum by(instance) (rate(http_server_requests_seconds_count[5m]))
This will show you a line for each unique instance label. Now if you want to see only some and not all instances, you can do that with a filter. For example, to calculate a value just for nodeA instance:
sum by(instance) (rate(http_server_requests_seconds_count{instance="nodeA"}[5m]))
Read more about selectors here. With labels you can create any number of useful panels. Perhaps you'd like to calculate the percentage of exceptions, or their rate of occurrence, or perhaps a request rate by status code, you name it.
Note on max
From what I found on the web, max shows the maximum recorded value during some interval set in settings (default is 2 minutes if to trust the source). This is somewhat uncommon metric and whether it is useful is up to you. Since it is a Gauge (unlike sum and count it can go both up and down) you don't need extra functions (such as rate()) to see dynamics. Thus
http_server_requests_seconds_max
... will show you the maximum request duration. You can augment this with aggregation functions (avg(), sum(), etc) and label filters to make it more useful.

Data Structure for time scheduling?

I am in need of a data structure that can properly model blocks of time, like appointments. For example, each appointment has a time it starts on, and a time it ends on. I need to have extremely fast access to things like:
Does a specified start time and end time conflict with an existing event?
What events exist from a specified start time and end time?
Ideally the data structure could model something like the image below.
I thought of using a binary search tree (ex. Java's TreeMap) but I can't think of what key or value I would use. Is there a single data structure or combination of data structures that is strong at modeling this?
A Guava Table would probably work for your use case, depending on what it is you want to actually index on.
A naive approach would be to index by name, then time of day, and then have a value whether or not this particular block is occupied by that particular person.
This would make the instantiation of the object become...
Table<String, LocalDateTime, Boolean> calendar = TreeBasedTable.create();
You would populate each individual's allocation at a given interval. You get to set what that interval is - if it's broken into 15, 30 or 1 hour periods (as defined by the table).
To find if a time is occupied, you look for the closest interval to the time you want to schedule a person. You'd use the column() method for this to see if there's any availability, or you could get specific and get a row for the individual. This means you'd have to pull two values; the start time you want, and however many minutes your interval is out. That part I'll have to leave as an exercise for the reader.

Best practices for overall rating calculation

I have LAMP-based business application. SugarCRM to be more precise. There are 120+ active users at the moment. Every day each user generates some records that are used in complex calculation to get so called “individual rating”.
It takes for about 6 seconds to calculate one “individual rating” value. And there was not a big problem before: each user hits the link provided to start “individual rating” calculations, waits for 6-7 seconds, and get the value displayed.
But now I need to implement “overall rating” calculation. That means that additionally to “individual rating” I have to calculate and display to the user:
minimum individual rating among ALL the users of the application
maximum individual rating among ALL the users of the application
current user position in the range of all individual ratings.
Say, current user has individual rating equal to 220 points, minimum value of rating is 80, maximum is 235 and he is on 23rd position among all the users.
What are (imho) the main problems to be solved?
If one calculation lasts for 6 seconds, that overall calculations will take more than 10 minutes. I think it’s no good to make the application almost unaccessible for this period. And what if the quantity of users will rise in the nearest future 2-3 times?
Those calculations could be done as nightly job but all the users are in different timezones. In Russia difference between extreme timezones is 9 hours. So people in west part of Russia are still working in “today”. While people in eastern part is waking up to work in “tomorrow”. So what is the best time for nightly job in this case?
Are there any best practices|approaches|algorithms to build such rating system?
Given only the information provided, the only options I see:
The obvious one - reduce the time taken for a rating calculation (6 seconds to calculate 1 user's rating seems like a lot)
If possible, have intermediate values which you only recalculate some of, as required (for example, have 10 values that make up the rating, all based on different data, when some of the data changes, flag the appropriate values for recalcuation). Either do this recalculation:
During your daily recalculation or
When the update happens
Partial batch calculation - only recalculate x of the users' ratings at chosen intervals (where x is some chosen value) - has the disadvantage that, at all times, some of the ratings can be out of date
Calculate if not busy - either continuously recalculate ratings or only do so at a chosen interval, but instead of locking the system, have it run as a background process, only doing work if the system is idle
(Sorry, didn't manage with "long" comment posting; so decided to post as answer)
#Dukeling
SQL query that takes almost all the time for calculation mentioned above is just a replication of business logic that should be executed in PHP code. The logic was moved into SQL with the hope to reduce calculation time. OK, I’ll try both to optimize SQL query and play with executing logic in PHP code.
Suppose after that optimized application calculates individual rating for just 1 second. Great! But even in this case the first user logged into system should awaits for 120 seconds (120+ users * 1 sec = 120 sec) to calculate overall rating and gets its position in it.
I’m thinking of implementing the following approach:
Let’s have 2 “overall ratings” – “today” and “yesterday”.
For displaying purposes we’ll use “yesterday” overall rating represented as huge already sorted PHP array.
When user hits calculation link he started “today” calculation but application displays him “yesterday” value. Thus we have quickly accessible “yesterday” rating and each user randomly launches rating calculation that will be displayed for them tomorrow.
User list are partitioned by timezones. Each hour a cron job started to check if there’re any users in selected timezone that don’t have “today” individual rating calculated (e.g. user didn’t log into application). If so, application starts calculation of individual rating and puts its value in “today” (still invisible) ovarall rating array. Thus we have a cron job that runs nightly for each timezone-specific user group and fills the probable gaps in case users didn’t log into system.
After all users in all timezones had been worked out, application
sorts “today” array,
drops “yesterday” one,
rename “today” in “yesterday” and
initialize new “today”.
What do you think of it? Is it reasonable enough or not?

Several sensors - noise filtering algorithm needed

My software receives information from several sensors. The number of sensors is not fixed - they can be added and removed, each sensor one has its own unique identifier. Sensors send data irregularly - they can keep silent for weeks or push data every second. Each sensor generates a value from a fixed set of values ​​- so sensors are discrete. My program logs each message from each sensor into an SQL database table (sensorId, time, value).
The task is to filter the information. I need to select only one record from this log, which I'm considering to be the actual information. For example, if I get the latest record from a single sensor, which says that value is A, but before it 10 different sensors told me that the value is B, then I shall still consider B to be the actual information. At the same time the problem is not just the usual noise filtering, because if there was one sensor which told me for a month every second that value was C, and then five sensors recently tell that in fact the value is D, I shall immediately consider D to be the actual data despite long history - I want to say that the number of independent sources also must have weight.
So, I think I get a kind of a function of two variables - time (ageing) and the number of unique sensors in the moment of time. So, I think, I must somehow calculate the weight of each record and then just select the one with the biggest weight. And I suppose that to calculate the record weight I should use not only the information from current record, but information from all the previous ones.
I need some help with the algorithm. Maybe there is actually some well-known solution which I'm not aware of?

How do I model "relative" time in a database?

Clarification: I am not trying to calculate friendly times (e.g., "8 seconds ago") by using a timestamp and the current time.
I need to create a timeline of events in my data model, but where these events are only relative to each other. For example, I have events A, B, and C. They happen in order, so it may be that B occurs 20 seconds after A, and that C occurs 20 years after B.
I don't care about the unit of time. For my purpose, there is no time, just relativity.
I intend to model this like a linked list, where each event is a node:
Event
id
name
prev_event
next_event
Is this the most efficient way to model relative events?
All time recorded by computers is relative time, nominally it is relative to an epoch as an offset in milliseconds. Normally this epoch is an offset from 1970/01/01 as is the case with Unix.
If you store normal everyday timestamp values, you already have relative time between events if they are all sequential, you just need to subtract them to get intervals which are what you are calling relative times but they are actually intervals.
You can use whatever resolution you need to use, milliseconds is what most things use, if you are sampling things at sub-millisecond resolution, you would use nanoseconds
I don't think you need to link to previous and next event, why not just use a timestamp and order by the timestamp?
If you can have multiple, simultaneous event timelines, then you would use some kind of identifier to identify the timeline (int, guid, whatever) and key that in witht the timestamp. No id is even necessary unless you need to refer to it by an single number.
Something like this:
Event
TimeLineID (key)
datetime (key)
Name

Resources