Cumulative Return For the following data - algorithm

Actually I was reading a blog on how to calculate the cumulating return of a stock for each day.
The formula that was described in the blog to calculate cumulating return was
(1 + TodayReturn) * (1 + Cumulative_Return_Of_Previous_Day) - 1 , but still I am not able to calculate the cumulative return that was provided on it.
Can someone please make this clear that how is cumulative return has been calculated in the table given below. This would be a lot of help.
Thanks in advance.
| Days | Stock Price | Return | Cumulative Return|
---------------------------------------------------
| Day 1 | 150 | | |
| Day 2 | 153 | 2.00 % | 2.00 % |
| Day 3 | 160 | 4.58 % | 6.67 % |
| Day 4 | 163 | 1.88 % | 8.67 % |
| Day 5 | 165 | 1.23 % | 10.00 % |
---------------------------------------------------

Related

filter and parse unstructured log with logstash

i'm pretty new to logstash and i have a problem.
I want to filter a line from a log that is different from others and then make some manipulation with grok.
This is the log file that i have:
Date: 3/1/2021 -- 05:08:14 (uptime: 2d, 22h 36m 18s)
------------------------------------------------------------------------------------
Counter | TM Name | Value
------------------------------------------------------------------------------------
capture.kernel_packets | Total | 433066
capture.kernel_drops | Total | 18183
decoder.pkts | Total | 414883
decoder.bytes | Total | 453509832
decoder.ipv4 | Total | 413834
decoder.ipv6 | Total | 778
decoder.ethernet | Total | 414883
decoder.tcp | Total | 409208
decoder.udp | Total | 5266
decoder.icmpv6 | Total | 56
decoder.avg_pkt_size | Total | 1093
decoder.max_pkt_size | Total | 1514
flow.tcp | Total | 1273
flow.udp | Total | 1336
flow.icmpv6 | Total | 26
flow.wrk.spare_sync_avg | Total | 100
flow.wrk.spare_sync | Total | 18
decoder.event.ipv4.opt_pad_required | Total | 82
decoder.event.ipv6.zero_len_padn | Total | 24
flow.wrk.flows_evicted_needs_work | Total | 535
flow.wrk.flows_evicted_pkt_inject | Total | 579
flow.wrk.flows_evicted | Total | 406
flow.wrk.flows_injected | Total | 535
tcp.sessions | Total | 667
tcp.syn | Total | 669
tcp.synack | Total | 668
tcp.rst | Total | 407
tcp.stream_depth_reached | Total | 14
tcp.reassembly_gap | Total | 8
tcp.overlap | Total | 27
detect.alert | Total | 1106
app_layer.flow.http | Total | 41
app_layer.tx.http | Total | 126
app_layer.flow.tls | Total | 611
app_layer.flow.ntp | Total | 15
app_layer.tx.ntp | Total | 15
app_layer.flow.dhcp | Total | 4
app_layer.tx.dhcp | Total | 6
app_layer.flow.dns_udp | Total | 964
app_layer.tx.dns_udp | Total | 1934
app_layer.flow.failed_udp | Total | 353
flow.mgr.full_hash_pass | Total | 35
flow.spare | Total | 9856
flow.mgr.rows_maxlen | Total | 2
flow.mgr.flows_checked | Total | 3998
flow.mgr.flows_notimeout | Total | 1808
it is repeating starting from the date. i need only the date string and nothing more and then i want to make some manipulation on the data, to send them as json. There is a way to do so?
First, you must manage multiline. So I suppose you have input log file (you could change as you want the approach is the same).
input {
file {
path => ["inputlogs/*"]
codec => multiline {
pattern => "^Date: %{DATE:date} -- %{TIME:time}"
negate => true
what => previous
}
}
}
After this, you must filter each line (I guess csv filter must be more useful in this case, but we can handle this with grok too as you've asked) :
grok {
match => ["message","^%{NOTSPACE:fieldname}\s*\| %{WORD:field}\s*\| %{INT:value}"]
}
All lines that don't match this pattern (header and -- line) are flaged with _grokparsefailure

Manually calculating time complexity of recursive Fibonacci algorithm

I am trying to understand the time complexity of the recursive Fibonacci algorithm.
fib(n)
if (n < 2)
return n
return fib(n-1)+fib(n-2)
Having not much mathematical background, I tried computing it by hand. That is, I manually count the number of steps as n increases. I ignore all things that I think are constant time. Here is how I did it. Say I want to compute fib(5).
n = 0 - just a comparison on an if statement. This is constant.
n = 1 - just a comparison on an if statement. This is constant.
n = 2 - ignoring anything else, this should be 2 steps, fib(1) takes 1 step and fib(0) takes 1 step.
n = 3 - 3 steps now, fib(2) takes two steps and fib(1) takes 1 step.
n = 4 - 5 steps now, fib(3) takes 3 steps and fib(2) takes 2 steps.
n = 5 - 8 steps now, fib(4) takes 5 steps and fib(3) takes 3 steps.
Judging from these, I believe the running time might be fib(n+1). I am not so sure if 1 is a constant factor because the difference between fib(n) and fib(n+1) might be very large.
I've read the following on SICP:
In general, the number of steps required by a tree-recursive process
will be proportional to the number of nodes in the tree, while the
space required will be proportional to the maximum depth of the tree.
In this case, I believe the number of nodes in the tree is fib(n+1). So I am confident I am correct. However, this video confuses me:
So this is a thing whose time complexity is order of actually, it
turns out to be Fibonacci of n. There's a thing that grows exactly as
Fibonacci numbers. 
...
That every one of these nodes in this tree has to be examined.
I am absolutely shocked. I've examined all nodes in the tree and there are always fib(n+1) nodes and thus number of steps when computing fib(n). I can't figure out why some people say it is fib(n) number of steps and not fib(n+1).
What am I doing wrong?
In your program, you have this time-consuming actions (sorted by time used per action, quick actions on top of the list):
Addition
IF (conditional jump)
Return from subroutine
Function call
Lets look at how many of this actions are executed, and lets compare this with n and fib(n):
n | fib | #ADD | #IF | #RET | #CALL
---+-----+------+-----+------+-------
0 | 0 | 0 | 1 | 1 | 0
1 | 1 | 0 | 1 | 1 | 0
For n≥2 you can calculate the numbers this way:
fib(n) = fib(n-1) + fib(n-2)
ADD(n) = 1 + ADD(n-1) + ADD(n-2)
IF(n) = 1 + IF(n-1) + IF(n-2)
RET(n) = 1 + RET(n-1) + RET(n-2)
CALL(n) = 2 + CALL(n-1) + CALL(n-2)
Why?
ADD: One addition is executed directly in the top instance of the program, but in the both subroutines, that you call are also additions, that need to be executed.
IF and RET: Same argument as before.
CALL: Also the same, but you execute two calls in the top instance.
So, this is your list for other values of n:
n | fib | #ADD | #IF | #RET | #CALL
---+--------+--------+--------+--------+--------
0 | 0 | 0 | 1 | 1 | 0
1 | 1 | 0 | 1 | 1 | 0
2 | 1 | 1 | 3 | 3 | 2
3 | 2 | 2 | 5 | 5 | 4
4 | 3 | 4 | 9 | 9 | 8
5 | 5 | 7 | 15 | 15 | 14
6 | 8 | 12 | 25 | 25 | 24
7 | 13 | 20 | 41 | 41 | 40
8 | 21 | 33 | 67 | 67 | 66
9 | 34 | 54 | 109 | 109 | 108
10 | 55 | 88 | 177 | 177 | 176
11 | 89 | 143 | 287 | 287 | 286
12 | 144 | 232 | 465 | 465 | 464
13 | 233 | 376 | 753 | 753 | 752
14 | 377 | 609 | 1219 | 1219 | 1218
15 | 610 | 986 | 1973 | 1973 | 1972
16 | 987 | 1596 | 3193 | 3193 | 3192
17 | 1597 | 2583 | 5167 | 5167 | 5166
18 | 2584 | 4180 | 8361 | 8361 | 8360
19 | 4181 | 6764 | 13529 | 13529 | 13528
20 | 6765 | 10945 | 21891 | 21891 | 21890
21 | 10946 | 17710 | 35421 | 35421 | 35420
22 | 17711 | 28656 | 57313 | 57313 | 57312
23 | 28657 | 46367 | 92735 | 92735 | 92734
24 | 46368 | 75024 | 150049 | 150049 | 150048
25 | 75025 | 121392 | 242785 | 242785 | 242784
26 | 121393 | 196417 | 392835 | 392835 | 392834
27 | 196418 | 317810 | 635621 | 635621 | 635620
You can see, that the number of additions is exactly the half of the number of function calls (well, you could have read this directly out of the code too). And if you count the initial program call as the very first function call, then you have exactly the same amount of IFs, returns and calls.
So you can combine 1 ADD, 2 IFs, 2 RETs and 2 CALLs to one super-action that needs a constant amount of time.
You can also read from the list, that the number of Additions is 1 less (which can be ignored) than fib(n+1).
So, the running time is of order fib(n+1).
The ratio fib(n+1) / fib(n) gets closer and closer to Φ, the bigger n grows. Φ is the golden ratio, i.e. 1.6180338997 which is a constant. And constant factors are ignored in orders. So, the order O(fib(n+1)) is exactly the same as O(fib(n)).
Now lets look at the space:
It is true, that the maximum space, needed to process a tree is equal to the maximum distance between the tree and the maximum distant leaf. This is true, because you call f(n-2) after f(n-1) returned.
So the space needed by your program is of order n.

Is it possible to do a 'normalized' dense_rank() in hive?

I have a consumer table like so.
consumer | product | quantity
-------- | ------- | --------
a | x | 3
a | y | 4
a | z | 1
b | x | 3
b | y | 5
c | x | 4
What I want is a 'normalized' rank assigned to each consumer so that I can split the table easily for testing and training. I used the dense_rank() in hive, so I got the below table.
rank | consumer | product | quantity
---- | -------- | ------- | --------
1 | a | x | 3
1 | a | y | 4
1 | a | z | 1
2 | b | x | 3
2 | b | y | 5
3 | c | x | 4
This is well and good, but I want to scale this to use with any number of consumers, so I would ideally like the range of ranks between 0 and 1, like so.
rank | consumer | product | quantity
---- | -------- | ------- | --------
0.33 | a | x | 3
0.33 | a | y | 4
0.33 | a | z | 1
0.67 | b | x | 3
0.67 | b | y | 5
1 | c | x | 4
This way, I'd always know what the range of ranks is, and can split the data in a standard way (rank <= 0.7 training, and rank > 0.7 testing)
Is there a way to achieve this in hive?
Or, is there a different and better approach to my original issue of splitting the data?
I tried to do a select * where rank < 0.7*max(rank), but hive says the MAX UDAF is not yet available in where clause.
percent_rank
select percent_rank() over (order by consumer) as pr
,*
from mytable
;
+-----+----------+---------+----------+
| pr | consumer | product | quantity |
+-----+----------+---------+----------+
| 0.0 | a | z | 1 |
| 0.0 | a | y | 4 |
| 0.0 | a | x | 3 |
| 0.6 | b | y | 5 |
| 0.6 | b | x | 3 |
| 1.0 | c | x | 4 |
+-----+----------+---------+----------+
For filtering you'll need a sub-query / CTE
select *
from (select percent_rank() over (order by consumer) as pr
,*
from mytable
) t
where pr <= ...
;

How to implement cumulative sum for specific group in informatica

I have below data set
ESN | DATE | SV_NO
123 | 22-NOV | 2
123 | 23-NOV | 2
123 | 25-NOV | 3
123 | 27-NOV | 2
123 | 27-NOV | 3
123 | 28-NOV | 4
123 | 28-NOV | 2
124 | 21-NOV | 0
124 | 23-NOV | 3
124 | 24-NOV | 3
124 | 25-NOV | 2
124 | 27-NOV | 2
124 | 28-NOV | 3
124 | 30-NOV | 0
and I want to achieve below output using informatica. All data is sorted based on ESN and DATE. I have to calculate the SUM on the basis of ESN and SV_NO 0.11 value is stored in one variable port.
ESN | DATE | SV_NO | SUM
123 | 22-NOV | 2 | 0.11
123 | 23-NOV | 2 | 0.22
123 | 25-NOV | 3 | 0.11
123 | 27-NOV | 2 | 0.33
123 | 27-NOV | 3 | 0.22
123 | 28-NOV | 4 | 0.11
123 | 28-NOV | 2 | 0.44
124 | 21-NOV | 0 | 0.11
124 | 23-NOV | 3 | 0.11
124 | 24-NOV | 3 | 0.22
124 | 25-NOV | 2 | 0.11
124 | 27-NOV | 2 | 0.22
124 | 28-NOV | 3 | 0.33
124 | 30-NOV | 0 | 0.22
Please provide me the proper solution for this.
First sort the data by ESN and SV_NO. Then in an Expression transformation, do the following:
ESN: <-- I/O port
DATE: <-- I/O port
SV_NO: <-- I/O port
v_CONST:=0.11
v_SUM:= IIF(ESN = v_prev_ESN AND SV_NO=v_prev_SV_NO, v_SUM+v_CONST, v_CONST)
o_SUM:= <-- Output port
v_prev_ESN:= ESN
v_prev_SV_NO:= SV_NO
Now, again sort the data by ESN and DATE before loading the target

80% Rule Estimation Value in PL/SQL

Assume a range of values inserted in a schema table and in the end of the month i want to apply for these records (i.e. 2500 rows = numeric values) the algorithm: sort the values descending (from the smallest to highest value) and then find the 80% value of the sorted column.
In my example, if each row increases by one starting from 1, the 80% value will be the 2000 row=value (=2500-2500*20/100). This algorithm needs to be implemented in a procedure where the number of rows is not constant, for example it can varries from 2500 to 1,000,000 per month
Hint: You can achieve this using Oracle's cumulative aggregate functions. For example, suppose your table looks like this:
MY_TABLE
+-----+----------+
| ID | QUANTITY |
+-----+----------+
| A | 1 |
| B | 2 |
| C | 3 |
| D | 4 |
| E | 5 |
| F | 6 |
| G | 7 |
| H | 8 |
| I | 9 |
| J | 10 |
+-----+----------+
At each row, you can sum the quantities so far using this:
SELECT
id,
quantity,
SUM(quantity)
OVER (ORDER BY quantity ROWS UNBOUNDED PRECEDING)
AS cumulative_quantity_so_far
FROM
MY_TABLE
Giving you:
+-----+----------+----------------------------+
| ID | QUANTITY | CUMULATIVE_QUANTITY_SO_FAR |
+-----+----------+----------------------------+
| A | 1 | 1 |
| B | 2 | 3 |
| C | 3 | 6 |
| D | 4 | 10 |
| E | 5 | 15 |
| F | 6 | 21 |
| G | 7 | 28 |
| H | 8 | 36 |
| I | 9 | 45 |
| J | 10 | 55 |
+-----+----------+----------------------------+
Hopefully this will help in your work.
Write a query using the percentile_disc function to solve your problem. Sounds like it does what you want.
An example would be
select percentile_disc(0.8) within group (order by the_value)
from my_table

Resources