How to convert a positive duration into a negative in Google Spreadsheet - time

I have to deal with time and duration in Google Spreadsheet App and I have to calculate with negative duration.
Problem:
--------------------------------------------------------
Begin | End | Duration | calculated in negative (for some reasons)
--------------------------------------------------------
08:00 | 14:00 | 06:00 | no
10:00 | 15:00 | 05:00 | yes
If column 'Begin' and 'End' were formatted as "Time", the difference can be easily calculated in the duration column. However converting the duration value into a negative one with a simple solution like(end-begin)*(-1)seems not to be supported.
First solution:
With the following formula I achieved one goal:
[duration = end - begin]
(HOUR(duration)*60) + MINUTE(duration))(-1)
I had to convert the duration into minutes, multiply with -1 to convert the number into negative. But this leads to a strange behavior:
--------------------------------------------------------
Begin | End | Duration | calculated in negative (for some reasons)
--------------------------------------------------------
08:00 | 14:00 | 06:00 | no
10:00 | 15:00 | -7200:00:00 | yes
So I tried to divide it with 24, 60, 3600, but nothing seems to fit. Until I used the magic number 1440.
This number is a multiple of 60, exactly 24 times.
Final solution:
[duration = end - begin]
((HOUR(duration)*60) + MINUTE(duration))(-1))/1440
My questions are:
Does anyone know why to use the number 1440?
Is there another way to solve this problem?

Google Sheets treat dates and time like serial numbers (same as Excel does):
today() is 42 458;
tommorow = today() + 1 = 42 459;
each day counts one.
time is the number between 0 and 1. So we have 24 hours in 1, and 60 minutes in 1 hour. Therefore to get duration
in minutes: = 24 * 60 = 1440;
in seconds = 24 * 60 * 60 = 86 400;

Related

How to calculate no of matches in given days

We have 16 teams and 2 grounds. Both 2 grounds available for 60 min on Monday and 120 min on Tuesdays. That means 2 matches can be possible on Monday and 4 matches on Tuesday. If one team can not play the match again on the same day how many matches can be possible on Monday and Tuesday respectively.
Hint:-
No of matches for N number of teams,
= (N-1)+(N-2)+(N-3)+(N-4).....+(N-N)
example for 16 teams
(16-1)+(16-2)+(16-3)....+(16-16)= 120 Matches
How can we find the formula for the N number of teams and N number of grounds?
I am asking about an optimal solution so I can find minimum days to complete all the matches.
from itertools import combinations
# Get all combinations of 16 teams and length 2
# Formula for combination is Nc2 = ((N)*(N-1))/2)
total_no_match = combinations([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], 2)
total_no_match = list(total_no_match)
total_no_match = [str(total_no_match) for total_no_match in total_no_match]
print("Total number of matches:", len(total_no_match))
# On Monday 2 Matches and on Tuesday 4 Matches can be played. So total no of matches played in a week = 6
m = 2
t = 4
# Total no weeks required to complete 120 match:
total_week = (len(total_no_match))/(m+t)
total_week = int(total_week)
print("Total week required to complete {} match:".format(len(total_no_match)), total_week)
# No of matches on Monday:
total_match_on_monday = total_week*m
print("Total number of Match on Monday:", total_match_on_monday)
# No of matches on Monday:
total_match_on_tuesday = total_week*t
print("Total number of Match on Tuesday:", total_match_on_tuesday)
Output:
Total number of matches: 120
Total week required to complete 120 match: 20
Total number of Match on Monday: 40
Total number of Match on Tuesday: 80
We can simply use the formula of NĂ—(N-1)/2 (Robin Round tournament ) to calculate the total number of matches, which gives us the answer 120 in this case.
As mentioned, there are 6 matches possible in a week, so it will take 20 for 120 matches.
2 on Monday and 4 on Tuesday meaning 40 on Mondays and the remaining 80 on Tuesdays for 20 weeks.

Using Arrays to Calculate Previous and Next Values

Is there a way I can use Clickhouse (Arrays?) to calculate sequential values that are dependent on previously calculated values.
For e.g.
On day 1, I start with 0 -- consume 5 -- Add 100 -- ending up with = 0 - 5 + 100 = 95
My day2, starts with what I ended up on day 1 which is 95 -- again consume 10 -- add 5 -- ending up with 95-10+5=90 (which will be the start for day3)
Given
ConsumeArray [5,10,25]
AddArray [100,5,10]
Calculate EndingPosition and (= StartingPosition for Next day)
-
Day1 Day2 Day3
--------------------------------------------------------------------
StartingPosition (a) = Previous Ending Position | 0 95 90 Calculate
Consumed (b) | 5 10 25
Added (c) | 100 5 10
EdingPosition (d) = a-b+c | 95 90 75 Calculate
Just finish all the add/consume operations first and then do an accumulation.
WITH [5,10,25] as ConsumeArray,
[100,5,10] as AddArray
SELECT
arrayCumSum(arrayMap((c, a) -> a - c, ConsumeArray, AddArray));

Is there a way to generate lags in a panel without collapsing the data?

I have a dataset that looks like this
State Year Policy other_variables
a 2000 0 18
a 2000 0 19
.
.
.
a 2001 1 86
a 2001 1 23
The poicy value is constant within each state and year. But it changes for different state and different year. The other_variables are different for each observation.
I want to generate lags of the policy value for each state. However, I cannot use xtset state year and then use the L operator. There are repeated values within each state year combination. I know that collapsing the dataset, generate lag variables and then merge back to the dataset would work. My question is is there an easy way to do this operation?
This may help:
clear
input str1 State Year Policy
a 2000 0
a 2000 0
a 2001 1
a 2001 1
end
bysort State (Year) : gen diff = Policy - Policy[_n-1] if Year == Year[_n-1] + 1
by State Year: replace diff = diff[_n-1] if missing(diff)
list, sepby(State Year)
+------------------------------+
| State Year Policy diff |
|------------------------------|
1. | a 2000 0 . |
2. | a 2000 0 . |
|------------------------------|
3. | a 2001 1 1 |
4. | a 2001 1 1 |
+------------------------------+

Stata: Extracting values and save them as scalars (and more)

This question is a follow-up question from Stata: replace, if, forvalues. Consider this data:
set seed 123456
set obs 5000
g firmid = "firm" + string(_n) /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
The first 10 observations are:
+--------------------------------------+
| firmid nw lat lon |
|--------------------------------------|
1. | firm1 81 39.915526 -75.505018 |
2. | firm2 35 39.548523 -75.201567 |
3. | firm3 10 39.657866 -75.17988 |
4. | firm4 83 39.957938 -75.898837 |
5. | firm5 56 39.575881 -75.169157 |
6. | firm6 73 39.886184 -75.857255 |
7. | firm7 27 39.33288 -75.724665 |
8. | firm8 75 39.165549 -75.96502 |
9. | firm9 64 39.688819 -75.232764 |
10. | firm10 76 39.012228 -75.166272 |
+--------------------------------------+
I need to calculate the distances between firm 1 and all other firms. So, the vincenty command looks like:
. scalar theLat = 39.915526
. scalar theLon = -75.505018
. vincenty lat lon theLat theLon, hav(distance_km) inkm
The vincenty command creates the distance_km variable that has distances between each observation and firm 1. Here, I manually copy and paste the two numbers that are 39.915526 and -75.505018.
Question 1: What's the syntax that extracts those numbers?
Now, I can keep observations where distances_km <= 2. And,
. egen near_nw_sum = sum(nw)
will create the sum of workers within 2 kilometers of the firm 1. (Or, the collapse command may do the job.)
Question 2: I have to do this for all firms, and the final data should look like:
+-----------------------------------------------------------------+
| firmid nw lat lon near_nw_sum |
|-----------------------------------------------------------------|
1. | firm1 81 39.915526 -75.505018 (# workers near firm1) |
2. | firm2 35 39.548523 -75.201567 (# workers near firm2) |
3. | firm3 10 39.657866 -75.17988 (# workers near firm3) |
4. | firm4 83 39.957938 -75.898837 (# workers near firm4) |
5. | firm5 56 39.575881 -75.169157 (# workers near firm5) |
6. | firm6 73 39.886184 -75.857255 (# workers near firm6) |
7. | firm7 27 39.33288 -75.724665 (# workers near firm7) |
8. | firm8 75 39.165549 -75.96502 (# workers near firm8) |
9. | firm9 64 39.688819 -75.232764 (# workers near firm9) |
10. | firm10 76 39.012228 -75.166272 (# workers near firm10) |
+-----------------------------------------------------------------+
Creating the near_nw_sum variable is my final goal. I need your help here for my weak data management skill.
The following is basically the same strategy found here and is based on your "final goal". Again, it can be useful depending on the size of your original dataset.joinby creates observations so you may exceed the Stata limit. However, I believe it does what you want.
clear all
set more off
set seed 123456
set obs 10
g firmid = _n /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
gen dum = 1
list
* joinby procedure
tempfile main
save "`main'"
rename (firmid lat lon nw) =0
joinby dum using "`main'"
drop dum
* Pretty print
sort firmid0 firmid
order firmid0 firmid
list, sepby(firmid0)
* Uncomment if you do not want to include workers in the "base" firm.
*drop if firmid0 == firmid
* Compute distance
vincenty lat0 lon0 lat lon, hav(distance_km) inkm
keep if distance_km <= 40 // an arbitrary distance
list, sepby(firmid0)
* Compute workers of nearby-firms
collapse (sum) nw_sum=nw (mean) nw0 lat0 lon0, by(firmid0)
list
What it does is form pairwise combinations of firms to compute distances and sum workers of nearby-firms. No need here to extract scalars as asked in Question 1. Also, no need to complicate the variable firmid converting to string.
The following overcomes the problem of the Stata limit on number of observations.
clear all
set more off
* Create empty database
gen x = .
tempfile results
save "`results'", replace
* Create input for exercise
set seed 123456
set obs 500
g firmid = _n /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
gen dum = 1
*list
* Save number of firms
local size = _N
display "`size'"
* joinby procedure
tempfile main
save "`main'"
timer clear 1
timer clear 2
timer clear 3
timer clear 4
quietly {
timer on 1
forvalues i=1/`size'{
timer on 2
use "`main'" in `i', clear // assumed sorted on firmid
rename (firmid lat lon nw) =0
joinby dum using "`main'", unmatched(using)
drop _merge dum
order firmid0 firmid
timer off 2
timer on 3
vincenty lat0 lon0 lat lon, hav(dist) inkm
timer off 3
keep if dist <= 40 // an arbitrary distance
timer on 4
collapse (sum) nw_sum=nw (mean) nw0 lat0 lon0, by(firmid0)
append using "`results'"
save "`results'", replace
timer off 4
}
timer off 1
}
use "`results'", clear
sort firmid0
drop x
list
timer list
However inefficicent, some testing using timer shows that most of the computation time goes into the vincenty command which you won't be able to escape. The following is the time (in seconds) for 10,000 observations with an Intel Core i5 processor and a conventional hard drive (not SSD). Timer 1 is the total while 2, 3, 4 are the components (approx.). Timer 3 corresponds to vincenty:
. timer list
1: 1953.99 / 1 = 1953.9940
2: 169.19 / 10000 = 0.0169
3: 1669.95 / 10000 = 0.1670
4: 94.47 / 10000 = 0.0094
Of course, note that in both codes duplicate computations of distances are made (e.g. both the distances between firm1-firm2 and firm2-firm1 are computed) and this you can probably avoid. As it stands, for 110,000 observations it will take a long time. On the positive side, I noticed this second setup demands very little RAM as compared to the same amount of observations in the first setup. In fact, my 4GB machine freezes with the latter.
Also note that even though I use the same seed as you do, data is different because I create different numbers of observations (not 5000), which makes a difference in the variable creation process.
(By the way, if you wanted to save the value as a scalar you could use subscripting: scalar latitude = lat[1]).

Ruby: Time difference in percentage?

How would I, for example, find out that 6pm is 50% between 4pm and 8pm?
Or that 12am Wednesday is 50% between 12pm Tuesday and 12pm Wednesday?
Convert the times to seconds, calculate the span in seconds, calculate the difference between your desired time and the first time in seconds, calculate the fraction of the whole span, and then multiply by 100%?
Example:
12 AM = 0 seconds (of day)
12 PM = 43200 seconds (of day)
Your desired time = 3 AM = 10800 seconds of day
Total time span = 43200 - 0 = 43200 seconds
Time difference of your desired time from first time = 10800 - 0 = 10800 seconds
Fraction = 10800 / 43200 = 0.25
Percentage = 0.25 * 100% = 25%
(Sorry don't know Ruby but there's the idea.)
require 'date'
start = Date.new(2008, 4, 10)
middle = Date.new(2009, 12, 12)
enddate = Date.new(2009, 4, 10)
duration = start - enddate #Duration of the whole time
desired = middle - start #Difference between desired + Start
fraction = desired / duration
percentage = fraction * 100
puts percentage.to_i
Thanks to 'John W' for the math.

Resources