Labeled LDA learn in Stanford Topic Modeling Toolbox - stanford-nlp

It's ok when I run the example-6-llda-learn.scala as follows:
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);
val tokenizer = {
SimpleEnglishTokenizer() ~> // tokenize on space and punctuation
CaseFolder() ~> // lowercase everything
WordsAndNumbersOnlyFilter() ~> // ignore non-words and non-numbers
MinimumLengthFilter(3) // take terms with >=3 characters
}
val text = {
source ~> // read from the source file
Column(4) ~> // select column containing text
TokenizeWith(tokenizer) ~> // tokenize with tokenizer above
TermCounter() ~> // collect counts (needed below)
TermMinimumDocumentCountFilter(4) ~> // filter terms in <4 docs
TermDynamicStopListFilter(30) ~> // filter out 30 most common terms
DocumentMinimumLengthFilter(5) // take only docs with >=5 terms
}
// define fields from the dataset we are going to slice against
val labels = {
source ~> // read from the source file
Column(2) ~> // take column two, the year
TokenizeWith(WhitespaceTokenizer()) ~> // turns label field into an array
TermCounter() ~> // collect label counts
TermMinimumDocumentCountFilter(10) // filter labels in < 10 docs
}
val dataset = LabeledLDADataset(text, labels);
// define the model parameters
val modelParams = LabeledLDAModelParams(dataset);
// Name of the output model folder to generate
val modelPath = file("llda-cvb0-"+dataset.signature+"-"+modelParams.signature);
// Trains the model, writing to the given output path
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000);
// or could use TrainGibbsLabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1500);
But it's not ok when I change the last line from:
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000);
to:
TrainGibbsLabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1500);
And the method of CVB0 cost much memory.I train a corpus of 10,000 documents with about 10 labels each document,it will cost 30G memory.

I've encountered the same situation and indeed I believe it's a bug. Check GIbbsLabeledLDA.scala in edu.stanford.nlp.tmt.model.llda under the src/main/scala folder, from line 204:
val z = doc.labels(zI);
val pZ = (doc.theta(z)+topicSmoothing(z)) *
(countTopicTerm(z)(term)+termSmooth) /
(countTopic(z)+termSmoothDenom);
doc.labels is self-explanatory, and doc.theta records the distribution (counts, actually) of its labels, which has the same size as doc.labels.
zI is index variable iterating doc.labels, while the value z gets the actual label number. Here comes the problem: it's possible this documents has only one label - say 1000 - therefore zI is 0 and z is 1000, then doc.theta(z) gets out of range.
I suppose the solution would be to modify doc.theta(z) to doc.theta(zI).
(I'm trying to check whether the results would be meaningful, anyway this bug has made me not so confident in this toolbox.)

Related

Find cities reachable in given time

I am trying to solve a problem where i have 3 columns in csv like below
connection Distance Duration
Prague<>Berlin 400 4
Warsaw<>Berlin 600 6
Berlin<>Munich 800 8
Munich<>Vienna 400 3.5
Munich<>Stuttgart 800 8
Stuttgart<>Freiburg 150 2
I need to find out how many cities i can cover in given time from the origin city
Example if i would give input as
Input: Berlin, 10
Output: ["Prague","Munich","Warsaw"]
Input : Berlin, 30
Output : ["Prague","Munich","Warsaw", "Vienna", "Stuttgart",
"Freiburg"]
This is something a Graph problem in real time.
I am trying this with Scala, can someone help please.
Below what i tried:
I made it working partially.
import scalax.collection.Graph // or scalax.collection.mutable.Graph
import scalax.collection.GraphPredef._, scalax.collection.GraphEdge._
import scalax.collection.edge.WDiEdge
import scalax.collection.edge.Implicits._
val rows = """Prague<>Berlin,400,4
Warsaw<>Berlin,600,6
Berlin<>Munich,800,8
Munich<>Vienna,400,3.5
Munich<>Stuttgart,800,8
Stuttgart<>Freiburg,150,2""".split("\n").toList
I am preparing the input for my application.
Below i am having a list of cities which are present in the given file.
NOTE: We can have it from file itself while reading and kept in list. Here i kept all as lowercase
val cityList = List("warsaw","berlin","prague","munich","vienna","stuttgart","freiburg")
Now creating a case class:
case class Bus(connection: String, distance: Int, duration: Float)
val buses: List[Bus] = rows.map(row => {
val r =
row.split("\\,")
Bus(r(0).toLowerCase, r(1).toInt, r(2).toFloat)
})
case class City(name: String)
// case class BusMeta(distance: Int, duration: Float)
val t = buses.map(bus => {
val s = bus.connection.split("<>")
City(s.head) ~ City(s.last) % bus.duration
})
val busGraph = Graph(t:_*)
From above we will create a Graph as required from the input file. "busGraph" in my case.
import scala.collection.mutable
val travelFrom = ("BERLIN").toLowerCase
val travelDuration = 16F
val possibleCities: mutable.Set[String] = mutable.Set()
if (cityList.contains(travelFrom)){
busGraph.nodes.get(City(travelFrom)).edges.filter(_.weight <= travelDuration).map(edge => edge.map(_.name)).flatten.toSet.filterNot(_ == travelFrom).foreach(possibleCities.add)
println("City PRESENT in File")
}
else
{
println("City Not Present in File")
}
I am geting Output here :
possibleCities: scala.collection.mutable.Set[String] = Set(munich, warsaw, prague)
Expected Output : Set(munich, warsaw, prague, stuttgart, Vienna)
Your solution only finds direct routes (that's why your output is shorter than expected). To get complete answer, you need to also consider connections, by recursively traversing the graph from each of the direct destinations.
Also, do not use mutable collections, they are evil.
Here is a possible solution for you:
// First, create the graph structure
def routes: Map[String, (String, Double)] = Source
.fromInputStream(System.in)
.getLines
.takeWhile(_.nonEmpty)
.map(_.split("\\s+"))
.flatMap { case Array(from_to, dist, time) =>
val Array(from,to) = from_to.split("<>")
Seq(from -> (to, time.toDouble), to -> (from, time.toDouble))
}.toSeq
.groupMap(_._1)(_._2)
// Now search for suitable routes
def reachable(
routes: Map[String, Seq[(String, Double)]],
from: String,
limit: Double,
cut: Set[String] = Set.empty
): Set[String] = routes
.getOrElse(from, Nil)
.filter(_._2 <= limit)
.filterNot { case (n, _) => cut(n) }
.flatMap { case(name, time) =>
reachable(routes, name, limit - time, cut + from) + name
}.toSet
// And here is how you use it
def main(argv: Array[String]): Unit = {
val Array(from, limit) = new Scanner(System.in).nextLine().split("\\s")
val reach = reachable(routes, from, limit.toDouble)
println(reach)
}
Do a breadth first search from the origin city, stopping going deeper when you reach the time limit. Output the stops reached by the search.
To my best knowledge this tasks should be solved with Graph Adjacency Matrix, which first need to build from input data.
In your particular case the Graph Adjacency Matrix would be 2D and contains cities on rows and columns and weight of direction as value.
See screenshot from Excel with example below,
At the first iteration you search for possible routes from starting cities and store city name (row/column id) and weight.
Each next iteration you try to add route and compare with limit (can you add it or not also make sure you are not adding same city)
To store results you will need again 2D array, where first element is you possible route and next element is a Tuple of visited city and value taken.
After few iterations you should get all possible options and just provide summary of founded.
TL;DR; Most of Graph programmatical algorithms use or depends (with different extent) on Graph Adjacency Matrix

Sort a range or array based on two columns that contain the date and time

Currently I'm trying to create a Google Apps Script for Google Sheets which will allow adding weekly recurring events, batchwise, for upcoming events. My colleagues will then make minor changes to these added events (e.g. make date and time corrections, change the contact person, add materials neccessary for the event and so forth).
So far, I have written the following script:
function CopyWeeklyEventRows() {
var ss = SpreadsheetApp.getActiveSheet();
var repeatingWeeks = ss.getRange(5,1).getValue(); // gets how many weeks it should repeat
var startDate = ss.getRange(6, 1).getValue(); // gets the start date
var startWeekday = startDate.getDay(); // gives the weekday of the start date
var regWeek = ss.getRange(9, 2, 4, 7).getValues(); // gets the regular week data
var regWeekdays = new Array(regWeek.length); // creates an array to store the weekdays of the regWeek
var ArrayStartDate = new Array(startDate); // helps to store the We
for (var i = 0; i < regWeek.length; i++){ // calculates the difference between startWeekday and each regWeekdays
regWeekdays[i] = regWeek[i][1].getDay() - startWeekday;
Logger.log(regWeekdays[i]);
// Add 7 to move to the next week and avoid negative values
if (regWeekdays[i] < 0) {
regWeekdays[i] = regWeekdays[i] + 7;
}
// Add days according to difference between startWeekday and each regWeekdays
regWeek[i][0] = new Date(ArrayStartDate[0].getTime() + regWeekdays[i]*3600000*24);
}
// I'm struggling with this line. The array regWeek is not sorted:
//regWeek.sort([{ column: 1, ascending: true }]);
ss.getRange(ss.getLastRow() + 1, 2, 4, 7).setValues(regWeek); // copies weekly events after the last row
}
It allows to add one week of recurring events to the overview section of the spreadsheet based on a start date. If the start date is a Tuesday, the regular week is added starting from a Tuesday. However, the rows are not sorted according to the dates:
.
How can the rows be sorted by ascending date (followed by time) before adding them to the overview?
My search for similar questions revealed Google Script sort 2D Array by any column which is the closest hit I've found. The same error message is shown when running my script with the sort line. I don't understand the difference between Range and array yet which might help to solve the issue.
To give you a broader picture, here's what I'm currently working on:
I've noticed that the format will not necessarily remain when adding
new recurring events. So far I haven't found the rule and formatted by
hand in a second step.
A drawback is currently that the weekly recurring events section is
fixed. I've tried to find the last filled entry and use it to set the
range of regWeek, but got stuck.
Use the column A to exclude recurring events from the addition
process using a dropdown.
Allow my colleagues to add an event to the recurring events using a
dropdown (e.g. A26). This event should then be added with sorting to
the right day of the week and start time. The sorting will come in
handy.
Thanks in advance for your input regarding the sorting as well as suggestions on how to improve the code in general.
A demo version of the spreadsheet
UpdateV01:
Here the code lines which copy and sort (first by date, then by time)
ss.getRange(ss.getLastRow()+1,2,4,7).setValues(regWeek); // copies weekly events after the last row
ss.getRange(ss.getLastRow()-3,2,4,7).sort([{column: 2, ascending: true}, {column: 4, ascending: true}]); // sorts only the copied weekly events chronologically
As #tehhowch pointed out, this is slow. Better to sort BEFORE writing.
I will implement this method and post it here.
UpdateV02:
regWeek.sort(function (r1, r2) {
// sorts ascending on the third column, which is index 2
return r1[2] - r2[2];
});
regWeek.sort(function (r1, r2) {
// r1 and r2 are elements in the regWeek array, i.e.
// they are each a row array if regWeek is an array of arrays:
// Sort ascending on the first column, which is index 0:
// if r1[0] = 1, r2[0] = 2, then 1 - 2 is -1, so r1 sorts before r2
return r1[0] - r2[0];
});
UpdateV03:
Here an attempt to repeat the recurring events over several weeks. Don't know yet how to include the push for the whole "week".
// Repeat week for "A5" times and add to start/end date
for (var j = 0; j < repeatingWeeks; j++){
for (var i = 0; i < numFilledRows; i++){
regWeekRepeated[i+j*6][0] = new Date(regWeek[i][0].getTime() + j*7*3600000*24); // <-This line leads to an error message
regWeekRepeated[i+j*6][3] = new Date(regWeek[i][3].getTime() + j*7*3600000*24);
}
}
My question was answered and I was able to make the code work as intended.
Given your comment - you want to sort the written chunk - you have two methods available. One is to sort written data after writing, by using the Spreadsheet service's Range#sort(sortObject) method. The other is to sort the data before writing, using the JavaScript Array#sort(sortFunction()) method.
Currently, your sort code //regWeek.sort([{ column: 1, ascending: true }]); is attempting to sort a JavaScript array, using the sorting object expected by the Spreadsheet service. Thus, you can simply chain this .sort(...) call to your write call, as Range#setValues() returns the same Range, allowing repeated Range method calling (e.g. to set values, then apply formatting, etc.).
This looks like:
ss.getRange(ss.getLastRow() + 1, 2, regWeek.length, regWeek[0].length)
.setValues(regWeek)
/* other "chainable" Range methods you want to apply to
the cells you just wrote to. */
.sort([{column: 1, ascending: true}, ...]);
Here I have updated the range you access to reference the data you are attempting to write - regWeek - so that it is always the correct size to hold the data. I've also visually broken apart the one-liner so you can better see the "chaining" that is happening between Spreadsheet service calls.
The other method - sorting before writing - will be faster, especially as the size and complexity of the sort increases. The idea behind sorting a range is you need to use a function that returns a negative value when the first index's value should come before the second's, a positive value when the first index's value should come after the second's, and a zero value if they are equivalent. This means a function that returns a boolean is NOT going to sort as one thinks, since false and 0 are equivalent in Javascript, while true and 1 are also equivalent.
Your sort looks like this, assuming regWeek is an array of arrays and you are sorting on numeric values (or at least values which will cast to numbers, like Dates).
regWeek.sort(function (r1, r2) {
// r1 and r2 are elements in the regWeek array, i.e.
// they are each a row array if regWeek is an array of arrays:
// Sort ascending on the first column, which is index 0:
// if r1[0] = 1, r2[0] = 2, then 1 - 2 is -1, so r1 sorts before r2
return r1[0] - r2[0];
});
I strongly recommend reviewing the Array#sort documentation.
You could sort the "Weekly Events" range before you set the regWeek variable. Then the range would be in the order you want before you process it. Or you could sort the whole "Overview" range after setting the data. Here's a quick function you can call to sort the range by multiple columns. You can of course tweak it to sort the "Weekly Events" range instead of the "Overview" range.
function sortRng() {
var ss = SpreadsheetApp.getActiveSheet();
var firstRow = 22; var firstCol = 1;
var numRows = ss.getLastRow() - firstRow + 1;
var numCols = ss.getLastColumn();
var overviewRng = ss.getRange(firstRow, firstCol, numRows, numCols);
Logger.log(overviewRng.getA1Notation());
overviewRng.sort([{column: 2, ascending: true}, {column: 4, ascending: true}]);
}
As for getting the number of filled rows in the Weekly Events section, you need to search a column that will always have data if any row has data (like the start date column b), loop through the values and the first time it finds a blank, return that number. That will give you the number of rows that it needs to copy. Warning: if you don't have at least one blank value in column B between the Weekly Events section and the Overview section, you will probably get unwanted results.
function getNumFilledRows() {
var ss = SpreadsheetApp.getActiveSheet();
var eventFirstRow = 9; var numFilledRows = 0;
var colToCheck = 'B';//the StartDate col which should always have data if the row is filled
var vals = ss.getRange(colToCheck + eventFirstRow + ":" + colToCheck).getValues();
for (i = 0; i < vals.length; i++) {
if (vals[i][0] == '') {
numFilledRows = i;
break;
}
}
Logger.log(numFilledRows);
return numFilledRows;
}
EDIT:
If you just want to sort the array in javascript before writing, and you want to sort by Start Date first, then by Time of day, you could make a temporary array, and add a column to each row that is date and time combined. array.sort() sorts dates alphabetically, so you would need to convert that date to an integer. Then you could sort the array by the new column, then delete the new column from each row. I included a function that does this below. It could be a lot more compact but I thought it might be more legible like this.
function sortDates() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var vals = ss.getActiveSheet().getRange('B22:H34').getDisplayValues(); //get display values because getValues returns time as weird date 1899 and wrong time.
var theDate = new Date(); var newArray = []; var theHour = ''; var theMinutes = '';
var theTime = '';
//Create a new array that inserts date and time as the first column in each row
vals.forEach(function(aRow) {
theTime = aRow[2];//hardcoded - assumes time is the third column that you grabbed
//get the hours (before colon) as a number
theHour = Number(theTime.substring(0,theTime.indexOf(':')));
//get the minutes(after colon) as a number
theMinutes = Number(theTime.substring(theTime.indexOf(':')+1));
theDate = new Date(aRow[0]);//hardcoded - assumes date is the first column you grabbed.
theDate.setHours(theHour);
theDate.setMinutes(theMinutes);
aRow.unshift(theDate.getTime()); //Add the date and time as integer to the first item in the aRow array for sorting purposes.
newArray.push(aRow);
});
//Sort the newArray based on the first item of each row (date and time as number)
newArray.sort((function(index){
return function(a, b){
return (a[index] === b[index] ? 0 : (a[index] < b[index] ? -1 : 1));
};})(0));
//Remove the first column of each row (date and time combined) that we added in the first step
newArray.forEach(function(aRow) {
aRow.shift();
});
Logger.log(newArray);
}

Java8 stream average of object property in collection

I'm new to Java so if this has already been answered somewhere else then I either don't know enough to search for the correct things or I just couldn't understand the answers.
So the question being:
I have a bunch of objects in a list:
try(Stream<String> logs = Files.lines(Paths.get(args))) {
return logs.map(LogLine::parseLine).collect(Collectors.toList());
}
And this is how the properties are added:
LogLine line = new LogLine();
line.setUri(matcher.group("uri"));
line.setrequestDuration(matcher.group("requestDuration"));
....
How do I sort logs so that I end up with list where objects with same "uri" are displayed only once with average requestDuration.
Example:
object1.uri = 'uri1', object1.requestDuration = 20;
object2.uri = 'uri2', object2.requestDuration = 30;
object3.uri = 'uri1', object3.requestDuration = 50;
Result:
object1.uri = 'uri1', 35;
object2.uri = 'uri2', 30;
Thanks in advance!
Take a look at Collectors.groupingBy and Collectors.averagingDouble. In your case, you could use them as follows:
Map<String, Double> result = logLines.stream()
.collect(Collectors.groupingBy(
LogLine::getUri,
TreeMap::new,
Collectors.averagingDouble(LogLine::getRequestDuration)));
The Collectors.groupingBy method does what you want. It is overloaded, so that you can specify the function that returns the key to group elements by, the factory that creates the returned map (I'm using TreeMap here, because you want the entries ordered by key, in this case the URI), and a downstream collector, which collects the elements that match the key returned by the first parameter.
If you want an Integer instead of a Double value for the averages, consider using Collectors.averagingInt.
This assumes LogLine has getUri() and getRequestDuration() methods.

How can I sum binned time series using d3.js?

I want a simple graph like:
The data I have is a simple list of transactions with two properties:
timestamp
amount
I tried d3.layout.histogram().bins() but it seems it only supports counting the transactions.
I mustn't be the only one looking for that, am I ?
Ok, so the IRC folks helped me out and pointed to nest, which works great (this is CoffeeScript):
nested_data = d3.nest()
.key((d) -> d3.time.day(d.timestamp))
.rollup((a) -> d3.sum(a, (d) -> d.amount))
.entries(incoming_data) # An array of {timestamp: ..., amount: ...} objects
# Optional
nested_data.map (d) ->
d.date = new Date(d.key)
The trick here is d3.time.day which takes a timestamp, and tells you which day (12 a.m. in the night) that timestamp belongs to. This function and the other ones like d3.time.week, etc.. can bin timeseries very well.
The other trick is the nest().rollup() function, which after being grouped by key(), sum all of the events on a given day.
Last thing I wanted, was to interpolate empty values on the days where I had no transactions. This is the last part of the code:
# Interpolate empty vals
nested_data.sort((a, b) -> d3.descending(a.date, b.date))
ex = d3.extent(nested_data, (d) -> d.date)
each_day = d3.time.days(ex[0], ex[1])
# Build a hashmap with the days we have
data_hash = {}
angular.forEach(data, (d) ->
data_hash[d.date] = d.values
)
# Build a new array for each day, including those where we didn't have transactions
new_data = []
angular.forEach(each_day, (d) ->
val = 0
if data_hash[d]
val = data_hash[d]
new_data.push({date: d, values: val})
)
final_data = new_data
Hope this helps somebody!
The histogram code doesn't support this, but you can easily do the binning yourself. Assuming that you have a date and a count for each transaction, you can bin by day like this.
var bins = {};
transactions.forEach(function(t) {
var key = t.date.toDateString();
bins[key] = bins[key] || 0;
bins[key] += t.amount;
});
You can obviously parse the date string back into a date if you need it; the point of using .toDateString() here is that the time part is chopped off and everything binned by day. If you want to bin by another time interval, you can use the same technique and extract a different part of the date.

Stata: how to get observation value 5 minutes ahead with gabbed time data

I got high frequency data from a limit order book in Stata. Time does not have a regular interval, and some observations are at the same time (in milliseconds). For each observation I need to get the midpoint 5 minutes later in a separate column. So for observation 1 the midpoint would be 10.49, because the last midpoint closest to 09:05:02.579 would be 10.49.
How to do this in Stata?
datetime midpoint
12/02/2012 09:00:02.579 10.5125
12/02/2012 09:00:03.471 10.5125
12/02/2012 09:00:03.471 10.5125
12/02/2012 09:00:03.471 10.51
12/02/2012 09:00:03.471 10.51
12/02/2012 09:00:03.549 10.505
12/02/2012 09:00:03.549 10.5075
......
12/02/2012 09:04:59.785 10.495
12/02/2012 09:05:00.829 10.4925
12/02/2012 09:05:01.209 10.49
12/02/2012 09:05:03.057 10.4875
12/02/2012 09:05:05.055 10.485
.....
My approach would be
generate a new data set shifted by five minutes
append this shifter data set
find closest before and after observations to your five minute delta
use some criteria to pick the better of these two values
You specified closest, but you might want to add some other criteria depending on your book. Also, you mentioned more than one value at a given ms tick, but without more information I'm not sure how to handle that. Do you want to combine those midpoints first? Or are they different stocks?
Here's some code that implements the basics of the approach above.
clear
version 11.2
set seed 2001
* generate some data
set obs 100000
generate double dt = ///
tc(02dec2012 09:00:00.000) + 1000*_n + int(100*rnormal())
format dt %tcDDmonCCYY_HH:MM:SS.sss
sort dt
generate midpt = 100
replace midpt = ///
round(midpt[_n - 1] + 0.1*rnormal(), 0.005) if (_n != 1)
* add back future midpts
preserve
tempfile future
rename midpt fmidpt
rename dt fdt
generate double dt = fdt - tc(00:05:00.000)
save `future'
restore
append using `future'
* generate midpoints before and after 5 minutes in the future
sort dt
foreach v of varlist fdt fmidpt {
clonevar `v'_b = `v'
replace `v'_b = `v'_b[_n - 1] if missing(`v'_b)
}
gsort -dt
foreach v of varlist fdt fmidpt {
clonevar `v'_a = `v'
replace `v'_a = `v'_a[_n - 1] if missing(`v'_a)
}
format fdt* %tcDDmonCCYY_HH:MM:SS.sss
* use some algorithm to pick correct value
sort dt
generate choose_b = ///
((dt + tc(00:05:00.000)) - fdt_b) < (fdt_a - (dt + tc(00:05:00.000)))
generate fdt_c = cond(choose_b, fdt_b, fdt_a)
generate fmidpt_c = cond(choose_b, fmidpt_b, fmidpt_a)
format fdt_c %tcDDmonCCYY_HH:MM:SS.sss
// Construct a variable to look for in the dataset
gen double midpoint_5 = (datetime + 5*60000)
format midpoint_5 %tcNN/DD/CCYY_HH:MM:SS.sss
// will contain the closest observation number and midpoint 5 minutes a head
gen _t = .
gen double midpoint_at5 = .
// How many observations in the sample?
local N = _N
// We will use these variables to skip some observations in the loop
egen obs_in_minute = count(minutes_filter), by(minutes_filter)
egen max_obs_in_minute = max(obs_in_minute)
set more off
// For each observation
forvalues i = 1/`N' {
// If it is a trade
if type[`i'] == "Trade" {
// Set the time to lookup in the data
local lookup = midpoint_5[`i']
// The time should be between the min and max(*5)
local min = `i' + obs_in_minute[`i'] // this might cause errors
local max = `i' + max_obs_in_minute[`i']*5
// For each of these observations
forvalues j = `min'/`max' {
// Check if the lookup date is smaller than the datetime of the observation
if `lookup' < datetime[`j'] {
// Set the observation ID at the lookup ID 1 observation before
quietly replace _t = `j'-1 in `i'
// Set the midpoint at the lookup ID 1 observation before
quietly replace midpoint_at5 = midpoint[`j'-1] in `i'
// We have found the closest 5th min ahead... now stop loop and continue to next observation.
continue, break
}
}
// This is to indicate where we are in the loop
display "`i'/`N'"
}
}

Resources