I've got sports data that I've imported from an online source via a .xlsx file. Each observation is a penalty in an NFL (American football) game. In order to later merge this with another dataset, I need to have certain variables/values that match up between the two files. I'm hitting an issue with one variable, however.
In the main dataset in question (the penalty dataset originally mentioned), my ultimate goal is to create two variables, Minute and Second, that are of type byte and format %8.0g. This would make them perfectly correspond with the respective variables in the destination dataset. I have the required information available, which is the time remaining in the given quarter of the NFL game, but it's stored in a strange way, and I'm having trouble converting things.
The data is stored in a variable called Time. Visibly, the data looks fine as imported from the original .xlsx file. For example, the first observation reads "12:21", indicating that there are 12 minutes and 21 seconds left in the quarter. When importing from the .xlsx sheet, however, Stata assumes that the variable Time is a date/time variable measured in hh:mm, and thus assigns it a type of double and a format of %tchh:MM.
In the end, I don't really care about correctly formatting this Time variable, but I need to somehow make this match the required Minute and Second columns of the destination file. I've tried several different approaches, but so far nothing seems to work.
If Stata is misreading minutes and seconds as hours and minutes, and also (as it does) storing date-times in milliseconds, then it is off by a factor of 60 (minutes/hour) x 1000 (ms/s) = 60000. So, consider
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen double wrong = clock("1jan1960 12:21:00", "DMY hms")
. format wrong %tchh:MM
. clonevar alsowrong = wrong
. format alsowrong %15.0f
. list
+------------------+
| wrong alsowr~g |
|------------------|
1. | 12:21 44460000 |
+------------------+
. gen right = wrong/60000
. gen byte Minute = floor(right/60)
. gen byte Second = mod(right, 60)
. list
+--------------------------------------------+
| wrong alsowr~g right Minute Second |
|--------------------------------------------|
1. | 12:21 44460000 741 12 21 |
+--------------------------------------------+
I can't comment easily on your import, as neither imported file nor exact import code are given as examples.
EDIT Another way to do it:
. gen alsoright = string(wrong, "%tchh:MM")
. gen minute = real(substr(alsoright, 1, strpos(alsoright, ":") - 1))
. gen second = real(substr(alsoright, strpos(alsoright, ":") + 1, .))
. l alsoright minute second
+----------------------------+
| alsori~t minute second |
|----------------------------|
1. | 12:21 12 21 |
+----------------------------+
Related
I'm importing a big csv (5gb) file to the BiqQuery and I had information about an error in the file and its position — specified as a byte offset from the start of the file (for example, 134683757). I'd like to look at lines around this error position.
Some example lines of the file:
field1, field2, field3
abc, bcd, efg
...
dge, hfr, kdf,
dgj, "a""a", fbd # in this line is an invalid csv element and I get error, let's say on the position 134683757
skd, frd, lqw
...
asd, fij, fle
I need some command to show lines around error like
dge, hfr, kdf,
dgj, "a""a", fbd
skd, frd, lqw
I tried sed and awk but I didn't find any simple solution.
It was definitely not clear from the original version of the question that you only got a byte offset from the start of the file.
You need to get a better position from the software generating the error; the developer was lazy in reporting an unusable number. It is reasonable to request a line number (and preferably offset within the line), rather than (or as well as) the byte offset from the start.
Assuming that the number is a byte position in the file, that gets tricky. Most Unix utilities work with lines (of variable length). I'd be tempted to write some C code to do the job, but that might be beyond you (and no shame in that).
Failing that, your best is likely the dd command. If the number reported is 134683757, then I'd guess that your lines are probably not more than 1 KiB each (adjust numbers if they're bigger, or smaller), and then use:
dd if=big.csv of=extract.csv bs=1 skip=$((134683757 - 3 * 1024)) count=6144
echo >> extract.csv
You'd then look at extract.csv. The raw dd output probably won't have a newline at the end of the last line (the echo >>extract.csv fixes that). The output will probably start part way through a record and end part way through another record. However, you're likely to have the relevant information, as well as some irrelevant information. As I said, adjust the numbers to suit your exact situation.
The trickiest part is identifying exactly where the byte offset is in the file you get. With custom C code, that can be provided easily (more easily). With the output from dd, you have to do the calculation yourself.
awk -v offset=$((134683757 - 3 * 1024)) '
{ printf "%9d: %s\n", offset, $0; offset += length($0) + 1 }
' extract.cvs
That takes the starting offset from the dd command, and prefixes the (remnants of) the first line with that number and the data; it then adds the length to the offset plus one for the newline that wasn't counted, and continues to the end of the file. That gives you the start offset for each line in the extracted data. You can see where your actual start was by looking at the offsets — you should be able to identify which record that was.
You could use a variant of this Awk script that reads the whole file line by line, and tracks the offset (as well as the line numbers) and prints the data when it gets to the vicinity of where you have the problem.
In times long past, I had to deal with data from 1/2 inch mag tapes (those big circular tapes you see in old movies) where the files generated on a mainframe seemed sanely formatted for the first few tens of megabytes, but then the format changed to some alternative format for a few megabytes, and then reverted to the original format once more. I never did find out why; I just learned how to deal with it. Trial and error!
I have a table keyed by time, e.g.
time | valA | valB
---- | ---- | ----
09:00| 1.4 | 1.2
09:05| 1.5 | 1.4
09:10| 1.5 | 1.4
I want to store this in a data structure and query values as of arbitrary times. E.g.
asof 09:01, valA = 1.4
asof 09:06, valB = 1.4
asof 09:14, valA = 1.5
What is the best way of structuring this in c++11? Which std::chrono datatype should I use to represent my times. How can I develop a solution that supports time zones? E.g. the times listed in my table may be in US/Central time and I may want to query using Australia/Sydney based times.
To support local times in different time zones with <chrono> I recommend Howard Hinnant's free, open-source time zone library. This library is built on top of <chrono>, and uses the IANA time zone database to manage time zones.
Also, to handle time zones, you will need to store more than just time-of-day. You will need to store the entire date, as a time zone's UTC offset often varies with date. I.e. 09:05 Australia/Sydney doesn't really nail down a moment in time. But 2017-08-16 09:05 Australia/Sydney does.
Here is how you could create such a time stamp with <chrono> and the time zone library:
using namespace date;
using namespace std::chrono;
auto zt = make_zoned("Australia/Sydney", local_days{2017_y/aug/16} + 9h + 5min);
You can print it out like this:
std::cout << zt << '\n';
And the output is:
2017-08-16 09:05:00 AEST
If you want to find out the local time in US/Central that corresponds to this same moment in time:
auto zt2 = make_zoned("US/Central", zt);
std::cout << zt2 << '\n';
And the output is:
2017-08-15 18:05:00 CDT
date::zoned_time<std::chrono::seconds> is the type of zt and zt2 in these examples, and that is what I recommend you store. Under the hood this type is a pairing of {date::time_zone const*, std::chrono::time_point<system_clock, seconds>} (two words of storage).
Source code: https://github.com/HowardHinnant/date
Documentation: http://howardhinnant.github.io/date/tz.html
Video: https://www.youtube.com/watch?v=Vwd3pduVGKY
I've looked at this answer, that states that this problem might happen when the description files for the negative images is created with tools different from Opencv_createSamples, but this is not the case here.
The break occurs somewhere between the fourth and the seventh stage. In another post, someone suggested that this message means the classifier cannot be improved, but with only 5 stages, it is at least odd.
For training, I´m using numPos=800 while the vec file (60x60 px) contains 1200 positive samples. Moreover, I´m using 1491 negative samples(30x30 px). I´ve made all kinds of changes in the parameters, and none of them worked.
For the last attempt I used the parameters as follows:
cascadeDirName: 15stages
vecFileName: pos.vec
bgFileName: neg_dir.txt
numPos: 800
numNeg: 1491
numStages: 15
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 60
sampleHeight: 60
boostType: GAB
minHitRate: 0.9999
maxFalseAlarmRate: 0.3
weightTrimRate: 0.9
maxDepth: 1
maxWeakCount: 100
mode: ALL
I had the same problem, after making a big research, I've got the best parameters that should be supplied to the opencv_traincascade.
If you are using a rectangular image, specify -w 24 -h 24, In addition make sure you have more positives than negatives and set -maxFalseAlarmRate 0.5.
That worked for me very well, hope it is useful for you too.
i also have this problem before. but after i reduce the param [maxFalseAlarmRate] ,like set small than 0.1 , it works ok. hope this have some help.
The Fortran program I am working is encountering a runtime error when processing an input file.
At line 182 of file ../SOURCE_FILE.f90 (unit = 1, file = 'INPUT_FILE.1')
Fortran runtime error: Bad value during integer read
Looking to line 182 I see a READ statement with an implicit/implied DO loop:
182: READ(IT4, 310 )((IPPRM2(IP,I),IP=1,NP),I=1,16) ! read 6 integers
183: READ(IT4, 320 )((PPARM2(IP,I),IP=1,NP),I=1,14) ! read 5 reals
Format statement:
310 FORMAT(1X,6I12)
When I reach this code in the debugger NP has a value of 2. I has a value of 6, and IP has a value of 67. I think I and IP should be reinitialized in the loop.
My problem is that when I try to step through in the debugger once I get to the READ statement it seems to execute and then throw the error. I'm not sure how to follow it as it reads. I tried stepping into the function, but it seems like that may be a difficult route to take since I am unfamiliar with the gfortran library. The input file looks OK, I think it should be read just fine. This makes me think this READ statement isn't looping as intended.
I am completely new to Fortran and implicit DO loops like this, but from what I can gather line 182 should read in 6 integers according to the format string #310. However, when I arrive NP has a value of 2 which makes me think it will only try to read 2 integers 16 times.
How can I debug this read statement to examine the values read into IPPARM as they are read from the file? Will I have to step through the Fortran library?
Any tips that can clear up my confusion regarding these implicit loops would be appreciated!
Thanks!
NOTE: I'm using gfortran/gcc and gdb on Linux.
Is there any reason you need specific formatting on the read? I would use READ(IT4, *) where feasible...
Later versions of gfortran support unlimited format reads (see link http://fortranwiki.org/fortran/show/Fortran+2008+status)
Then it may be helpful to specify
310 FORMAT("*(1X,6I12)")
Or for older compilers
310 FORMAT(1000(1X,6I12))
The variables IP and I are loop indices and so they are reinitialized by the loop. With NP=2 the first statement is going to read a total of 32 integers -- it is contributing to the determination the list of items to read. The format determines how they are read. With "1X,6I12" they will be read as 6 integers per line of the input file. When the first 6 of the requested 32 integers is read fron a line/record, Fortran will consider that line/record completed and advance to the next record.
With a format of "1X,6I12" the integers must be precisely arranged in the file. There should be a single blank, then the integers should each be right-justified in fields of 12 columns. If they get out of alignment you could get the wrong value read or a runtime error.
Is there a way to find out what the number is that the build process assigns to the * when I have the assembly version set to something like 1.0.0.*?
I've been looking at the "EnvDTE" namespace in the macros, but haven't been able to find anything useful.
According to MSDN if you have a version number in the form of
major.minor.build.revision
and specify
1 . 0 . * . *
then
major = 1
minor = 0
build = build to be equal to the number of days since January 1, 2000 local time
revision = revision to be equal to the number of seconds since midnight local timeivided by 2