How do I write ASCII files from internet URL's directly into Stata? - ascii

I'm trying to write a set of files (ex. https://www.fiscal.treasury.gov/files/reports-statements/mts/mts1103.txt) into Stata.
Because I want a number of months and years from the source website (https://www.fiscal.treasury.gov/reports-statements/mts/previous.html) I was hoping I could write it directly from each URL into Stata using a for loop and then clear/append them together as I've done many times before. (Don't want to download them all independently).
Something like this:
foreach yr in 04 05 06 07 08 09 {
foreach month in 01 02 03 04 05 06 07 08 09 10 11 12 {
insheet using "https://www.fiscal.treasury.gov/files/reports-statements/mts/mts`month'03.txt", clear
}
}
Would something like this be possible? What would I need to do to make this work with a ASCII file?

The code should work (do note Nick Cox's comment on the year local). I would personally recommend that you copy the files to your local drive first, because you cannot trust that those files will remain available forever.
You can use Stata's copy command to do that.
This has the additional advantage that you split the download and parsing parts - maybe insheet fails to handle the particular format of one file and causes issues. In general, modular code is almost always a good idea.
Don't forget to include a save-command in the loop, or all your hard work will be for nothing (been there).

Related

I am using Dark Sky API for a weather app, and what does this time mean?

A bit of code shows {"time":1578475688. This doesn't match the time that it was at the time. The time was around 5:27 pm at this time, and I am wondering what this means.
What you got was a UNIX Timestamp. A UNIX Timestamp shows you the time that has passed since Jan 01 1970. (UTC)
You can't really see the time how we normally read it in the number you get but you can look it up through websites like these: https://www.unixtimestamp.com/index.php
Hopefully this can help you understand what the number you got is.

Need to find the maximum difference between two consecutive timestamps out of multiple timestamps that are logged in a single log file

I have some log files that are logged in a Unix server by a front end application. These files have all the logging statements that starts with a timestamp value followed by the logging text. An example of how the logging is done in these files is shown below sample :
02 07:31:05.578 logging text........(I bolded the timestamp here because I will use it to explain this timestamp notation below)
02 07:31:05.579 logging text........
02 07:31:05.590 logging text........
02 07:31:05.591 logging text........
02 07:31:05.593 logging text........
Time stamp value explanation -
02 : Date Value (If date is July 02, so the value will be 02)
07 : Hours
31 : Minutes
05 : Seconds
578 : Milliseconds
Note : Kindly ignore if you are looking for the 'YYYY' i.e year field. For simplicity please stick to above format only.
What I have to achieve : I have to find out those exact two consecutive timestamps in a give file that has the maximum difference between them as compared to all other pairs of consecutive timestamps in the given file.
example : If you see above sample of logging text you will see the only pair of consecutive time stamp in the above sample having max difference is
02 07:31:05.579 and 02 07:31:05.590
I am looking for a shell script that I can run on the required file and get the output as the two consecutive timestamps that have the maximum difference.
Why I need it : There are many such log files that I need to monitor for the cases where there is a huge difference between the logging statements. This could potentially help me find out situations like SQL query is waiting for long for the transaction to happen due to locks, API request is not getting the response from the destination etc.
If anyone can also share any other posts to this question or any other efficient way that would be helpful.
Thank you everyone for reading and taking out your time. Please let me know if anymore information is required.
What you could do is write a script with the awk command.
You have examples here on how to convert dates with awk: Converting dates in AWK.
This will help you parse the file, and add 2 columns at the beginning of each line:
line number
difference compared to previous line
Then you have to sort the resulting file using the second column, and you are done.
Of course, it would be too easy if I wrote the script for you (and it is a lot of time I don't really have). So you must try the above on your own, and then come back with specific questions. Here your question is too broad in comparison with the on topic questions of SO.
I would propose to walk through the lines, convert every timestamp into a UNIX epoch time (seconds since 1970-01-01, date can do this). Unfortunately you lack the month and year, but maybe you can just assume the current month and year; except for month-borders this should give correct results for the distances anyway.
Then I would just give out each line again with e difference of its timestamp to the former one. So out of
02 07:31:05.579 logging text........
02 07:31:05.590 logging text........
02 07:31:05.591 logging text........
02 07:31:05.593 logging text........
I would make
0.000 02 07:31:05.579 logging text........
0.011 02 07:31:05.590 logging text........
0.001 02 07:31:05.591 logging text........
0.002 02 07:31:05.593 logging text........
Then you can simply sort -g this new output to sort it by time between the line its predecessor. The last line will be the line with the maximum timestamp difference.

Reverse engineering iWork '13 formats

Prior versions of Apple's iWork suite used a very simple document format:
documents were Bundles of resources (folders, zipped or not)
the bundle contained an index.apxl[z] file describing the document structure in a proprietary but fairly easy to understand schema
iWork '13 has completely redone the format. Documents are still bundles, but what was in the index XML file is now encoded in a set of binary files with type suffix .iwa packed into Index.zip.
In Keynote, for example, there are the following iwa files:
AnnotationAuthorStorage.iwa
CalculationEngine.iwa
Document.iwa
DocumentStylesheet.iwa
MasterSlide-{n}.iwa
Metadata.iwa
Slide{m}.iwa
ThemeStylesheet.iwa
ViewState.iwa
Tables/DataList.iwa
for MasterSlides 1…n and Slides 1…m
The purpose of each of these is quite clear from their naming. The files even appear uncompressed, with essentially all content text directly visible as strings among the binary blobs (albeit with some like RTF/NSAttributedString/similar-related garbage in the midst of the readable ASCII characters).
I have posted the unpacked Index of a simple example Keynote document here: https://github.com/jrk/iwork-13-format.
However, the overall file format is non-obvious to me. Apple has a long history of using simple, platform-standard formats like plists for encoding most of their documents, but there is no clear type tag at the start of the files, and it is not obvious to me what these iwa files are.
Do these files ring any bells? Is there evidence they are in some reasonably comprehensible serialization format?
Rummaging through the Keynote app runtime and class dumps with F-Script, the only evidence I've found is for some use of Protocol Buffers in the serialization classes which seem to be used for iWork, e.g.: https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/iWorkImport.framework/TSPArchiverBase.h.
Quickly piping a few of the files through protoc --decode_raw with the first 0…16 bytes lopped off produced nothing obviously usable.
I've done some work reverse engineering the format and published my results here. I've written up a description of the format and provided a sample project as well.
Basically, the .iwa files are Protobuf streams compressed using Snappy.
Hope this helps!
Interesting project, I like it! Here is what I have found so far.
The first 4 bytes of each of the iwa files appear to be a length, with a tweak. So it looks like there will not be any 'magic' to verify file type.
Look at Slide1.iwa:
First 4 bytes are 00 79 02 00
File size is 637 bytes
take the first 00 off, and reverse the bytes: 00 02 79
00 02 79 == 633
637 - 633 = 4 bytes that hold the size of the file.
This checks out for the 4 files I looked at: Slide1.iwa, Slide2.iwa, Document.iwa, DocumentStylesheet.iwa

How can I programmatically draw graph of continously coming data set from a file?

I have memorylog.csv file which is keep filling up with data in every 1 second. Now, what I want is to keep drawing timeline gui graph from memorylog.csv in parallel. How can I achieve that? I want to do it programmatically using gnuplot (or other utility). The graph should be keep updating as new data in file is keep coming.
Sample data set:
Fri Aug 2 04:46:59 IST 2013,14576,28823,24128,2050
Fri Aug 2 04:47:00 IST 2013,14580,28823,24187,1992
Fri Aug 2 04:47:01 IST 2013,14584,28823,24245,1933
Fri Aug 2 04:47:03 IST 2013,14604,28823,24303,1875
Fri Aug 2 04:47:04 IST 2013,14636,28823,24361,1817
Fri Aug 2 04:47:05 IST 2013,14668,28823,24421,1757
Fri Aug 2 04:47:06 IST 2013,14708,28823,24479,1699
I want timestamp values to be in x-axis and rest four values in y axis.
Put something like this into a script continuous.gp:
plot '<tail -n 100 data'
pause 1
reread
and run it like gnuplot continuous.gp. This will replot, every second, the last 100 entries. Unfortunately, this will cause the plot window to rise to the foreground of your display each time, which may or may not be what you want. Also, you need to figure out how to get gnuplot to interpret the timestamps. I think you need to format them in a way that consists of numbers only (although it can display them in any format).
Another possibility to consider is rrdtool. You feed data values to this tool and it will keep running summaries (min, max, avg) for the last minute, last hour, last day, and so on. These are visualized as graphs on a web page. Basically, it does exactly what you are asking for.

FTP List Command Response Structure Issue

I've been beating my head against a wall on this issue so far. My server presently responds to the LIST -a command like this:
drwxr-xr-x 1 owner group 1 Feb 21 04:37 test
drwxr-xr-x 1 owner group 129024 Feb 21 11:05 tardis.mp3
For some reason, the second one is being parsed in Filezilla as a folder instead of a file. Long story short, it's not. I know I'm missing something. Filezilla seems to not be able to see the file size.
Here's a screenshot:
Anyone have any thoughts on why Filezilla can't parse the file size? What am I missing?
Unfortunately, the LIST command output is meant for human reading and not for machine parsing. The output format is not standardised and as such not easy to parse. It might work for your particular case, but it might break if you change the FTP server software or change the locale.
As you mentioned FileZilla, you can have a look at the FileZilla directory parser for LIST command.
The best approach nowadays is to use the MLSD command for Listings for Machine Processing as the output format is well-defined and easy to parse.
The second line in the listing is a directory. It has d in the first character of the permissions field. The size doesn't matter, directories have a size field too.

Resources