When I do an ncdump -h on some old I/O API NetCDF files, I always get the same dimensions:
dimensions:
TSTEP
DATE-TIME
LAY
VAR
ROW
COL
Are these exact dimensions and names always required in an I/O API NetCDF file? (If this is the convention, why does it require TSTEP and DATE-TIME? They sound redundant.)
Most recent documentation seems to be included in https://github.com/cjcoats/ioapi-3.2
And there is a section called FILES: Variables, Layers, and Time Steps where You can read that "There are eight types of data currently supported by the I/O API." Without knowing Your files exactly, I cannot tell what format they should precisely correspond to.
Related
How to convert image.png or image.bmp to integer array? (do not use any non-standard library)
Please ignore chunks that are not directly related to image data.(IHDR、IEND...etc.)
thank you very much.
SOLVED: I should use binary I/O function in stdio.h to read image file. thanks
If you have to read images into arrays without any image processing libraries you need two things:
You need means to read files in general.
You need to know the internal structure of the file formats you want to read.
So for png refer to https://www.w3.org/TR/2003/REC-PNG-20031110/
This document will tell you where to find the image dimensions, pixel data and other features. It's basically a manual for software developers on how to use this standard format properly.
Some image formats will require additional work like decrompression.
I trained my model in Nvidia Digits 5 and I would now like to extract the accuracy and loss plots that were generated during training for a report. Is this data saved somewhere so that it would possible to extract the data for these plots so that I could plot it in Python and perhaps ultimately modify the plots to compare different models etc?
The best solution I have found is to either look at the HTML file or to scan the text file caffe_output.log that is produced by Caffe. The text file is usually stored in /var/digits/jobs/insert_your_job_id/ but you can also just run on linux systems:
locate caffe_output.log
Go to your DIGITS job folder and locate your job's subfolder. Inside you'll find a file status.pickle, which is a pickled object containing all your job's information.
You can load it in python like so:
import digits
import pickle
data = pickle.load(open('status.pickle','rb'))
This object is somewhat generic and may contain multiple tasks. For a typical classification task it will likely be just one, but you will still need to access it via data.tasks[0]. From there you can grab the plots:
data.tasks[0].combined_graph_data()
which returns a somewhat convoluted dict (unfortunately - since your network can produce many accuracy/loss outputs, as well as even custom ones). It contains everything you need though - I managed to plot accuracy with:
plt.plot( data.tasks[0].combined_graph_data()['columns'][2][1:] )
but it's likely that you'll have to write a bit of custom code. As always, dir() is your friend.
How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks
You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.
The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.
You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)
This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp
I'm new to protocol buffers and I was wondering whether it was possible to search a protocol buffers binary file and read the data in a structured format. For example if a message in my .proto file has 4 fields I would like to serialize the message and write multiple messages into a file and then search for a particular field in the file. If I find the field I would like to read back the message in the same structured format as it was written. Is this possible with protocol buffers ? If possible any sample code or examples would be very helpful. Thank you
You should treat protobuf library as one serialization protocol, not an all-in-one library which supports complex operations (such as querying, indexing, picking up particular data). Google has various libraries on top of open-sourced portion of protobuf to do so, but they are not released as open source, as they are tied with their unique infrastructure. That being said, what you want is certainly possible, yet you need to write some code.
Anyhow, some of your requirements are:
one file contains various serialized binaries.
search a particular field in each serialized binary and extract that chunk.
There are several ways to achieve them.
The most popular way for serial read/write is that the file contains a series of [size, type, serialization output]. That is, one serialized output is always prefixed by size and type (either 4/8 byte or variable-length) to help reading and parsing. So you just repeat this procedure: 1) read size and type, 2) read binary with given size, 3) parse with given type 4) goto 1). If you use union type or one file shares same type, you may skip type. You cannot drop size, as there is no way know the end of output by itself. If you want random read/write, other type of data structure is necessary.
'search field' in binary file is more tricky. One way is to read/parse output one by one and to check the existance of field by HasField(). It is most obvious and slow yet straightforward way to do so. If you want to search field by number (say, you want to search 'optional string email = 3;'), thus search by binary blob (like 0x1A, field number 3, wire type 2), it is not possible. In a serialized binary stream, field information is saved merely a number. Without an exact context (.proto scheme or binary file's structure), the number alone doesn't mean anything. There is no guarantee that 0x1A is from field information, or field information from other message type, or actually number 26, or part of other number, etc. That is, you need to maintain the information by yourself. You may create another file or database with necessary information to fetch particular message (like the location of serialization output with given field).
Long story short, what you ask is beyond what open-sourced protobuf library itself does, yet you can write them with your requirements.
I hope, this is what you are looking for:
http://temk.github.io/protobuf-utils/
This is a command line utility for searching within protobuf file.
The word file here refers to the shell file command, and not actual files. I want to determine whether a file is a, for example, video file (.mpg, .mkv, .avi). file is pretty good at returning image for image files, video for video files, and audio for audio files (and application/x-empty for some reason for text). My question is how reliable this is for identifying types. If I did a simple
file -ib deliverance.avi | grep video
would that work for all of the main video files outlined here?
The results from file are less than perfect, and it has more problems with some types of files than others. File basically just looks for particular pieces of binary data in predictable patterns to figure out filetypes.
Unfortunately, in particular, some of the filetypes often used for video fall into this "problematic" category. The newer container formats like .mp4 and .mkv usually have several different MIME types that should properly depend on what type of data is being contained. For example, an .mp4 could properly be identified as video/mp4, audio/mp4, or application/mp4 depending on the content.
In practice, file often makes guesses that simply conform with common usage, and it may work perfectly well for you. For example, while I mentioned some theoretical difficulties with identifying Matroska files correctly, file basically just assumes that any Matroska file is a video. On the other hand, the usage of the Ogg container is more evenly split between audio and video, and I believe the current version of file just splits the difference, and identifies Ogg files as application/ogg, which wouldn't fall into any of your categories.
The one thing I can say with certainty is that you want the most up-to-date version of file you can get your hands on. The "magic" files that contain the patterns to match against and the MIME types that will result from a match are updated fairly often to include newer filetypes like WebM, or just to improve accuracy for older types.
file works by referencing the header of the file against a "magic number" file. I suspect the best way to see how robust file is to check your local magic number file (possibly /usr/share/magic but see man file for details) for the file types from your referenced list.
It seems like it should work for most video/audio/image files. But, if it doesn't, there's actually a file that contains the relations between an extension and it's type:
The information identifying these files is read from the compiled magic file /usr/share/magic.mgc , or /usr/share/magic if the compile file does not exist.
see:
http://linux.about.com/library/cmd/blcmdl1_file.htm
Hope this helps!