read a .fit file on Linux - converters

How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks

You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.

The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.

You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)

This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp

Related

Read a tiff image in lua löve (love2d)

I wrote a little program using lua LÖVE. Now I would like to make it read some TIFF files, since LÖVE does not support this image format. And I failed.
Basically, LÖVE can read the file from some userdata. I thought that I might read the data with another library and convert it internally to a format that LÖVE supports, but can't find anything suitable. I looked at graphicsmagick lua bindings, but unfortunately it does not appear to be up to date. I tried to get it too run, but gave up after a while; I would probably have to rewrite the whole package and I can't even find some of the modules it uses (for example the "sys" module).
EDIT: Some more background. I need a fast image viewer to quickly browse through files on the disk. I do not like to use the file manager for that purpose, and I would like it to behave exactly as I want it to behave. I was using xzgv for this purpose for years.
When I discovered lua and LÖVE, I decided to write one both as an exercise and because I want to have a little tool like that (you can see how it looks like here).
Here is a solution which does not requires any libraries. Basically, the idea is that you convert an image using the convert program from the ImageMagick suite and pipe its output to a filehandle with io.popen. That way the file is read only once from the storage.
local cmd = "/usr/bin/convert %s -format JPG:-"
local file = "test.tiff"
local fh = io.popen(cmd:format(file), "r")
local fdata = fh:read("*a") -- read all
fh:close()
fdata = love.filesystem.newFileData(fdata, file)
local img = love.graphics.newImage(fdata)

Is there a way to change the projection in a topojson file?

I am trying to create a topojson file projected using geoAlbersUsa, originating from the US Census's ZCTA (Zip Codes, essentially) shapefile. I was able to successfully get through the examples in the excellent https://medium.com/#mbostock/command-line-cartography-part-1-897aa8f8ca2c using the specified maps, and now I'm trying to get the same result using the Zip Code-level shapefiles.
I keep running into various issues due to the size of the file and the length of the strings within the file. While I have been able to create a geojson file and a topojson file, I haven't been able to give it the geoAlbersUsa projection I want. I was hoping to find something to convert the current topojson file into a topojson file with a geoAlbersUsa projection but I haven't been able to find any way.
I know this can be done programmatically in the browser, but everything I've read indicates that performance will be significantly better if as much as possible can be done in the files themselves first.
Attempt 1: I was able to convert the ZCTA-level shapefile to a geojson file successfully using shp2json (as in Mike Bostock's example) but when I try to run geoproject (from d3-geo-projection) I get errors related to excessive string length. In node (using npm) I installed d3-geo-projection (npm install -g d3-geo-projection) then ran the following:
geoproject "d3.geoAlbersUsa()" < us_zips.geojson > us_zips_albersUsa.json
I get errors stating "Error: Cannot create a string longer than 0x3fffffe7 characters"
Attempt 2: I used ogr2ogr (https://gdal.org/programs/ogr2ogr.html) to create the geojson file (instead of shp2json), then ran tried to run the same geoproject code as above and got the same error.
Attempt 3: I used ogr2ogr to create the geojson sequence file (instead of a geojson file), then ran geo2topo to create the topojson file from the geojsons file. While this succeeded in creating the topojson file, it still doesn't include the geoAlbersUsa projection in the resulting topojson file.
I get from the rather obtuse documentation of ogr2ogr that an output projection can be specified using -a_srs but I can't for the life of me figure out how to specify something that would get me the geoAlbersUsa projection. I found this reference https://spatialreference.org/ref/sr-org/44/ but I think that would get me the Albers and it may chop off Alaska and Hawaii, which is not what I want.
Any suggestions here? I was hoping I'd find a way to change the projection in the topojson file itself since that would avoid the excessively-long-string issue I seem to run into whenever I try to do anything in node that requires the use of the geojson file. It seems like possibly that was something that could be done in earlier versions of topojson (see Ways to project topojson?) but I don't see any way to do it now.
Not quite an answer, but more than a comment..
So, I Googled just "0x3fffffe7" and found this comment on a random Github/NodeJS project, and based on reading it, my gut feeling is that the node stuff, and/or the D3 stuff you're using is reducing your entire ZCTA-level shapefile down to ....a single string stored in memory!! That's not good for a continent-scale map with such granular detail.
Moreover, the person who left that comment suggested that the OP in that case would need a different approach to introduce their dataset to the client. (Which I suppose is a browser?) In your case, might it work if you query out each state's collection of zips into a single shapefile (ogr2ogr can do this using OGR-SQL), which would give you 5 different shapefiles. Then for each of these, run them through your conversions to get json/geoalbers. To test this concept, try exporting just one state and see if everything else works as expected.
That being said, I'm concerned that your approach to this project has an unworkable UI/architectural expectation: I just don't think you can put that much geodata in a browser DIV! How big is the DIV, full screen I hope?!?
My advice would be to think of a different way to present the data. For example an inset-DIV to "select your state", then clicking the state zooms the main DIV to a larger view of that state and simultaneously pulls down and randers that state's-specific ZCTA-level data using the 50 files you prepped using the strategy I mentioned above. Does that make sense?
Here's a quick example for how I expect you can apply the OGR_SQL to your scenario, adapt to fit:
ogr2ogr idaho_zcta.shp USA_zcta.shp -sql "SELECT * FROM USA_zcta WHERE STATE_NAME = 'ID'"
Parameters as follows:
idaho_zcta.shp < this is your new file
USA_zcta.shp < this is your source shapefile
-sql < this signals the OGR_SQL query expression
As for the query itself, a couple tips. First, wrap the whole query string in double-quotes. If something weird happens, try adding leading and trailing spaces to the start and end of your query, like..
" SELECT ... 'ID' "
It's odd I know, but I once had a situation where it only worked that way.
Second, relative to the query, the table name is the same as the shapefile name, only without the ".shp" file extension. I can't remember whether or not there is case-sensitivity between the shapefile name and the query string's table name. If you run into a problem, give the shapefile and all lowercase name and use lowercase in the SQL, too.
As for your projection conversion--you're on your own there. That geoAlbersUSA looks like it's not an industry standard (i.e EPSG-coded) and is D3-specific, intended exclusively for a browser. So ogr2ogr isn't going to handle it. But I agree with the strategy of converting the data in advance. Hopefully the conversion pipeline you already researched will work if you just have much smaller (i.e. state-scale) datasets to put through it.
Good luck.

How do I effectively identify an unknown file format

I want to write a program that parses yum config files. These files look like this:
[google-chrome]
name=google-chrome - 64-bit
baseurl=http://dl.google.com/linux/chrome/rpm/stable/x86_64
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub
This format looks like it is very easy to parse, but I do not want to reinvent the wheel. If there is an existing library that can generically parse this format, I want to use it.
But how to find a library for something you can not name?
The file extension is no help here. The term ".repo" does not yield any general results besieds yum itself.
So, please teach me how to fish:
How do I effectively find the name of a file format that is unknown to me?
Identifying an unknown file format can be a pain.
But you have some options. I will start with a very obvious one.
Ask
Showing other people the format is maybe the best way to find out its name.
Someone will likely recognize it. And if no one does, chances are good that
you have a proprietary file format in front of you.
In case of your yum repository file, I would say it is a plain old INI file.
But let's do some more research on this.
Reverse Engineering
Reverse Engineering maybe your best bet if nobody recognizes your format.
Take the reference implementation and find out what they are using to parse the format.
Luckily, yum is open source. So it is easy to look up.
Let's see, what the yum authors use to parse their repo file:
try:
ini = INIConfig(open(repo.repofile))
except:
return None
https://github.com/rpm-software-management/yum/blob/master/yum/config.py#L1304
Now the import of this function can be found here:
from iniparse import INIConfig
https://github.com/rpm-software-management/yum/blob/master/yum/config.py#L32
This leads us to a library called iniparse (https://pypi.org/project/iniparse/).
So yum uses an INI parser for its config files.
I will show you how to quickly navigate to those kind of code passages
since navigating in somewhat large projects can be intimidating.
I use a tool called ripgrep (https://github.com/BurntSushi/ripgrep).
My initial anchors are usually well known filepaths. In case of yum, I took /etc/yum.repos.d for my initial search:
# assuming you are in the root directory of yum's source code
rg /etc/yum.repos.d yum
yum/config.py
769: reposdir = ListOption(['/etc/yum/repos.d', '/etc/yum.repos.d'])
yum/__init__.py
556: # (typically /etc/yum/repos.d)
This narrows it down to two files. If you go on further with terms like read or parse,
you will quickly find the results you want.
What if you do not have the reference source?
Well, sometimes, you have no access to the source code of a reference implementation. E.g: The reference implementation is closed source.
Try to break the format. Insert some garbage and observe the log files afterwards. If you are lucky, you may find
a helpful error message which might give you hints about the format.
If you feel very brave, you can try to use an actual decompiler as well. This may or may not be illegal and may or may not be a waste of time.
I personally would only do this as a last resort.

Using the linux 'file' command to determine type (ie. image, audio, or video)

The word file here refers to the shell file command, and not actual files. I want to determine whether a file is a, for example, video file (.mpg, .mkv, .avi). file is pretty good at returning image for image files, video for video files, and audio for audio files (and application/x-empty for some reason for text). My question is how reliable this is for identifying types. If I did a simple
file -ib deliverance.avi | grep video
would that work for all of the main video files outlined here?
The results from file are less than perfect, and it has more problems with some types of files than others. File basically just looks for particular pieces of binary data in predictable patterns to figure out filetypes.
Unfortunately, in particular, some of the filetypes often used for video fall into this "problematic" category. The newer container formats like .mp4 and .mkv usually have several different MIME types that should properly depend on what type of data is being contained. For example, an .mp4 could properly be identified as video/mp4, audio/mp4, or application/mp4 depending on the content.
In practice, file often makes guesses that simply conform with common usage, and it may work perfectly well for you. For example, while I mentioned some theoretical difficulties with identifying Matroska files correctly, file basically just assumes that any Matroska file is a video. On the other hand, the usage of the Ogg container is more evenly split between audio and video, and I believe the current version of file just splits the difference, and identifies Ogg files as application/ogg, which wouldn't fall into any of your categories.
The one thing I can say with certainty is that you want the most up-to-date version of file you can get your hands on. The "magic" files that contain the patterns to match against and the MIME types that will result from a match are updated fairly often to include newer filetypes like WebM, or just to improve accuracy for older types.
file works by referencing the header of the file against a "magic number" file. I suspect the best way to see how robust file is to check your local magic number file (possibly /usr/share/magic but see man file for details) for the file types from your referenced list.
It seems like it should work for most video/audio/image files. But, if it doesn't, there's actually a file that contains the relations between an extension and it's type:
The information identifying these files is read from the compiled magic file /usr/share/magic.mgc , or /usr/share/magic if the compile file does not exist.
see:
http://linux.about.com/library/cmd/blcmdl1_file.htm
Hope this helps!

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

Resources