How do I effectively identify an unknown file format

How do I effectively identify an unknown file format - format

I want to write a program that parses yum config files. These files look like this:
[google-chrome]
name=google-chrome - 64-bit
baseurl=http://dl.google.com/linux/chrome/rpm/stable/x86_64
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub
This format looks like it is very easy to parse, but I do not want to reinvent the wheel. If there is an existing library that can generically parse this format, I want to use it.
But how to find a library for something you can not name?
The file extension is no help here. The term ".repo" does not yield any general results besieds yum itself.
So, please teach me how to fish:
How do I effectively find the name of a file format that is unknown to me?

Identifying an unknown file format can be a pain.
But you have some options. I will start with a very obvious one.
Ask
Showing other people the format is maybe the best way to find out its name.
Someone will likely recognize it. And if no one does, chances are good that
you have a proprietary file format in front of you.
In case of your yum repository file, I would say it is a plain old INI file.
But let's do some more research on this.
Reverse Engineering
Reverse Engineering maybe your best bet if nobody recognizes your format.
Take the reference implementation and find out what they are using to parse the format.
Luckily, yum is open source. So it is easy to look up.
Let's see, what the yum authors use to parse their repo file:
try:
ini = INIConfig(open(repo.repofile))
except:
return None
https://github.com/rpm-software-management/yum/blob/master/yum/config.py#L1304
Now the import of this function can be found here:
from iniparse import INIConfig
https://github.com/rpm-software-management/yum/blob/master/yum/config.py#L32
This leads us to a library called iniparse (https://pypi.org/project/iniparse/).
So yum uses an INI parser for its config files.
I will show you how to quickly navigate to those kind of code passages
since navigating in somewhat large projects can be intimidating.
I use a tool called ripgrep (https://github.com/BurntSushi/ripgrep).
My initial anchors are usually well known filepaths. In case of yum, I took /etc/yum.repos.d for my initial search:
# assuming you are in the root directory of yum's source code
rg /etc/yum.repos.d yum
yum/config.py
769: reposdir = ListOption(['/etc/yum/repos.d', '/etc/yum.repos.d'])
yum/__init__.py
556: # (typically /etc/yum/repos.d)
This narrows it down to two files. If you go on further with terms like read or parse,
you will quickly find the results you want.
What if you do not have the reference source?
Well, sometimes, you have no access to the source code of a reference implementation. E.g: The reference implementation is closed source.
Try to break the format. Insert some garbage and observe the log files afterwards. If you are lucky, you may find
a helpful error message which might give you hints about the format.
If you feel very brave, you can try to use an actual decompiler as well. This may or may not be illegal and may or may not be a waste of time.
I personally would only do this as a last resort.

Related

read a .fit file on Linux

How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks

You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.

The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.

You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)

This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp

How can one create a polyglot PDF?

I like reading the PoC||GTFO issues and one thing I found remarkable when I first discovered it, was the "polyglot" nature of their PDF files.
Let met explain: when you consider for example their 8th issue, you may unzip files from it; execute the encryption they are talking about by running it as a script and even better(worse?) with their 9th issue you can even play it as a music file!
I'm currently in the process of writing small scripts every week and writing each time a little one page PDF in LaTeX to explain the said scripts. So I would really enjoy being able to create the same kind of PDF files. Sadly they explained (partly) in their first issue how to include zip files, but they did so through three small sketches of cmd lines without actual explanations.
So my question is basically :
how can one create such a polyglot PDF file containing stuff like a zip as well as being a shell script which may be run using arguments just like normal scripts?
I'm asking here about the process of creation, not just an explanation of how this is possible. The ideal way for me would that there are already some scripts or programs allowing to create easily such PDF files.
I've tried to search the net for the keywords "polyglot files" and others of the kind and wasn't able to find any useful matches. Maybe this process has another name?
I've already read the presentation by Julia Wolf which explains how things works, but I sadly haven't had time to apply the knowledge there to real world, because I'm sadly not used to play with file headers and the way a PDF is constructed.
EDIT:
Okay, I've read more and found the 7th edition of PoC||GTFO to be really informative concerning this subject. I may end up being able to create my own scripts to do such polyglot PDF files if I have some more time to consider it.

I played around with polyglots myself after attending Ange's talks and also talking to him in person. You really need to understand the file formats to be able to nest them into each other.
However, long story short, here are some links I found extremely useful for creating polyglots:
Some older Google Code Trunk
PoC of the polyglot stuff
Especially the second link (to github) will help you creating polyglots, but also understanding how they are working and how they are implemented. Since it is mostly Python stuff and very well / clean written, it is very useful and easy to follow.

I feel dissecting some file formats would be a good place to start. You can find many file format specifications for different file types through Google, but they can be a tough read and will likely take you some time to translate into whatever language you are using.
PDF: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
ELF: https://www.cs.cmu.edu/afs/cs/academic/class/15213-s00/doc/elf.pdf
ZIP: http://kat.sdf.org/zip_file_format.txt
The language(s) you select will need a way to read and write raw bytes (not just ascii alphanumeric), so perhaps C would be good for more direct access to memory. Some Python tricks could help with open sourcing the scripts easily.
To dissect the files, you may want to build a tool kinda like https://github.com/kvesel/zipbrk/ to take them apart, then put them all back together in a polyglot format. For example, zip does not require the section headers to be at the start (or even contiguous for that matter), and PDF magic number can appear in multiple places within the file as well. I also believe I recall a polyglot tool being included in one of the PoC||GTFO publishings (maybe issue 8 or 2??) as a polyglot in the pdf file.
Don't forget the hackers bible! :)
https://nostarch.com/gtfo

Getting data from .dat files

I'm hoping somebody out there can help me with this. I'm attempting to extract some barcode data from some .dat files. Its a B Tree file system with groups of three files .dat .ix. .dia. The company that wrote the software (a long time ago) say that the program is written in Pascal. I have no experience in reverse engineering but from what I read its most likely the only way to extract the data as the structure of the database is contained in the code of the program. I'm looking for advice on where to start.

I suppose the first thing you need to do is to see if the exe you've got was written with Delphi. You can check with this: http://cc.embarcadero.com/Item/15250
Then, to see if the exe that creates those .dat files were made with 'TurboPower B-Tree Filer', the I'd suggest you download and take a look at this: http://sourceforge.net/projects/tpbtreefiler/
At this step, looking at these sources is needed to familiarize yourself with the class names used in 'TurboPower B-Tree Filer' to help determine if any of those classes were used in your exe.
Then, using 'XN Resource Editor' [search the Internet for this] or, probhably better, 'MiTeC Portable Executable Reader' [ http://www.mitec.cz/pe.html ], see if any class names are relevant.
If they are, then you're in luck --sort of. All you will need to do is to write an app using 'TurboPower B-Tree Filer' to import the data in your dat files to export or manipulate as you wish.
At that point, you might find this link useful.
TurboPower B-Tree Filer and Delphi XE2 - Anyone done it?
If, OTOH, none of the above applies; I fear the only option is to reverse engineer the exe you have.

How can I convert GOPixbuf strings to images

I have an XML file produced with Gnumeric that contains images, stored as GOPixbuf strings inside XML. They look like this:
eXyA/4KEiP9xcnf/f3+E/3l5ff9xb3L/jo2Q/29wdP+ [truncated]
For each string I have width and height, and a rowstride parameter, like in this example:
<GOImage name="Image(70)" type="GOPixbuf" width="151" height="135" rowstride="604">
Is there a reasonable way to convert that to an image - any format will do?
I'm conversant with perl and image conversion tools (imagemagick, gimp) but I have not found any documentation by googling beyond GTK or GOffice docs.

You have already found stuff that is helpful. But since there are no Perl bindings for this on CPAN, you would have to make your own if you want to use Perl.
Fortunately, you don't have to know XS to do that. You can use FFI::Platypus to create temporary bindings and only map what you need.
The docs you have probably already found have a Getting started with GOffice section. After a quick check I found that on my recent Ubuntu there is a package that contains that lib. It is called libgoffice-0.10-dev.
Now you can set that up and play around with the lib functions. Somewhere in https://developer.gnome.org/goffice/unstable/GOImage.html there probably is a method to read and convert it.
One of the good ones might be go-image-get-pixbuf, which returns a GdkPixbuf. That in turn has a very extensive documentation. Maybe what you need might be in this one.
Good luck.

ELF/DWARF Parser to Out Structure elements

Is there a way to extract the size and address of elements within a structure using an elf file? I am hoping there is a tool available that can do this and export it to a more readable format.
My end goal is to convert the ELF file to a ASAM A2L file. A open source/free tool that could do this would even be better but most companies that do this charge alot for their tools.

I don't know offhand of anything pre-canned, but it isn't very hard to modify an existing tool to do it.
The "pahole" program from the "dwarves" project does something similar. It prints a structure definition in a certain way.
There's also a "pahole.py" script for gdb that does pretty much the same thing. This would be trivial to modify to print things however you like.
If you want to get a little deeper you could write it yourself using one of the existing DWARF libraries. I like the one in elfutils, but YMMV.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio