NCBI gene database question - bioinformatics

I m trying to find gene_info file with genenames and chromosomal location. However, I can't seem to locate it on NCBI FTP site. Can anyone give me a pointer?

See: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/README for details of what is in what files at the NCBI ftp site.
If you want to get the data from NCBI itself you will need to combine multiple files, probably a gene2accession (which also includes position information) and a gene_info file which maps ids to symbols and names etc.
It is probably more convenient to go to the UCSC site for this information, they also provide a public mysql database if you want to explore what is available:
http://workshops.arl.arizona.edu/sql1/sql_workshop/mysql/mysqlclient.html
If you just want human, mouse or rat data then the Rat Genome Database has already compiled the data you want (fresh from the NCBI and Ensembl sources):
ftp://rgd.mcw.edu/pub/data_release
e.g. for human data look at: ftp://rgd.mcw.edu/pub/data_release/GENES_HUMAN.txt

Related

read a .fit file on Linux

How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks
You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.
The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.
You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)
This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp

Getting data from .dat files

I'm hoping somebody out there can help me with this. I'm attempting to extract some barcode data from some .dat files. Its a B Tree file system with groups of three files .dat .ix. .dia. The company that wrote the software (a long time ago) say that the program is written in Pascal. I have no experience in reverse engineering but from what I read its most likely the only way to extract the data as the structure of the database is contained in the code of the program. I'm looking for advice on where to start.
I suppose the first thing you need to do is to see if the exe you've got was written with Delphi. You can check with this: http://cc.embarcadero.com/Item/15250
Then, to see if the exe that creates those .dat files were made with 'TurboPower B-Tree Filer', the I'd suggest you download and take a look at this: http://sourceforge.net/projects/tpbtreefiler/
At this step, looking at these sources is needed to familiarize yourself with the class names used in 'TurboPower B-Tree Filer' to help determine if any of those classes were used in your exe.
Then, using 'XN Resource Editor' [search the Internet for this] or, probhably better, 'MiTeC Portable Executable Reader' [ http://www.mitec.cz/pe.html ], see if any class names are relevant.
If they are, then you're in luck --sort of. All you will need to do is to write an app using 'TurboPower B-Tree Filer' to import the data in your dat files to export or manipulate as you wish.
At that point, you might find this link useful.
TurboPower B-Tree Filer and Delphi XE2 - Anyone done it?
If, OTOH, none of the above applies; I fear the only option is to reverse engineer the exe you have.

Is there a part of a windows file that can't be modified?

I'm trying to accomplish something that will let a user download a file from a web application onto their system. The file will contain a unique five digit code. Using this unique five digit code the users can search for a file in their file system.
I'm wondering where is the best place to put this five digit code in a file so that users can easily search for the file. The simplest approach would be to put it in the name of the file, however, users can change the name of the file easily.
I'm looking for a filed where I can put the code so that users won't be able to modify it but will still be able to search for it. Is this possible?
If you say File.. what kind of file format do you mean. I'm asking because a file is just a pile of bytes and you can append your 5 digit code every where in the file, if it is your own file format. But if you tell us which file format you use, probably there are some fields which can be used to search for it. As example Tiff has many tags. Images have other meta data. etc

Generating vector data (points) for OpenLayers Cluster

In my web application I am going to use OpenLayers.Strategy.AnimatedCluster strategy due to the fact that I need to visualize a great amount of point features. Here is a very good example of what it looks like. In both examples in above mentioned example the data (point features) are generated of taken from the GeoJSON file.
So, can anybody provide me with a file containing 100 000+ (better is even 500 000+) features (world cities, for instance), or explain how I can generate them so that they will be located all over the world (not like in Spain in the first example in above mentioned link).
use a geolocation database to supply you the data you need. GeoLite, for example
If 400K+ locations is ok, use download their CSV CITY LIST
If you want more, then you might want to give the Nominatim downloads, but they are quite bulky (more than 25GB) and parsing data is not as simple as a csv file.

Programmatically find common European street names

I am in the middle of designing a web form for German and French users. Within this form, the users would have to type street names several times.
I want to minimize the annoyance to the user, and offer autocomplete feature based on common French and German street names.
Any idea where I can a royalty-free list?
Would your users have to type the same street name multiple times? Because you could easily prevent this by coding something that prefilled the fields.
Another option could be to use your user database as a resource. Query it for all the available street names entered by your existing users and use that to generate suggestions.
Of course this would only work if you have a considerable number of users.
[EDIT] You could have a look at OpenStreetMap with their Planet.osm dumbs (or have a look here for a dump containing data for just Europe). That is basically the OSM database with all the map information they have, including street names. It's all in an XML format and streets seem to be stored as Ways. There are tools (i.e. Osmosis) to extract the data and put it into a database, or you could write something to plough through the data and filter out the street names for your database.
Start with http://en.wikipedia.org/wiki/Category:Streets_in_Germany and http://en.wikipedia.org/wiki/Category:Streets_in_France. You may want to verify the Wikipedia copyright isn't more protective than would be suitable for your needs.
Edit (merged from my own comment): Of course, to answer the "programmatically" part of your question: figure out how to spider and scrape those Wikipedia category pages. The polite thing to do would be to cache it, rather than hitting it every time you need to get the street list; refreshing once every month or so should be sufficient, since the information is unlikely to change significantly.
You could start by pulling names via Google API (just find e.g. lat/long outer bounds - of Paris and go to the center) - but since Google limits API use, it would probably take very long to do it.
I had once contacted City of Bratislava about the street names list and they sent it to me as XLS. Maybe you could try doing that for your preferred cities.
I like Tom van Enckevort's suggestion, but I would be a little more specific that just looking inside the Planet.osm links, because most of them require the usage of some tool to deal with the supported formats (pbf, osm xml etc)
In fact, take a look at the following link
http://download.gisgraphy.com/openstreetmap/
The files there are all in .txt format and if it's only the street names that you want to use, just extract the second field (name) and you are done.
As an fyi, I didn't have any use for the French files in my project, but mining the German files resulted (after normalization) in a little more than 380K unique entries (~6 MB in size)
#dusoft might be onto something - maybe someone at a government level can help? I don't think that a simple list of street names cannot be copyrighted, nor any royalties be charged. If that is the case, maybe you could even scrape some mapping data from something like a TomTom?
The "Deutsche Post" offers a list with all street names in Germany:
http://www.deutschepost.de/dpag?xmlFile=link1015590_3877
They don't mention the price, but I reckon it's not for free.

Resources