Extracting Wikimedia pageview statistics

Extracting Wikimedia pageview statistics - download

Wikipedia provides all their page views in a hourly text file. (See for instance http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-01/)
For a project is need to extract keywords and their associated page views for the year 2014. But seeing that one file (representing 1 hour, consequently totalling 24*365 files) is ~80MB. This can be a hard task doing manual.
My questions:
1. Is there any way to download the files automatically? (the files are structured properly this could be helpful)

Download? Sure, that's easy:
wget -r -np http://dumps.wikimedia.org/other/pagecounts-raw/
Recursive wget does it. Note, these files are deprecated now; you probably want to use http://dumps.wikimedia.org/other/pagecounts-all-sites/ instead.

I worked on this project: https://github.com/idio/wikiviews
you just call it like python wikiviews 2 2015 and it will download all the files for February 2015, and join them in a single file.

Related

Partial update of tags in ctags

I have a large code base, which universal ctags takes 10 minutes to process. When I develop it, I modify a small number of files and would like to update the tags of those files only (to avoid spending 10 minutes again).
ctags's --append option does not work for me: it either erases the previous tags file and I only get the few modified files in it, or it appends the modified files and I get duplicated tags for those. Is there an option that erases only the modified files, and then appends the new tags?
I would also accept a program that merges 2 tags files, in the way described above.

SVN: How to list author, date and comments from svn log

I am using SVN on Windows 10 machine. I want to list Author, Date and Comment of all commits within a date range. So I want to report 1 line per commit and each line has 3 columns. How can I do that?
I want to be able to copy that report and paste in Excel.
Thanks

Short answer
Nohow. You can't change format of log output in pure SVN, you can only disable (-q option) log-message in separate line(s)
Longer answer
Because svn log have always single (documented) format of output and -r option accept date as parameters you can write appropriate log-command and post-process results (in standard human-readable form or in xml-output)
Long answer
If generating different custom reports from SVN-repositories is your long-running regular task, you can to think (at least) about using Mercurial (with hgsubversion) as interface for processing data. With HG you'll have
- transparent access to original SVN-repos
- full power of templating and revsets for extracting and manipulating of data for your needs and requirements

What you are looking for is called the Subversion Webview. These are third party mostly free to use web view of your repository where you can filter out commints like the following:
You can either filter there in the view or copy it in excel and add a filter yourself.
Hope this helps.

Using 7za x -aou to extract multiple files and keeping duplicates results in duplicate files having creation dates 1 hour apart, why?

I currently have about 102 zip files, of which I would like to combine them into one folder. A lot of the files within the zip files between zip files have the same name and content. I do not want them to overwrite. I used the following command:
7za x '*.zip' -aou -o/Path/To/Export/To
This works fine in that, say if zipfile1.zip and zipfile2.zip had the same file called IMG.jpg, with the EXACT contant, it would create two names, one with IMG.jpg and the other with IMG_1.jpg.
HOWEVER, I noticed that upon comparing the files, the creation/modification time was off by 1 hour. Is there a reasonable explanation for why?

According to this forum, it is not supported to preserve creation time due to lack of interest in the 7-zip team. It's not a great answer, but it seems to be the answer.

read a .fit file on Linux

How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks

You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.

The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.

You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)

This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp

YouMax 2.0 change max-results

I found this plug-in called YouMax which embeds your youtube channel into your website, the only problem that I'm having with this plug-in is changing the amount of video results that are collected, it default is 25 videos I want to chane this to another value like 12 or 24.
http://www.codehandling.com/2013/03/youmax-20-complete-youtube-channel-on.html?m=1

There seems to be 3 sections to this plug-ins results: Featured, Uploads, and Playlists.
I edited the youmax.min.js file for the Featured section because it is the first results page that loads. My edit was very small. Essentially, I added the following:
&start-index=1&max-results=2
at the end of the string var apiFeaturedPlaylistVideosURL
This var is located inside the function: function getFeaturedVideos(playlistId)
You can change the result from 2 to 12 or whatever you want and that will be the max amount of results you get back from youtube.
Also- you can add this same argument (&start-index=1&max-results=2) to the Uploads and Playlists function in the youmax.min.js file if thats where you want to limit your results instead (or in addition to Featured section).
I created a copy of my edited youmax.min.js file in jsfiddle. My edit comes on line 152 on jsfiddle. Try downloading it and giving it a try. I hope it helps:
http://jsfiddle.net/wCKKU/

Youmax 2.0 (free version) has been upgraded which has the maxResults option builtin - http://demos.codehandling.com/youmax/home.html
You already get a "maxResults" option with the plugin and a "Load More" functionality.
Regarding the timestamps, you can try the PRO version which has options to display relative timestamps (2 hours ago) or fixed timestamps (23 March 2016)
Cheers :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting Wikimedia pageview statistics - download

Download? Sure, that's easy: wget -r -np http://dumps.wikimedia.org/other/pagecounts-raw/ Recursive wget does it. Note, these files are deprecated now; you probably want to use http://dumps.wikimedia.org/other/pagecounts-all-sites/ instead.

I worked on this project: https://github.com/idio/wikiviews you just call it like python wikiviews 2 2015 and it will download all the files for February 2015, and join them in a single file.

Related

Partial update of tags in ctags

SVN: How to list author, date and comments from svn log

Using 7za x -aou to extract multiple files and keeping duplicates results in duplicate files having creation dates 1 hour apart, why?

read a .fit file on Linux

YouMax 2.0 change max-results

Categories

Resources