Sphinxdoc/Readthedocs: redirect a "flat" structure to a hierarchical one - python-sphinx

I have a Sphinxdoc page on readthedocs with a large number (>1000) of hierarchically sorted pages, https://iraf.readthedocs.io/en/latest/tasks/index.html.
My problem is now that I also need to access them in a "flat" structure, i.e.
https://iraf.readthedocs.io/en/latest/tasks/addstar.html
should redirect to
https://iraf.readthedocs.io/en/latest/tasks/noao/digiphot/daophot/addstar.html
How could one do this? The exact URL does not matter; important is that there is a need to access the pages just by the name (like addstar in the example). There are a few conflicts (i.e. the same name at different places in the hierarchy), but they could be solved in a pragmatic "take the first one" approach.

The Sphinx extension sphinx-reredirects might do what you want.
You could also configure Read The Docs through User-defined Redirects.

Related

Applying different parsefilters to each domain in the same topology

I am trying to crawl different websites (e-commerce websites) and extract specific information from the pages of each website (i.e. product price, quantity, date of publication, etc.).
My question is: how to configure the parsing since each website has a different HTML layout which means I need different Xpaths for the same item depending on the website? Can we add multiple parser bolts in the topology for each website? If yes, how can we assign different parsefilters.json files to each parser bolt?
You need #586. At the moment there is no way to do it but to put all your XPATH expressions regardless of the site you want to use them on in the parsefilters.json.
You can't assign different parsefilters.json to the various instances of a bolt.
UPDATE however you could have multiple XpathFilters sections within the parseFilters.json. Each could cover a specific source, however, there is currently no way of constraining which source a parse filter gets applied to. You could extend XPathFilter so that it takes some extra config e.g. regular expression a URL must match in order to be applied. That would work quite nicely I think.
I've recently added JsoupFilters which will be in the next release. These should be useful for your use case but that still doesn't solve the issue that you need an implementation of the filter that organizes the resources per host. It shouldn't be too hard to implement taking the URL filter one as a example and would also make a very nice contribution to the project.

Is there an OSM XAPI tag/value list?

I'm new to OSM querying, but would like to query vector data for a large area. Thus I need to limit the results I would like to get by tagging the request.
http://www.informationfreeway.org/api/0.6/way[tag=value][bbox=x,y,z,j]
I'd like to filter for specific tag/values when querying for a way. Though I don't know which tags/values exist. Is there a list listing the most common of them?
You are approaching your problem from the wrong direction. The number of different tags is almost unlimited. According to taginfo there are currently 75 380 856 different tags. I'm pretty sure you are not interested in most of them. Likewise you are probably not even interested in many of the most common tags.
What data do you want to query?
The OSM wiki should be your starting point for generating a list of tags you are interested in. For a generic overview take a look at the map features. Are you interested in streets? Then visit at the highway key. Routing? Then take a look at the routing wiki page.
Always remember that these lists aren't complete. People can use any tag they like (but should use well-established tags whenever possible of course).
Also consider using Overpass API instead of XAPI. Overpass API is much more powerful.

What is the purpose of wierdly named subdirectories for cache directories

When looking at the "Storage" directory of Spotify's cache I realized there are a lot of subdirectories, named with 2-digit hexadecimal names. Each of them contains one or more weirdly named files.
I've come across similar directory structures created by other programs in the past, and I have always wondered what the reason for such a naming/storing scheme is.
So why would you do such a thing? What benefits does this concept hold?
I hope you are still interested in the answer.
Usually you cache things like images or content. Say we cache an image from a different url to our server for performance/stability reasons. How do you name that image? You could name it like the URL, but you cannot include slashes or other special characters in the name and the length has a limit too.
Therefore you compute a so called hash of that URL and use it as the file name. Hashes are usually no hexadecimal, so no illegal characters and their length is always the same. If you now need the image from the URL you compute its hash and check if you find it in the cache.
The reason you dont store all cached files in one directory is for size-limitations. You usually group the cached files in subdirectories based on their first characters. See this answer: softwareengineering.stackexchange.com/a/301401.
For example let's say we want to cache https://example.com/favicon.ico. The MD5-Hash of that URL is f54403d0da4a57aa79bdf459897f08bd. You have now three different options:
If the cache is expected to remain small store it like /cache/f54403d0da4a57aa79bdf459897f08bd.ico (You usually want to preserve file-extensions for a number of reasons).
Four medium caches you could do /cache/f/f54403d0da4a57aa79bdf459897f08bd.ico or remove duplicate information by trimming the first character, as it already exists in the directory name like /cache/f/54403d0da4a57aa79bdf459897f08bd.ico
For very large caches you can subdivide even more like /cache/f/5/4403d0da4a57aa79bdf459897f08bd.ico`
These are just a few examples but the basic principle keeps the same.

Should I use hashbang/shebang?

I read here that the idea of the shebang (#!) was so Google
knows that an alternative conventional
URL exists providing the same page
"state"
So, if I don't have conventional URLs corresponding to these hash-states, am I right to say that I should be using just a hash and not a shebang?
Background: The hashes are created based on a search form, and the search results are loaded on the same page. The hashes are there so that people can go back to the URL with the hashes and repeat the same search.
More broadly, is there a reason I should have real URLs corresponding to my hashes?
To be clear, is this a javascript powered site? As far as I'm concerned that would be the only reason that one might need to use a #!. With that said, even the #! can be full of issues, as could be seen for a while during the gawker switch earlier this year.

Naming convention for images in a large website

This is a question about organising lots of images in a web project. Say you had the following two icons in a web project that represeneted, for example, a product selected or a product not selected:
What would you name them?
Seems a simple question, but I suspect naming images is something of an art.
For example:
star_active.png and star_inactive.png: Seems fair enough but what if you want to replace the star at a later date with a circle say. Then your name is misleading so you would have to rename it and then update all your css etc.
product_selected.png and product_unselected.png: Great for the when used for the specific action of selecting a product but what if I wanted to use the same image for a different purpose. Then the name is confusing and too specific.
Should the image size be part of the image name? eg. someImage_16.png
What is the best naming convention you have found for naming images?
You're asking for a naming convention that predicts future attributes and applications of the file so that you never have to update the file name. That is impossible. You have to rely on your own intuition when you initially name the files.
There is no way around it. If you end up changing either a file or it's application so drastically that the file name no longer accurately reflects its use, then you will either need to keep the misleading name or replace it throughout your files.
Most decent text editors should be able to easily do the latter across multiple files.
The only alternative is to assign names which are not descriptive from the start, which is obviously not a good idea.
Listen to Kobi and look into sprites, or if you're averse to sprites, do it the way Arvin said for the reasons given.

Resources