Dynamic navigation based on metadata and URL patterns - google-search-appliance

Is it possible to create a dynamic navigation filter based on metadata and URL patterns?
One part of the pages on our site has metadata and other part not. We need to build a dynamic navigation filter which will include page with and without metadata.
E.g
Part of pates contains metadata:
<meta name="content_format" content="Video"/>
<meta name="content_format" content="Audio"/>
Other part can be recognized by URL pattern:
http://test.com/.*.pdf - Document
http://test.com/blogs/.* - Blog
The filter should look like this:
Content Format
- Video
- Audio
- Document
- Blog
I reviewed documentation, but didn't find such opportunity.
Perhaps there are some workarounds?

You can create dynamic navigation out of metadata or entity extraction but not both.

Related

how does html link parser preprocessor works in jmeter

I want to know about the working of HTML links parser preprocessor that how does it work how to retrieve all links and all other elements that are present in the HTML response. As far as I have checked on each blog it is written that .* will extract all links but what about other elements what if I don't want links and I want to test with other elements like I want to fetch image source or I want to play with drop down or radio button available in response . How can I extract those?
Is there going to be any other regex for that or the same one .*?
As per documentation
This modifier parses HTML response from the server and extracts links and forms
so there are 2 main use cases for the HTML link parser:
site links crawling (spidering)
submitting random data into a form
In both cases you need to provide a Perl-5 Compatible Regular Expression in order to limit crawling to current domain or narrow down options selection.
If you need to fetch image(s) source(s) the best option would be using CSS/JQuery Extractor configured like:
Selector: img
Attribute: src

My `pdf` generated through RTD is having all articles of content in the heading content Why?

I have just recently (yesterday) started using sphinx and read the docs for my project.
Till now I am happy with the Html documentation but the pdf version includes all the articles That appear in the index within the Contents heading. And the Documents orignal content/index is simply comprised of two links.
Please help.
The documentation is at http://todx.rtfd.io and the pdf is here.
When generating the PDF, Sphinx is always adding the content that is referenced via a .. toctree:: directive exactly where the directive is placed.
In contrast, when generating HTML, Sphinx is placing the content after the file that contains the toctree.
In this aspect, PDF generation and HTML generation are simply inconsistent.
Thus, you should place all content of the index page before the table of contents.
In case you want to provide an overview of the most important sections inline, you can use a list of references. Then, you might want to hide the toctree using the hidden property, for example like this:
Contents
--------
- :ref:`quickstart`
- :ref:`userguide`
Features
--------
- Fast
- Simple
- Inituitive
- Easy to Use
- Offline
- Open Source
.. toctree::
:hidden:
quickstart
userguide

Aggregate results with different metadata content in one dynamic facet

We have a need to combine different content under a single filter.
For example:
There are two pages with content_format meta tag.
One page with video content: <meta name="content_format" content="Video"/>
And other with audio: <meta name="content_format" content="Audio"/>
We should create one facet "Video/Audio" with both pages in this facet.
I found in documentation that it is possible to change "Display Label" for dynamic navigation through Entity Recognition:
The display label for the attribute appears on the search results page. The display label can be different from the name of the entity as configured for entity recognition or the attribute in HTML. For example, for "pub" in the following META tag,
<META NAME="pub" CONTENT="Google">, you might use the display label "Publisher."
https://www.google.com/support/enterprise/static/gsa/docs/admin/72/admin_console_help/serve_dynamic_navigation.html#displaylabel
I think about ability to combine several metadata values through Entity Recognition,but at the same time, the documentation for Entity Recognition says that Entity will not be created for metadata:
The search appliance extracts entities from the content of documents; it does not extract entities from the metadata associated with a document.
https://www.google.com/support/enterprise/static/gsa/docs/admin/72/admin_console_help/crawl_entity_recognition.html
Is it possible to aggregate search results with different metadata content(with equal metadata name) in one dynamic facet?
You won't be able to do this. Also, the display label stuff wouldn't help you here, it just lets you specify the label that would be shown to the user.
To solve this problem, you'll have to find a way to aggregate the metadata prior to indexing. If you're dynamically generating content, simply add a new metadata field if the current metadata field would match audio or video. If you're not dynamically generating content, use a metadata feed or crawl proxy to inject the new metadata field.

Retrieve the content of a section via MediaWiki API

I have a MediaWiki page set up in my company's intranet.
I would like to get the content of a section in a specific page using MediaWiki API (through AJAX).
I would like to refer to the section by its title like 'General' and refer to the page by its title as well, like 'Licenses'.
Is it possible somehow?
The only thing I could achieve is referring to the page by its title and refer to the section by a number like this:
http://mywiki.local/wiki/api.php?format=xml&action=parse&prop=text&page=Licenses&section=1
But let's say I create a new section before 'General' I would have to update all my AJAX URLs that queries this page. So this isn't good enough.
I couldn't find any working solution for this. Any ideas?
You can do this by first retrieving prop=sections to get the list of sections and their numbers:
http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=License
Then make your original request, with the section number you figured out based on the previous request.
Keep in mind that two different sections can have the same name.

SEO with keywords for a website based on images?

I have a website with newspapers frontpages so pages contain only big images and no text (neither description, because images change daily).
What's the best way to insert context keywords in the pages?
Is correct insert in body only keywords without link?
if you can change image names put keywords in those & also use related words, synonyms, plural / singular
also add alt & title attributes
Even though your pages do not have any content (only images), you still can provide proper page descriptions which will be used by Google (and other search engines).
A couple of handy tips:
Create unique, accurate page titles using <title> tag placed within the <head> tag.
Please bear in mind that Google does not recommend putting keywords into the title tag. So it is very good practice to make sure that your title effectively communicates the topic of the page's content.
Use the description meta tag (<meta name="description" content="">) to gives any search engine a summary of what the page is about. It is very good practice to use unique descriptions for each page.
Use the keywords meta tag (<meta name="keywords" content="">) to give page related keywords.
Also, as far as images are concerned, I would recommend using proper use of alt and title attributes providing description of your image content. Image names can sometimes be composed of some identifiers which are meaningless for customers.
Please have a look at Matt Cutts Blog page: Gadgets, Google, and SEO where Matt describes in details importance of correct informations in alt and title attributes of an image tag.

Resources