Filter out specific page title - google-search-appliance

The GSA documentation shows how to filter out meta name-value pairs but doesn't mention how to filter out specific page titles. The nearest I could find was intitle which looks for matches on the page title. I'm trying to filter out my error page which has "Error Page" as its title from the search results. Does anyone know if this is possible?
As an aside, the reason it's coming back in the results is from 404 errors which redirect to the error page.

Are your errors redirecting as 302 to your 404 page? In that case, the GSA will keep the links and use back-off and retry to find the page on recrawl. You can change settings to meet your requirements. Here is the documentation on this for software v 7.4

Got a workaround going...rather than exclude on the title I added a new meta tag to my error pages meta name="pagetype" content="error" and then added a filter for this on my search query url params ".-pagetype:error"

Related

How to load a specific number of records per page and add an more button

On my page I would like to output all records of a specific folder
but the number should initially be limited to a certain quantity (to reduce the loading times). With a "Load more" button further records should be loaded.
Does anyone have a hint on how I can achieve this?
I have already found several approaches on the web in connection with AJAX, but since I'm not familiar with this yet, more questions than answers have emerged ...
For info: I use an own Template Extension / Distribution under Typo3 9.5.8
Thank you in advance for any help!!
The state of the art solution is the AJAX solution, where you load only the required records from the server and modify the page on the fly.
Another option would be an URL parameter which is evaluated by your extension.
With the parameter the full list is shown,
without only the first N and a button with the link to the same URL including the parameter for the full list.
Make sure the paramter is handled correctly and generates another cached version of the page. (keywords: cHash)
As you now have two pages with partially identical content: don't forget to tell the searchengines that the short variant should not be indexed.
You could use the Paginate Widget like documented here: https://docs.typo3.org/other/typo3/view-helper-reference/9.5/en-us/typo3/fluid/latest/Widget/Paginate.html
By overriding the paginate template file and only rendering the pagination.nextPage link, you could load the nextpage via AJAX.

Magento: www.mysite.com/?SID="number" is appearing in one of the google urls

I was searching on all my urls using site:www.mysite.com to check if they are all redirecting well, but I found one of the url in google search have www.mysite.com/?SID="long number".
How to remove this url from google
Thanks in advance
It is possible to remove url parameters in google webmastertools.
On the Dashboard, under Crawl, click URL Parameters.
Next to the parameter you want, click Edit. (If the parameter isn’t listed, click Add parameter. Note that this tool is case sensitive, so be sure to type your parameter exactly as it appears in your URL.)
If the parameter doesn't affect the content displayed to the user, select No ... in the Does this parameter change... list, and then click Save. If the parameter does affect the display of content, click Yes: Changes, reorders, or narrows page content, and then select how you want Google to crawl URLs with this parameter. -- URL parameters - Webmaster Tools Help

Markup tags in product descriptions

I'm trying to use markup tags to link to info pages within the "description" attribute of a product. However, it's not giving me a clean URL path when the description is printed to the page.
Trying:
Contact Us
does not give the expected url. I've confirmed i'm doing this outside of WYSIWYG mode too.
When the text is rendered is returned to the browser as this html:
Contact Us
A resulting click on the link then ends up as:
http://example.com/prod-category/my-product/%7B%7Bstore%20url=
From what I can tell the markup tags aren't designed to be used in this way. Is it possible to extend it so it could work? Otherwise I guess I need to include the actual URL in the description?
Thanks for suggestions.
You can´t use the double-curly syntax because the attributes value is not being processed by magento´s template filter, which does the magic. You can use them e.g. in CMS or email Templates out of the box.

Retrieve the content of a section via MediaWiki API

I have a MediaWiki page set up in my company's intranet.
I would like to get the content of a section in a specific page using MediaWiki API (through AJAX).
I would like to refer to the section by its title like 'General' and refer to the page by its title as well, like 'Licenses'.
Is it possible somehow?
The only thing I could achieve is referring to the page by its title and refer to the section by a number like this:
http://mywiki.local/wiki/api.php?format=xml&action=parse&prop=text&page=Licenses&section=1
But let's say I create a new section before 'General' I would have to update all my AJAX URLs that queries this page. So this isn't good enough.
I couldn't find any working solution for this. Any ideas?
You can do this by first retrieving prop=sections to get the list of sections and their numbers:
http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=License
Then make your original request, with the section number you figured out based on the previous request.
Keep in mind that two different sections can have the same name.

Is there a way to tell Google, certain elements are irrelavent to page?

I have a page that shows the main product for that page, next to it though are "related products" which when you click on them you go to their page, and they have their own related products as well. The problem is that the related products are getting indexed by Google so when you search for product-A you may get the product-B page where product-A is a related item, instead of just getting the product-A page. I am trying to prevent this. Any ideas?
Thanks!
You can add rel="nofollow" in any links you don't want a bot to crawl. In this case, you can apply that tag to all your links and google won't follow them off your main page.
http://en.wikipedia.org/wiki/Nofollow
EDIT for clarification:
Page "A" is for widgets. You want this page to be returned for searches regarding widgets; on this page is a "related searches" section which links to Other Widgets. On all the anchor tags on page "A" which link to pages "B" and "C" (the related searches for Other Widgets), you'll put a rel="nofollow" tag. This will prevent Google from hitting page A and then following your "related searches" links off to pages "B" and "C".
This will NOT prevent pages "B" and "C" from being indexed on their own, it just prevents them from getting pulled in from page "A".
EDIT#2:
rel="nofollow" tells bots you don't want them to follow the link to the second page. Regardless of the anchor text on a link from A->B, if you've nofollowed it the bot won't "flow" pagerank to the linked-to page and should not follow the link to page "B" to index it due to that tag on the anchor. Note that this is not foolproof: Yahoo and other SE's may not treat nofollow like Google....so your best bet is to make sure that each page is strongly on-page-SEO'd such that it gets included in the index for the term you want it to be included for. Hope this helps...but like much of the SEO world there are few hard-and-fast rules which apply universally.
yes... put them at the bottom of the page for content,
if you want that to appear visually at the top of the page, use a css layout to re-arrange the page elements
also, as darksquid already said, add rel="nofollow" to links you don't want considered
another tip (pertaining to your comment on darksquid's post):
You could load the content via ajax, which would keep most search engine spiders from seeing it at all (since they don't generally execute javascript)
Use Google Applicance - googleoff / googleon Tags:
http://www.geekzilla.co.uk/ViewC8614968-56ED-4729-9C12-F01677DAC412.htm

Resources