Google Search Appliance displays results on the left half side of the web page - google-search-appliance

I am a newbie to Google Search Appliance(GSA). I would like to display the result of the GSA on the whole web page, but right now it is displayed on the left half section of the web page. Is there any setting in the GSA or GSA defaultfrontend XSLT stylesheet where I can change this display setting of the web page. I might be missing something obvious. Thank you for your help.

Yes, you can change that from the stylesheet. I think you are asking about changing the width of the search results. Search the following code in the stylesheet:
<xsl:if test="$res_cluster_position = 'right'">
div.main-results-without-dn {
margin-right: 15.1em;
}
</xsl:if>
You will need to change the value of margin-right.

Related

Setting the correct xpath

I'm trying to set the right xpath for using RSelenium, but I'm not very experienced in this area, so any help would be much appreciated.
Since I'm not allowed to post pictures yet I have tried to add a link to a screenshot of the html:
The html
I need R to scrape the dates (28-10-2020 - 13-11-2020), but so far I have not been able to set the correct xpath when using html.nodes.
I'm trying to scrape from sites like this one: https://www.boligsiden.dk/adresse/topperne-9-3-33-2620-albertslund-01650532___9__3__33
I usually do this on python rather than R
As you can see in this image when you right-click on the element concerned. You get a drop-down menu with an x-path to the element.
Other than that, the site orientation and x-path might change and a full x-path might be a good option in the short-run, so I rather prefer driver.find_element_by_xpath('//button[contains(text(),"Login")]')\ .click()
In your case which would be find_element_by_xpath('//*[contains(#class, 'u-pb-4 u-block')]')
I hope this helps and it is mostly the same across different languages

Uipath studio- data scraping error appear after I modified the selector

I use UiPath and data scraping activity. First open the browser direct to the e-commerce site and search the product. Everything is fine, until after the product was searched and results were shown, the data scraping stopped and the output gives the following error message which I couldn't understand why:
This is because I had previously edited the selector. Currently, my selector is:
<html app='chrome.exe' title='Qoo10 - "ItemsFList" Search Results : (Q·Ranking): Items now on sale at qoo10.sg' />
My previous selector did not causes any error and the selector was:
<html app='chrome.exe' title='Qoo10 - cooking oil; Search Results : (Q·Ranking): Items now on sale at qoo10.sg' />
The ItemsFList is actually a String variable I created. This variable stores a list of text in String format. It stores the exact same text as the rpa input into the search box at the e-commerce site when the rpa begins running.
UiPath tries to write as specific a selector as it can based on the data you provide it. Unfortunately, sometimes that selector is too specific.
For example, when you scrape a page, it includes the title of the page in the selector. But the page title will change if you are looping through more than one page. And sometimes the page title is completely dynamic, perhaps including a variable that changes every time the page is loaded. If the title is hard-coded into the selector, your program will only work if that page remains constant, which rarely happens.
Remove the title
You can use wildcards in the title to make this part of the selector more generic. Quite frankly, my experience is that that title is rarely needed at all, so I just remove it whenever I do a UiPath web scrape of HTML pages.
As you can see in the image below, the title is deselected. You can then click the orange ? Validate button to confirm that the page scraping will still work without the title. If everything goes green, you're good to go.
As you've found, the title almost always gets in the way.
The issue is your UI selector. It is clear with your error as you can see the title is dynamic and you're relying on the title to find the browser window or browser control. You have to make your selector more generic and than it should work. Try go through the UI explorer or UIPath Documentation. The few options you can try in your selector is:
Remove title from the selector <html app='chrome.exe' />
Or make Title generic
<html app='chrome.exe' title='Qoo10 - *' />
Note the * sign in the title that will make it more generic and please go through their documentation

Top level navigation from html link in code in Markup slice

I had seen a comments that Apache Superset was edited to allow user activation of top level navigation for links in a Markup slice (so clicking a link redirects the page instead of just the contents of the slice). Does anyone know how to enable this option?
You can use HTML coding in the mark-up.
It works

Lazy-loading images via JS with <noscript> fallback

I'm using a jQuery slider to display a series of images and none of them are showing in a Google image search, even though we rank at the top of normal search results for the relevant keyword. My suspicion is that Google is not indexing the images because they're being (lazy-)loaded into the slider with JavaScript via the data-image attribute. It is critical for performance purposes that I lazy-load the images and not use a set of standard <img> tags instead, so I'm trying to figure out the best way to serve the assets in a way that's more easily indexed by search engines. I'm considering using the <noscript> tag within the slide markup as follows:
<li class="slide" data-image="img/image.jpg">
<div class="caption">IMAGE INFO</div>
<noscript><img src="img/image.jpg" alt="Image info" width="x" height="x"></noscript>
</li>
I'm curious if there are any potential issues with this approach, or if something entirely different would be preferable? Will search engines still consider this markup relevant with respect to SEO if it's contained within <noscript> tags?
Thanks for any insight here.
The noindex tag is a solution, but not the best one.
I had the same problem, first I tried image sitemaps, then noindex tag and finally I found the best solution.
I wrote a blog post on this with a fully working example:
Lazy loading and the SEO problem, solved!
The best solution is to use the method provided by Google to index AJAX contents. But it is not limited to AJAX, in fact you can use it on any dynamically generated content.
In my sample I use this method for an image gallery that loads images dynamically.
In a few words you have to use escaped fragments.
A fragment is the last part of the URL, prefixed by #. Fragments are not propagated to the server, they are used only on the client side to tell the browser to show something, usually to move to a in-page bookmark.
If instead of using # as the prefix, you use #!, this instructs Google to ask the server for a special version of your page using an ugly URL. When the server receives this ugly request, it's your responsibility to send back a static version of the page that renders an HTML snapshot (the not indexed image in our case).
I generate HTML snapshots on the server side using ASP.NET (but you can generate them with any technology).
var fragment = Page.Request.QueryString["_escaped_fragment_"];
if (!String.IsNullOrEmpty(fragment))
{
var escapedParams = fragment.Split(new[] { '=' });
if (escapedParams.Length == 2)
{
var imageID = escapedParams[1];
// Render the page with the gallery showing the requested image (statically!)
...
}
}
The drawback of noscript tags method is that you provide a poor user experience, in fact the user is not able to bookmark the page showing a specific image.
Using fragments and JavaScript you give users the best experience
if (window.location.hash)
{
// NOTE: remove initial #
var fragmentParams = window.location.hash.substring(1).split('=');
var imageToDisplay = fragmentParams[1]
// Render the page with the gallery showing the requested image (dynamically!)
...
}

Is there a way to tell Google, certain elements are irrelavent to page?

I have a page that shows the main product for that page, next to it though are "related products" which when you click on them you go to their page, and they have their own related products as well. The problem is that the related products are getting indexed by Google so when you search for product-A you may get the product-B page where product-A is a related item, instead of just getting the product-A page. I am trying to prevent this. Any ideas?
Thanks!
You can add rel="nofollow" in any links you don't want a bot to crawl. In this case, you can apply that tag to all your links and google won't follow them off your main page.
http://en.wikipedia.org/wiki/Nofollow
EDIT for clarification:
Page "A" is for widgets. You want this page to be returned for searches regarding widgets; on this page is a "related searches" section which links to Other Widgets. On all the anchor tags on page "A" which link to pages "B" and "C" (the related searches for Other Widgets), you'll put a rel="nofollow" tag. This will prevent Google from hitting page A and then following your "related searches" links off to pages "B" and "C".
This will NOT prevent pages "B" and "C" from being indexed on their own, it just prevents them from getting pulled in from page "A".
EDIT#2:
rel="nofollow" tells bots you don't want them to follow the link to the second page. Regardless of the anchor text on a link from A->B, if you've nofollowed it the bot won't "flow" pagerank to the linked-to page and should not follow the link to page "B" to index it due to that tag on the anchor. Note that this is not foolproof: Yahoo and other SE's may not treat nofollow like Google....so your best bet is to make sure that each page is strongly on-page-SEO'd such that it gets included in the index for the term you want it to be included for. Hope this helps...but like much of the SEO world there are few hard-and-fast rules which apply universally.
yes... put them at the bottom of the page for content,
if you want that to appear visually at the top of the page, use a css layout to re-arrange the page elements
also, as darksquid already said, add rel="nofollow" to links you don't want considered
another tip (pertaining to your comment on darksquid's post):
You could load the content via ajax, which would keep most search engine spiders from seeing it at all (since they don't generally execute javascript)
Use Google Applicance - googleoff / googleon Tags:
http://www.geekzilla.co.uk/ViewC8614968-56ED-4729-9C12-F01677DAC412.htm

Resources