Extract the screenshot page where the text is found in azure cognitive search - azure-blob-storage

I have PDF documents stored in Azure Blobs that are indexed with Azure Search. I am searching for text in the content of the PDFs and everything works correctly. When I perform the search, is it possible that Azure returns a screenshot of the page where the text was found?
For example, if I search for the word 'information', which is on page 2 of a PDF, let Azure return a screenshot of that page.
thankssss

You can find an example of this in the JFK sample. The sample uses an image store custom skill that is used to extract the images and an HOCR skill to extract the data necessary to overlay zones corresponding to the text. The full skillset can be found here.
The front-end can then use that data to build a HOCR viewer component from that data.
I encourage you to read through the sample code to get the full details, that wouldn't fit in a Stack Overflow response.

Related

Change image next to Website in Google search results

Good morning,
a customer of ours asked us if it was possible to change the image that Google shows next to his site in Google search results.
After several searches, we tried using different techniques all followed by re-indexing the page in order to instantly see the results.
We tried using structured data (both with ld+json and using microdata) and also of the attributes "og:image" and "og:title" in the "meta" tags, but none of these tests changed the image displayed on the right side next to the site in Google results.
We expected that with one of these methods would have changed the image, but nothing happened
Therefore, we wondered whether it was possible to change that image or whether Google chose the best image based on its search parameters.
Thank you for your valuable help,
Best regards

Elasticsearch - TikaOnDotNet Text Extraction page by page

We are exploring the elastic search and currently we are extracting the text from ms office documents, pdf, .eml and other file formats using TikaOnDotNet.
We want to store the document content page by page to Elasticsearch. so that we can update users that the keyword you were looking for is available on page number x.
I am not sure whether it is possible or not, if you could share you though on the same or show some direction would be greatly appreciated.
Regards,
Hiten

Pass an image get a list of URLs matching the image, HOW?

I'm essentially trying to do a reverse image search, i.e. I want to pass in an image and get back a results list of instances on the web where that image is found. I know Google's old API that did this is depreciated, I see some answers on SO (e.g. Google custom search for images only) that talk about doing an image search with Google's Custom Search API, but every time I dig into the code they are retrieving images from a string rather than what I'm trying to do. Is there currently any API that will help me with what I'm trying to do?
I'm sorry. I cannot write comments yet. How about this? https://github.com/tanaikech/goris
Recently, I found this. I don't know whether this is what you want.

Loading the first 5 images from google with specific keyword

I'm trying to load the first 5 image that comes up on google when I type a given keyword in my app. So let's say if the keyword was "Butter" I want to load the first 5 images that com up on google if you type butter.
I've been looking at the Github project SDWebImages (https://github.com/rs/SDWebImage), but it looks like you can only load the images if you have the url of the image.
Anybody know how I can do what I described, or anybody that can point me in the right direction as to what I should look at to do it.
Google has deprecated their image search API and you are now supposed to use custom search which supports images. You will need to sign up for an API key. When you make your search request you need to set your searchType parameter to image.

Google apps API, is it possible to search the text of a presentation?

I'd like to produce a list of all of the words that appear in a google docs presentation. I thought that the API would allow this, but it only seems that the spreadsheets API allows searching of the contents of the document?
This is correct, you can't get the content of the presentation with the Documents List API, but you can easily download an exported version of a presentation, for example:
GET https://docs.google.com/feeds/download/presentations/Export
?docID=0AsJD12345&exportFormat=txt
You can use plain text output and just split up the words.

Resources