I want to link to a random question within the "resolved questions" section of Yahoo Answers.
I've found some js techniques which involve assigning numbers to each URL so the clicked link chooses randomly from a list I'd create that way. But there are 10's of thousands of resolved questions, and new ones added every day. So that method won't work for me.
How can I link to a random "resolved question?"
You could use the Yahoo! Answers API to get the data you need (in either XML or JSON).
The documentation is available here: http://developer.yahoo.com/answers/
Related
My scenario is that our marketing department have various (FAQ related) URL's , each of which contains approximately 20 question and answer pairs, per URL.
I have imported these URL's into QnA maker and (importantly) the marketing department have finessed and tidied the answers held in the knowledge base (* reason below)
The marketing department still use a process where they add new question and answers to the relevant URL when needed, so - for example URL "X" might contain 20 Question and Answer pairs currently , all of which have been imported (and tidied/finessed) in the KB.
They want to add a new question and answer pair to the URL (i.e. make the change on the website), and then they want to "Refresh" that URL into the Knowledge Base. The problem is (I believe), that the original 20 question and answers pairs on the web url will actually over-write the "finessed" ones in the KB.
Is there any way to avoid this? Ideally just (somehow) bring in the "newly added" Question and Answer pair? (without having to copy/paste the content from the web to the KB?)
(*)the reason this needed doing was simply that, on the source URL's the answers were quite verbose, and included images, which work fine on the actual website, but when these answers are imported , they didn't look elegant within the bot , so they have been edited (within QnA maker) so they now look good when yielded by the bot)
Importing URLs into QnA Maker that only contain new content will always overwrite a knowledgebase with the content found within. This is true whether that is from doc linked from a URL, an uploaded doc, or some other method. If you are wanting to add new questions to an existing knowledgebase, then you first need to export that KB, make updates to the content contained within it (i.e. add your new questions to the old), and then upload/provide a link to that new data set.
If you are wanting to do this via REST API, then you should look over the QnA Maker REST API Reference docs. By using calls like GET, LIST, and UPDATE, you should be able to achieve what you are wanting to do. There are samples you can reference, as well, to help you on your way.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I am trying to import, into google sheets, the last quarter's research and development expense for a few thousand companies from their financial statements. While I want to import several different elements from financial statements, the last quarter R&D expense is currently pertinent (and potentially the previous 3 quarters).
I have tried several different sites (yahoo finance, bloomberg, etc) but the simplest URL seems to be from stockrow.com because I can simply automate the substitution of the stock ticker in the URL.
To get the xpath, I inspect the element and copy the xpath using the browser (have tried with Chrome and Firefox).
I am using IMPORTXML on googlesheets and, on my last attempt, used the following input: =IMPORTXML("https://stockrow.com/JNJ/financials/income/quarterly","/html/body/div[1]/div/div/section/div/div[2]/div[1]/section[4]/div/div[3]/div/div/div[3]/div/div/div[11]/div/span")
I have attempted all sorts of combinations of sites, browsers, and xpaths related to the element, but no matter what I do, I always get the same error "Imported content is empty."
I read xpath google sheet importxml but can't make sense of what is happening in the change to the xpath or how to solve this particular challenge.
Because I want this to be repeatable across multiple stock tickers in google sheets, I am hoping that the "location" of the R&D expense (and other elements in the financial statement) are consistent across all pages, rather than just a specific solution to this challenge.
Looking forward to receiving guidance. Thanks!!
you need some other source. Google Sheets does not support the scraping of JavaScript elements. you can test JS dependency simply by disabling JS for a given site and what's left can be scraped. in your case its nothing:
I'm brand new to programming (though I'm willing to learn), so apologies in advance for my very basic question.
The [SEC makes available all of their filings via FTP][1], and eventually, I would like to download a subset of these files in bulk. However, before creating such a script, I need to generate a list for the location of these files, which follow this format:
/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm
51143 = the company ID, and I already accessed the list of company IDs I need via FTP
000005114313000007/0000051143-13-000007 = the report ID, aka "accession number"
I'm struggling with how to figure this out as the documentation is fairly light. If I already have the 000005114313000007/0000051143-13-000007 (what the SEC calls the "accession number") then it's pretty straightforward. But I'm looking for ~45k entries and would obviously need to generate these automatically for a given CIK ID (which I already have).
Is there an automated way to achieve this?
Welcome to SO.
I'm currently scraping the same site, so I'll explain what I've done so far. What I am assuming is that you'll have the CIK numbers of the companies you're looking to scrape. If you search the company's CIK, you'll get a list of all of the files that are available for the company in question. Let's use Apple as an example (since they have a TON of files):
Link to Apple's Filings
From here you can set a search filter. The document you linked was a 10-Q, so let's use that. If you filter 10-Q, you'll have a list of all of the 10-Q documents. You'll notice that the URL changes slightly, to accommodate for the filter.
You can use Python and its web scraping libraries to take that URL and scrape all of the URLs of the documents in the table on that page. For each of these links you can scrape whatever links or information you want off the page. I personally use BeautifulSoup4, but lxml is another choice for web scraping, should you choose Python as your programming language. I would recommend using Python, as it's fairly easy to learn the basics and some intermediate programming constructs.
Past that, the project is yours. Good luck, I've posted some links below to get you started. I'm only allowed to post two links since I'm new to the site, so I'll give you the beautiful soup link:
Beautiful Soup Home Page
If you choose to use Python and are new to the language, check out the codecademy python course, and don't forget to check out lxml, as some people prefer it over BeautifulSoup (some people also use both in conjunction, so it's all a matter of personal preference).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 12 months ago.
Improve this question
I've read over the Google specification for crawling AJAX-enabled pages. Since part of Google's indexing method uses the URL itself, will converting to !# negatively effect SEO?
For instance, if I have a page at www.mysite.com/surfing, Google will be likely to rate it highly if a user searches for "surfing" because it has "surfing" in the URL. Would the same be true for www.mysite.com/#!surfing or does it ignore the hash fragments for the purposes of weighting the URL itself?
Perhaps you have already read in the google Ajax-crawling instructions that the !# is actually transformed into ?_escaped_fragment_ by the google crawler. So let's use your example:
www.mysite.com/#!surfing , the google crawler will see the link as www.mysite.com/?_escaped_fragment_=surfing . So it comes to the question : what is better for google SEO a link with a paremeter ?_escaped_fragment_=surfing or without one /surfing ?
Search engineer representatives have confirmed on numerous occasions that URLs with more than 2 dynamic parameters may not be spidered unless they are perceived as significantly important (i.e. have many, many links pointing to them). So unless you're using too many parameters in the url, you don't have much to worry about. If you haven't done it already, you can always read the detailed google documentation https://developers.google.com/webmasters/ajax-crawling/docs/getting-started . Now, just an advice - don't rely on # in your AJAX website. Use history.pushState() to change your url to whatever you wish. I use #! only on browsers that don't support history.pushState() like IE. The problem with the SEO with #! doesn't come form the url but from the difficulties in the Server Side processing of the information needed to provide HTML snapshot for the crawler.
The question is old.
Now Google not supports AJAX-Crawling anymore:
https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html
And this document officially deprecated:
https://developers.google.com/search/docs/ajax-crawling/docs/getting-started
So don't use hashbangs in URLs.
Traditionally, from SEO perspective, hash tag (#) is used to avoid the following issues
-Cannibalization issues
-Affiliate URLs (Here is a good article about how to use hash for tracking purpose instead of using question mark in the URL)
-Show limited content on the page (pagination issues)
The usage you are refering to is what Google recommends on how to make AJAX pages being able to be read by Google - https://support.google.com/webmasters/answer/174992?hl=en
For more info about hash tag and its SEO benefits, check this blog post - https://digitalreadymarketing.com/adding-hash-in-urls-seo-benefits/
In My personal opinion and 8 years in SEO & development It won't harm but it depends more on the site other parameters so adding the !# won't do harm...
Do you have the site URL so I can take a more in-depth Look ?
That could cause a problem if Google's crawler thought that there could be an infinite number of possibilities. Like with a ? in the url. But the answer beyond that is clear.
website.com/oreo-cookies
is more semantic and easier to understand for both people and crawlers than
website.com/#!oreo-cookies
But is this going to have a major impact? If you were a client paying me for SEO, I would tell you that your incoming text links with relevant keyword phrases from relevant related websites is far more important. I would also say that if you are submitting an xml sitemap for google to digest, and lots of popular websites are using the #! google will figure it out and ignore it.
So bottom line, if my content was worth linking to, and I made sure google was finding all my pages and indexing them, I would not worry about it.
I think that it will not harm your SEO in any way I am in SEO for last 5 years and haven't experienced such problem yet so don't worry about it. So my opinion is you can do it by adding the !# no harm !!
I posted this one a couple of months ago on the Mathematica newsgroup, but got no usable response. I thought I'd give SO a try.
The question was: I don't seem to be able to find the method to generate the table of contents of a
Mathematica document I'm working on. Anyone knows this feauture's
hideout?
David Annetts pointed me in the direction of the AuthorTools, an old v5.1
utility package that's still hidden in Mathematica. However, it
doesn't work on my document (v7). Any clue?
Edit
The TOC should contain correct section numbers (if present in the stylesheet) and list page numbers (this requires taking page size settings into account).
Perhaps looking at the code of Yuri Kandrashkin's package, Sidebar, will be useful?