Pull info from website

Pull info from website - bash

I'm looking to pull the timer from this site: http://invasiontimer.com/
But it looks like the timer isn't in html, so the normal curl or wget isn't getting it for me.
Is there any way to get this in a bash script and print it to a text file.
Thanks.

I think what you want is the content loaded by javascript. Check out this answer for more details: How to get webcontent that is loaded by JavaScript using cURL?

Related

curl 1020 error when trying to scrape page using bash script

I'm trying to write a bash script to access a journal overview page on SSRN.
I'm trying to use curl for this, which works for me on other webpages, but it returns error code: 1020 for me if I try to run the following codes:
curl https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1925128
I thought it might have to do with the question mark in the URL, but I got it to work with other pages that contained question marks.
It probably has something to do with what the page's allows to do. However, I can also access the page using R's rvest package, so I think it should work in general also using bash.

Looks like the site has blocked access via curl. Change the user agent and it should work fine i.e.
curl --user-agent 'Chrome/79' "https://papers.ssrn.com/sol3/papersstract_id=1925128"

ajax request and robots.txt

A website has a URL http://example.com/wp-admin/admin-ajax.php?action=FUNCTIOn_NAME. When I click the URL, it executes the ajax function.
When I put the URL in the address bar, it gives a redirect error because the URL doesn't actually take you anywhere, but it definitely still executes the ajax function.
When I use the command line bash call: firefox -new-window http://example.com/wp-admin/admin-ajax.php?action=FUNCTIOn_NAME, it opens a empty page except for the line "Bad user...". After some digging I found that the robots.txt file has "Disalow: /wp-admin/". I am assuming this is why it isn't working in the command line. I have used wget -e robots=off URL before, but there isn't anything to download so it doesn't apply here.
What type of URL is this? (I believe it's dynamic or formula, but not sure)
I want to get the same results with the command line as when I plug the URL into the address bar. Ideas?

It's nothing special it just display a that html no matter what. HTTP servers don't have use files. It could be written in c++, java, python or nodejs(probably not).

how curl retrieves a url with # and ! symbols in it?

I was considering using curl to retrieve a page from a url(http://bbs.byr.cn/#!board/JobInfo?p=3) but ended up getting a notice from bash like
$ curl bbs.byr.cn/#!article/JobInfo/102321
bash: !article/JobInfo/102321: event not found
this url is accessible in my browser window, how can I write a curl command line that works on this url?

In general this is not possible that stuff after the hashtag (#) is just handled by JavaScript on the client side. Curl cannot execute JavaScript. You can put that URL in quotes to get the static part of the page, but this is however surly not that what you want.
If you observe the traffic of that page in Firebug you will see that the url http://bbs.byr.cn/board/JobInfo?p=3 will be downloaded. This file you can download to get your results.

Wget download after POST, make it wait?

I am working on bash script in which I use Wget to supply POST data, and Wget is supposed to make POST request on specific page, and that page is supposed to return file for download.The problem is that, after making request, that page returns file after few seconds, not immediately so Wget only downloads html page, and don't wait for that file to be returned.Is there any option to make this work - make post request and wait a few seconds for a file to be returned from remote server ?

If your only problem is that you need more time you can use the sleep command.
You can get more information about it here: http://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/timedate.html
Hope that helped!

Magento - blank lines being added to wsdl file

I am trying to call the API but I keep getting a soap error that can't load the file. I found that the reason is there are about 3 blank lines at the top of the XML file that is returned. I found this by doing wget url.
This use to work just fine, when I debug through the API controller the response or xml looks fine all the way through, I don't see any spaces at all. I have no idea what might be causing this. I don't think there is anything we modified that would do this.
UPDATE:: I have found that it appears to be because of an observer class I made for the controller_action_predispatch event. It appears I have some spaces above the

I'm not completely sure of what I am talking about as far as I've never used the API, but you should try to look at the end of the file which generates your XML for a closing '?>'. If there is this closing PHP marker, remove it and try again you API call...
In Magento, PHP file should never be ended by a closing php marker '?>'.
edit: as said in comments, also look at spaces before the opening tag '<?php'.

First thing need to check is api.php file in root folder, most of api blank space issues will come if any empty lines added in api.php file before starting

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Pull info from website - bash

I'm looking to pull the timer from this site: http://invasiontimer.com/ But it looks like the timer isn't in html, so the normal curl or wget isn't getting it for me. Is there any way to get this in a bash script and print it to a text file. Thanks.

I think what you want is the content loaded by javascript. Check out this answer for more details: How to get webcontent that is loaded by JavaScript using cURL?

Related

curl 1020 error when trying to scrape page using bash script

ajax request and robots.txt

how curl retrieves a url with # and ! symbols in it?

Wget download after POST, make it wait?

Magento - blank lines being added to wsdl file

Categories

Resources