Github API /issues - pagination trouble - bash

I am using curl from a bash command line to GET Github issues like this:
curl -o myoutput --user "myuser:mypasswd" -G https://api.github.com/issues?filter=all
This is working fine and returns 52 open issues.
I know there are more issues, so I am also examining the headers (using -i) which provides links to the next & last pages, https://api.github.com/issues?filter=all&page=2 & https://api.github.com/issues?filter=all&page=14 respectively
However, using curl with these link URI's produces the same 52 results as before. In fact any page number I try returns the same most recent issues. I am deleting myoutput each time.
What am I missing?
Any words of wisdom on this would be much appreciated.
Thanks

What am I missing?
Use a single quoted string for the URL to make sure the ampersand (e.g &page=2) is not interpreted as a control operator:
curl -o myoutput2 --user "user:pwd" \
'https://api.github.com/issues?filter=all&page=2'
Without doing so you systematically perform a https://api.github.com/issues?filter=all request, which is why the output is always the same.

Related

Bash script execute wget with an input inside url

Complete newbie, I know you probably can't use the variables like that in there but I have 20 minutes to deliver this so HELP
read -r -p "Month?: " month
read -r -p "Year?: " year
URL= "https://gz.blockchair.com/ethereum/blocks/"
wget -w 2 --limit-rate=20k "${URL}blockchair_ethereum_blocks_$year$month*.tsv.gz"
exit
There are two issues with your code.
First, you should remove the whitespace that follows the equal symbol when you declare your URL variable. So the line becomes
URL="https://gz.blockchair.com/ethereum/blocks/"
Then, you are building your URL using a wildcard, which is not allowed in this case. So you cannot do something like month*.tsv.gz as you are doing right know. If you need to perform requests to several URLs, you need to run wget for each one of them.
It's possible to do what you're trying to do with wget, however, this particular site's robots.txt has a rule to disallow crawling of all files (https://gz.blockchair.com/robots.txt):
User-agent: *
Disallow: /
That means the site's admins don't want you to do this. wget respects robots.txt by default, but it's possible to turn that off with -e robots=off.
For this reason, I won't post a specific, copy/pasteable solution.
Here is a generic example for selecting (and downloading) files using a glob pattern, from a typical html index page:
url=https://www.example.com/path/to/index
wget \
--wait 2 --random-wait --limit=20k \
--recursive --no-parent --level 1 \
--no-directories \
-A "file[0-9][0-9]" \
"$url"
This would download all files named file, with a two digit suffix (file52 etc), that are linked on the page at $url, and whose parent path is also $url (--no-parent).
This is a recursive download, recursing one level of links (--level 1). wget allows us to use patterns to accept or reject filenames when recursing (-A and -R for globs, also --accept-regex, --reject-regex).
Certain sites block may block the wget user agent string, it can be spoofed with --user-agent.
Note that certain sites may ban your IP (and/or add it to a blacklist) for scraping, especially doing it repeatedly, or not respecting robots.txt.
In case of downloading blocks for every day in a month, you may just change in original script the * symbol to a argument, let's say day and previously assign a variable days to a list of days.
Then iterate like for day in days… and do your wget stuff.

How to execute cURL requesting which requires to upload file in Golang?

I have a cURL request as follows.
$(curl --request PUT --upload-file "<path to catalog file on your local machine>" "<presigned URL>")
Let's say that I have to upload a bin/test.txt file with the presigned URL being https://www.someurl.com
I execute the command in my terminal
curl --request PUT --upload-file "bin/test.txt" "https://www.someurl.com" and it works fine.
How do I write a piece of Golang which does the same? I have tried
cmd := exec.Command("curl", "--request", "PUT", "--upload-file", fmt.Sprintf("\"%s\"", catalogPath), fmt.Sprintf("\"%s\"", presignedURL))
err = cmd.Run()
but found no success.
I see one obvious problem preventing that curl call from working properly, one quite possible and another one also possible.
The obvious problem is that string quoting — such as executing curl … --upload-file "bin/test.txt" … — in interpreted by the shell which is to execute the command. Quoting — using either double or single quotes — is used to inhibit interpreting of otherwise special characters by the shell; chiefly it's used to prevent the shell from splitting a string into separate "words" on whitespace characters or their series.
The key takeaway is that the command run by the shell after it's fully parsed the command to be executed (and interpreted the quotes) does not "see" these quotes because they are removed by the shell.
os/exec.Cmd calls the specified program directly and does not "pass it through" the shell. Hence if you include double quotes into the command-line parameters of the program to execute, they are passed to that program, unchanged. This means, curl were to try to find the file named test.txt" located in the directory named "bin — which is most probably not what you expected.
The same applied to the URL.
The second — possible — problem is that your call relies on the current directory of your Go program because you pass a relative path to curl.
This might or might not be a problem but you might check this anyway.
The third problem is that you might want to pass your URL through the "percent escaping" algorithm before passing it to curl.
You might look at PathEscape and QueryEscape functions of the net/url package.
Two pieces of advice follow.
First, I would try very hard not to call out to curl to perform such a ridiculously simple task. Go has excellent support for making HTTP requests (and serving them, FWIW) in its standard library, and PUTting a file is really a no-brainer with a solutions googleable in, like, five minutes.
Second, if, for some reason, you intend to stick with calling curl, consider passing it some options to make it fail loudly on errors — otherwise you're doomed to be in that „but found no success” situation in your attempts. For instance, consider passing curl the -s and -S command line options (together).
that's not how you quote shell arguments, that would break if your argument starts with or ends with \ or ", the proper way to quote shell arguments on unix would be
func quoteshellarg(str string) string {
if strings.Contains(str, "\x00") {
panic("argument contains null bytes, it is impossible to escape null bytes in shell arguments!")
}
return "'" + strings.ReplaceAll(str, "'", "'\\''") + "'"
}
and with that, just
cmd := exec.Command("curl --request PUT --upload-file " + quoteshellarg(catalogPath)+ " " + quoteshellarg(presignedURL));
... at least that's how to do it on unix systems. as for how to do it on Windows, it seems nobody knows for sure, not even Microsoft

Strange characters appearing in bash variable expansion

Trying to do the following on contos7 works as I expect:
pod_in_question=$(curl -u uname:password -k very.cluster.com/api/v1/namespaces/default/pods/ | grep -i '"name": "myapp-' | cut -d '"' -f 4)
echo "$pod_in_question"
curl -u uname:password -k -X DELETE "very.cluster.com/api/v1/namespaces/default/pods/${pod_in_question}"
However, trying the same thing on MacOS (10.12.1) yields:
curl: (3) [globbing] bad range in column 92
When I try to curl the last line with a -g option it substitutes with a malformed name such as: myapp-\\x1b[m\\x1b[Kl1eti\
The echo statement would always execute just fine and show something like myapp-v7454 which I later want to put into the last curl statement. So where are these other characters coming from?
A robust solution - Basic cURL CLI debugging.
This answer is revised after it's been identified that the issue for the OP relates to curl applying color output.
There's a proposed answer which explains clearly what the embedded special characters meant, and instructions to override the grep behaviour to not output color. Certainly this is a good practise for grep use in piping. There are however a number of best practises that can help diagnose this or a similar issue with cURL and ultimately lead to the most robust solution.
Re-creating the problem
Assuming it's a JSON Content-Type, we use echo {'"name": "myapp-7414"'} to simulate the output from cURL
We filter the text and set a variable with it that we use in a cURL command
We force grep to output color, since it doesn't normally by default when outputting to a tty.
Recreation:
myvar=$(echo {'"name": "myapp-7414"'} | grep --color=always -i '"name": "myapp-' | cut -d '"' -f 4)
curl "https://www.google.com/${myvar}"
Output:
curl: (3) [globbing] bad range in column 32
First up:
'{}' are special characters to cURL, period.
The best practise for URL syntax in cURL:
If Variable Expansion is required:
Apply the -g switch to disable potential globbing done by cURL
Otherwise:
Use $variable as part of a "quoted" url string, instead of ${variable}
Second: In addition to -g, we add --libcurl /tmp/libcurl so we can get some insight into what cURL is seeing.
   Recreation with -g and --libcurl:
curl -g --libcurl /tmp/libcurl "https://www.google.com/${myvar}"
Output:
<p>Your client has issued a malformed or illegal request <ins>That’s all we know.
Perfect, at least now everything is getting to the server and back! Let's see what cURL sent out to the server:
cat /tmp/libcurl
Surely enough we find this line: (note the bold part).
curl_easy_setopt(hnd, CURLOPT_URL, "https://www.google.com/myapp-\033[m7414");
So we know that:
The shell is doing something strange with our variable.
cURL knows not to try glob once we send the -g switch. That way - If there is an error with the shell variable, we can actually see what it is. We shouldn't be debugging a globbing error if we're not trying to use URL Ranges.
The special characters are colors. They represent the --color=always that we added to simulate the OPs environment.
At this point. Since it looks like we're working with JSON data, why not just use a widely available, high performance JSON parsing tool. That has a number of benefits, including:
Not relying on any environment that could affect string filtering
Can request the data we want (aka. "name")
The app name "myapp" can change and we won't have to re-write the code to retrieve it.
It's cleaner and accounts for things I haven't considered yet.
If we used jq for example (while we're at it, we don't need the -g switch because we don't need '{}' for the variable because we're already double " the URL):
myvar=$(echo {'"name": "myapp-7414"'} | jq -r .name)
curl --libcurl /tmp/libcurl "https://www.google.com/$myvar"
Now we get:
<p>The requested URL /myapp-7414 was not found on this server. That’s all we know.
Great. It's all working now. It should be obvious that the test URL here being www.google.com is obviously not going to know was myapp-7414 was.
So we've gone from :
Globbing bad range, to:
Malformed URL, to:
URL not found on server.
We could also as suggested elsewhere change the grep output and override it to --color=never (As I have noted: If grep has to be used, the --color=never is a great way to use it as a best practise when piping strings, period.). However, given the portability issues already experienced because of string filtering, and the fact that we are already handed structured data on a plate that can be parsed reliably, the more robust solution would be to do just that, if possible.
The substitution you showed at the last part looks like one of your calls injected ANSI escape sequences. It's possible that grep isn't detecting non-TTY output and is colorizing.
On a terminal that supports ANSI escape sequences, your particular codes might not be visible. The codes ^E[m^E[K set the screen mode and clear the current line. That's why you thought the echo command proved your data was correct.
You can examine the raw data with:
echo "$pod_in_question" | hexdump -C
And you should see there are other characters in there which did not appear in your terminal before. When you put these "invisible" codes into the URL, curl tries to encode them and then fails when it encounters a control character (ESC).
The solution is to add the argument --color=never to your grep call, which will disable colorization.

How to urlencode data into a URL, with bash or curl

How can a string be urlencoded and embedded into the URL? Please note that I am not trying to GET or POST data, so the -G and --data and --data-urlencode options of curl don't seem to do the job.
For example, if you used
curl -G http://example.com/foo --data-urlencode "bar=spaced data"
that would be functionally equivalent to
curl http://example.com/foo?bar=spaced%20data"
which is not desired.
I have a string foo/bar which must be urlencoded foo%2fbar and embedded into the URL.
curl http://example.com/api/projects/foo%2fbar/events
One hypothetical solution (if I could find something like this) would be to preprocess the data in bash, if there exists some kind of urlencode function.
DATA=foo/bar
ENCODED=`urlencode $DATA`
curl http://example.com/api/projects/${ENCODED}/events
Another hypothetical solution (if I could find something like this) would be some switch in curl, similar to this:
curl http://example.com/api/projects/{0}/events --string-urlencode "0=foo/bar"
The specific reason I'm looking for an answer to this question is the Gitlab API. For example, gitlab get single project NAMESPACE/PROJECT_NAME is URL-encoded, eg. /api/v3/projects/diaspora%2Fdiaspora (where / is represented by %2F). Further to this, you can request individual properties in the project, so you end up with a URL such as http://example.com/projects/diaspora%2Fdiaspora/events
Although this question is gitlab-specific, I imagine it's generally applicable to REST API's in general, and I'm surprised I can't find a pre-existing answer on stackoverflow or internet search.
The urlencode function you propose is easy enough to implement:
urlencode() {
python -c 'import urllib, sys; print urllib.quote(sys.argv[1], sys.argv[2])' \
"$1" "$urlencode_safe"
}
...used as:
data=foo/bar
encoded=$(urlencode "$data")
curl "http://example.com/api/projects/${encoded}/events"
If you want to have some characters which are passed through literally -- in many use cases, this is desired for /s -- instead use:
encoded=$(urlencode_safe='/' urlencode "$data")

Curl and wget: why isn't the GET parameter used?

I am trying to fetch data from this page using wget and curl in PHP. As you can see by using your browser, the default result is 20 items but by setting the GET parameter iip to number x, I can fetch x items, i.e. http:// www.example.com/ foo ?a=26033&b=100
The problem is that the iip parameter only works in browsers. If I try to fetch the last link using wget or curl, only 20 items are returned. Why? Try this at the command-line:
curl -O http://www.example.com/foo?a=26033&iip=b
wget http://www.example.com/foo?a=26033&iip=b
Why can't I use the GET parameter iip?
Try adding quotes:
curl -O 'http://www.objektvision.se/annonsorer?ai=26033&iip=100'
wget 'http://www.objektvision.se/annonsorer?ai=26033&iip=100'
The & has special functionality on the command line which is likely causing the issues.
Try quoting the argument. At least in cmd, & is used to delimit two commands that are run individually.
You'll have to enclose your URL in either " or ', since the & has a special meaning in shellscript... That'll give you:
curl -O "http://www.objektvision.se/annonsorer?ai=26033&iip=100"
wget "http://www.objektvision.se/annonsorer?ai=26033&iip=100"
& is a reserved word in shell. Just escape it like this \&

Resources