Strange characters appearing in bash variable expansion - bash

Trying to do the following on contos7 works as I expect:
pod_in_question=$(curl -u uname:password -k very.cluster.com/api/v1/namespaces/default/pods/ | grep -i '"name": "myapp-' | cut -d '"' -f 4)
echo "$pod_in_question"
curl -u uname:password -k -X DELETE "very.cluster.com/api/v1/namespaces/default/pods/${pod_in_question}"
However, trying the same thing on MacOS (10.12.1) yields:
curl: (3) [globbing] bad range in column 92
When I try to curl the last line with a -g option it substitutes with a malformed name such as: myapp-\\x1b[m\\x1b[Kl1eti\
The echo statement would always execute just fine and show something like myapp-v7454 which I later want to put into the last curl statement. So where are these other characters coming from?

A robust solution - Basic cURL CLI debugging.
This answer is revised after it's been identified that the issue for the OP relates to curl applying color output.
There's a proposed answer which explains clearly what the embedded special characters meant, and instructions to override the grep behaviour to not output color. Certainly this is a good practise for grep use in piping. There are however a number of best practises that can help diagnose this or a similar issue with cURL and ultimately lead to the most robust solution.
Re-creating the problem
Assuming it's a JSON Content-Type, we use echo {'"name": "myapp-7414"'} to simulate the output from cURL
We filter the text and set a variable with it that we use in a cURL command
We force grep to output color, since it doesn't normally by default when outputting to a tty.
Recreation:
myvar=$(echo {'"name": "myapp-7414"'} | grep --color=always -i '"name": "myapp-' | cut -d '"' -f 4)
curl "https://www.google.com/${myvar}"
Output:
curl: (3) [globbing] bad range in column 32
First up:
'{}' are special characters to cURL, period.
The best practise for URL syntax in cURL:
If Variable Expansion is required:
Apply the -g switch to disable potential globbing done by cURL
Otherwise:
Use $variable as part of a "quoted" url string, instead of ${variable}
Second: In addition to -g, we add --libcurl /tmp/libcurl so we can get some insight into what cURL is seeing.
   Recreation with -g and --libcurl:
curl -g --libcurl /tmp/libcurl "https://www.google.com/${myvar}"
Output:
<p>Your client has issued a malformed or illegal request <ins>That’s all we know.
Perfect, at least now everything is getting to the server and back! Let's see what cURL sent out to the server:
cat /tmp/libcurl
Surely enough we find this line: (note the bold part).
curl_easy_setopt(hnd, CURLOPT_URL, "https://www.google.com/myapp-\033[m7414");
So we know that:
The shell is doing something strange with our variable.
cURL knows not to try glob once we send the -g switch. That way - If there is an error with the shell variable, we can actually see what it is. We shouldn't be debugging a globbing error if we're not trying to use URL Ranges.
The special characters are colors. They represent the --color=always that we added to simulate the OPs environment.
At this point. Since it looks like we're working with JSON data, why not just use a widely available, high performance JSON parsing tool. That has a number of benefits, including:
Not relying on any environment that could affect string filtering
Can request the data we want (aka. "name")
The app name "myapp" can change and we won't have to re-write the code to retrieve it.
It's cleaner and accounts for things I haven't considered yet.
If we used jq for example (while we're at it, we don't need the -g switch because we don't need '{}' for the variable because we're already double " the URL):
myvar=$(echo {'"name": "myapp-7414"'} | jq -r .name)
curl --libcurl /tmp/libcurl "https://www.google.com/$myvar"
Now we get:
<p>The requested URL /myapp-7414 was not found on this server. That’s all we know.
Great. It's all working now. It should be obvious that the test URL here being www.google.com is obviously not going to know was myapp-7414 was.
So we've gone from :
Globbing bad range, to:
Malformed URL, to:
URL not found on server.
We could also as suggested elsewhere change the grep output and override it to --color=never (As I have noted: If grep has to be used, the --color=never is a great way to use it as a best practise when piping strings, period.). However, given the portability issues already experienced because of string filtering, and the fact that we are already handed structured data on a plate that can be parsed reliably, the more robust solution would be to do just that, if possible.

The substitution you showed at the last part looks like one of your calls injected ANSI escape sequences. It's possible that grep isn't detecting non-TTY output and is colorizing.
On a terminal that supports ANSI escape sequences, your particular codes might not be visible. The codes ^E[m^E[K set the screen mode and clear the current line. That's why you thought the echo command proved your data was correct.
You can examine the raw data with:
echo "$pod_in_question" | hexdump -C
And you should see there are other characters in there which did not appear in your terminal before. When you put these "invisible" codes into the URL, curl tries to encode them and then fails when it encounters a control character (ESC).
The solution is to add the argument --color=never to your grep call, which will disable colorization.

Related

Sending script and file content via STDIN

I generate (dynamically) a script concatenating the following files:
testscript1
echo Writing File
cat > /tmp/test_file <<EOF
testcontent
line1
second line
testscript2
EOF
echo File is written
And I execute by calling
$ cat testscript1 testcontent testscript2 | ssh remote_host bash -s --
The effect is that the file /tmp/test_file is filled with the desired content.
Is there also a variant thinkable where binary files can be supplied in a similar fashion? Instead of cat of course dd could be used or other Tools, but the problem I see is 'telling' them that the STDIN now ended (can I send ^D through that stream?)
I am not able to get my head around that problem, but there is likely no comparable solution. However, I might be wrong, so I'd be happy to hear from you.
Regards,
Mazze
can I send ^D through that stream
Yes but you don't want to.
Control+D, commonly notated ^D, is just a character -- or to be pedantic (as I often am), a codepoint in the usual character code (ASCII or a superset like UTF-8) that we treat as a character. You can send that character/byte by a number of methods, most simply printf '\004', but the receiving system won't treat it as end-of-file; it will instead be stored in the destination file, just like any other data byte, followed by the subsequent data that you meant to be a new command and file etc.
^D only causes end-of-file when input from a terminal (more exactly, a 'tty' device) -- and then only in 'cooked' mode (which is why programs like vi and less can do things very different from ending a file when you type ^D). The form of ssh you used doesn't make the input a 'tty' device. ssh can make the input (and output) a 'tty' (more exactly a subclass of 'tty' called a pseudo-tty or 'pty', but that doesn't matter here) if you add the -t option (in some situations you may need to repeat it as -t -t or -tt). But then if your binary file contains any byte with the value \004 -- or several other special values -- which is quite possible, then your data will be corrupted and garbage commands executed (sometimes), which definitely won't do what you want and may damage your system.
The traditional approach to what you are trying to do, back in the 1980s and 1990s, was 'shar' (shell archive) and the usual solution to handling binary data was 'uuencode', which converts binary data into only printable characters that could go safely go through a link like this, matched by 'uudecode' which converts it back. See this surviving example from GNU. uuencode and uudecode themselves were part of a communication protocol 'uucp' used mostly for email and Usenet, which are (all) mostly obsolete and forgotten.
However, nearly all systems today contain a 'base64' program which provides equivalent (though not identical) functionality. Within a single system you can do:
base64 <infile | base64 -d >outfile
to get the same effect as cp infile outfile. In your case you can do something like:
{ echo "base64 -d <<END# >outfile"; base64 <infile; echo "END#"; otherstuff; } | ssh remote bash
You can also try:
cat testscript1 testcontent testscript2 | base64 | ssh <options> "base64 --decode | bash"
Don't worry about ^D, because when your input is exhausted, the next processes of the pipeline will notice that they have reached the end of the input file.

Unable to use Select query while using curl

For my interest. I am trying to use the below curl query:
Delphi$ curl -o /dev/null -s -w %{time_total}
http://localhost:8080/getquery?db=EmpDt&col=HRDt&query=select * from
emp where id=1111
but unable to execute it:
[1] 2784
[2] 2785
0.003799invalid option or syntax: 10
[1]- Done curl -o /dev/null -s -w %{time_total}
http://localhost:8080/getquery?db=EmpDt
[2]+ Done col=HRDt
Something is not correct here but not able to get what? Any help would be really helpful. Thanks
in shell unquoted & terminates command and runs the command to its left in the background; thus your post contains three separate commands run concurrently. Either quote & individually with backslash as \& or surround at least the &s and usually the whole string with either singlequotes'http://host/q?x&y&z' or doublequotes "http://host/q?x&y&z"
? and * are also special in shell, although not command terminators, and in general must also be quoted, although in your case after fixing the spaces (below) this becomes less critical
URL cannot contain space; it must be encoded as + (preferred) or %20. Other special characters (here * and =) may not work depending on how your server handles URL parsing, which in turn depends on what your server is and you didn't give any hint; in that case they too must be percent-encoded. (If you want actual +, which you don't, it is encoded as %2B.)

how to edit url string with sed

My Linux repository file contain a link that until now was using http with a port number to point to it repository.
baseurl=http://host.domain.com:123/folder1/folder2
I now need a way to replace that URL to use https with no port or a different port .
I need also the possibility to change the server name for example from host.domain.com to host2.domain.com
So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.
Im now using this code (im using echo just for the example):
the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around
and both code i was using the same sed for generic reasons.
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
OR
WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
is that the correct way doing so?
sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:
echo 'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'
result:
https://host.domain.com/folder1/folder2
(BTW you don't have to escape slashes because you are using an alternate separating character)
the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).
You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).
Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.
Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:
$ echo 'http://host.domain.com:123/folder1/folder2' |
sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2
$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2
If that isn't what you need then edit your question to clarify your requirements and in particular explain why:
You want to use hard-coded URLs, and
You need 1 script to do both transformations.
and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).
wrt what you had:
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
The main issues are:
Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Github API /issues - pagination trouble

I am using curl from a bash command line to GET Github issues like this:
curl -o myoutput --user "myuser:mypasswd" -G https://api.github.com/issues?filter=all
This is working fine and returns 52 open issues.
I know there are more issues, so I am also examining the headers (using -i) which provides links to the next & last pages, https://api.github.com/issues?filter=all&page=2 & https://api.github.com/issues?filter=all&page=14 respectively
However, using curl with these link URI's produces the same 52 results as before. In fact any page number I try returns the same most recent issues. I am deleting myoutput each time.
What am I missing?
Any words of wisdom on this would be much appreciated.
Thanks
What am I missing?
Use a single quoted string for the URL to make sure the ampersand (e.g &page=2) is not interpreted as a control operator:
curl -o myoutput2 --user "user:pwd" \
'https://api.github.com/issues?filter=all&page=2'
Without doing so you systematically perform a https://api.github.com/issues?filter=all request, which is why the output is always the same.

curl to compile a list of redirected pages

Suppose I have a bash script that goes through a file that contains a list of old URLs that have all been redirected.
curl --location http://destination.com will process a page by following a redirect. However, I'm interested not in the content, but on where the redirect points so that I can update my records.
What is the command-line option for curl to output what that new location for the URL is?
You wou want to leave out the --location/-L flag, and use -w, checking the redirect_url variable. curl -w "%{redirect_url}" http://someurl.com should do it.
Used in a script:
REDIRECT=`curl -w "%{redirect_url}" http://someurl.com`
echo "http://someurl.com redirects to: ${REDIRECT}"
From the curl man page:
-w, --write-out <format>
Make curl display information on stdout after a completed transfer. The
format is a string that may contain plain text mixed with any number
of variables. The format can be specified as a literal "string", or
you can have curl read the format from a file with "#filename" and to
tell curl to read the format from stdin you write "#-".
The variables present in the output format will be substituted by the
value or text that curl thinks fit, as described below. All variables
are specified as %{variable_name} and to output a normal % you just
write them as %%. You can output a newline by using \n, a carriage
return with \r and a tab space with \t.
NOTE: The %-symbol is a special symbol in the win32-environment, where
all occurrences of % must be doubled when using this option.
The variables available are:
...
redirect_url When an HTTP request was made without -L to follow
redirects, this variable will show the actual URL a redirect would
take you to. (Added in 7.18.2)
...
This might work (as a starting point)
curl -sI google.com | head -1 | grep 301 | wc -l
man curl
then
search redirect_url
redirect_url When a HTTP request was made without -L to follow
redirects, this variable will show the actual URL a redirect would
take you to. (Added
in 7.18.2)
the variable above is for -w/--write-out <format>

Resources