cURL Scraping Wrong Webpage - macos

I am attempting to scrape a webpage that requires a login using curl in the Mac Terminal but can't seem to get it right. I have a cookies.txt file with my login info that I am reading into the command, but I can't get it to scrape the intended page. When I run
curl -b /Users/dwm8/Desktop/cookies.txt -o /Users/dwm8/Desktop/file.txt https://kenpom.com/team.php?team=Duke&y=2002
the contents of file.txt are the webpage data from https://kenpom.com/team.php?team=Duke instead of https://kenpom.com/team.php?team=Duke&y=2002. Is there a fix for this? Thanks for the help.

& is a shell metacharacter that separates commands and indicates the command before it should be run in the background. So, your command:
curl ... https://kenpom.com/team.php?team=Duke&y=2002
gets parsed as two separate commands:
curl ... https://kenpom.com/team.php?team=Duke & # The & means run curl in the background
y=2002 # This just sets a shell variable
In order to get the shell to treat & as part of the argument to curl rather than a command separator, you need to quote it (either single- or double-quotes would work) or escape it with a backslash:
curl ... 'https://kenpom.com/team.php?team=Duke&y=2002'
curl ... "https://kenpom.com/team.php?team=Duke&y=2002"
curl ... https://kenpom.com/team.php\?team=Duke\&y=2002
Oh, and notice that I also escaped the ? in that last example? That's because ? is also a shell metacharacter (specifically, a wildcard). In this case it probably wouldn't cause any trouble, but it's safest to quote or escape it just in case. And since it's hard to keep track of exactly which characters can cause trouble, I'll recommend quoting instead of escaping, and just quoting everything that you're at all unsure about.

You need to wrap url part in quotes.

Related

sed with regular expression as a bash variable

We have an application that keeps some info in an encrypted file. To edit the file we have to put the text editor name in an environment variable in bash, for example, EDITOR=vi. Then we run the application and it opens the decrypted file in vi. I am trying to come up with a bash script that updates the encrypted file. The only solution that I can think of is passing sed command instead of vi to the EDITOR variable. It works perfectly for something like EDITOR='sed -i s#aaaa#bbbb#'.
Problem starts when I need space and regular expression. For example: EDITOR='sed -i -r "s#^(\s*masterkey: )(.*)#\1xxxxx#"' which return error. I tried running the EDITOR in bash with $EDITOR test.txt and I can see the problem. It doesn't like double quotes and space between them so I added a backslash before the double quotes and \s instead of space. Now it says unterminated address regex. For several hours I googled and couldn't find any solution. I tried replacing single quotes with double quotes and vice versa and everything that I could find on the internet but no luck.
How can I escape and which characters should I escape here?
Update:
Maybe if I explain the whole situation somebody could suggest an alternative solution. There is an application written by Ruby and it is inside a container. Ruby application has a secret_key_base for production and we supposed to change the key with EDITOR=vi rails credentials:edit --environment=production. I don't know Ruby and google did not return any ruby solution for automation so I could only think about sending sed instead of vi to Ruby.
How can I escape and which characters should I escape here?
That is not possible. Word splitting on the result of expansion cannot be escaped from inside the result of that expansion, it will always run. Note that filename expansion is also running over the result of the expansion.
Create an executable file with the script content and set EDITOR to it.
You could export a bash shell function, after some tries I got to:
myeditor() {
sed -i -E 's#^(\s*masterkey: )(.*)#\1xxxxx#' "$#"
}
export -f myeditor
EDITOR='bash -c "$#" _ myeditor'

How to read arguments with ampersand in bash [duplicate]

I'm building a shell script (trying to be POSIX compliant) and I'm stuck in an issue.
The script is supposed to receive an URL and do some things with it's content.
myscript www.pudim.com.br/?&args=ok
The thing is, the ampersand symbol is interpreted as a command additive, and giving to my script only the www.pudim.com.br/? part as an argument.
I know that the right workaround would be to surround the URL with quotes but, because I need to use this script several times in a row, I wanted to paste the URL's without having to embrace it with quotes every time.
Is there some way to get the full URL argument, somehow bypassing the ampersand?
Quotes for full URL
Wrapping the URL in quotes will be your only chance. See popular shell utility curl, as it states for its core argument URL:
When using [] or {} sequences when invoked from a command line prompt,
you probably have to put the full URL within double quotes to avoid
the shell from interfering with it. This also goes for other
characters treated special, like for example '&', '?' and '*'.
See also this question and that.
Extra argument(s) for specifying query parameters
You can also pass query parameter (key-value pair) as separate argument. So you can bypass & as separator. See curl's -F option:
-F, --form <name=content>
Read URL from STDIN
If your script allows user interaction you could read the unescaped URL (including metachars as &) from an uninterpreted input-source. See this tutorial.
You can escape just the ampersand; quotes effectively escape every character between them.
myscript www.pudim.com.br/\?\&args=ok # The ? should be escaped as well
There is no solution that lets you avoid all quoting, as & is a shell metacharacter whose unquoted meaning cannot be disabled. The & terminates the preceding command, causing it to be run in a background process; adding some redundant whitespace, you attempt is the same as
myscript www.pudim.com.br/? &
args=ok
Unescaped, the ? will cause the URL to be treated as a pattern to expand. However, it's unlikely the pattern will match any existing file, and bash's default behavior is to treat an unmatched pattern literally. (The failglob option will treat it as an error, and the nullglob option will make the URL disappear completely from the command line, but neither option is enabled by default.)

spaces,',`,/,\,<,>,?,&,| are filtered how to bypass them with Bash commands

i have PHP code use some bash codes which the PHP code can run it, and its have a bug to make RCE in bash,
the command would be "$(id)" command executed as well
but if i execute any other command like "ls -la" its have a space
the space replaced automatically with "-"
i checked the source as well and i found the following chars spaces,',`,/,\,<,>,?,&,| are filtered
how to bypass them and execute command like "wget link" and run it perfect
****UPDATE****
the following code i add as a live example.
send the command in sendcmd function
`https://pastebin.com/raw/1MfR6aic`
This is (example) output from id
uid=1000(ibug) gid=1000(ibug)
Since these characters aren't filtered, you can get an unfiltered space like this:
ID=$(id)
echo${ID:14:1}foo
Now you have space. You can get virtually any character with echo -e and then eval an expression.
I tried your PHP code and found this working:
sendcmd("http://52.27.167.139", "{echo,hello}");
Just wrap then in braces and use commas. The shell will expand the brace to
echo hello

Escaping the output of date in a BASH script

I'm working on what should be a very simple BASH script. What I want to do is a pull an image from a webcamera using curl and write it to a file whose name is datestamped.
#! /bin/bash
DATE=$(date +%Y-%m-%d_%H-%M)
DIRECTORY1=home/manager/security_images/Studio_1/
TARGET1=${DIRECTORY1}${DATE}.jpg
curl http://web#192.168.180.211/snapshot.cgi > $TARGET1
When I try to run this I am told that there is no such file or directory. I believe this is due to an error in my escaping but I have tried seemingly every combination of quotation marks around the variables at each stage and still can't get it to work. I just don't understand what is going wrong and could really use some pointers towards what I'm doing wrong.
Many thanks
No, it’s just a typo.
DIRECTORY1=home/manager/security_images/Studio_1/
^^
Should be
DIRECTORY1=/home/manager/security_images/Studio_1/
of course.
As for escaping, even though only safe characters are used now, so quotes are technically superfluous, quoting out all $variables in every line by default is a good habit in shell scripting, there are very few cases when you do not want to use them.
Double quoting the redirection target should be enough:
curl http://web#192.168.180.211/snapshot.cgi > "$TARGET1"
Just make sure the path to it exists. You can run your script with
set -xv
to see how variables are interpolated.

wget errors breaks shell script - how to prevent that?

I have a huge file with lots of links to files of various types to download. Each line is one download command like:
wget 'URL1'
wget 'URL2'
...
and there are thousands of those.
Unfortunately some URLs look really ugly, like for example:
http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc
It opens OK in a browser, but confuses wget.
I'm getting an error:
./tasks001.sh: line 35: syntax error near unexpected token `1'
./tasks001.sh: line 35: `wget 'http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc''
I've tried both URL and 'URL' ways of specifying what to download.
Is there a way to make a script like that running unattended?
I'm OK if it'll just skip the file it couldn't download.
Do not (ab)use the shell.
Save your URLs to some file (let's say my_urls.lst) and do:
wget -i my_urls.lst
Wget will handle quoting etc on it's own
I think you need to used double-quotes (") and not single quotes (') around the URL.
If that still doesn't work, try escaping the paren characters ( and ) with a backslash: \( and \)
Which shell are you using? Bash? zsh?
This doesn't exactly answer your question but:
Both of the following commands work directly in a bash shell:
wget "http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc"
and
wget 'http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc'
Can you check to see if either of those work for you?
What seems to be happening is that your shell is doing something with the ( characters. I would try using double quotes " instead of single quotes ' around your URL.
If you wish to suppress errors you can use a >/dev/null under unix to redirect standard output or 2> /dev/null to redirect standard error. Under other operating systems it may be something else.

Resources