Strip characters from cURL command output - bash

I am looking to take the download progress of a cURL command by taking only the first few characters of its progress bar output. Normally, I would use ${string:position:length}, but that doesn't seem to work in this situation.
Here's what I'm working with:
curl -O https://file.download.link/ > output.txt 2>&1
As you can see, I'm redirecting the output of the cURL command to the file output.txt, but let's say I want to only store the first three characters. Using what I just suggested returns a 'bad substitution' error:
echo ${curl -O https://file.download.link/:0:3} > output.txt 2>&1
so I'm out of my depth here.
If you'd like some more context, I was hoping to then change the command to output to a named pipe, so that it would change the progress of a CocoaDialog progress bar. I'm basically giving a GUI representation of the cURL download progress bar.
I would really appreciate any help or advice you could offer, so thank you in advance.
... and my apologies if this is a 'bad' question. I'm fairly new to bash, and scripting in general for that matter.

Here are two methods to get the first three characters of every line/update that curl produces. Note that, after curl prints its header and first output line, each subsequent line/update of output is preceded not by a newline character but by a carriage return, \r. On a terminal, this gives the output its nice update-in-place look. In our case, we have to add, as shown below, a little bit of special handling to interpret the \r correctly.
Using tr and grep
curl -O https://file.download.link/ 2>&1 | tr '\r' '\n' | grep -o '^...' >output.txt
Using awk
curl -O https://file.download.link/ 2>&1 | awk -v RS='\r' '{print substr($0,1,3)}' >output.txt
Sample Output
$ curl -O http://www.google.com/index.html 2>&1 | awk -v RS='\r' '{print substr($0,1,3)}'
%
0
100

Related

pipe a command not printing newline (but using \r)

I want to pipe the output of a program that doesn't print newline, as it uses carriage return to replace it's line with new content.
this code represents the behavior of the program I'd like to retreive the output.
#!/usr/bin/env bash
for i in {1..100};do
echo -ne "[ $i% ] long unneeded log\r"
sleep 0.3
done
i'd like , in a bash script, to cut this output live to display only the important info,
but as the program doesn't prints newline a ./program | awk ... shows the output only when the command is ended.
I cannot modify the program that gives this output I'm trying to trim.
(I don't have it's source + I want to share my own script with other users)
I know my request is pretty specific, but is there a way to pipe the output character by character instead that by line?
you may try
./program | tr '\r' '\n'
you may continue piping with a third program that would process line per line.
I found it thanks to a mix of #OznOg answer and #Walter-A link.
indeed replacing carriage returns with newline with tr works,
but it is buffered by default, stdbuf can unbuffer it with stdbuf -o0.
so the final command is:
./program | stdbuf -o0 tr '\r' '\n' | awk -F'[][]' '{printf $2 "\r"}'
this indeed prints live the first match between brackets, with a carriage return.
so a live log from the program showing on a single updated line [x%] long compile detail would be abbreviated just to x%, still using 1 line.
Change your echo command from:
echo -ne
To:
echo -e
From the echo docs:
ā€˜-nā€™
Do not output the trailing newline.

bash script grep using variable fails to find result that actually does exist

I have a bash script that iterates over a list of links, curl's down an html page per link, greps for a particular string format (syntax is: CVE-####-####), removes the surrounding html tags (this is a consistent format, no special case handling necessary), searches a changelog file for the resulting string ID, and finally does stuff based on whether the string ID was found or not.
The found string ID is set as a variable. The issue is that when grepping for the variable there are no results, even though I positively know there should be for some of the ID's. Here is the relevant portion of the script:
for link in $(cat links.txt); do
curl -s "$link" | grep 'CVE-' | sed 's/<[^>]*>//g' | while read cve; do
echo "$cve"
grep "$cve" ./changelog.txt
done
done
If I hardcode a known ID in the grep command, the script finds the ID and returns things as expected. I've tried many variations of grepping on this variable (e.g. exporting it and doing command expansion, cat'ing the changelog and piping to grep, setting variable directly via command expansion of the curl chain, single and double quotes surrounding variables, half a dozen other things).
Am I missing something nuanced with the outputted variable from the curl | grep | sed chain? When it is echo'd to stdout or >> to a file, things look fine (a single ID with no odd characters or carriage returns etc.).
Any hints or alternate solutions would be much appreciated. Thanks!
FYI:
OSX:$bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
Edit:
The html file that I was curl'ing was chock full of carriage returns. Running the script with set -x was helpful because it revealed the true string being grepped: $'CVE-2011-2716\r'.
+ read -r link
+ curl -s http://localhost:8080/link1.html
+ sed -n '/CVE-/s/<[^>]*>//gp'
+ read -r cve
+ grep -q -F $'CVE-2011-2716\r' ./kernelChangelog.txt
Also investigating from another angle, opening the curled file in vim showed ^M and doing a printf %s "$cve" | xxd also showed the carriage return hex code 0d appended to the grep'd variable. Relying on 'echo' stdout was a wrong way of diagnosing things. Writing a simple html page with a valid CVE-####-####, but then adding a carriage return (in vim insert mode just type ctrl-v ctrl-m to insert the carriage return) will create a sample file that fails with the original script snippet above.
This is pretty standard string sanitization stuff that I should have figured out. The solution is to remove carriage returns, piping to tr -d '\r' is one method of doing that. I'm not sure there is a specific duplicate on SO for this series of steps, but in any case here is my now working script:
while read -r link; do
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read -r cve; do
if grep -q -F "$cve" ./changelog.txt; then
echo "FOUND: $cve";
else
echo "NOT FOUND: $cve";
fi;
done
done < links.txt
HTML files can contain carriage returns at the ends of lines, you need to filter those out.
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read cve; do
Notice that there's no need to use grep, you can use a regular expression filter in the sed command. (You can also use the tr command in sed to remove characters, but doing this for \r is cumbersome, so I piped to tr instead).
It should look like this:
# First: Care about quoting your variables!
# Use read to read the file line by line
while read -r link ; do
# No grep required. sed can do that.
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | while read -r cve; do
echo "$cve"
# grep -F searches for fixed strings instead of patterns
grep -F "$cve" ./changelog.txt
done
done < links.txt

Filtering output from wget using sed

How would I go about taking the output from a post call made using wget and filtering out everything but a string I want using sed. In other words, let's say I have some wget call that returns (amongst part of some string) :
'userPreferences':'some stuff' }
How would I get the string "some stuff" such that the command would look something like:
sed whatever-command-here | wget my-post-parameters some-URL
Also is that the proper way to chain the two as one line?
You want the output of wget to go to sed, so the order would be wget foo | sed bar
wget -q -O - someurl | sed ...
The -q flag will silence most of wget's output and -O - will write to standard output, so you can then pipe everything to sed.
The pipe works the other way around. They chain the left command's output to the right command's input:
wget ... | sed -n "/'userPreferences':/{s/[^:]*://;s/}$//p}" # keeps quotes
The filtering might be easier to express with GNU grep though:
wget ... | grep -oP "(?<='userPreferences':').*(?=' })" # strips the quotes, too
If you are on a system that supports named pipes (FIFOs) or the /dev/fd method of naming open files, you could avoid a pipe and use < <(...)
sed whatever-command-here < <(wget my-post-parameters some-URL)

grepping string from long text

The command below in OSX checks whether an account is disabled (or not).
I'd like to grep the string "isDisabled=X" to create a report of disabled users, but am not sure how to do this since the output is on three lines, and I'm interested in the first 12 characters of line three:
bash-3.2# pwpolicy -u jdoe -getpolicy
Getting policy for jdoe /LDAPv3/127.0.0.1
isDisabled=0 isAdminUser=1 newPasswordRequired=0 usingHistory=0 canModifyPasswordforSelf=1 usingExpirationDate=0 usingHardExpirationDate=0 requiresAlpha=0 requiresNumeric=0 expirationDateGMT=12/31/69 hardExpireDateGMT=12/31/69 maxMinutesUntilChangePassword=0 maxMinutesUntilDisabled=0 maxMinutesOfNonUse=0 maxFailedLoginAttempts=0 minChars=0 maxChars=0 passwordCannotBeName=0 validAfter=01/01/70 requiresMixedCase=0 requiresSymbol=0 notGuessablePattern=0 isSessionKeyAgent=0 isComputerAccount=0 adminClass=0 adminNoChangePasswords=0 adminNoSetPolicies=0 adminNoCreate=0 adminNoDelete=0 adminNoClearState=0 adminNoPromoteAdmins=0
Your ideas/suggestions are most appreciated! Ultimately this will be part of a Bash script. Thanks.
This is how you would use grep to match "isDisabled=X":
grep -o "isDisabled=."
Explanation:
grep: invoke the grep command
-o: Use the --only-matching option for grep (From grep manual: "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line."
"isDisabled=.": This is the search pattern you give to grep. The . is part of the regular expression, it means "match any character except for newline".
Usage:
This is how you would use it as part of your script:
pwpolicy -u jdoe -getpolicy | grep -oE "isDisabled=."
This is how you can save the result to a variable:
status=$(pwpolicy -u jdoe -getpolicy | grep -oE "isDisabled=.")
If your command was run some time prior, and the results from the command was saved to a file called "results.txt", you use it as input to grep as follows:
grep -o "isDisabled=." results.txt
You can use sed as
cat results.txt | sed -n 's/.*isDisabled=\(.\).*/\1/p'
This will print the value of isDisbaled.

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources