I am trying to craft a script to perform curl requests on webservers and parse out the "server" and "Location." This way I can easily import it into my excel tables without having to reformat.
My current script:
curl -sD - -o /dev/null -A "Mozilla/4.0" http://site/ | sed -e '/Server/p' -e '/Location/!d' | paste - -
Expected/Desired output:
Server: Apache Location: http://www.site
Current output:
From curl:
HTTP/1.1 301 Moved permanently
Date: Sun, 16 Nov 2014 20:14:01 GMT
Server: Apache
Set-Cookie: USERNAME=;path=/
Set-Cookie: CFID=16581239;path=/
Set-Cookie: CFTOKEN=32126621;path=/
Location: http://www.site
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Piped into 'sed':
Server: Apache
Location: http://www.site
Piped into 'paste':
Server: Location: http://www.site
Why does paste immediately 'paste' after the first space? How do I get it to format correctly? I'm open to other methods, but keep in mind, the responses from the 'curl' request will be different lengths.
Thanks
Output of "curl" contains "return" i.e. \r character(s) which will cause that behaviour.
curl -sD - -o /dev/null -A "Mozilla/4.0" http://site/ | tr -d '\r'| sed -e '/Server/p' -e '/Location/!d' | paste - -
tr -d '\r' filters out all carriage return characters.
About line ends
While Linux/Unix uses "LF" (Line Feed, \n) line ends many other systems use "CR LF" (Carriage Return Line Feed \r\n) line ends. That can cause weard looking results unless you are prepared for it. Let's see some examples without \r and the same with \r.
Concatenation of strings:
a=$(echo -e "Please notice don't delete your files in /<config_dir> ")
b=$(echo -e "without hesitation ")
echo "$a""$b"
Result:
Please notice don't delete your files in /<config_dir> without hesitation
We get somewhat different result if lines end with CR LF:
a=$(echo -e "Please notice don't delete your files in /<config_dir> \r")
b=$(echo -e "without hesitation \r")
echo "$a""$b"
Result:
without hesitation delete your files in /<config_dir>
What might happen with programs which modify text only if matching string is at line end ?
Let's remove "ny" if it appears at line end:
echo "Stackoverflow is funny" | sed 's/ny$//g'
Result:
Stackoverflow is fun
The same wirh CR LF ending line:
echo -e "Stackoverflow is funny\r" | sed 's/ny$//g'
Result:
Stackoverflow is funny
sed works as designed because the line does not end to "ny" but "ny CR".
The teaching of all this is to be prepared for unexpected input data. In most cases it may be a good idea to filter out \r from data copletely since it's seldom needed for anything useful in BASH script. Filtering out unwanted character(s) is simple with "tr":
tr -d '\r'
Related
I have a small script that downloads a value from a web page.
Before anyone looses their mind because I am not using an HTML parser, besides the headers, the whole web page only has 3 lines of text between one pair of pre tags. I am just after the number values - that is it.
</head><body><pre>
sym
---
12300
</pre></body></html>
This is the script :
#!/bin/bash
wget -O foocounthtml.txt "http://foopage"
tr -d "\n" foocounthtml.txt > foocountnonewlines.txt
Anyhow the tr command is throwing an error.
tr: extra operand ‘foocounthtml.txt’
Only one string may be given when deleting without squeezing repeats.
Try 'tr --help' for more information.
Yes, I could use sed for inplace modification with the -i tag. However I am perplexed by this tr error. Redirecting tr output works fine from command line, but not in a script.
The 'tr' command operates on SETs of text rather than files. From the man page:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard output.
...
SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:
So tr is expecting the actual content you want to operate on rather than the target file. You can simply pipe the files contents to tr for the resuts you want
cat foocounthtml.txt | tr -d "\n" > foocountnonewlines.txt
or as #CHarlesDUffy points out, it would be faster to read directly from the file:
tr -d "\n" < foocounthtml.txt > foocountnonewlines.txt
I have some text in one file which I want to be copied to another file, using shell script.
This is the script -
#!/bin/sh
PROPERTY_FILE=/path/keyValuePairs.properties
function getValue {
FIELD_KEY=$1
FIELD_VALUE=`cat $PROPERTY_FILE | grep "$FIELD_KEY" | cut --complement -d'=' -f1`
}
SERVER_FILE=/path/FileToReplace.yaml
getValue "xyz.abc"
sed -i -e "s|PASSWORD|$FIELD_VALUE|g" $SERVER_FILE
keyValuePairs.properties:
xyz.abc=abs
FileToReplace.yaml:
someField:
address: "someValue"
password: PASSWORD
The goal of the script is to fetch "abs" from keyValuePairs.properties and replace it in FileToReplace.yaml from PASSWORD field.
The FileToReplace.yaml should look like
someField:
address: "someValue"
password: abs
Note - Instead of "abs", there could be '=' in the text. It should work fine too.
The current situation is that when I run the script, it updates FileToReplace.yaml as
someField:
address: "someValue"
password:
It is setting the value as empty.
Can someone please help me figure what's wrong with this script?
Note - Whenever I execute the script, I get the issue -
sh scriptToRun.sh
cut: illegal option -- -
usage: cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-s] [-d delim] [file ...]
If I use gcut, the code just works fine, but I can't use gcut (requirement issues). I need to fix this using cut.
There are a few issues with your script:
FIELD_VALUE is local to the getValue() function.
getValue() will match rows containing FIELD_KEY anywhere in the line (e.g. some.property=string.containing.xyz.abc)
getValue() could return multiple rows.
All occurrences of the string "PASSWORD" in the server file will be updated, not just the ones on the "password: PASSWORD" line.
If you can use bash instead of sh, this should resolve all of the issues:
#!/bin/bash
declare property_file=/path/keyValuePairs.properties
declare server_file=/path/FileToReplace.yaml
declare property="xyz.abc"
property_line=$(grep -m 1 "^${property}=" ${property_file}" )
sed -i 's|^\(\s*password:\s*\)PASSWORD\s$|\1'${property_line##*=}'|g' ${server_file}
The original code which I posted, worked. I was using the wrong name of the file in the shell (in my real code) which was causing it to not read the value and hence setting it to empty.
Replace the cut command with:
cut -d'=' -f2-
and it should work on all versions of cut.
-f2- means field 2 and all later. This is necessary to handle values containing '='s.
And yes, some characters will cause problems for the sed command. It's hard to get a robust solution without getting into trouble here. A python script may be the better choice.
If shell script is the only option, you could try something like this:
(sed -n -e '1,/PASSWORD/p' FileToReplace.yaml | head -n -1;
echo " password: ${FIELD_VALUE}";
sed -n -e '/PASSWORD/,$ p' FileToReplace.yaml) > FileToReplace.yaml.new \
&& mv FileToReplace.yaml.new FileToReplace.yaml
but it gets quite ugly. (print the file up to the line containing "PASSWORD", then echo the full password line, then print the rest of the file)
You can also use something like this:
cat << EOF > FileToCreate.yaml
someField:
address: "someValue"
password: ${FIELD_VALUE}
if keeping the old contents of the file is not important.
Goal: using an input file with a list of file names, get the first 5 lines of each file and output to another file. Basically, I'm trying to find out what each program does by reading the header.
Shell: Ksh
Input: myfile.txt
tmp/file1.txt
tmp/file2.txt
Output:
tmp/file1.txt - "Creates web login screen"
tmp/file2.txt - "Updates user login"
I can use "head -5" but not sure how to get the input from the file. I'm assuming I could redirect (>> output.txt)the output for my output file.
Input file names use a relative path.
Update: I created a script below but I'm getting "syntax error: unexpected end of file". The script was created with VI.
#! /bin/sh
cat $HOME/jmarti20.list | while read line
do
#echo $line" >> jmarti20.txt
head -n 5 /los_prod/$line >> $HOME/jmarti20.txt
done
Right, you can append output with >> to a file.
head -n 5 file1.txt >> file_descriptions.txt
You can also use sed to print lines, from documentation at pinfo sed.
sed 5q file1.txt >> file_descriptions.txt
Personal preference is to put file description in line 3, and only print line 3 of files.
sed -n 3p file1.txt >> file_descriptions.txt
The reasoning for using line 3 has to do with the first line often containing a "shebang" like #!/bin/bash, and the 2nd line having localization strings, such as # -*- coding: UTF-8 -*-, to allow proper display of extra character glyphs and languages in terminals and text editors that support them.
Below is what I came up with and seems to work fairly well:
#! /bin/sh
cat $HOME/jmarti20.list | while read line
do
temp=$line
temp2=$(head -n 5 /los_prod/$line)
echo "$temp" "$temp2" >> jmarti20.txt
#echo "$line" >> jmarti20.txt
#head -n 5 /los_prod/$line >> $HOME/jmarti20.txt
done
I need to get a nonce from a http service
I am using curl and later openssl to calculate the sha1 of that nonce.
but for that i need to get the nonce to a variable
1 step (done)
curl --user username:password -v -i -X POST http://192.168.0.202:8080/RPC3 -o output.txt -d #initial.txt
and now, the output file #output.txt holds the http reponse
HTTP/1.1 401 Unauthorized
Server: WinREST HTTP Server/1.0
Connection: Keep-Alive
Content-Length: 89
WWW-Authenticate: ServiceAuth realm="WinREST", nonce="/wcUEQOqUEoS64zKDHEUgg=="
<html><head><title>Unauthorized</title></head><body>Error 401: Unauthorized</body></html>
I have to get the position of "nonce=" and extract all the way to the " char.
How can I get in bash, the value of nonce ??
Regards
Pretty simple with grep using the -o/--only-matching and -P/--perl-regexp options (available in GNU grep):
$ grep -oP 'nonce="\K[^"]+' output.txt
/wcUEQOqUEoS64zKDHEUgg==
The -o option will print only matched part, which would normally include nonce=" if we had not used the reset match start escape sequence available in PCRE.
Additionally, if your output.txt (i.e. server response) can contain more than one nonce, and you are interested in only reading the first one, you can use the -m1 option (as Glenn suggests):
$ grep -oPm1 'nonce="\K[^"]+' output.txt
To store that nonce in a variable, simply use command substitution; or just pass it through openssl sha1 to get that digest you need:
$ nonce=$(grep -oPm1 'nonce="\K[^"]+' output.txt)
$ echo "$nonce"
/wcUEQOqUEoS64zKDHEUgg==
$ read hash _ <<<"$(grep -oPm1 'nonce="\K[^"]+' output.txt | openssl sha1 -r)"
$ echo "$hash"
2277ef32822c37b5c2b1018954f750163148edea
You can use GNU sed for this as below :
ubuntu$ cat output.txt
HTTP/1.1 401 Unauthorized
Server: WinREST HTTP Server/1.0
Connection: Keep-Alive
Content-Length: 89
WWW-Authenticate: ServiceAuth realm="WinREST", nonce="/wcUEQOqUEoS64zKDHEUgg=="
<html><head><title>Unauthorized</title></head><body>Error 401: Unauthorized</body></html>
ubuntu$ sed -E -n 's/(.*)(nonce="\/)([a-zA-Z0-9=]+)(")(.*)/\3/gp' output.txt
wcUEQOqUEoS64zKDHEUgg==
Regards!
For a course called 'Programming Techniques', I have to scan a file with lines having the following format:
[IP-Address] - - [[Date and time]] "GET [some URL]" [HTML reply code] [some non-interesting number]
An example:
129.232.223.206 - - [30/Apr/1998:22:00:02 +0000] "GET /images/home_intro.anim.gif HTTP/1.0" 200 60349
My task is to scan all lines and extract from it the HTTP reply code only if this code is not equal to 200.
We have to use the command line. The following almost works:
cat file.out | sed 's/^.*\"[[:space:]]//' | sed 's/[[:space:]].*//' | grep -v '200' | sort | uniq 1> result1.txt
First, read in the file, remove everything up until the second " and the space after it, remove everything from the first space to the end, remove lines with 200, sort the numbers, remove duplicates, and send the remaining numbers to a file.
This produces the following output:
-
206
26.146.85.150ÀüŒÛ/ HTTP/1.0" 404 305
302
304
400
404
500
As we can see, it almost works. There is one line causing trouble:
26.146.85.150 - - [01/May/1998:16:47:28 +0000] "GET /images/home_fr_phra><HR><H3>\C0\FC\BC\DB/ HTTP/1.0" 404 305
This line causes the weird third output-line. What is wrong with this line? The only thing I can think of is the part \C0\FC\BC\DB. Backslashes always seem to cause trouble. So, what part of my command conflicts with this line?
Also, I noticed that if I switched sort and uniq, the file does get sorted, but duplicates do not get removed. Why?
(By the way, I'm relatively new to using the command line for the purposes described above.)
So, this looks like encoding SNAFU. If I'm not mistaken, what's happening is:
You're using an UTF-8 locale,
The input file does not contain valid UTF-8,
sed attempts to read the file as UTF-8 because of the aforementioned locale, and
sed breaks because of this (in particular, . does not match the offending bytes).
The stuff with the backslashes denotes a series of four bytes by their hex values, that is C0 FC BC DB. This is not valid UTF-8-encoded data.1
Given an UTF-8 locale, (GNU) sed interprets input as UTF-8, and . matches a valid UTF-8 character. It does not match invalid byte sequences. You can see this by running
echo -e '\xc0\xfc\xbc\xdb' | sed 's/.//g'
in a UTF-8 locale and noticing that the output is not empty. I am inclined to agree that this behavior is a bit of a nuisance, but here we are.
Since you don't seem to rely on any Unicode features, the solution could be to run sed with a non-UTF-8 locale, such as C. In your case:
cat file.out | LC_ALL=C sed 's/^.*\"[[:space:]]//' \
| LC_ALL=C sed 's/[[:space:]].*//' \
| grep -v '200' \
| sort \
| uniq 1 \
> result1.txt
(line breaks added for readability). By the way, you could conflate the two sed commands to a single one as follows:
LC_ALL=C sed 's/^.*\"[[:space:]]//; s/[[:space:]].*//'
1 c0 would indicate a two-byte UTF-8 code whose uppermost five bits are zero, which already makes no sense since it could be encoded as plain ASCII, and fc does not begin with the 10 bits in the uppermost half-nibble that the UTF-8 encoding would require there. So, although I am unsure what exactly their encoding is, it is definitely not UTF-8.