Elegant way to replace tr '\n' '\0' (Null byte generating warnings at runtime) - bash

I strongly doubt about the grep best use in my code and would like to find a better and cleaner coding style for extracting the session ID and security level from my cookie file :
cat mycookie
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
#HttpOnly_127.0.0.1 FALSE / FALSE 0 PHPSESSID 1hjs18icittvqvpa4tm2lv9b12
#HttpOnly_127.0.0.1 FALSE /mydir/ FALSE 0 security medium
The expected output is the SSID hash :
1hjs18icittvqvpa4tm2lv9b12
Piping grep with tr '\n' '\0' works like a charm in the command line, but generates warnings (warning: command substitution: ignored null byte in input”) at the bash code execution. Here is the related code (with warnings):
ssid=$(grep -Po 'PHPSESSID.*' path/sessionFile | grep -Po '[a-z]|[0-9]' | tr '\n' '\0')
I am using bash 4.4.12 (x86_64-pc-linux-gnu) and could read here this crystal clear explanation :
Bash variables are stored as C strings. C strings are NUL-terminated.
They thus cannot store NULs by definition.
I could see here and there in both cases a coding solution using read:
# read content from stdin into array variable and a scalar variable "suffix"
array=( )
while IFS= read -r -d '' line; do
array+=( "$line" )
done < <(process that generates NUL stream here)
suffix=$line # content after last NUL, if any
# emit recorded content
printf '%s\0' "${array[#]}"; printf '%s' "$suffix"
I don't want to use arrays nor a while loop for this specific case, or others. I found this workaround using sed:
ssid=$(grep -Po 'PHPSESSID.*' path/sessionFile | grep -Po '[a-z]|[0-9]' | tr '\n' '_' | sed -e 's/_//g')
My two questions are :
1) Would it be a better way to substitute tr '\n' '\0', without using read into a while loop ?
2) Would it be a better way to extract properly the SSID and security level ?
Thx

It looks like you're trying to get rid of the newlines in the output from grep, but turning them into nulls doesn't do this. Nulls aren't visible in your terminal, but are still there and (like many other nonprinting characters) will wreak havoc if they get treated as part of your actual data. If you want to get rid of the newlines, just tell tr to delete them for you with ... | tr -d '\n'. But if you're trying to get the PHPSESSID value from a Netscape-format cookie file, there's a much much better way:
ssid=$(awk '($6 == "PHPSESSID") {print $7}' path/sessionFile)
This looks for "PHPSESSID" only in the sixth field (not in e.g. the path or cookie values -- both places it could legally appear), and specifically prints the seventh field of matching lines (not just anything after "PHPSESSID" that happens to be a digit or lowercase letter).

You could also try this, if you don't want to use awk:
ssid=$(grep -P '\bPHPSESSID\b' you_cookies_file)
echo $ssid # for debug only
which outputs something like
#HttpOnly_127.0.0.1 FALSE / FALSE 0 PHPSESSID 1hjs18icittvqvpa4tm2lv9b12
Then with cut(1) extract the relevant field:
echo $ssid |cut -d" " -f7
which outputs
1hjs18icittvqvpa4tm2lv9b12
Of course you should capture the last echo.
UPDATE
If you don't want to use cut, it is possible to emulate it with:
echo $ssid | (read a1 b2 c3 d4 e5 f6 g7; echo $g7)
Demonstration to capture in a variable:
$ field=$(echo $ssid | (read a1 b2 c3 d4 e5 f6 g7; echo $g7))
$ echo $field
1hjs18icittvqvpa4tm2lv9b12
$
Another way is to use positional parameters passing the string to a function which then refers to $7. Perhaps cleaner. Otherwise, you can use an array:
array=($(echo $ssid))
echo ${array[6]} # outputs the 7th field
It should also be possible to use regular expressions and/or string manipulation is bash, but they seem a little more difficult to me.

Related

Parsing CSV records when a value is multiline

Source file looks like this:
"google.com", "vuln_example1
vuln_example2
vuln_example3"
"facebook.com", "vuln_example2"
"reddit.com", "stupidly_long_vuln_name1"
"stackoverflow.com", ""
I've been trying to get the output to be something like this but the line breaks seem to cause me no end of problems. I'm using a "while read line" job to do this because I do some processing on the columns (e.g Vulnerability count and url in this example). This is output into a jenkins job (yuk).
The basic summary of the problem is getting the linebreaks in the csv to be output into the third column while retaining the table structure. I've got a sort of weird example of the desired output below.
||hostname ||Vulnerability count|| Vulnerability list || URL ||
|google.com |3 |vuln_example1 |http://cve.com/vuln_example1|
| | |vuln_example2 |http://cve.com/vuln_example2|
| | |vuln_example3 |http://cve.com/vuln_example3|
|facebook.com |1 |vuln_example2 |http://cve.com/vuln_example2|
|reddit.com |1 |stupidly_long_vuln_name1 |http://cve.com/stupidly_long_vuln_name1|
|stackoverflow.com |0 | ||
Looking at this... I've got a feeling it might be easier by showing some code and example output.
Parsing your input with the command line below makes the problem easier (I'm assuming the inputs are correct):
perl -0777 -pe 's/([^"])\s*\n/\1 /g ; s/[",]//g' < sample.txt
This line invokes Perl to perform two regex substitutions:
s/([^"])\s*\n/\1 /g: This substitution removes an end of line if it doesn't terminate by a quote " (i.e. if a host entry, with all vulnerabilities isn't yet complete).
s/[",]//g removes all quotes and commas remaining.
For each host entry like this one:
"google.com", "vuln_example1
vuln_example2
vuln_example3"
You'll get:
google.com vuln_example1 vuln_example2 vuln_example3
Then you can assume for each line, you have an host and a set of vulnerabilities.
The given example below stores vulnerabilities in an array and loop through it, formatting and printing each line:
# Replace this by your custom function
# to get an URL for a given vulnerability
function get_vuln_url () {
# This just displays a random url for an non-empty arg
[[ -z "$1" ]] || echo "http://host/$1.htm"
}
# Format your line (see printf help)
function print_row () {
printf "%-20s|%5s|%-30s|%s\n" "$#"
}
# The perl line reformat
perl -0777 -pe 's/([^"])\s*\n/\1 /g ; s/[",]//g' < sample.txt |
while read -r line ; do
arr=(${line})
print_row "${arr[0]}" "$((${#arr[#]} - 1))" "${arr[1]}" "$(get_vuln_url ${arr[1]})"
#echo -e "${arr[0]}\t|$vul_count\t|${arr[1]}\t|$(get_vuln_url ${arr[1]})"
for v in "${arr[#]:2}" ; do
print_row " " " " "$v" "$(get_vuln_url ${arr[1]})"
done
done
Output:
google.com | 3|vuln_example1 |http://host/vuln_example1.htm
| |vuln_example2 |http://host/vuln_example1.htm
| |vuln_example3 |http://host/vuln_example1.htm
facebook.com | 1|vuln_example2 |http://host/vuln_example2.htm
reddit.com | 1|stupidly_long_vuln_name1 |http://host/stupidly_long_vuln_name1.htm
stackoverflow.com | 0| |
Update.
If you don't have Perl, and if your file doesn't have tabulations, you can use this command as a workaround instead:
tr '\n' '\t' < sample.txt | sed -r -e 's/([^"])\s*\t/\1 /g' -e 's/[",]//g' -e 's/\t/\n/g'
tr '\n' '\t' replaces all ends of line by tabulations
sed part acts like Perl line, except it deals with tabulations instead of ends of line and restores tabulations back to ends of line.

Unix bash - using cut to regex lines in a file, match regex result with another similar line

I have a text file: file.txt, with several thousand lines. It contains a lot of junk lines which I am not interested in, so I use the cut command to regex for the lines I am interested in first. For each entry I am interested in, it will be listed twice in the text file: Once in a "definition" section, another in a "value" section. I want to retrieve the first value from the "definition" section, and then for each entry found there find it's corresponding "value" section entry.
The first entry starts with ' gl_ ', while the 2nd entry would look like ' "gl_ ', starting with a '"'.
This is the code I have so far for looping through the text document, which then retrieves the values I am interested in and appends them to a .csv file:
while read -r line
do
if [[ $line == gl_* ]] ; then (param=$(cut -d'\' -f 1 $line) | def=$(cut -d'\' -f 2 $line) | type=$(cut -d'\' -f 4 $line) | prompt=$(cut -d'\' -f 8 $line))
while read -r glline
do
if [[ $glline == '"'$param* ]] ; then val=$(cut -d'\' -f 3 $glline) |
"$project";"$param";"$val";"$def";"$type";"$prompt" >> /filepath/file.csv
done < file.txt
done < file.txt
This seems to throw some syntax errors related to unexpected tokens near the first 'done' statement.
Example of text that needs to be parsed, and paired:
gl_one\User Defined\1\String\1\\1\Some Text
gl_two\User Defined\1\String\1\\1\Some Text also
gl_three\User Defined\1\Time\1\\1\Datetime now
some\junk
"gl_one\1\Value1
some\junk
"gl_two\1\Value2
"gl_three\1\Value3
So effectively, the while loop reads each line until it hits the first line that starts with 'gl_', which then stores that value (ie. gl_one) as a variable 'param'.
It then starts the nested while loop that looks for the line that starts with a ' " ' in front of the gl_, and is equivalent to the 'param' value. In other words, the
script should couple the lines gl_one and "gl_one, gl_two and "gl_two, gl_three and "gl_three.
The text file is large, and these are settings that have been defined this way. I need to collect the values for each gl_ parameter, to save them together in a .csv file with their corresponding "gl_ values.
Wanted regex output stored in variables would be something like this:
first while loop:
$param = gl_one, $def = User Defined, $type = String, $prompt = Some Text
second while loop:
$val = Value1
Then it stores these variables to the file.csv, with semi-colon separators.
Currently, I have an error for the first 'done' statement, which seems to indicate an issue with the quotation marks. Apart from this,
I am looking for general ideas and comments to the script. I.e, not entirely sure I am looking for the quotation mark parameters "gl_ correctly, or if the
semi-colons as .csv separators are added correctly.
Edit: Overall, the script runs now, but extremely slow due to the inner while loop. Is there any faster way to match the two lines together and add them to the .csv file?
Any ideas and comments?
This will generate a file containing the data you want:
cat file.txt | grep gl_ | sed -E "s/\"//" | sort | sed '$!N;s/\n/\\/' | awk -F'\' '{print $1"; "$5"; "$7"; "$NF}' > /filepath/file.csv
It uses grep to extract all lines containing 'gl_'
then sed to remove the leading '"' from the lines that contain one [I have assumed there are no further '"' in the line]
The lines are sorted
sed removes the return from each pair of lines
awk then prints
the required columns according to your requirements
Output routed to the file.
LANG=C sort -t\\ -sd -k1,1 <file.txt |\
sed '
/^gl_/{ # if definition
N; # append next line to buffer
s/\n"gl_[^\\]*//; # if value, strip first column
t; # and start next loop
}
D; # otherwise, delete the line
' |\
awk -F\\ -v p="$project" -v OFS=\; '{print p,$1,$10,$2,$4,$8 }' \
>>/filepath/file.csv
sort lines so gl_... appears immediately before "gl_... (LANG fixes LC_TYPE) - assumes definition appears before value
sed to help ensure matching definition and value (may still fail if duplicate/missing value), and tidy for awk
awk to pull out relevant fields

The content of a variable created in bash contains some invisible symbols, and these are not a new line

These 2 variables will have the same visible content
x_sign1="aabbccdd_and_somthing_else"
var1="...."
[........]
x_sign2=$(echo -n "${var1}${var2}${var3}" | shasum -a 256)
echo $x_sign2
====>
aabbccdd_and_somthing_else -
Note the "-" in the end.
However, their lengths will be different. Even though the x_sign2 doesn't contain a new line symbol. To ensure this:
x_sign22=$(echo -n "${var1}${var2}${var3}" | shasum -a 256 | tr -d '\n')
But:
echo ${#x_sign1}
====> 64
And:
And:
echo ${#x_sign2}
====> 67
echo ${#x_sign22}
====> 67
The difference is 3 symbols. The visible content is identical.
Also, when I make a request via curl to a REST API which needs that value of a signature, x_sign1 always succeeds, whereas x_sign2 doesn't -- "wrong signature"
Why? How to fix that?
Nonprintable characters could be a lot of things. If you want to remove any "nonprintable" characters, then tell tr that's what you want removed.
POSIX tr has a number of named character classes. For all printable characters, use [:print:]. To remove everything NOT in that set, -delete the -complement of the set.
So, instead of
tr -d '\n'
use
tr -dc '[:print:]'
IF THE PROBLEM is just the ' - ' on the end...
x_sign22="${x_sign22% -*}"
e.g.:
$: x="aabbccdd_and_somthing_else - "
$: echo "[$x]"
[aabbccdd_and_somthing_else - ]
$: x="${x% -*}"
$: echo "[$x]"
[aabbccdd_and_somthing_else]

UNIX - Replacing variables in sql with matching values from .profile file

I am trying to write a shell which will take an SQL file as input. Example SQL file:
SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'
Now the script should extract all variables, which in this case everything starting with %%. So the output file will be something as below:
%%DB
%%TBLEXT
%%CITY
Now I should be able to extract the matching values from the user's .profile file for these variables and create the SQL file with the proper values.
SELECT *
FROM tempdb.TBL_abc
WHERE CITY = 'Chicago'
As of now I am trying to generate the file1 which will contain all the variables. Below code sample -
sed "s/[(),']//g" "T:/work/shell/sqlfile1.sql" | awk '/%%/{print $NF}' | awk '/%%/{print $NF}' > sqltemp2.sql
takes me till
%%DB.TBL_%%TBLEXT
%%CITY
Can someone help me in getting to file1 listing the variables?
You can use grep and sort to get a list of unique variables, as per the following transcript:
$ echo "SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'" | grep -o '%%[A-Za-z0-9_]*' | sort -u
%%CITY
%%DB
%%TBLEXT
The -o flag to grep instructs it to only print the matching parts of lines rather than the entire line, and also outputs each matching part on a distinct line. Then sort -u just makes sure there are no duplicates.
In terms of the full process, here's a slight modification to a bash script I've used for similar purposes:
# Define all translations.
declare -A xlat
xlat['%%DB']='tempdb'
xlat['%%TBLEXT']='abc'
xlat['%%CITY']='Chicago'
# Check all variables in input file.
okay=1
for key in $(grep -o '%%[A-Za-z0-9_]*' input.sql | sort -u) ; do
if [[ "${xlat[$key]}" == "" ]] ; then
echo "Bad key ($key) in file:"
grep -n "${key}" input.sql | sed 's/^/ /'
okay=0
fi
done
if [[ ${okay} -eq 0 ]] ; then
exit 1
fi
# Process input file doing substitutions. Fairly
# primitive use of sed, must change to use sed -i
# at some point.
# Note we sort keys based on descending length so we
# correctly handle extensions like "NAME" and "NAMESPACE",
# doing the longer ones first makes it work properly.
cp input.sql output.sql
for key in $( (
for key in ${!xlat[#]} ; do
echo ${key}
done
) | awk '{print length($0)":"$0}' | sort -rnu | cut -d':' -f2) ; do
sed "s/${key}/${xlat[$key]}/g" output.sql >output2.sql
mv output2.sql output.sql
done
cat output.sql
It first checks that the input file doesn't contain any keys not found in the translation array. Then it applies sed substitutions to the input file, one per translation, to ensure all keys are substituted with their respective values.
This should be a good start, though there may be some edge cases such as if your keys or values contain characters sed would consider important (like / for example). If that is the case, you'll probably need to escape them such as changing:
xlat['%%UNDEFINED']='0/0'
into:
xlat['%%UNDEFINED']='0\/0'

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources