need help in grep (BASH) - bash

I try to make a tool to get list of proxies and You have downloaded Index for one of the free proxy sites I use this :
wget http://free-proxy.cz/en/proxylist/country/all/https/ping/all
and outputs something like that :
<script type="text/javascript">document.write(Base64.decode("MTg1LjExNC4xMzcuMTQ="))</script></td><td style=""><span class="fport" style=''>12195</span></td><td><small>HTTPS</small></td><td class="left"><div style="padding-left:2px"><img src="/flags/blank.gif" class="flag flag-ua" alt="Ukraine" /> Ukraine</div></td><td class="small"><small></small></td><td class="small"><small></small></td><td class="small"><small>High anonymity</small></td><td> <i class="icon-black icon-question-sign"></i></td><td><small>2.4%</small><div class="progress"><div class="fill" style="width:4%;background-color:red;"></div></div></td><td><div style="padding-left:5px"><small>649 ms</small> <div class="progress"><div class="fill" style="width:94%;background-color:#A5DA74;;"></div></div></div></td><td><small>8 hours ago</small></td></tr><tr><td style="text-align:center" class="left"><script type="text/javascript">document.write(Base64.decode("MTYxLjk3LjEzOC4yMzg="))</script></td><td style=""><span class="fport" style=''>3128</span></td><td>
As you can see the IP is encrypted in base64 and port is normal
I try to grep base64 codes first and this is work ↓
echo (outputs) | grep -Eo '("[A-Za-z0-9]{12,30}[=]{0,2}")' | cut -d '"' -f2
and I try this code to get ports ↓
echo (output) | grep -Eo "(class=\"fport\" style=''>[0-9]{1,9})" | cut -d '>' -f2
how can I mix it and make it like that
(base64 code):(port)
and after that I wanna decrypt the base64 code and make it look like :
IP:PORT

1st step
base64 is not an encryption, but an encoding. If you are working on
Linux or other Unix-variants, the command base64, base64-encoder/decoder,
will be pre-installed. If not, it will be easily installed with your
OS-dependent package manager.
Then please try to execute:
base64 -d <<< "MTg1LjExNC4xMzcuMTQ="
It will output:
185.114.137.14
2nd step
Then we can combine the base64 decoder with your command pipeline.
The problem is base64-coding ignores newlines and we need to process
the result of the pipeline line by line. Assuming the variable $output
holds the output of the wget command, please try:
while IFS= read -r line; do
base64 -d <<< "$line"
echo
done < <(echo "$output" | grep -Eo '("[A-Za-z0-9]{12,30}[=]{0,2}")' | cut -d '"' -f2)
It will print something like:
185.114.137.14
161.97.138.238
The <(command) notation is a process substitution and the output of
echo .. grep .. cut pipeline is fed to the while loop via stdin
and the while loop processes the base64-encoded strings line by line.
3rd step
Now we want to merge the IPs and PORTs in a format of IP:PORT.
We can make use of paste command. The final script will be:
paste -d ":" <(
while IFS= read -r line; do
base64 -d <<< "$line"
echo
done < <(echo "$output" | grep -Eo '("[A-Za-z0-9]{12,30}[=]{0,2}")' | cut -d '"' -f2)
) \
<(echo "$output" | grep -Eo "(class=\"fport\" style=''>[0-9]{1,9})" | cut -d '>' -f2)
Output:
185.114.137.14:12195
161.97.138.238:3128
The paste command takes filenames as arguments. Here we make use
of process substitution again in a manner as: paste <(command1) <(command2)
which saves to create temporary files.

Related

GNU parallel with custom script doing string comparison

The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit

Echo the command result in a file.txt

I have a script such as :
cat list_id.txt | while read line; do for ACC in $line;
do
echo -n "$ACC\t"
curl -s "link=fasta&retmode=xml" |\
grep TSeq_taxid |\
cut -d '>' -f 2 |\
cut -d '<' -f 1 |\
tr -d "\n"
echo
sleep 0.25
done
done
This script allows me from a list of ID in list_id.txt to get the corresponding names in a database in https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ACC}&rettype=fasta&retmode=xml
So from this script I get something like
CAA42669\t9913
V00181\t7154
AH002406\t538120
And what I would like is directly to print or echo this result in fiel call new_ids.txt, I tried echo >> new_ids.txt but the file is empty.
Thanks for your help.
A minimal refactoring of your script might look like
# Avoid useless use of cat
# Use read -r
# Don't use upper case for private variables
while read -r line; do
for acc in $line; do
echo -n "$acc\t"
# No backslash necessary after | character
curl -s "link=fasta&retmode=xml" |
# Probably use a proper XML parser for this
grep TSeq_taxid |
cut -d '>' -f 2 |
cut -d '<' -f 1 |
tr -d "\n"
echo
sleep 0.25
done
done <list_id.txt >new_ids.txt
This could probably still be simplified significantly, but without knowledge of what your input file looks like exactly, or what curl returns, this is somewhat speculative.
tr -s ' \t\n' '\n' <list_id.txt |
while read -r acc; do
curl -s "link=fasta&retmode=xml" |
awk -v acc="$acc" '/TSeq_taxid/ {
split($0, a, /[<>]/); print acc "\t" a[3] }'
sleep 0.25
done <list_id.txt >new_ids.txt

Inline array substitution

I have file with a few lines:
x 1
y 2
z 3 t
I need to pass each line as paramater to some program:
$ program "x 1" "y 2" "z 3 t"
I know how to do it with two commands:
$ readarray -t a < file
$ program "${a[#]}"
How can i do it with one command? Something like that:
$ program ??? file ???
The (default) options of your readarray command indicate that your file items are separated by newlines.
So in order to achieve what you want in one command, you can take advantage of the special IFS variable to use word splitting w.r.t. newlines (see e.g. this doc) and call your program with a non-quoted command substitution:
IFS=$'\n'; program $(cat file)
As suggested by #CharlesDuffy:
you may want to disable globbing by running beforehand set -f, and if you want to keep these modifications local, you can enclose the whole in a subshell:
( set -f; IFS=$'\n'; program $(cat file) )
to avoid the performance penalty of the parens and of the /bin/cat process, you can write instead:
( set -f; IFS=$'\n'; exec program $(<file) )
where $(<file) is a Bash equivalent to to $(cat file) (faster as it doesn't require forking /bin/cat), and exec consumes the subshell created by the parens.
However, note that the exec trick won't work and should be removed if program is not a real program in the PATH (that is, you'll get exec: program: not found if program is just a function defined in your script).
Passing a set of params should be more organized :
In this example case I'm looking for a file containing chk_disk_issue=something etc.. so I set the values by reading a config file which I pass in as a param.
# -- read specific variables from the config file (if found) --
if [ -f "${file}" ] ;then
while IFS= read -r line ;do
if ! [[ $line = *"#"* ]]; then
var="$(echo $line | cut -d'=' -f1)"
case "$var" in
chk_disk_issue)
chk_disk_issue="$(echo $line | tr -d '[:space:]' | cut -d'=' -f2 | sed 's/[^0-9]*//g')"
;;
chk_mem_issue)
chk_mem_issue="$(echo $line | tr -d '[:space:]' | cut -d'=' -f2 | sed 's/[^0-9]*//g')"
;;
chk_cpu_issue)
chk_cpu_issue="$(echo $line | tr -d '[:space:]' | cut -d'=' -f2 | sed 's/[^0-9]*//g')"
;;
esac
fi
done < "${file}"
fi
if these are not params then find a way for your script to read them as data inside of the script and pass in the file name.

Weird bash results using cut

I am trying to run this command:
./smstocurl SLASH2.911325850268888.911325850268896
smstocurl script:
#SLASH2.911325850268888.911325850268896
model=$(echo \&model=$1 | cut -d'.' -f 1)
echo $model
imea1=$(echo \&simImea1=$1 | cut -d'.' -f 2)
echo $imea1
imea2=$(echo \&simImea2=$1 | cut -d'.' -f 3)
echo $imea2
echo $model$imea1$imea2
Result Received
&model=SLASH2911325850268888911325850268896
Result Expected
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896
What am I missing here ?
You are cutting based on the dot .. In the first case your desired string contains the first string, the one containing &model, so then it is printed.
However, in the other cases you get the 2nd and 3rd blocks (-f2, -f3), so that the imea text gets cutted off.
Instead, I would use something like this:
while IFS="." read -r model imea1 imea2
do
printf "&model=%s&simImea1=%s&simImea2=%s\n" $model $imea1 $imea2
done <<< "$1"
Note the usage of printf and variables to have more control about what we are writing. Using a lot of escapes like in your echos can be risky.
Test
while IFS="." read -r model imea1 imea2; do printf "&model=%s&simImea1=%s&simImea2=%s\n" $model $imea1 $imea2
done <<< "SLASH2.911325850268888.911325850268896"
Returns:
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896
Alternatively, this sed makes it:
sed -r 's/^([^.]*)\.([^.]*)\.([^.]*)$/\&model=\1\&simImea1=\2\&simImea2=\3/' <<< "$1"
by catching each block of words separated by dots and printing back.
You can also use this way
Run:
./program SLASH2.911325850268888.911325850268896
Script:
#!/bin/bash
String=`echo $1 | sed "s/\./\&simImea1=/"`
String=`echo $String | sed "s/\./\&simImea2=/"`
echo "&model=$String
Output:
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896
awk way
awk -F. '{print "&model="$1"&simImea1="$2"&simImea2="$3}' <<< "SLASH2.911325850268888.911325850268896"
or
awk -F. '$0="&model="$1"&simImea1="$2"&simImea2="$3' <<< "SLASH2.911325850268888.911325850268896"
output
&model=SLASH2&simImea1=911325850268888&simImea2=911325850268896

hash each line in text file

I'm trying to write a little script which will open a text file and give me an md5 hash for each line of text. For example I have a file with:
123
213
312
I want output to be:
ba1f2511fc30423bdbb183fe33f3dd0f
6f36dfd82a1b64f668d9957ad81199ff
390d29f732f024a4ebd58645781dfa5a
I'm trying to do this part in bash which will read each line:
#!/bin/bash
#read.file.line.by.line.sh
while read line
do
echo $line
done
later on I do:
$ more 123.txt | ./read.line.by.line.sh | md5sum | cut -d ' ' -f 1
but I'm missing something here, does not work :(
Maybe there is an easier way...
Almost there, try this:
while read -r line; do printf %s "$line" | md5sum | cut -f1 -d' '; done < 123.txt
Unless you also want to hash the newline character in every line you should use printf or echo -n instead of echo option.
In a script:
#! /bin/bash
cat "$#" | while read -r line; do
printf %s "$line" | md5sum | cut -f1 -d' '
done
The script can be called with multiple files as parameters.
You can just call md5sum directly in the script:
#!/bin/bash
#read.file.line.by.line.sh
while read line
do
echo $line | md5sum | awk '{print $1}'
done
That way the script spits out directly what you want: the md5 hash of each line.
this worked for me..
cat $file | while read line; do printf %s "$line" | tr -d '\r\n' | md5 >> hashes.csv; done

Resources