Handling wildcard expansion from a string array in a bash shell scripting - bash

Following is a sample script that I have written
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
echo "$line" <-- "$line" prints the text correctly
result_array=( `echo "$line"| sed 's/:/\n/1' | sed 's/:/\n/1'`)
echo "${result_array[0]}"
echo "${result_array[1]}"
echo "${result_array[2]}" <-- prints the first filename in the directory due to wildcard character * .
How to get the text "* http://abcd.com/index.do " printed instead of the filename when retrieved from an array?

Assuming bash is the right tool, there are a few ways:
disable filename expansion temporarily
use read with IFS
use the substitution feature of bash expansion
Disabling expansion:
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
set -f
OIFS=$IFS
IFS=$'\n'
result_array=( `echo "$line"| sed 's/:/\n/1' | sed 's/:/\n/1'`)
IFS=$OIFS
set +f
echo "${result_array[0]}"
echo "${result_array[1]}"
echo "${result_array[2]}"
(note we also had to set IFS, otherwise each part of the contents ends up in result_array[2], [3], [4], etc.)
Using read:
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
echo "$line"
IFS=: read file number match <<<"$line"
echo "$file"
echo "$number"
echo "$match"
Using bash parameter expansion/substitution:
line="/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>"
rest="$line"
file=${rest%%:*}
[ "$file" = "$line" ] && echo "Error"
rest=${line#$file:}
number=${rest%%:*}
[ "$number" = "$rest" ] && echo "Error"
rest=${rest#$number:}
match=$rest
echo "$file"
echo "$number"
echo "$match"

How about:
$ line='/path/IntegrationFilter.java:150: * <td>http://abcd.com/index.do</td>'
$ echo "$line" | cut -d: -f3-
* <td>http://abcd.com/index.do</td>

Related

base64 decode while ignoring brackets

I'm trying to decode a file, which is mostly encoded with base64. What I want to do is to decode the following, while still maintaining the [_*_].
example.txt
wq9cXyjjg4QpXy/Crwo=
[_NOTBASE64ED_]
aGkgdGhlcmUK
[_CONSTANT_]
SGVsbG8gV29ybGQhCg==
Sometimes it'll be in this form
aGkgdGhlcmUK[_CONSTANT_]SGVsbG8gV29ybGQhCg==
Desired output
¯\_(ツ)_/¯
[_NOTBASE64ED_]
hi there
[_CONSTANT_]
Hello World!
hi there[_CONSTANT_]Hello World!
Error output
¯\_(ツ)_/¯
4��!:�#�H\�B�8ԓ��[��ܛBbase64: invalid input
What I've tried
base64 -di example.txt
base64 -d example.txt
base64 --wrap=0 -d -i example.txt
I tried to individually base64 the [_*_] using grep -o. Then find and
replacing them through a weird arrangement with arrays, but I couldn't
get it to work.
base64ing it all, then decoding. Results in double base64ed rows.
The file is significantly downsized!
Encoded using base64 --wrap=0, while loop, and if/else statement.
The [_*_] still need to be there after being decoded.
I am sure someone has a more clever solution than this. But try this
#! /bin/bash
MYTMP1=""
function printInlineB64()
{
local lines=($(echo $1 | sed -e 's/\[/\n[/g' -e 's/\]/]\n/g'))
OUTPUT=""
for line in "${lines[#]}"; do
MYTMP1=$(base64 -d <<< "$line" 2>/dev/null)
if [ "$?" != "0" ]; then
OUTPUT="${OUTPUT}${line}"
else
OUTPUT="${OUTPUT}${MYTMP1}"
fi;
done
echo "$OUTPUT"
}
MYTMP2=""
function printB64Line()
{
local line=$1
# not fully base64 line
if [[ ! "$line" =~ ^[A-Za-z0-9+/=]+$ ]]; then
printInlineB64 "$line"
return
fi;
# likely base64 line
MYTMP2=$(base64 -d <<< "$line" 2>/dev/null)
if [ "$?" != "0" ]; then
echo $line
else
echo $MYTMP2
fi;
}
FILE=$1
if [ -z "$FILE" ]; then
echo "Please give a file name in argument"
exit 1;
fi;
while read line; do
printB64Line "$line"
done < ${FILE}
and here is output
$ cat example.txt && echo "==========================" && ./base64.sh example.txt
wq9cXyjjg4QpXy/Crwo=
[_NOTBASE64ED_]
aGkgdGhlcmUK
[_CONSTANT_]
SGVsbG8gV29ybGQhCg==
==========================
¯\_(ツ)_/¯
[_NOTBASE64ED_]
hi there
[_CONSTANT_]
Hello World!
$ cat example2.txt && echo "==========================" && ./base64.sh example2.txt
aGkgdGhlcmUK[_CONSTANT_]SGVsbG8gV29ybGQhCg==
==========================
hi there[_CONSTANT_]Hello World!
You need a loop that reads each line and tests whether it's base64 or non-base64, and processes it appropriately.
while read -r line
do
case "$line" in
\[*\]) echo "$line" ;;
*) base64 -d <<< "$line" ;;
esac
done << example.txt
I would suggest using other languages other than sh but here is a solution using cut. This would handle the case where there are more than one [_constant_] in a line.
#!/bin/bash
function decode() {
local data=""
local line=$1
while [[ -n $line ]]; do
data=$data$(echo $line | cut -d[ -f1 | base64 -d)
const=$(echo $line | cut -d[ -sf2- | cut -d] -sf1)
[[ -n $const ]] && data=$data[$const]
line=$(echo $line | cut -d] -sf2-)
done
echo "$data"
}
while read -r line; do
decode $line
done < example.txt
If Perl is an option, you can say something like:
perl -MMIME::Base64 -lpe '$_ = join("", grep {/^\[/ || chomp($_ = decode_base64($_)), 1} split(/(?=\[)|(?<=\])/))' example.txt
The code below is equivalent to the above but is broken down into steps for the explanation purpose:
#!/bin/bash
perl -MMIME::Base64 -lpe '
#ary = split(/(?=\[)|(?<=\])/, $_);
foreach (#ary) {
if (! /^\[/) {
chomp($_ = decode_base64($_));
}
}
$_ = join("", #ary);
' example.txt
-MMIME::Base64 option loads the base64 codec module.
-lpe option makes Perl bahave like AWK to loop over input lines and implicitly handle newlines.
The regular expression (?=\[)|(?<=\]) matches the boundary between the base64 block and the maintaining block surrounded by [...].
The split function divides the line into blocks on the boundary and store them in an array.
Then loop over the array and decode the base64-encoded entry if found.
Finally merge the substring blocks into a line to print.

Bash (split) file name comparison fails

In my directory I have files (*fastq.gz.fasta) and directories, whose names contain the filenames (*fastq.gz.fasta-blastdb):
IVC6_Meino.clust.gz.fasta-blastdb
IVC5_Mehiv.clust.gz.fasta-blastdb
....
IVC6_Meino.clust.gz.fasta
IVC5_Mehiv.clust.gz.fasta
....
In a bash script I want to compare the filenames with the direcories using the cut option on the latter to extract only the filename part. If those two names match I want to do further stuff (for now echo match or no match respectively).
I have written the following piece of code:
#!/bin/bash
for file in *.fasta
do
for db in *-blastdb
do
echo $file, $db | cut -d '-' -f 1
if [[ $file = "$db | cut -d '-' -f 1" ]]; then
echo "match"
else
echo "no match"
fi
done
done
But it does not detect matches. The output looks like this:
...
IVC6_Meino.clust.gz.fasta, IIIA11_Meova.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC5_Mehiv.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC6_Meino.clust.gz.fasta
no match
The last line should read match as you can see, the strings look the same.
What am i missing?
You can use parameter expansion to do this more easily:
for file in *.fasta
do
for db in *-blastdb
do
echo "$file", "$db"
if [[ "${file%%.fasta}" = "${db%%.fasta-blastdb}" ]]; then
echo "match"
else
echo "no match"
fi
done
done
If you want to fix yours, the problem is the use of $db | cut -d '-' -f 1 With echo it appears that echo is printing the pipe. It isn't. cut is printing. When you do [[ $file = "$db | cut -d '-' -f 1" ]] it is equivalent to [[ $file = [return code from last pipe component] ]]
You need to use the $(..) shell construct to capture the output of the pipe and you need to echo to get the contents of $db to start the pipe. You should quote "$db" so you do not have word splitting or globbing from the contents of the variable.
Like so:
for file in *.fasta
do
for db in *-blastdb
do
ts=$(echo "$db" | cut -d '-' -f 1)
echo "$file", "$ts"
if [[ "$file" = "$ts" ]]; then
echo "match"
else
echo "no match"
fi
done
done # this works I think -- not tested...
Please be careful with your quoting with Bash and liberally use ShellCheck.
The structure you have is also not the most efficient. You will loop over the *-blastdb glob once for every file in *-blastdb. If you have a lot of files, that could get really slow.
To solve that, you could rewrite this loop with Bash arrays (best if you have Bash 4+) or use awk:
ext1=.fasta
ext2=.fasta-blastdb
awk 'FNR==NR{
s=$0
sub("\\"ext1"$","",s)
seen[s]=$0
next}
{
s=$0
sub("\\"ext2"$","",s)
if (s in seen)
print seen[s], $0
}
' ext1="$ext1" ext2="$ext2" <(for fn in *$ext1; do echo "$fn"; done) <(for fn in *$ext2; do echo "$fn"; done)
Each glob is only executing once and awk is using an array to test if the basenames are the same.
Best

Check if a string contains "-" and "]" at the same time

I have the next two regex in Bash:
1.^[-a-zA-Z0-9\,\.\;\:]*$
2.^[]a-zA-Z0-9\,\.\;\:]*$
The first matches when the string contains a "-" and the other values.
The second when contains a "]".
I put this values at the beginning of my regex because I can't scape them.
How I can get match the two values at the same time?
You can also place the - at the end of the bracket expression, since a range must be closed on both ends.
^[]a-zA-Z0-9,.;:-]*$
You don't have to escape any of the other characters, either. Colons, semicolons, and commas have no special meaning in any part of a regular expression, and while a period loses its special meaning inside a bracket expression.
Basically you can use this:
grep -E '^.*\-.*\[|\[.*\-.*$'
It matches either a - followed by zero or more arbitrary chars and a [ or a [ followed by zero or more chars and a -
However since you don't accept arbitrary chars, you need to change it to:
grep -E '^[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*\[|\[[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*$'
Maybe, this can help you
#!/bin/bash
while read p; do
echo $p | grep -E '\-.*\]|\].*\-' | grep "^[]a-zA-Z0-9,.;:-]*$"
done <$1
user-host:/tmp$ cat test
-i]string
]adfadfa-
string-
]string
str]ing
]123string
123string-
?????
++++++
user-host:/tmp$ ./test.sh test
-i]string
]adfadfa-
There are two questions in your post.
One is in the description:
How I can get match the two values at the same time?
That is an OR match, which could be done with a range that mix your two ranges:
pattern='^[]a-zA-Z0-9,.;:-]*$'
That will match a line that either contains one (or several) -…OR…]…OR any of the included characters. That would be all the lines (except ?????, ++++++ and as df gh) in the test script below.
Two is in the title:
… a string contains “-” and “]” at the same time
That is an AND match. The simplest (and slowest) way to do it is:
echo "$line" | grep '-' | grep ']' | grep '^[-a-zA-Z0-9,.;:]*$'
The first two calls to grep select only the lines that:
contain both (one or several) - and (one or several) ]
Test script:
#!/bin/bash
printlines(){
cat <<-\_test_lines_
asdfgh
asdfgh-
asdfgh]
as]df
as,df
as.df
as;df
as:df
as-df
as]]]df
as---df
asAS]]]DFdf
as123--456DF
as,.;:-df
as-dfg]h
as]dfg-h
a]s]d]f]g]h
a]s]d]f]g]h-
s-t-r-i-n-g]
as]df-gh
123]asdefgh
123asd-fgh-
?????
++++++
as df gh
_test_lines_
}
pattern='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing the simple pattern of $pattern"
while read line; do
resultgrep="$( echo "$line" | grep "$pattern" )"
printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
p1='-'; p2=']'; p3='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing a 'grep AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ $resultgrep ]] && printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing an 'AWK AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultawk="$( echo "$line" |
awk -v p1="$p1" -v p2="$p2" -v p3="$p3" '$0~p1 && $0~p2 && $0~p3' )"
[[ $resultawk ]] && printf '%13s %-13s\n' "$line" "$resultawk"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing a 'bash AND' of '$p1', '$p2' and '$p3'."
while read line; do
rgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ ( $line =~ $p1 ) && ( $line =~ $p2 ) && ( $line =~ $p3 ) ]]
rbash=${BASH_REMATCH[0]}
[[ $rbash ]] && printf '%13s %-13s %-13s\n' "$line" "$rgrep" "$rbash"
done < <(printlines)
echo "#############################################################"
echo

bash Shell: lost first element data partially

Using bash shell:
I am trying to read a file line by line.
and every line contains two meaning full file names delimited by "``"
file:1 image_config.txt
bbbbb.mp4``thumb/hashdata.gif
bbbbb.mp4``thumb/hashdata2.gif
Shell Script
#!/bin/bash
filename="image_config.txt"
while IFS='' read -r line || [[ -n "$line" ]]; do
IFS='``' read -r -a array <<< "$line"
if [ "$line" = "" ]; then
echo lineempty
else
file=${array[0]}
hash=${array[2]}
echo $file$hash;
output=$(ffmpeg -v warning -ss 2 -t 0.8 -i $file -vf scale=200:-1 -gifflags +transdiff -y $hash);
echo $output;
# echo ${array[0]}${array[1]}${array[2]}
fi;
done < "$filename"
first time executed successfully but when loop executes second time.
variable file lost bbbbb from bbbbb.mp4
and following output comes out
Output :
user#domain [~/public_html/Videos]$ sh imager.sh
bbbbb.mp4thumb/hashdata.gif
.mp4thumb/hashdata2.gif
.mp4: No such file or directory
lineempty
Please check out Bash FAQ 89 - I'm using a loop which runs once per line of input but it only seems to run once; everything after the first line is ignored? which seems to be helpful in your case.
Aside:
There is no point in using the same character twice in IFS.
IFS=\`
Is enough.
Check out this:
var='abc``def'
IFS=\`\` read -ra arr <<< "$var"
printf '<%s>\n' "${arr[#]}"
Output:
<abc>
<>
<def>
As you can see, arr[0] is abc, arr[1] is empty and arr[2] is def, and not arr[0] is abc and arr[1] is def as one might expect.
Taken from the IFS wiki of Greycat and Lhunath Bash Guide :
The IFS variable is used in shells (Bourne, POSIX, ksh, bash) as the input field separator (or internal field separator). Essentially, it is a string of special characters which are to be treated as delimiters between words/fields when splitting a line of input.
Here is how you could do differently, avoiding a read in the read:
#!/bin/bash
filename="image_config.txt"
while IFS='' read -r line || [[ -n "$line" ]]; do
if [ "$line" = "" ]; then
echo lineempty
else
file=$( echo ${line} | awk -F \` ' { print $1 } ' )
hash=$( echo ${line} | awk -F \` ' { print $3 } ' )
echo $file$hash;
output=$(ffmpeg -v warning -ss 2 -t 0.8 -i $file -vf scale=200:-1 -gifflags +transdiff -y $hash);
echo $output;
fi;
done < "$filename"

count words in a file without using wc

Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count

Resources