shell - Characters contained in both strings - edited

shell - Characters contained in both strings - edited - shell

I want to compare two string variables and print the characters that are the same for both. I'm not really sure how to do this, I was thinking of using comm or diff but I'm not really sure the right parameters to print only matching characters. also they say they take in files and these are strings. Can anyone help?
Input:
a=$(echo "abghrsy")
b=$(echo "cgmnorstuvz")
Output:
"grs"

You don't need to do that much work to assign $a and $b shell variables, you can just...
a=abghrsy
b=cdgmrstuvz
Now, there is a classic computer science problem called the longest common subsequence1 that is similar to yours.
However, if you just want the common characters, one way would let Ruby do the work...
$ ruby -e "puts ('$a'.chars.to_a & '$b'.chars.to_a).join"
1. Not to be confused with the different longest common substring problem.

Use Character Classes with GNU Grep
The isn't a widely-applicable solution, but it fits your particular use case quite well. The idea is to use the first variable as a character class to match against the second string. For example:
a='abghrsy'
b='cgmnorstuvz'
echo "$b" | grep --only-matching "[$a]" | xargs | tr --delete ' '
This produces grs as you expect. Note that the use of xargs and tr is simply to remove the newlines and spaces from the output; you can certainly handle this some other way if you prefer.
Set Intersection
What you're really looking for is a set intersection, though. While you can "wing it" in the shell, you'd be better off using a language like Ruby, Python, or Perl to do this.
A Ruby One-Liner
If you need to integrate with an existing shell script, a simple Ruby one-liner that uses Bash variables could be called like this inside your current script:
a='abghrsy'
b='cgmnorstuvz'
ruby -e "puts ('$a'.split(//) & '$b'.split(//)).join"
A Ruby Script
You could certainly make things more elegant by doing the whole thing in Ruby instead.
string1_chars = 'abghrsy'.split //
string2_chars = 'cgmnorstuvz'.split //
intersection = string1_chars & string2_chars
puts intersection.join
This certainly seems more readable and robust to me, but your mileage may vary. At least now you have some options to choose from.

Nice question +1.
You can use an awk trick to get this done.
a=abghrsy
b=cdgmrstuvz
comm -12 <(echo $a|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}') <(echo $b|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}')|tr -d '\n'
OUTPUT:
grs
Note use of awk -F"\0" that breaks input string character by character into different awk fiedls. Rest is pretty straightforward use of comm and tr.
PS: If you input string is not sorted then you need to pipe awk's output to sort or do sort of an array inside awk.
UPDATE: awk only solution (without comm):
echo "$a;$b" | awk -F"\0" '{scnd=0; for (i=1; i<=NF; i++) {if ($i!=";") {if (!scnd) arr1[$i]=$i; else if ($i in arr1) arr2[$i]=$i} else scnd=1}} END { for (a in arr2) printf("%s", a)}'
This assumes semicolon doesn't appear in your string (you can use any other character if that's not the case).
UPDATE 2: I think simplest solution is using grep -o
(thanks to answer from #CodeGnome)
echo "$b" | grep -o "[$a]" | tr -d '\n'

Using gnu coreutils(inspired by #DigitalRoss)..
a="abghrsy"
b="cgmnorstuvz"
echo "$(comm -12 <(echo "$a" | fold -w1 | sort | uniq) <(echo "$b" | fold -w1 | sort | uniq) | tr -d '\n')"
will print grs. I assumed you only want uniq characters.
UPDATE:
Modified for dash..
#!/bin/dash
string1=$(printf "$1" | fold -w1 | sort | uniq | tr -d '\n');
string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
while [ "$string1" != "" ]; do
c1=$(printf '%s\n' "$string1" | cut -c 1-1 )
string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
while [ "$string2" != "" ]; do
c2=$(printf '%s\n' "$string2" | cut -c 1-1 )
if [ "$c1" = "$c2" ]; then
echo "$c1\c"
fi
string2=$(printf '%s\n' "$string2" | cut -c 2- )
done
string1=$(printf '%s\n' "$string1" | cut -c 2- )
done
echo;
Note: I am just a beginner. There might be a better way of doing this.

Related

BASH: Iterate range of numbers in a for cicle

I want to create an array from a list of words. so, i'm using this code:
for i in {1..$count}
do
array[$i]=$(cat file.txt | cut -d',' -f3 | sort -r | uniq | tail -n ${i})
done
but it fails... in tail -n ${i}
I already tried tail -n $i, tail -n $(i) but can't pass tail the value of i
Any ideas?

It fails because you cannot use a variable in range directive in shell i.e. {1..10} is fine but {1..$n} is not.
While using BASH you can use ((...)) operators:
for ((i=1; i<=count; i++)); do
array[$i]=$(cut -d',' -f3 file.txt | sort -r | uniq | tail -n $i)
done
Also note removal of useless use of cat from your command.

Your range is not evaluated the way you are thinking, e.g.:
$ x=10
$ echo {1..$x}
{1..10}
You're better off just using a for loop:
for ((i = 1; i <= count; i++))
do
# ...
done

Just to elaborate on previous answers, this occurs because the 'brace expansion' is the first part of bash's parsing, and never gets repeated: when the braces are expanded, the '$count' is just a piece of text and so the braces are left as is. Then, when '$count' is expanded to a number, the brace expansion never runs again. See here.
If you wanted for some reason to force this brace expansion to happen again, you can use 'eval':
replace the {1..$count} with $(eval echo {1..${count}})
Better, in your case, to do as anubhava suggests.

Instead of reading the file numerous times, use the built-in mapfile command:
mapfile -t array < <(cut -d, -f3 file.txt | sort -r | uniq)

Assigning deciles using bash

I'm learning bash, and here's a short script to assign deciles to the second column of file $1.
The complicating bit is the use of awk within the script, leading to ambiguous redirects when I run the script.
I would have gotten this done in SAS by now, but like the idea of two lines of code doing the job.
How can I communicate the total number of rows (${N}) to awk within the script? Thanks.
N=$(wc -l < $1)
cat $1 | sort -t' ' -k2gr,2 | awk '{$3=int((((NR-1)*10.0)/"${N}")+1);print $0}'

You can set an awk variable from the command line using -v.
N=$(wc -l < "$1" | tr -d ' ')
sort -t' ' -k2gr,2 "$1" | awk -v n=$N '{$3=int((((NR-1)*10.0)/n)+1);print $0}'
I added tr -d to get rid of the leading spaces that wc -l puts in its result.

Is there a better way to retrieve the elements of a delimited pair in bash?

I have entries of the form: cat:rat and I would like to assign them to separate variables in bash. I am currently able to do this via:
A=$(echo $PAIR | tr ':' '\n' | head -n1)
B=$(echo $PAIR | tr ':' '\n' | tail -n1)
after which $A and $B are, respectively, cat and rat. echo, the two pipes and all feels a bit like overkill am I missing a much simpler way of doing this?

Using the read command
entry=cat:rat
IFS=: read A B <<< "$entry"
echo $A # => cat
echo $B # => rat

Yes using bash parameter substitution
PAIR='cat:rat'
A=${PAIR/:*/}
B=${PAIR/*:/}
echo $A
cat
echo $B
rat
Alternately, if you are willing to use an array in place of individual variables:
IFS=: read -r -a ARR <<<"${PAIR}"
echo ${ARR[0]}
cat
echo ${ARR[1]}
rat
EDIT: Refer glenn jackman's answer for the most elegant read-based solution

animal="cat:rat"
A=echo ${animal} | cut -d ":" -f1
B=echo ${animal} | cut -d ":" -f2
might not be the best solution. Just giving you a possible solution

Bash escaping and syntax

I have a small bash file which I intend to use to determine my current ping vs my average ping.
#!/bin/bash
output=($(ping -qc 1 google.com | tail -n 1))
echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc
This outputs my ping - 20 ms, which is the number I want. However, I also want to prepend a + if the number is positive and append "ms".
This brings me to my overarching problem: Bash syntax regarding escaping and such heavy "indenting" is kind of flaky.
While I'll be satisfied with an answer of how to do what I wanted, I'd like a link to, or explanation of how exactly bash syntax works dealing with this sort of thing.

output=($(ping -qc 1 google.com | tail -n 1))
echo "${output[3]}" | awk -F/ '{printf "%+fms\n", $1-20}'
The + modifier in printf tells it to print the sign, whether it's positive or negative.
And since we're using awk, there's no need to use cut or bc to get a field or do arithmetic.

Escaping is pretty awful in bash if you use the deprecated `..` style command expansion. In this case, you have to escape any backticks, which means you also have to escape any other escapes. $(..) nests a lot better, since it doesn't add another layer of escaping.
In any case, I'd just do it directly:
ping -qc 1 google.com.org | awk -F'[=/ ]+' '{n=$6}
END { v=(n-20); if(v>0) printf("+"); print v}'

Here's my take on it, recognizing that the result from bc can be treated as a string:
output=($(ping -qc 1 google.com | tail -n 1))
output=$(echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc)' ms'
[[ "$output" != -* ]] && output="+$output"
echo "$output"

Bash cannot handle floating point numbers. A workaround is to use awk like this:
#!/bin/bash
output=($(ping -qc 1 google.com | tail -n 1))
echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc | awk '{if ($1 >= 0) printf "+%fms\n", $1; else printf "%fms\n", $1}'
Note that this does not print anything if the result of bc is not positive
Output:
$ ./testping.sh
+18.209000ms

Extract text from hostname

Using OS X, I need a one line bash script to look at a client mac hostname like:
12345-BA-PreSchool-LT.local
Where the first 5 digits are an asset serial number, and the hyphens separate a business unit code from a department name followed by something like 'LT' to denote a laptop.
I guess I need to echo the hostname and use a combination of sed, awk and perhaps cut to strip characters out to leave me with:
"BA PreSchool"
Any help much appreciated. This is what I have so far:
echo $HOSTNAME | sed 's/...\(...\)//' | sed 's/.local//'

echo "12345-BA-PreSchool-LT.local" | cut -d'-' -f2,3 | sed -e 's/-/ /g'
(Not on OSX, so not sure if cut is defined)

I like to keep things simple :)
You could do it with just cut:
echo 12345-BA-PreSchool-LT.local | cut -d"-" -f2,3
BA-PreSchool
If you want to remove the hyphen you can use tr
echo 12345-BA-PreSchool-LT.local | cut -d"-" -f2,3 | tr "-" " "
BA PreSchool

How about
echo $HOSTNAME | awk 'BEGIN { FS = "-" } ; { print $2, $3 }'

Awk can solve your question easily.
echo "12345-BA-PreSchool-LT.local" | awk -F'-' '$0=$2" "$3'
BA PreSchool

bash$ string="12345-BA-PreSchool-LT.local"
bash$ IFS="-"
bash$ set -- $string
bash$ echo $2-$3
BA-PreSchool

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

shell - Characters contained in both strings - edited - shell

Related

BASH: Iterate range of numbers in a for cicle

Assigning deciles using bash

Is there a better way to retrieve the elements of a delimited pair in bash?

Bash escaping and syntax

Extract text from hostname

Categories

Resources