Bash show charcaters if not in string - bash

I am trying out bash, and I am trying to make a simple hangman game now.
Everything is working but I don't understand how to do one thing:
I am showing the user the word with guessed letters (so for example is the world is hello world, and the user guessed the 'l' I show them **ll* ***l* )
I store the letters that the user already tried in var guess
I do that with the following:
echo "${word//[^[:space:]$guess]/*}"
The thing I want to do now is echo the alphabet, but leave out the letters that the user already tried, so in this case show the full alphabet without the L.
I already tried to do it the same way as I shown just yet, but it won't quite work.
If you need any more info please let me know.
Thanks,
Tim

You don't show what you tried, but parameter expansion works fine.
$ alphabet=abcdefghijklmnopqrstuvwxyz
$ word="hello world"
$ guesses=aetl
$ echo "${word//[^[:space:]$guesses]/*}"
*ell* ***l*
$ echo "${alphabet//[$guesses]/*}"
*bcd*fghijk*mnopqrs*uvwxyz

First store both strings in files where they are stored one char per line:
sed 's/./&\n/g' | sort <<< $guess > guessfile
sed 's/./&\n/g' | sort <<< $word > wordfile
Then we can filter the words that are only present in one of the files and paste the lines together as a string:
grep -xvf guessfile wordfile | paste -s -d'\0'
And of course we clean up after ourselves:
rm wordfile
rm guessfile
If the output is not correct, try switching arguments in grep (i.e. wordfile guessfile instead of guessfile wordfile).

Related

sed can't replace substring with special characters

[Mac/Terminal] I'm trying to replace words in a sentence with red-colored versions of them. I'm trying to use sed, but it's not outputting the result in the format I'm expecting. i.e.
for w in ${sp}; do
msg=`echo $msg | sed "s/$w/\\033[1;31m$w\\033[0m/g"`
done
results in:
033[1;31mstb033[0m 033[1;31mshu033[0m 033[1;31mkok033[0m
where $sp is a list of a subset of words contained in $msg
the desired output would look like:
\033[1;31mstb\033[0m \033[1;31mshu\033[0m \033[1;31mkok\033[0m
and then my hope would be that echo -e would interpret this correctly and show the red coloring instead. So far, however, I seem to not understand quite correctly how sed works in order to accomplish this.
This seems hugely inefficient. Why do you not simply replace all the words in one go and put in the actual escape codes immediately?
sp='one two three'
msg='one little mouse, two little mice, three little mice'
echo "$msg" | sed -E "s/${sp// /|}/^[[1;31m&^[[0m/g"
Output (where I use bold to mark up the red color1):
one little mouse, two little mice, three little mice
The sed -E option is just to allow us to use a simpler regex syntax (on Linux and some other platforms, try sed -r or simply translate the script to Perl).
You would type ctrl-V esc where you see ^[ in the command line above.
If you need the message in a variable for repeated use, look at printf -v
1 Looks like Stack Overflow doesn't support <span style="color:red">, unfortunately.
What about using an array, and printf instead of echo?
$ sp="Now is the time..."
$ w=( $sp )
$ printf -v output '\e[1;31m%s\e[0m ' "${w[#]}"
$ echo "$output"
Now is the time...
The output is obviously red, which doesn't come across here, but:
$ printf '%q\n' "$output"
$'\E[1;31mNow\E[0m \E[1;31mis\E[0m \E[1;31mthe\E[0m \E[1;31mtime...\E[0m '
And if you don't like the trailing space, you can trim it with ${output% }.

Delete everything before a pattern

I'm trying to clean a text file.
I want to delete everything start before the first 12 numbers.
1:0:135103079189:0:0:2:0::135103079189:000011:00
A:908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Output desired:
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Here's my command but seems not working.
sed '/:\([0-9]\{12\}\)/d' t.txt
the d command in sed will delete entire line on matching the given regex, you need to use s command to search and replace only part of line... however, for given problem, sed is not suitable as it doesn't support non-greedy regex
you can use perl instead
$ perl -pe's/^.*?(?=\d{12}:)//' ip.txt
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
.*? match zero or more characters as minimally as possible
(?=\d{12}:) only if it is followed by 12-digits ending with :
use perl -i -pe for in-place editing
some possible corner cases
$ # this is matching part of field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe's/^.*?(?=\d{12}:)//'
135103079189:23:603307102606:1
$ # this is not matching 12-digit field at end of line
$ echo 'foo:123:135103079189' | perl -pe's/^.*?(?=\d{12}:)//'
foo:123:135103079189
$ # so, add start/end of line matching cases and restrict 12-digits to whole field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe 's/^(?:.*?:)?(?=\d{12}(:|$))//'
603307102606:1
$ echo 'foo:123:135103079189' | perl -pe's/^(?:.*?:)?(?=\d{12}(:|$))//'
135103079189
Could you please try following.
awk --re-interval 'match($0,/[0-9]{12}/){print substr($0,RSTART)}' Input_file
Since I have OLD version of awk so I am using --re-interval you could remove it in case you have new version of it.
This might work for you (GNU sed):
sed -n 's/[0-9]\{12\}/\n&/;s/.*\n//p' file
We only want to print specific lines so use the -n option to turn off automatic printing. If a line contains a 12 digit number, insert a newline before it. Remove any characters before and including a newline and print the result.
If you want to print lines that do not contain a 12 digit number as is, use:
sed 's/[0-9]\{12\}/\n&/;s/.*\n//' file
The crux of the problem is to identify the start of a multi-character string, insert a unique marker and delete all characters before and including the unique marker. As sed uses the newline to delimit lines, only the user can introduce newlines into the pattern space and as a result, newlines will always be unique.
Taking the nice answer from #Sundeep, in case you would like to use grep or pcregrep (macOS/BSD) you could give a try to:
$ grep -oP '^(?:.*?:)?(?=\d{12})\K.*' file
or
$ pcregrep -o '^(?:.*?:)?(?=\d{12})\K.*' file
The \K will ignore everything after the pattern
Alternative thoughts - I almost think your data is too dirty for a quick sed fix but if generally it's all similar to your sample set of data then certainly pick one of the answers with sed etc. However if you wanted to be more particular about it you could build up a set of commands to ensure the values. I like doing this for debugging and when speed isn't urgent.
Take this tiny sample of code, you could do this other ways but I'm getting the value for each part of the string and I know the order because it contiguous. You could then set up controls on which parts to keep and such as it builds out say a new string per line. Overwrought for sure, but sometimes that is a better long term approach.
#!/bin/bash
while IFS= read -r line ;do
IFS=':' read -r -a array <<< "$line"
for ((i=0; i<${#array[#]}; i++)) ;do
echo "part : ${array[$i]}"
done
done < "test_data.txt"
You could then build the data back up how you wanted and more easily understand what's happening every step of the way ..
part : 1
part : 0
part : 135103079189
part : 0
part : 0
part : 2
part : 0
part :
part : 135103079189
part : 000011
part : 00
part : A
part : 908529896240
part : 0

list words from file using shell script in alphabetical order and with no punctuation

I am using Shell script and bash commands.
I have to generate a list of words that are in alphabetical order from a file which has many sentences in it, i am using song lyrics to work this out on. I can return each word in alphabetical order but it still includes some apostrophes, question marks and full stops. to do this I use:
cat lyrics01.txt | tr "\"' " '\n' | sort -u >> lyrics01.wl
I know this tells the list to go down after each space and apostrophe but I need it to delete the punctuation and simply be the words in an alphabetical order.
I have tried implementing this part:
-d ',.;:-+=()'
after the 'tr' from my original code but it will not work. Any help for a simpler way or even to solve this would be much appreciated.
Assuming you want lines split on words but not split on punctuation so that "The world isn't fair." becomes
The
world
isnt
fair
and not
The
world
isn
t
fair
<blank line>
the following should do what you want
sed 's/[[:punct:]]*//g;s/ /\n/g' lyrics01.txt | sort -u >> lyrics01.wl
Try sed as below:
sed 's/\([[:punct:] ]\)/\n/g' lyrics01.txt | sort -u >> lyrics01.wl
This will remove any punctuation marks or space and replace it with new line character.
All of the examples seem to remove the single quote from the word "isn't"
If that is not what you want, I've tested and come up with this :
$ cat test.txt
The
world
isn't
fair.
Isn't it ?
$ sed "s/ /\n/g" test.txt | sed "s/[[:punct:]]$/\n/g" | grep .
The
world
isn't
fair
Isn't
it
$
It's not sorted, but this is to show you can retain punctionation if not at the end

BASH Palindrome Checker

This is my first time posting on here so bear with me please.
I received a bash assignment but my professor is completely unhelpful and so are his notes.
Our assignment is to filter and print out palindromes from a file. In this case, the directory is:
/usr/share/dict/words
The word lengths range from 3 to 45 and are supposed to only filter lowercase letters (the dictionary given has characters and uppercases, as well as lowercase letters). i.e. "-dkas-das" so something like "q-evvavve-q" may count as a palindrome but i shouldn't be getting that as a proper result.
Anyways, I can get it to filter out x amount of words and return (not filtering only lowercase though).
grep "^...$" /usr/share/dict/words |
grep "\(.\).\1"
And I can use subsequent lines for 5 letter words and 7 and so on:
grep "^.....$" /usr/share/dict/words |
grep "\(.\)\(.\).\2\1"
But the prof does not want that. We are supposed to use a loop. I get the concept but I don't know the syntax, and like I said, the notes are very unhelpful.
What I tried was setting variables x=... and y=.. and in a while loop, having x=$x$y but that didn't work (syntax error) and neither did x+=..
Any help is appreciated. Even getting my non-lowercase letters filtered out.
Thanks!
EDIT:
If you're providing a solution or a hint to a solution, the simplest method is prefered.
Preferably one that uses 2 grep statements and a loop.
Thanks again.
Like this:
for word in `grep -E '^[a-z]{3,45}$' /usr/share/dict/words`;
do [ $word == `echo $word | rev` ] && echo $word;
done;
Output using my dictionary:
aha
bib
bob
boob
...
wow
Update
As pointed out in the comments, reading in most of the dictionary into a variable in the for loop might not be the most efficient, and risks triggering errors in some shells. Here's an updated version:
grep -E '^[a-z]{3,45}$' /usr/share/dict/words | while read -r word;
do [ $word == `echo $word | rev` ] && echo $word;
done;
Why use grep? Bash will happily do that for you:
#!/bin/bash
is_pal() {
local w=$1
while (( ${#w} > 1 )); do
[[ ${w:0:1} = ${w: -1} ]] || return 1
w=${w:1:-1}
done
}
while read word; do
is_pal "$word" && echo "$word"
done
Save this as banana, chmod +x banana and enjoy:
./banana < /usr/share/dict/words
If you only want to keep the words with at least three characters:
grep ... /usr/share/dict/words | ./banana
If you only want to keep the words that only contain lowercase and have at least three letters:
grep '^[[:lower:]]\{3,\}$' /usr/share/dict/words | ./banana
The multiple greps are wasteful. You can simply do
grep -E '^([a-z])[a-z]\1$' /usr/share/dict/words
in one fell swoop, and similarly, put the expressions on grep's standard input like this:
echo '^([a-z])[a-z]\1$
^([a-z])([a-z])\2\1$
^([a-z])([a-z])[a-z]\2\1$' | grep -E -f - /usr/share/dict/words
However, regular grep does not permit backreferences beyond \9. With grep -P you can use double-digit backreferences, too.
The following script constructs the entire expression in a loop. Unfortunately, grep -P does not allow for the -f option, so we build a big thumpin' variable to hold the pattern. Then we can actually also simplify to a single pattern of the form ^(.)(?:.|(.)(?:.|(.)....\3)?\2?\1$, except we use [a-z] instead of . to restrict to just lowercase.
head=''
tail=''
for i in $(seq 1 22); do
head="$head([a-z])(?:[a-z]|"
tail="\\$i${tail:+)?}$tail"
done
grep -P "^${head%|})?$tail$" /usr/share/dict/words
The single grep should be a lot faster than individually invoking grep 22 or 43 times on the large input file. If you want to sort by length, just add that as a filter at the end of the pipeline; it should still be way faster than multiple passes over the entire dictionary.
The expression ${tail+:)?} evaluates to a closing parenthesis and question mark only when tail is non-empty, which is a convenient way to force the \1 back-reference to be non-optional. Somewhat similarly, ${head%|} trims the final alternation operator from the ultimate value of $head.
Ok here is something to get you started:
I suggest to use the plan you have above, just generate the number of "." using a for loop.
This question will explain how to make a for loop from 3 to 45:
How do I iterate over a range of numbers defined by variables in Bash?
for i in {3..45};
do
* put your code above here *
done
Now you just need to figure out how to make "i" number of dots "." in your first grep and you are done.
Also, look into sed, it can nuke the non-lowercase answers for you..
Another solution that uses a Perl-compatible regular expressions (PCRE) with recursion, heavily inspired by this answer:
grep -P '^(?:([a-z])(?=[a-z]*(\1(?(2)\2))$))++[a-z]?\2?$' /usr/share/dict/words

Trimming pathnames beyond a keyword (awk, sed, ?)

I want to trim a pathname beyond a certain point after finding a keyword. I'm drawing a blank this morning.
/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java
I want to find the keyword Java, save the pathname beyond that (tsupdater), then cut everything off after the Java portion.
I don't know if this is what you want, but you can split the pathname into two with:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//p;g;s/Java.*/Java/'
Which outputs:
/tsupdater/src/tsupdater.java
/home/quikq/1.0/dev/Java
If you would like to save the second part into a file part2.txt and print the first part, you could do:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//;wpart2.txt;g;s/Java.*/Java/'
If you're writing a shell script:
myvar="/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"
part1="${myvar%Java*}Java"
part2="${myvar#*Java/}"
Hope this helps =)
take one you need:
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java/[^/]*).*#\1#g'
/home/quikq/1.0/dev/Java/tsupdater
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java).*#\1#g'
/home/quikq/1.0/dev/Java
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#.*Java/([^/]*).*#\1#g'
tsupdater
I'm not entirely sure what you want as output (please specify more clearly), but this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/.*Java//'
results in:
/tsupdater/src/tsupdater.java
If you want the preceding part then this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/Java.*//'
results in:
/home/quikq/1.0/dev/
Like I said, I was having a weird morning, but it dawned on me.
echo /home/quikq/1.0/dev/Java/TSUpdater/src/TSUpdater.java | sed s/Java.*//g
Yields
/home/quikq/1.0/dev
Lots of great tips here for chopping it up different ways though. Thanks a bunch!

Resources