Bash - stdir words to file - bash

I am trying to store whole user input in a bash variable (appending variable).
Then to sort them etc.
The problem is that for input f.e.:
sdsd fff sss
asdasds
It creates this output:
fff
sdsd
sssasdasds
Expected output is:
asdasds
fff
sdsd
sss
Code follows:
content=''
while read line
do
content+=$(echo "$line")
done
result=`echo "$content" | sed -r 's/[^a-zA-Z ]+/ /g' | tr '[:upper:]' '[:lower:]' | tr ' ' '\n' | sort -u | sed '/^$/d' | sed 's/[^[:alpha:]]/\n/g'`
echo "$result" >> "$dictionary"

You aren't providing a space when you are appending.
content+=$(echo "$line")
You need to make sure there is a space between the end of the old value and the new value.
content+=" $line"
(There's no need for echo for this either as #gniourf_gniourf correctly pointed out.)

Something that will achieve what you're showing in your example:
words_ary=()
while read -r -a line_ary; do
(( ${#line_ary[#]} )) || continue # skip empty lines
words_ary+=( "${line_ary[#],,}" ) # The ,, is to convert to lower-case
done
printf '%s\n' "${words_ary[#]}" | sort -u >> "$dictionary"
We're splitting input into words at spaces and put these words in array line_ary
We're checking that we have a non-empty input
we append each word, converted to lowercase, from input to the array words_ary
finally we sort each word from words_ary and append the sorted words to file $dictionary.

Related

Check for duplicate word in comma-separated string in bash

I need to check that a variable does not contain a duplicate entry in a comma-separated string.
For example, inside of $animals, if I have:
,dog,cat,bird,goat,fish,
That would be considered valid since every word is unique.
The string:
,dog,cat,dog,bird,fish,
would be invalid since dog is entered twice.
,dog,cat,dogs,bird,fish,
Would be valid since there is only one instance of dog (dogs is there but allowed since it's not the same exact word)
The string:
,dog,cat,DOG,bird,fish
Would also be invalid since dog is the same as DOG only in uppercase.
Is there any way I can do this? I would put some code I've tried but I don't know what to use to even experiment.
Using bash 3.2.57(1)-release on 10.11.6 El Capitan
Case sensitive:
echo ",dog,cat,dog,bird,fish," | tr ',' '\n' | grep -v '^$' | sort | uniq -c | sort -k 1,1nr
Case insensitive:
echo ",dog,DOG,cat,dog,bird,fish," | tr ',' '\n' | grep -v '^$' | sort -rf | uniq -ci | sort -k 1,1nr
Perform a reverse sort (-r) and do it case insensitive to get the lower-case letters after upper ones. Then uniq them with -i. (You might have to ensure the defined collation LC_COLLATE and maybe locales like LANG and LC_ALL aren't affecting sort behavior).
Then check if the number in the first row > 1
Simple script-based solution
Usage
$ .\script.sh ,dog,dog,cat,
Actual Script
#!/bin/sh
num_duplicated() {
echo $1 |
tr ',' '\n' | # Split each items into its own line
tr '[:upper:]' '[:lower:]' | # Convert everything to lowercase
sort | # Sorts the lines (required for the call to `uniq`
uniq -d | # Passing the `-d` flag to show only duplicated lines
grep -v '^$' | # Passing `-v` on the pattern `^$` to remove empty lines
wc -l # Count the number of duplicate lines
}
main() {
num_duplicates=$(num_duplicated "$1")
if [[ $num_duplicates -eq '0' ]]
then
echo "No duplicates"
else
echo "Contains duplicate(s)"
fi
}
main $1

Reformatting text to return text before delimiter

I have been cleansing the following string of file names () as follows:
#!/bin/sh
QUEUES='FILENAME1.WORLD1.J00.2D.00;FILENAME2.WORLD1.J01.2D.00;FILENAME3.WORLD1.J00.2D.00;FILENAME4.WORLD1.J01.2D.00;FILENAME5.WORLD1.J00.2D.00'
for i in $(echo $MQ_QUEUES | sed "s/;/ /g")
do
a=$(echo "$i" | tr [:upper:] '[:lower:]' | sed 's/\./_/g')
print $a
done
This currently produces a lower case version of the queue above with the full stop delimiter replaced by an underscore. But I now only want to grab the first section of the title. How would I do this?
I.e. so instead of $a returning filename1_world1_j00_2d_00, it would simply return filename1.
why not use awk? with:
echo "$QUEUES" | tr ';' '\n' | awk -F'.' '{print tolower($1)}'
you will get:
filename1
filename2
...

Unix file pattern issue: append changing value of variable pattern to copies of matching line

I have a file with contents:
abc|r=1,f=2,c=2
abc|r=1,f=2,c=2;r=3,f=4,c=8
I want a result like below:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
The third column value is r value. A new line would be inserted for each occurrence.
I have tried with:
for i in `cat $xxxx.txt`
do
#echo $i
live=$(echo $i | awk -F " " '{print $1}')
home=$(echo $i | awk -F " " '{print $2}')
echo $live
done
but is not working properly. I am a beginner to sed/awk and not sure how can I use them. Can someone please help on this?
awk to the rescue!
$ awk -F'[,;|]' '{c=0;
for(i=2;i<=NF;i++)
if(match($i,/^r=/)) a[c++]=substr($i,RSTART+2);
delim=substr($0,length($0))=="|"?"":"|";
for(i=0;i<c;i++) print $0 delim a[i]}' file
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Use an inner routine (made up of GNU grep, sed, and tr) to compile a second more elaborate sed command, the output of which needs further cleanup with more sed. Call the input file "foo".
sed -n $(grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n') foo | \
sed 's/|[0-9|]*|/|/'
Output:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Looking at the inner sed code:
grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n'
It's purpose is to parse foo on-the-fly (when foo changes, so will the output), and in this instance come up with:
1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;
Which is almost perfect, but it leaves in old data on the last line:
sed -n '1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;' foo
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1|3
...which old data |1 is what the final sed 's/|[0-9|]*|/|/' removes.
Here is a pure bash solution. I wouldn't recommend actually using this, but it might help you understand better how to work with files in bash.
# Iterate over each line, splitting into three fields
# using | as the delimiter. (f3 is only there to make
# sure a trailing | is not included in the value of f2)
while IFS="|" read -r f1 f2 f3; do
# Create an array of variable groups from $f2, using ;
# as the delimiter
IFS=";" read -a groups <<< "$f2"
for group in "${groups[#]}"; do
# Get each variable from the group separately
# by splitting on ,
IFS=, read -a vars <<< "$group"
for var in "${vars[#]}"; do
# Split each assignment on =, create
# the variable for real, and quit once we
# have found r
IFS== read name value <<< "$var"
declare "$name=$value"
[[ $name == r ]] && break
done
# Output the desired line for the current value of r
printf '%s|%s|%s\n' "$f1" "$f2" "$r"
done
done < $xxxx.txt
Changes for ksh:
read -A instead of read -a.
typeset instead of declare.
If <<< is a problem, you can use a here document instead. For example:
IFS=";" read -A groups <<EOF
$f2
EOF

Extract words from files

How can I extract all the words from a file, every word on a single line?
Example:
test.txt
This is my sample text
Output:
This
is
my
sample
text
The tr command can do this...
tr [:blank:] '\n' < test.txt
This asks the tr program to replace white space with a new line.
The output is stdout, but it could be redirected to another file, result.txt:
tr [:blank:] '\n' < test.txt > result.txt
And here the obvious bash line:
for i in $(< test.txt)
do
printf '%s\n' "$i"
done
EDIT Still shorter:
printf '%s\n' $(< test.txt)
That's all there is to it, no special (pathetic) cases included (And handling multiple subsequent word separators / leading / trailing separators is by Doing The Right Thing (TM)). You can adjust the notion of a word separator using the $IFS variable, see bash manual.
The above answer doesn't handle multiple spaces and such very well. An alternative would be
perl -p -e '$_ = join("\n",split);' test.txt
which would. E.g.
esben#mosegris:~/ange/linova/build master $ echo "test test" | tr [:blank:] '\n'
test
test
But
esben#mosegris:~/ange/linova/build master $ echo "test test" | perl -p -e '$_ = join("\n",split);'
test
test
This might work for you:
# echo -e "this is\tmy\nsample text" | sed 's/\s\+/\n/g'
this
is
my
sample
text
perl answer will be :
pearl.214> cat file1
a b c d e f pearl.215> perl -p -e 's/ /\n/g' file1
a
b
c
d
e
f
pearl.216>

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources