I have two strings:
l1='a1 a2 b1 b2 c1 c2'
l2='a1 b3 c1'
And I want to check if each element of string l2 exists in l1, and then remove it from l1.
Is it possible to do that without a for loop?
You can do this:
l1=$(comm -23 <(echo "$l1" | tr ' ' '\n' | sort) <(echo "$l2" | tr ' ' '\n' | sort) | tr '\n' ' ')
The comm compares lines and outputs the lines that are unique to the first input, unique to the second input, and common to both. The -23 option suppresses the second two sets of outputs, so it just reports the lines that are unique to the first input.
Since it requires the input to be sorted lines, I first pipe the variables to tr to put each word on its own line, and then sort to sort it. <(...) is a common shell extension called process substitution that allows a command to be used where a filename is expected; it's available in bash and zsh, for example (see here for a table that lists which shells have it).
At the end I use tr again to translate the newlines back to spaces.
DEMO
If you don't have process substution, you can emulate it with named pipes:
mkfifo p1
echo "$l1" | tr ' ' '\n' | sort > p1 &
mkfifo p2
echo "$l2" | tr ' ' '\n' | sort > p2 &
l1=$(comm p1 p2 | tr '\n' ' ')
rm p1 p2
Related
I want to count the amount of the same words in a text file and display them in descending order.
So far I have :
cat sample.txt | tr ' ' '\n' | sort | uniq -c | sort -nr
Which is mostly giving me satisfying output except the fact that it includes special characters like commas, full stops, ! and hyphen.
How can I modify existing command to not include special characters mentioned above?
You can use tr with a composite string of the letters you wish to delete.
Example:
$ echo "abc, def. ghi! boss-man" | tr -d ',.!'
abc def ghi boss-man
Or, use a POSIX character class knowing that boss-man for example would become bossman:
$ echo "abc, def. ghi! boss-man" | tr -d [:punct:]
abc def ghi bossman
Side note: You can have a lot more control and speed by using awk for this:
$ echo "one two one! one. oneone
two two three two-one three" |
awk 'BEGIN{RS="[^[:alpha:]]"}
/[[:alpha:]]/ {seen[$1]++}
END{for (e in seen) print seen[e], e}' |
sort -k1,1nr -k2,2
4 one
4 two
2 three
1 oneone
How about first extracting words with grep:
grep -o "\w\+" sample.txt | sort | uniq -c | sort -nr
I have a big txt file which I want to edit in pipeline. But on same place in pipeline I want to set number of lines in variable $nol. I just want to see sintax how could I set variable in pipeline like:
cat ${!#} | tr ' ' '\n'| grep . ; $nol=wc -l | sort | uniq -c ...
That after second pipe is very wrong, but how can I do it in bash?
One of solutions is:
nol=$(cat ${!#} | tr ' ' '\n'| grep . | wc -l)
pipeline all from the start again
but I don't want to do script the same thing twice, bec I have more pipes then here.
I musn't use awk or sed...
You can use a tee and then write it to a file which you use later:
tempfile="xyz"
tr ' ' '\n' < "${!#}" | grep '.' | tee > "$tempfile" | sort | uniq -c ...
nol=$(wc -l "$tempfile")
Or you can use it the other way around:
nol=$(tr ' ' '\n' < "${!#}" | grep '.' \
| tee >(sort | uniq -c ... > /dev/tty) | wc -l
You can set a variable in a particular link of a pipeline, but that's not very useful since only that particular link will be affected by it.
I recommend simply using a temporary file.
set -e
trap 'rm -f "$tmpf"' EXIT
tmpf=`mktemp`
cat ${!#} | tr ' ' '\n'| grep . | sort > "$tmpf"
nol="$(wc "$tmpf")"
< "$tmpf" uniq -c ...
You can avoid the temporary file with tee and a named pipe, but it probably won't perform much better (it may even perform worse).
UPDATE:
Took a minute but I got it...
cat ${!#} | tr ' ' '\n'| tee >(nol=$(wc -l)) | sort | uniq -c ...
PREVIOUS:
The only way I can think to do this is storing in variables and calling back. You would not execute the command more than one time. You would just store the output in variables along the way.
aCommand=($(cat ${!#} | tr ' ' '\n'));sLineCount=$(echo ${#aCommand[#]});echo ${aCommand[#]} | sort | uniq -c ...
aCommand will store the results of the first set of commands in an array
sLineCount will count the elements (lines) in the array
;... echo the array elements and continue the commands from there.
Looks to me like you're asking how to avoid stepping through your file twice, just to get both word and line count.
Bash lets you read variables, and wc can produce all the numbers you need at once.
NAME
wc -- word, line, character, and byte count
So to start...
read words line chars < <( wc < ${!#} )
This populates the three variables based on input generated from process substitution.
But your question includes another partial command line which I think you intend as:
nol=$( sort -u ${!#} | wc -l )
This is markedly different from the word count of your first command line, so you can't use a single wc instance to generate both. Instead, one option might be to put your functionality into a script that does both functions at once:
read words uniques < <(
awk '
{
words += NF
for (i=1; i<=NF; i++) { unique[$i] }
}
END {
print words,length(unique)
}
' ${!#}
)
For some reason, I can't, for the life of me, get zsh to produce an array containing one line from the entire shell history per element. (i.e. hist_arr[1] == $(history 1 1 | tr -s " " | cut -d ' ' -f 3-), hist_arr[2] == $(history 2 2 | tr -s " " | cut -d ' ' -f 3-), ... <for ten thousand lines>). I'd like to compute the whole array in a single step, so it's more efficient.
hist_arr[1]=$(history 1 1) works fine, but contains redundant history number.
If that is your problem then simple remove it, eg. this way:
hist_arr[1]=$(history 1 1 | tr -s " " | cut -d ' ' -f 3-)
Edit:
If you want to assign to table all element from history file then
IFS=$'
'
hist_arr=( $(awk 'BEGIN{FS=OFS=";"} {$1=""; sub(/\;/, "")}'1 .zsh_history) )
should work.
how to list all word of length 3 without duplication ?
using tr ' ' '\n' < cca1.txt | grep '^.\{3\}$'
list all word of length 3
but when add sort -u to be tr ' ' '\n' < cca1.txt | grep '^.\{3\}$' |sort -u
to list words of length 3 without duplication
it list part of words not whole words of length 3
any suggestion?
sort -u can be tricky.
simply use:
tr ' ' '\n' < cca1.txt | grep '^...$' | sort | uniq
Ok, so I need to create a command that lists the 100 most frequent words in any given file, in a block of text.
What I have at the moment:
$ alias words='tr " " "\012" <hamlet.txt | sort -n | uniq -c | sort -r | head -n 10'
outputs
$ words
14 the
14 of
8 to
7 and
5 To
5 The
5 And
5 a
4 we
4 that
I need it to output in the following format:
the of to and To The And a we that
((On that note, how would I tell it to print the output in all caps?))
And I need to change it so that I can pipe 'words' to any file, so instead of having the file specified within the pipe, the initial input would name the file & the pipe would do the rest.
Okay, taking your points one by one, though not necessarily in order.
You can change words to use standard input just by removing the <hamlet.txt bit since tr will take its input from standard input by default. Then, if you want to process a specific file, use:
cat hamlet.txt | words
or:
words <hamlet.txt
You can remove the effects of capital letters by making the first part of the pipeline:
tr '[A-Z]' '[a-z]'
which will lower-case your input before doing anything else.
Lastly, if you take that entire pipeline (with the suggested modifications above) and then pass it through a few more commands:
| awk '{printf "%s ", $2}END{print ""}'
This prints the second argument of each line (the word) followed by a space, then prints an empty string with terminating newline at the end.
For example, the following script words.sh will give you what you need:
tr '[A-Z]' '[a-z]' | tr ' ' '\012' | sort -n | uniq -c | sort -r
| head -n 3 | awk '{printf "%s ", $2}END{print ""}'
(on one line: I've split it for readability) as per the following transcript:
pax> echo One Two two Three three three Four four four four | ./words.sh
four three two
You can achieve the same end with the following alias:
alias words="tr '[A-Z]' '[a-z]' | tr ' ' '\012' | sort -n | uniq -c | sort -r
| head -n 3 | awk '{printf \"%s \", \$2}END{print \"\"}'"
(again, one line) but, when things get this complex, I prefer a script, if only to avoid interminable escape characters :-)