Print Unique Values while using Do-While loop - bash

I have a file named textfile.txt like below:
a 1 xxx
b 1 yyy
c 2 zzz
d 2 aaa
e 3 bbb
f 3 ccc
I am trying to filter the second column with a unique values in that. I had below code:
while read LINE
do
compname=`echo ${LINE} | cut -d' ' -f2 | uniq`
echo -e "${compname}"
done < textfile.txt
It is displaying:
1
1
2
2
3
3
But I am looking for an output like:
1
2
3
I tried another command also like : echo ${LINE} | cut -d' ' -f2 | sort -u | uniq
still not expected output.
Can anyone help me?

There's no need to loop, sort -u already processes the whole input.
cut -d' ' -f2 textfile.txt | sort -u
Maybe you wanted to get the output in the original order, showing the first occurrence only? You can use an associative array to remember which values have been already seen:
#! /bin/bash
declare -A seen
while read x ; do
[[ ${seen[$x]} ]] || printf '%s\n' "$x"
seen[$x]=1
done < <(cut -d' ' -f2 textfile.txt)
For the last occurrence only, change the last line to
done < <(cut -d' ' -f2 textfile.txt | tac) | tac
(i.e. the last occurrence is the first occurrence in the reversed order)

Just pipe the output of the loop to sort -u. There's no need for cut; the read command can handle this type of splitting.
while read -r _ compname _; do
echo "$compname"
done < textfile.txt | sort -u

Try moving the sort -u or sort | uniq after the done statement like this:
while read LINE;
do
compname=$(echo ${LINE} | cut -d' ' -f2)
echo "${compname}"
done < textfile.txt | sort -u

Related

Split pipe into two and paste the results together?

I want to pipe the output of the command into two commands and paste the results together. I found this answer and similar ones suggesting using tee but I'm not sure how to make it work as I'd like it to.
My problem (simplified):
Say that I have a myfile.txt with keys and values, e.g.
key1 /path/to/file1
key2 /path/to/file2
What I am doing right now is
paste \
<( cat myfile.txt | cut -f1 ) \
<( cat myfile.txt | cut -f2 | xargs wc -l )
and it produces
key1 23
key2 42
The problem is that cat myfile.txt is repeated here (in the real problem it's a heavier operation). Instead, I'd like to do something like
cat myfile.txt | tee \
<( cut -f1 ) \
<( cut -f2 | xargs wc -l ) \
| paste
But it doesn't produce the expected output. Is it possible to do something similar to the above with pipes and standard command-line tools?
This doesn't answer your question about pipes, but you can use AWK to solve your problem:
$ printf %s\\n 1 2 3 > file1.txt
$ printf %s\\n 1 2 3 4 5 > file2.txt
$ cat > myfile.txt <<EOF
key1 file1.txt
key2 file2.txt
EOF
$ cat myfile.txt | awk '{ ("wc -l " $2) | getline size; sub(/ .+$/,"",size); print $1, size }'
key1 3
key2 5
On each line we first we run wc -l $2 and save the result into a variable. Not sure about yours, but on my system wc -l includes the filename in the output, so we strip it with sub() to match your example output. And finally, we print the $1 field (key) and the size we got from wc -l command.
Also, can be done with shell, now that I think about it:
cat myfile.txt | while read -r key value; do
printf '%s %s\n' "$key" "$(wc -l "$value" | cut -d' ' -f1)"
done
Or more generally, by piping to two commands and using paste, therefore answering the question:
cat myfile.txt | while read -r line; do
printf %s "$line" | cut -f1
printf %s "$line" | cut -f2 | xargs wc -l | cut -d' ' -f1
done | paste - -
P.S. The use of cat here is useless, I know. But it's just a placeholder for the real command.

Extract number in every line of TSV file

I have a file with tab-separated-values and also with blank spaces like this:
! (desambiguación) http://es.dbpedia.org/resource/!_(desambiguación) 5
! (álbum) http://es.dbpedia.org/resource/!_(álbum_de_Trippie_Redd) 2
!! http://es.dbpedia.org/resource/!! 4
$9.99 http://es.dbpedia.org/resource/$9.99 6
Tomlinson http://es.dbpedia.org/resource/(10108)_Tomlinson 20
102 Miriam http://es.dbpedia.org/resource/(102)_Miriam 2
2003 QQ47 http://es.dbpedia.org/resource/(143649)_2003_QQ47 2
I want to extract the last number of every line:
5
2
4
6
20
2
2
For that, I have done this:
while read line;
do
NUMBER=$(echo $line | cut -f 3 -d ' ')
echo $NUMBER
done < $PAIRCOUNTS_FILE
The main problem is that some lines have more spaces than others and cut doesn't work for me with default delimiter (tab). I dont' know why, maybe because I am using WSL.
I have tried cut with several options but it doesn't work in anyway:
NUMBER=$(echo $line | cut -f 3 -d ' ')
NUMBER=$(echo $line | cut -f 4 -d ' ')
NUMBER=$(echo $line | cut -f 2)
NUMBER=$(echo $line | cut -f 3)
Hope you can help me with this. Thanks in advance.
I want to extract the last number of every line:
You could use grep
grep -Eo '[[:digit:]]+$' file
Or mapfile aka readarray which is a bash4+ feature.
mapfile -t array < file
printf '%s\n' "${array[#]##* }"
You can use awk:
awk '{print $NF}' file
With cut (if it is truly TAB separated and 3 fields per line):
cat file | cut -f3
If you have some variable number of fields per line, use rev|cut|rev to get the last field:
cat file | rev | cut -f1 | rev
Or with pure Bash and parameter expansion:
while IFS= read -r line; do
last=${line##* } # that is a literal TAB in the parameter expansion
printf "%s\n" "$last";
done <file
Or, read into a bash array and echo the last field:
while IFS=$'\t' read -r -a arr; do
echo "${arr[${#arr[#]}-1]}"
done <file
If you have a mixture of tabs and spaces you can do what usually is a mistake and break a Bash variable on white spaces in general (tabs and spaces) into an array:
while IFS= read -r line; do
arr=($line) # break on either tab or space without quotes
echo "${arr[${#arr[#]}-1]}"
done <file

How to get nth line of a file in bash?

I want to extract from a file named datax.txt the second line being :
0/0/0/0/0/0 | 0/0/0/0/0/0 | 0/0/0/0/0/0
And then I want to store in 3 variables the 3 sequences 0/0/0/0/0/0.
How am I supposed to do?
Read the 2nd line into variables a,b and c.
read a b c <<< $(awk -F'|' 'NR==2{print $1 $2 $3}' datax)
the keys is to split the problem in two:
you want to get the nth line of a file -> see here
you want to split a line in chunks according to a delimiter -> that's the job of many tools, cut is one of them
For future questions, be sure to include a more complete dataset, here is one for now. I changed a bit the second line so that we can verify that we got the right column:
f.txt
4/4/4/4/4/4 | 4/4/4/4/4/4 | 4/4/4/4/4/4
0/0/0/0/a/0 | 0/0/0/0/b/0 | 0/0/0/0/c/0
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
Then a proper script building on the two key actions described above:
extract.bash
file=$1
target_line=2
# get the n-th line
# https://stackoverflow.com/questions/6022384/bash-tool-to-get-nth-line-from-a-file
line=$(cat $file | head -n $target_line | tail -1)
# get the n-th field on a line, using delimiter '|'
var1=$(echo $line | cut --delimiter='|' --fields=1)
echo $var1
var2=$(echo $line | cut --delimiter='|' --fields=2)
echo $var2
var3=$(echo $line | cut --delimiter='|' --fields=3)
echo $var3
aaand:
$ ./extract.bash f.txt
0/0/0/0/a/0
0/0/0/0/b/0
0/0/0/0/c/0
Please try the following:
IFS='|' read a b c < <(sed -n 2P < datax | tr -d ' ')
Then the variables a, b and c are assigned to each field of the 2nd line.
You can use sed to print a specific line of a file, so for your example on the second line:
sed -n -e 2p ./datax
Set the output of the sed to be a variable:
Var=$(sed -n -e 2p ./datax)
Then split the string into the 3 variables you need:
A="$(echo $Var | cut -d'|' -f1)"
B="$(echo $Var | cut -d'|' -f2)"
C="$(echo $Var | cut -d'|' -f3)"

Inifinite loop in bash

I have written the following command to loop over a set of strings in the second column of my file and then do sorting for each string on column 11, then take the second and eleventh column and count the number of unique occurrences. Very simple but it seems that it enters an infinite loop and I can't see why. I would appreciate your help very much.
for item in $(cat file.txt | cut -f2 -d " "| uniq)
do
sort -k11,11 file.txt | cut -f2,11 -d " " | uniq -c | sort -k2,2 > output
done
There's no infinite loop here, but it is a very silly loop (that takes a long time to run, while not accomplishing the script's stated purpose). Let's look at how one might accomplish that purpose more sanely:
Using a temporary file for counts.txt to avoid needing to rerun the sort, cut and uniq steps on each iteration:
sort -k11,11 file.txt | cut -f2,11 -d " " | uniq -c >counts.txt
while read -r item; do
fgrep -e " ${item}" counts.txt
done < <(cut -f2 -d' ' <file.txt | uniq)
Even better, using bash 4 associative arrays and no temporary file:
# reads counts into an array
declare -A counts=( )
while read -r count item; do
counts[$item]=count
done < <(sort -k11,11 file.txt | cut -f2,11 -d " " | sort | uniq -c)
# reads counts back out
while read -r item; do
echo "$item ${counts[$item]}"
done < <(cat file.txt | cut -f2 -d " "| sort | uniq)
...that said, that's only if you want to use sort for ordering on pulling data back out. If you don't need to do that, the latter part could be replaced as such:
# read counts back out
for item in "${!counts[#]}"; do
echo "$item ${counts[$item]}"
done

sort fields within a line

input:
87 6,1,9,13
3 9,4,14,35,38,13
31 3,1,6,5
(i.e. a tab-delimited column where the second field is a comma-delimited list of unordered integers.)
desired output:
87 1,6,9,13
3 4,9,13,14,35,38
31 1,3,5,6
Goal:
for each line separately, sort the comma-separated list appearing in the second field. i.e. sort the 2nd column within for each line separately.
Note: the rows should not be re-ordered.
What I've tried:
sort - Since the order of the rows should not change, then sort is simply not applicable.
awk - since the greater file is tab-delimited, not comma-delimited, it cannot parse the second column as multiple "sub-fields"
There might be a perl way? I know nothing about perl though...
It can be done by simple perl oneliner:
perl -F'/\t/' -alne'$s=join",",sort{$a<=>$b}split",",$F[1];print"$F[0]\t$s"'
and shell (bash) one as well:
while read a b;do echo -e "$a\t$(echo $b|tr , '\n'|sort -n|tr '\n' ,|sed 's/,$//')"; done
while read LINE; do
echo -e "$(echo $LINE | awk '{print $1}')\t$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,)";
done < input
Obviously a lot going on here so here we go:
input contains your input
$(echo $LINE | awk '{print $1}') prints the first field, pretty straightforward
$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,) prints the second field, but breaks it down into lines by replacing the commas by newlines (tr ',' '\n'), then sort numerically, then assemble the lines back to comma-delimited values (paste -s -d,).
$ cat input
87 6,1,9,13
3 9,4,14,35,38,13
31 3,1,6,5
$ while read LINE; do echo -e "$(echo $LINE | awk '{print $1}')\t$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,)"; done < input
87 1,6,9,13
3 4,9,13,14,35,38
31 1,3,5,6
Another way:
echo happybirthday|awk '{split($0,A);asort(A); for (i=1;i<length(A);i++) {print A[i]}}' FS=""|tr -d '\n';echo aabdhhipprty
I didn't know how to get back to this page after recovering login info, so am posting as a guest.

Resources