In many lines of the same values that I want to be counted [closed] - shell

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
main faile is:
785
785
788
788
883
883
883
921
921
921
921
921
921
925
925
I want to count the same values and write the results in a new file (as follows):
785 2
788 2
883 3
921 6
925 2
Thank you for your helps.

sort myFile.txt | uniq -c | awk '{ print $2 " " $1}' > myNewFile.txt
Edit: added sort and removed cat to take comments into account
And if you want only values which appear at least 4 times:
sort temp.txt | uniq -c | sort -n | egrep -v "^ *[0-3] " | awk '{ print $2 " " $1}'

Imagine your file is called t
You can do with:
cat t | sort -u | while read line #read each one element in sorted and uniqye
do
echo -n $line; # print element
cat t | grep ${line} | wc -l # read file, get only the specified and count
done

kent$ awk '{a[$0]++}END{for(x in a)print x, a[x]}' f
921 6
925 2
883 3
785 2
788 2
print only count >=4:
kent$ awk '{a[$0]++}END{for(x in a)if(a[x]>=4)print x, a[x]}' f
921 6

Related

AWK or SED Replace space between alphabets in a particular column [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have an infile as below:
infile:
INM00042170 28.2500 74.9167 290.0 CHURU 2015 2019 2273
INM00042182 28.5833 77.2000 211.0 NEW DELHI/SAFDARJUNG 1930 2019 67874
INXUAE05462 28.6300 77.2000 216.0 NEW DELHI 1938 1942 2068
INXUAE05822 25.7700 87.5200 40.0 PURNEA 1933 1933 179
INXUAE05832 31.0800 77.1800 2130.0 SHIMLA 1926 1928 728
PKM00041640 31.5500 74.3333 214.0 LAHORE CITY 1960 2019 22915
I want to replace the space between two words by an underscore in column 5 (example: NEW DELHI becomes NEW_DELHI). I want output as below.
outfile:
INM00042170 28.2500 74.9167 290.0 CHURU 2015 2019 2273
INM00042182 28.5833 77.2000 211.0 NEW_DELHI/SAFDARJUNG 1930 2019 67874
INXUAE05462 28.6300 77.2000 216.0 NEW_DELHI 1938 1942 2068
INXUAE05822 25.7700 87.5200 40.0 PURNEA 1933 1933 179
INXUAE05832 31.0800 77.1800 2130.0 SHIMLA 1926 1928 728
PKM00041640 31.5500 74.3333 214.0 LAHORE_CITY 1960 2019 22915
Thank you
#!/bin/bash
# connect field 5 and 6 and remove those with numbers.
# this returns a list of new names (with underscore) for
# all cities that need to be replaced
declare -a NEW_NAMES=$(cat infile | awk '{print $5 "_" $6}' | grep -vE "_[0-9]")
# iterating all new names
for NEW_NAME in ${NEW_NAMES[#]}; do
OLD_NAME=$(echo $NEW_NAME | tr '_' ' ')
# replace in file
sed -i "s/${OLD_NAME}/${NEW_NAME}/g" infile
done

awk length is counting +1

I'm trying, as an exercise, to output how many words exist in the dictionary for each possible length.
Here is my code:
$ awk '{print length}' dico.txt | sort -nr | uniq -c
Here is the output:
...
1799 5
427 4
81 3
1 2
My problem is that awk length count one more letter for each word in my file. The right output should have been:
1799 4
427 3
81 2
1 1
I checked my file and it does not contain any space after the word:
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
...
So I guess awk is counting the newline as a character, despite the fact it is not supposed to.
Is there any solution? Or something I'm doing wrong?
I'm gonna venture a guess. Isn't your awk expecting "U*X" style newlines (LF), but your dico.txt has Windows style (CR+LF). That easily give you the +1 on all lengths.
I took your four words:
$ cat dico.txt
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
And ran your line:
$ awk '{print length}' dico.txt | sort -nr | uniq -c
1 11
1 10
1 8
1 7
So far so good. Now the same, but dico.txt with windows newlines:
$ cat dico.txt | todos > dico_win.txt
$ awk '{print length}' dico_win.txt | sort -nr | uniq -c
1 12
1 11
1 9
1 8

Use SED in order to filter a file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I would like to use SED in order to filter a file and only get the id which is constituted of 3 numbers and the Domain (e.g.: google.com).
Original File:
451 [04/Jan/1997:03:35:55 +0100] http://www.netvibes.com
448 [04/Jan/1997:03:36:30 +0100] www.google.com:443
450 [04/Jan/1997:03:36:48 +0100] http://84.55.151.142:8080
452 [04/Jan/1997:03:36:51 +0100] http://127.0.0.1:9010
451 [04/Jan/1997:03:36:55 +0100] http://www.netvibes.com
453 [04/Jan/1997:03:37:10 +0100] api.del.icio.us:443
453 [04/Jan/1997:03:37:33 +0100] api.del.icio.us:443
448 [04/Jan/1997:03:37:34 +0100] www.google.com:443
Used SED commands : sed -e 's/\[[^]]*\]//g' -e 's/http:\/\///g' -e 's/www.//g' -e 's/^.com//g' -e 's/:[0-9]*//g'
Current Output:
451 netvibes.com
448 google.com
450 84.55.151.142
452 127.0.0.1
451 netvibes.com
453 api.del.icio.us
453 api.del.icio.us
448 google.com
Wished Output:
451 netvibes.com
448 google.com
451 netvibes.com
448 google.com
using grep
sed ... | grep -F '.com'
or
sed ... | grep '\.com$'
or with sed -n, using p to print match
sed -ne 's/\[[^]]*\]//g;s/http:\/\///g;s/www.//g;s/:[0-9]*//g;/.com$/p'
Expected you've lost api.del.icio.us in your wish output so:
cat testfile | awk '{print $1" "$NF}' | sed -r 's/http\:\/\/*//g;s/www\.//g' | awk -F: '{print $1}' | sed -r 's/([0-9]{1,3}) [0-9].*/\1 /g' | sed -r 's/[0-9]{3} $//g' | grep -v '^$' | uniq
If you needs only *.com domains, get it:
cat testfile | awk '{print $1" "$NF}' | sed -r 's/http://*//g;s/www.//g' | awk -F: '{print $1}' | sed -r 's/([0-9]{1,3}) [0-9].*/\1 /g' | sed -r 's/[0-9]{3} $//g' | grep -v '^$' | grep com | uniq
Here's one in awk:
$ awk 'match($NF,/[^\.]+\.[a-z]+($|:)/) {
print $1,substr($NF,RSTART,RLENGTH-($NF~/:[0-9]+/?1:0))
}' file
451 netvibes.com
448 google.com
451 netvibes.com
453 icio.us
453 icio.us
448 google.com
If you want just the .coms, replace [a-z]+ in the match regex with com.

Count duplicated couple of lines

I have a configuration file with this format:
cod 11
loc1 23
pto1 33
loc2 55
pto2 66
cod 12
loc1 55
pto1 66
loc2 88
pto2 77
...
I want to count how many times a pair of numbers appear in sequence loc/pto (indipendently of loc/pto number). In the example, the couple 55/66 appears 2 times (once as loc1/pto1 and one as loc2/pto2).
I have googled around and tried some combination of grep, uniq and awk but I only managed in count single line or number duplicated. I read the man documentation of those commands not finding any clue relative to my problem.
You could use the following:
$ sort file | uniq -f1 -dc
2 loc1 55
2 pto1 66
-f1 is skipping the 1st field when comparing lines
-dc is printing duplicate line with its associated count
Despite no visible effort on the part of the OP, this was an interesting question to work out.
awk '{for (i=1 ; i < 10 ; i++) if (NR == i) array[i]=$2} END {for (i=1 ; i < 10 ; i++) print array[i] "," array[i+1]}' file | sort | uniq -c
Output-
1 11,23
1 12,55
1 23,33
1 33,55
2 55,66
1 66,12
1 66,88
1 88,
The output tells you that 55 is followed by 66 twice. Other pairs only occur once.
Explanation-
I define an array in awk whoe elements are the ith number in the second column. The part after the END concatenates the ith and i+1th element. Then there is a sort | uniq -c to see if these pairs occur more than once.
If you want to know how many times a duplicate number appeared in the file:
awk '{print $2}' <filename> | sort | uniq -dc
Output:
2 55
2 66
If you want to know how many times a number appeared in the file regardless of being duplicate or not:
awk '{print $2}' <filename> | sort | uniq -c
Output:
1 11
1 12
1 23
1 33
2 55
2 66
1 77
1 88
If you want to print the full line on duplicate match based on second column:
awk '{print $2}' <filename> | sort | uniq -d | grep -F -f - <filename>
Output:
loc2 55
pto2 66
loc1 55
pto1 66

bash uniq, how to show count number at back

Normally when I do cat number.txt | sort -n | uniq -c , I get numbers like this:
3 43
4 66
2 96
1 97
But what I need is the number shows of occurrences at the back, like this:
43 3
66 4
96 2
97 1
Please give advice on how to change this. Thanks.
Use awk to change the order of columns:
cat number.txt | sort -n | uniq -c | awk '{ print $2, $1 }'
Perl version:
perl -lne '$occ{0+$_}++; END {print "$_ $occ{$_}" for sort {$a <=> $b} keys %occ}' < numbers.txt
Through GNU sed,
cat number.txt | sort -n | uniq -c | sed -r 's/^([0-9]+) ([0-9]+)$/\2 \1/g'

Resources