Getting repeated lines with awk in Bash

Getting repeated lines with awk in Bash - bash

I'm trying to know which are the lines that are repeated X times in a text file, and I'm using awk but I see that awk in my command, not work with lines that begin with the same characters or words. That is, does not recognize the full line individually.
Using this command I try to get the lines that are repeated 3 times:
awk '++A[$1]==3' ./textfile > ./log

This is what you need hopefully:
awk '{a[$0]++}END{for(i in a){if(a[i]==3)print i}}' File
Increment array a with the line($0) as index for each line. In the end, for each index ($0), check if the count(a[i] which is the original a[$0]) equals 3. If so, print the line (i which is the original $0 / line). Hope it's clear.

This returns lines repeated 3 times but adds a space at the beginning of each 3x-repeated line:
sort ./textfile | uniq -c | awk '$1 == 3 {$1 = ""; print}' > ./log

Related

Searching for a string between two characters

I need to find two numbers from lines which look like this
>Chr14:453901-458800
I have a large quantity of those lines mixed with lines that doesn't contain ":" so we can search for colon to find the line with numbers. Every line have different numbers.
I need to find both numbers after ":" which are separated by "-" then substract the first number from the second one and print result on the screen for each line
I'd like this to be done using awk
I managed to do something like this:
awk -e '$1 ~ /\:/ {print $0}' file.txt
but it's nowhere near the end result
For this example i showed above my result would be:
4899
Because it is the result of 458800 - 453901 = 4899
I can't figure it out on my own and would appreciate some help

With GNU awk. Separate the row into multiple columns using the : and - separators. In each row containing :, subtract the contents of column 2 from the contents of column 3 and print result.
awk -F '[:-]' '/:/{print $3-$2}' file
Output:
4899

Using awk
$ awk -F: '/:/ {split($2,a,"-"); print a[2] - a[1]}' input_file
4899

bash: capture lines with the same specific number of characters from the beginning

I want to capture lines which have the same beginning on their first nth characters and only output one of those lines no matter what comes after the first nth character.
If the line has less than nth chars, then send it to output as it is.
I tried grep to capture the first specific number of chars but it removes the rest!
cat myfile.txt | grep -o -P '^{0,41}'
or
cat myfile.txt | grep -o -P '.{0,0}http.{0,41}'
Here I have a file and I want to capture lines which are the same in their first 41 characters and only show one of them:
https://example.com/first/second/blahblah/?alsda=asldfaalafowiorie
https://example.com/first/second/blahblah/?oriwo=asldkjalkdjf2kasd
https://example.com/first/second/blahblah/some/more/dir
https://example.com/another/one
https://example.com/third/fourth/something/?cldl=aosijfoiret
https://example.com/third/fourth/something/?cldl=5145652
https://example.com/third/fourth/something/?hfdg=156569&wuew=8428
https://example.com/first/second/blahblah/
Desired output
https://example.com/first/second/blahblah/?alsda=asldfaalafowiorie
https://example.com/another/one
https://example.com/third/fourth/something/?cldl=aosijfoiret
Thanks.

awk '!seen[substr($0,1,41)]++' file

Just the usual sort&uniq pair.
sort file | uniq -w40
You might want to do something along sort -s -k1.1,1.40 file to have it stable sorted.
nly output one of those lines no matter what comes after the first nth character. If the line has less than nth chars, then send it to output as it is.
For anything else, there is almighty awk.
awk -v N=41 '
# Put lines longer then 41 in associative array, if not there already
length($0) >= N { i = substr($0,1,N); if (!(i in a)) a[i] = $0 }
# output lines shorter then 41
length($0) < N {print}
# output the array
END{ for (i in a) print a[i] } ' file

read a file to check for number and text and write it to different file in bash

I have a script which has an output as follows
AS1234
1.2.3.4/24
AS4534
2.3.4.5/24
I have been trying to write all the CIDR's to one file and ASN number to another. currently I have been using grep to write it, can this be achieved through an if/else loop?
thanks

You could do this within a single awk program. Have created 2 awk variables named asnOutputFile and cidrOuptutFile which have output file names respectively, you could change output file names as per your need here.
awk -v asnOutputFile="ASN_numbers.txt" -v cidrOuptutFile="cidr_values.txt" '
/^AS[0-9]+/{
print > (asnOutputFile)
next
}
/^[0-9]+(\.[0-9]+){3}\/[0-9]+/{
print > (cidrOuptutFile)
}
' Input_file
OR looking for specific digits values as per shown Input_file then try:
awk -v asnOutputFile="ASN_numbers.txt" -v cidrOuptutFile="cidr_values.txt" '
/^AS[0-9]{4}/{
print > (asnOutputFile)
next
}
/^([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}\/[[:digit:]]{2}/{
print > (cidrOuptutFile)
}
' Input_file

grep -Eo 'AS[[:digit:]]{4}' > file1
grep -Eo '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}\/[[:digit:]]{2}' > file2
Search for "AS" followed by a digit 4 times and output the found entries to file1
Search for 1 to 3 digits and then a full stop 3 times followed by a digit 1 to 3 times and then a forward slash and a digit 2 times. Output the found entries to file 2.

Delete range of lines when line number of known or not in unix using head and tail?

This is my sample file.
I want to do this.
I have fixed requirement to delete 2nd and 3rd line keeping the 1st line.
From the bottom, I want to delete above 2 lines excluding last line, as I wouldn't know what my last line number is as it depends on file.
Once I delete my 2nd and 3rd line 4th line should ideally come at 2nd and so on, same for a bottom after delete.
I want to use head/tail command and modify the existing file only. as Changes to write back to the same file.
Sample file text format.
Input File
> This is First Line
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> ..
> ..
> ..
> ..
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> This is Last Line, should not be deleted It could be come at any line
number (variable)
Output file (same file modified)
This is First Line
..
..
..
..
This is Last Line, should not be deleted It could be come at any line number (variable)
Edit - Because of compatibility issues on Unix (Using HP Unix on ksh shell) I want to implement this using head/tail/awk. not sed.

Adding solution as per OP's request to make it genuine solution.
Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.
What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.
cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)
awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh
Change Input_file name to your actual file name and let me know how it goes then.
Following awk may help you in same(Since I don't have Hp system so didn't test it).
awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1' Input_file
EDIT: Adding non-one liner form of solution too now.
awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file

wc + sed solution:
len=$(wc -l inpfile | cut -d' ' -f1)
sed "$(echo "$((len-2)),$((len-1))")d; 2,3d" inpfile > tmp_f && mv tmp_f inpfile
$ cat inputfile
> This is First Line
> ..
> ..
> ..
> ..
> This is Last Line, should not be deleted It could be come at any line

Perl suggestion... read whole file into array #L, get index of last line. Delete 2nd last, 3rd last, 3rd and 2nd line. Print what's left.
perl -e '#L=<>; $p=$#L; delete $L[$p-1]; delete $L[$p-2]; delete $L[2]; delete $L[1]; print #L' file.txt
Or, maybe a little more succinctly with splice:
perl -e '#L=<>; splice #L,1,2; splice #L,$#L-2,2; print #L' file.txt

If you wish to have some flexibility a ksh script approach may work, though little expensive in terms of resources :
#!/bin/ksh
[ -f "$1" ] || echo "Input is not a file" || exit 1
total=$(wc -l "$1" | cut -d' ' -f1 )
echo "How many lines to delete at the end?"
read no
[ -z "$no" ] && echo "Not sure how many lines to delete, aborting" && exit 1
sed "2,3d;$((total-no)),$((total-1))d" "$1" >tempfile && mv tempfile "$1"
And feed the file as argument to the script.
Notes
This deletes second and third lines.
Plus no number of lines from last excluding last as read from user.
Note: My ksh version is 93u+ 2012-08-01

awk '{printf "%d\t%s\n", NR, $0}' < file | sed '2,3d;N;$!P;D' file
The awk here serves the purpose of providing line numbers and then passing the output to the sed which uses the line numbers to do the required operations.
%d : Used to print the numbers. You can also use '%i'
'\t' : used to place a tab between the number and string
%s : to print the string of charaters
'\n' : To create a new line
NR : to print lines numbers starting from 1
For sed
N: Read/append the next line of input into the pattern space.
$! : is for not deleting the last line
D : This is used when pattern space contains no new lines normal and start a new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the specified lines, and restart cycle with the resultant pattern space, without reading a new line of input.
P : Print up to the first embedded newline of the current pattern space.This
prints the lines after removing the subjected lines.

I enjoyed this task and wrote awk script for more scaleable case (huge files).
Reading/scanning the input file once (no need to know line count), not storing the whole file in memory.
script.awk
BEGIN { range = 3} # define sliding window range
{lines[NR] = $0} # capture each line in array
NR == 1 {print} # print 1st line
NR > range * 2{ # for lines in sliding window range bottom
print lines[NR - range]; # print sliding window top line
delete lines[NR - range]; # delete sliding window top line
}
END {print} # print last line
running:
awk -f script.awk input.txt
input.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
output:
line 1
line 4
line 5
line 6
line 7
line 10

How to grep the last occurrence of a line pattern

I have a file with contents
x
a
x
b
x
c
I want to grep the last occurrence,
x
c
when I try
sed -n "/x/,/b/p" file
it lists all the lines, beginning x to c.

I'm not sure if I got your question right, so here are some shots in the dark:
Print last occurence of x (regex):
grep x file | tail -1
Alternatively:
tac file | grep -m1 x
Print file from first matching line to end:
awk '/x/{flag = 1}; flag' file
Print file from last matching line to end (prints all lines in case of no match):
tac file | awk '!flag; /x/{flag = 1};' | tac

grep -A 1 x file | tail -n 2
-A 1 tells grep to print one line after a match line
with tail you get the last two lines.
or in a reversed way:
tac fail | grep -B 1 x -m1 | tac
Note: You should make sure your pattern is "strong" enough so it gets you the right lines. i.e. by enclosing it with ^ at the start and $ at the end.

This might work for you (GNU sed):
sed 'H;/x/h;$!d;x' file
Saves the last x and what follows in the hold space and prints it out at end-of-file.

not sure how to do it using sed, but you can try awk
awk '{a=a"\n"$0; if ($0 == "x"){ a=$0}} END{print a}' file

POSIX vi (or ex or ed), in case it is useful to someone
Done in Command mode, of course
:set wrapscan
Go to the first line and just search Backwards!
1G?pattern
Slower way, without :set wrapscan
G$?pattern
Explanation:
G go to the last line
Move to the end of that line $
? search Backwards for pattern
The first backwards match will be the same as the last forward match
Either way, you may now delete all lines above current (match)
:1,.-1d
or
kd1G
You could also delete to the beginning of the matched line prior to the line deletions with d0 in case there were multiple matches on the same line.
POSIX awk, as suggested at
get last line from grep search on multiple files
awk '(FNR==1)&&s{print s; s=""}/PATTERN/{s=$0}END{if(s) print s}'

if you wanna do awk in truly hideous one-liner fashion but getting awk to resemble closer to functional programming paradigm syntax without having to keep track when the last occurrence is
mawk/mawk2/gawk 'BEGIN { FS = "=7713[0-9]+="; RS = "^$";
} END { print ar0[split($(0 * sub(/\n.+$/,"",$NF)), ar0, ORS)] }'
Here i'm employing multiple awk short-hands :
sub(/[\n.+$/, "", $NF) # trimming all extra rows after pattern
g/sub() returns # of substitutions made, so multiplying that by 0 forces the split() to be splitting $0, the full file, instead.
split() returns # of items in the array (which is another way of saying the position of last element), so even though I've already trimmed out the trailing \n, i still can directly print ar0[split()], knowing that ORS will fill in the missing trailing \n.
That's why this code looks like i'm trying to extract array items before the array itself is defined, but due to flow of logic needed, the array will become defined by the time it reaches print.
Now if you want something simpler, these 2 also work
mawk/gawk 'BEGIN { FS="=7713[0-9]+="; RS = "^$"
} END { $NF = substr($NF, 1, index($NF, ORS));
FS = ORS; $0 = $0; print $(NF-1) }'
or
mawk/gawk '/=7713[0-9]+=/ { lst = $0 } END { print lst }'
I didn't use the same x|c requirements as OP just to showcase these work regardless of whether you need fixed-strings or regex based matches.

The above solutions only work for one single file, to print the last occurrence for many files (say with suffix .txt), use the following bash script
#!/bin/bash
for fn in `ls *.txt`
do
result=`grep 'pattern' $fn | tail -n 1`
echo $result
done
where 'pattern' is what you would like to grep.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Getting repeated lines with awk in Bash - bash

This returns lines repeated 3 times but adds a space at the beginning of each 3x-repeated line: sort ./textfile | uniq -c | awk '$1 == 3 {$1 = ""; print}' > ./log

Related

Searching for a string between two characters

bash: capture lines with the same specific number of characters from the beginning

read a file to check for number and text and write it to different file in bash

Delete range of lines when line number of known or not in unix using head and tail?

How to grep the last occurrence of a line pattern

Categories

Resources