How to crop text after pattern in bash

How to crop text after pattern in bash - bash

If the text is
aaaa
bbbb
cccc
====
dddd
I want dddd as the result
If the text is
aaaa
====
bbbb
cccc
dddd
I want
bbbb
cccc
dddd
as the result.
I'm trying something like awk '{print $1}' | sed '/.*\n=*$/d' but it seems like sed can only delete a line.

You can try something like
n=$(grep -n "^=*$" $1 | awk -F: '{print $1}')
let n+=1
tail +$n $1

You can indicate a range of lines, e.g. from line 1 to the line containing the pattern:
sed '1,/====/d'

Related

Loop comma separated values in awk function

I want to extract the values from value2 variable, but this function is not printing the values.
value2=aaaa,bbbb,cccc,dddd
awk -F '|' -v value1=".$1" -v value2="$2" '
{
print "value1: " value1
print "value2: " value2
for ( i in value2//./ )
{
print "looping: " i
}
}
input value value2=aaaa,bbbb,cccc,dddd
expected output:
aaaa
bbbb
cccc
dddd
How would I print all values using awk?

echo 'aaaa,bbbb,cccc,dddd' |
mawk NF=NF FS=',' OFS='\n'
aaaa
bbbb
cccc
dddd

How to use awk to print a line before match and until next blank space after the match

For example, I have a variable $var
awk '/'$var'/ { }' file.txt
Once awk matches the variable in text file, I wanna to start printing one line before the match until next blank space.
Edit:
My File
AAAA
BBBB
SSSS
CCCC
DDDD
LLLL
PPPP
ASAD
BEKK
SSEE
AASS
if $var = SSSS, my ouput should look like:
BBBB
SSSS
CCCC
DDDD
LLLL
PPPP
ASAD
BEKK
Sorry I am new here If my explanation is not very clear

With your shown samples and attempts, please try following GNU grep solution. Written and tested in GNU grep:
grep -ozP '(?:[^\n]+\n)?AAAA(?:\n[^\n]+)*' Input_file
Few scenarios Checking above code with shown samples and with 3 different input strings.
1st scenario: Checking with input string SSSS:
grep -ozP '(?:[^\n]+\n)?SSSS(?:\n[^\n]+)*' Input_file
BBBB
SSSS
CCCC
DDDD
LLLL
PPPP
ASAD
BEKK
2nd scenario: Checking with string AAAA in code:
grep -ozP '(?:[^\n]+\n)?AAAA(?:\n[^\n]+)*' Input_file
AAAA
BBBB
SSSS
CCCC
DDDD
LLLL
PPPP
ASAD
BEKK
3rd scenario: Checking with input string BEKK:
grep -ozP '(?:[^\n]+\n)?BEKK(?:\n[^\n]+)*' Input_file
ASAD
BEKK

awk -v tgt="$var" '!f && ($0==tgt){print prev; f=1} f{if (NF) print; else exit} {prev=$0}' file
The above assumes you just want the first such range printed. If that's wrong then change to:
awk -v tgt="$var" '!f && ($0==tgt){print prev; f=1} f{if (NF) print; else f=0} {prev=$0}' file
Both scripts assume you want to do a full-line string match.

You may use this awk solution using match function and empty RS:
awk -v var='SSSS' -v RS= 'match($0, "(^|[^\n]+\n[^\n]*)" var) {print substr($0, RSTART)}' file
BBBB
SSSS
CCCC
DDDD
LLLL
PPPP
ASAD
BEKK
# more testing
awk -v var='BEKK' -v RS= 'match($0, "(^|[^\n]+\n[^\n]*)" var) {print substr($0, RSTART)}' file
ASAD
BEKK
awk -v var='AAAA' -v RS= 'match($0, "(^|[^\n]+\n[^\n]*)" var) {print substr($0, RSTART)}' file
AAAA
BBBB
SSSS
CCCC
DDDD
LLLL
PPPP
ASAD
BEKK

How to keep the last occurrence of duplicate lines in a text file?

I have a text file with contents that may be duplicates. Below is a simplified representation of my txt file. text means a unique character or word or phrase). Note that the separator ---------- may not be present. Also, the whole content of the file consists of unicode Japanese and Chinese characters.
EDITED
sometext1
sometext2
sometext3
aaaa
sometext4
aaaa
aaaa
bbbb
bbbb
cccc
dddd
eeee
ffff
gggg
----------
sometext5
eeee
ffff
gggg
sometext6
sometext7:cccc
sometext8:dddd
sometext9
sometext10
What I want to achieve is to keep only the line with the last occurrence of the duplicates like so:
sometext1
sometext2
sometext3
sometext4
aaaa
bbbb
sometext5
eeee
ffff
gggg
sometext6
sometext7:cccc
sometext8:dddd
sometext9
sometext10
The closest I found online is How to remove only the first occurrence of a line in a file using sed but this requires that you know which matching pattern(s) to delete. The suggested topics provided when writing the title gives Duplicating characters using sed and last occurence of date but they didn't work.
I am on a Mac with Sierra. I am writing my executable commands in a script.sh file to execute commands line by line. I'm using sed and gsed as my primary stream editors.

I am not sure if your intent is to preserve the original order of the lines. If that is the case, you could do this:
export LC_ALL=en_US.utf8 # to handle unicode characters in file
nl -n rz -ba file | sort -k2,2 -t$'\t' | uniq -f1 | sort -k1,1 | cut -f2
nl -n rz -ba file adds zero padded line numbers to the file
sort -k2,2 -t'$\t' sorts the output of nl by the second field (note that nl puts a tab after the line number)
uniq -f1 removes the duplicates, while ignoring the line number field (-f1)
the final sort restores the original order of the lines, with duplicates removed
cut -f2 removes the line number field, restoring the content to the original format

This awk is very close.
Given:
$ cat file
sometext1
sometext2
sometext3
aaaa
sometext4
aaaa
aaaa
bbbb
bbbb
cccc
dddd
eeee
ffff
gggg
----------
sometext5
eeee
ffff
gggg
sometext6
sometext7:cccc
sometext8:dddd
sometext9
sometext10
You can do:
$ awk 'BEGIN{FS=":"}
FNR==NR {for (i=1; i<=NF; i++) {dup[$i]++; last[$i]=NR;} next}
/^$/ {next}
{for (i=1; i<=NF; i++)
if (dup[$i] && FNR==last[$i]) {print $0; next}}
' file file
sometext1
sometext2
sometext3
sometext4
aaaa
bbbb
----------
sometext5
eeee
ffff
gggg
sometext6
sometext7:cccc
sometext8:dddd
sometext9
sometext10

This might work for you (GNU sed):
sed -r '1h;1!H;x;s/([^\n]+)\n(.*\1)$/\2/;s/\n-+$//;x;$!d;x' file
Store the first line in the hold space (HS) and append every subsequent line. Swap to the HS and remove any duplicate line that matches the last line. Also delete any separator lines and then swap back to the pattern space (PS). Delete all but the last line, which is swapped with the HS and printed out.

I found a simpler solution but it sorts file in the process. So if u don't mind output in sort format then u can use the following:
$sort -u input.txt > output.txt
Note: the u flag sort the lines of the file listing unique lines.

Like in the uniq manual:
cat input.txt | uniq -d

Grep for a word and then grep the next occurring word

I would like to grep a word. Then, I would like to print the next matching line.
Ex: Input file
AAAA
BBBB
CCCC
BBBB
CCCC
EEEE
AAAA
WWWW
CCCC
Output
AAAA
CCCC
AAAA
CCCC
I would want to search for AAAA first every time and then print the first line with CCCC after it. Please help by using grep if possible.

Answer for Revised Question
Using this as the sample file:
$ cat file2
AAAA
BBBB
CCCC
BBBB
CCCC
EEEE
AAAA
WWWW
CCCC
To print every line containing AAAA followed by the first line after it that contains CCCC, use:
$ sed -n '/AAAA/,/CCCC/ {/AAAA/p;/CCCC/p}' file2
AAAA
CCCC
AAAA
CCCC
Answer for Original Question
"I would want to search for AAAA first and then print the first line with CCCC after it."
Using this as the sample file:
$ cat file
AAAA
BBBB
CCCC
BBBB
CCCC
EEEE
To match the first line containing CCCC that occurs after the first line containing AAAA:
$ sed -n '/AAAA/,$ {/CCCC/{p;q}}' file
CCCC
How it works:
-n
This tells sed not to print anything unless we explicitly ask it to.
/AAAA/,$ {/CCCC/{p;q}}
/AAAA/,$ is a range. It specifies that the commands that follow it in braces are executed only if we between a line that matches AAAA and the last line in the file, denoted $.
/CCCC/ is another condition. It tells sed to execute the commands which follow only if we are on a line that matches CCCC.
{p;q} is a group of two commands. p tells sed to print the current line. q tells sed to quit (so no further lines will be read or matched).

awk compare two files -erase row from second file from condtion of first file

I need some help.
first file
0.5
0.4
0.1
0.6
0.9
second file .bam
(I have to use samtools view)
aaaa bbbb cccc
aaab bbaa ccaa
hoho jojo toto
sese rere baba
jouj douj trou
And I need output:
aaaa bbbb cccc
aaab bbaa ccaa
sese rere baba
Condition: if $1 from first file is in <0.3;0.6> print same row from the second file, if it is not, erase it. I want to get filtrate second file from condition of first file. I prefer awk or bash code, but It is not important.
condition for the first file:
awk '{if($1>0.3 && $1<0.6) {print $0}}'
Please could you help me?
Thanks a lot

Another way
paste file1 file2 | awk '$1<=0.6&&$1>=0.3{$1="";print substr($0,2) }'

Here is one awk solution:
awk 'FNR==NR {a[NR]=$1;next} a[FNR]>0.3 && a[FNR]<0.6' firstfile secondfile
aaaa bbbb cccc
aaab bbaa ccaa
sese rere baba is not printed since you say <0.6 and not <=0.6

You can use awk and its getline function. It reads lines from second file, and for each one use getline to read one from first one, compare its number and print if it matches:
awk '
BEGIN { f = ARGV[2]; --ARGC }
{
getline n <f
if ( (n >= 0.3) && (n <= 0.6) ) {
print $0
}
}
' file2 file1
It yields:
aaaa bbbb cccc
aaab bbaa ccaa
sese rere baba

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to crop text after pattern in bash - bash

If the text is aaaa bbbb cccc ==== dddd I want dddd as the result If the text is aaaa ==== bbbb cccc dddd I want bbbb cccc dddd as the result. I'm trying something like awk '{print $1}' | sed '/.\n=$/d' but it seems like sed can only delete a line.

You can try something like n=$(grep -n "^=*$" $1 | awk -F: '{print $1}') let n+=1 tail +$n $1

You can indicate a range of lines, e.g. from line 1 to the line containing the pattern: sed '1,/====/d'

Related

Loop comma separated values in awk function

How to use awk to print a line before match and until next blank space after the match

How to keep the last occurrence of duplicate lines in a text file?

Grep for a word and then grep the next occurring word

awk compare two files -erase row from second file from condtion of first file

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to crop text after pattern in bash - bash

If the text is aaaa bbbb cccc ==== dddd I want dddd as the result If the text is aaaa ==== bbbb cccc dddd I want bbbb cccc dddd as the result. I'm trying something like awk '{print $1}' | sed '/.*\n=*$/d' but it seems like sed can only delete a line.

You can try something like n=$(grep -n "^=*$" $1 | awk -F: '{print $1}') let n+=1 tail +$n $1

You can indicate a range of lines, e.g. from line 1 to the line containing the pattern: sed '1,/====/d'

Related

Loop comma separated values in awk function

How to use awk to print a line before match and until next blank space after the match

How to keep the last occurrence of duplicate lines in a text file?

Grep for a word and then grep the next occurring word

awk compare two files -erase row from second file from condtion of first file

Categories

Resources

If the text is aaaa bbbb cccc ==== dddd I want dddd as the result If the text is aaaa ==== bbbb cccc dddd I want bbbb cccc dddd as the result. I'm trying something like awk '{print $1}' | sed '/.\n=$/d' but it seems like sed can only delete a line.