How i should use sed for delete specific strings and allow duplicate with more characters?

How i should use sed for delete specific strings and allow duplicate with more characters? - bash

i had generate a list of file, and this had 17417 lines like :
./usr
./usr/share
./usr/share/mime-info
./usr/share/mime-info/libreoffice7.0.mime
./usr/share/mime-info/libreoffice7.0.keys
./usr/share/appdata
./usr/share/appdata/libreoffice7.0-writer.appdata.xml
./usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml
./usr/share/appdata/libreoffice7.0-draw.appdata.xml
./usr/share/appdata/libreoffice7.0-impress.appdata.xml
./usr/share/appdata/libreoffice7.0-base.appdata.xml
./usr/share/appdata/libreoffice7.0-calc.appdata.xml
./usr/share/applications
./usr/share/applications/libreoffice7.0-xsltfilter.desktop
./usr/share/applications/libreoffice7.0-writer.desktop
./usr/share/applications/libreoffice7.0-base.desktop
./usr/share/applications/libreoffice7.0-math.desktop
./usr/share/applications/libreoffice7.0-startcenter.desktop
./usr/share/applications/libreoffice7.0-calc.desktop
./usr/share/applications/libreoffice7.0-draw.desktop
./usr/share/applications/libreoffice7.0-impress.desktop
./usr/share/icons
./usr/share/icons/gnome
./usr/share/icons/gnome/16x16
./usr/share/icons/gnome/16x16/mimetypes
./usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png
The thing is i want to delete the lines like :
./usr
./usr/share
./usr/share/mime-info
./usr/share/appdata
./usr/share/applications
./usr/share/icons
./usr/share/icons/gnome
./usr/share/icons/gnome/16x16
./usr/share/icons/gnome/16x16/mimetypes
and the "." at the start, for the result must be like :
/usr/share/mime-info/libreoffice7.0.mime
/usr/share/mime-info/libreoffice7.0.keys
/usr/share/appdata/libreoffice7.0-writer.appdata.xml
/usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml
/usr/share/appdata/libreoffice7.0-draw.appdata.xml
/usr/share/appdata/libreoffice7.0-impress.appdata.xml
/usr/share/appdata/libreoffice7.0-base.appdata.xml
/usr/share/appdata/libreoffice7.0-calc.appdata.xml
/usr/share/applications/libreoffice7.0-xsltfilter.desktop
/usr/share/applications/libreoffice7.0-writer.desktop
/usr/share/applications/libreoffice7.0-base.desktop
/usr/share/applications/libreoffice7.0-math.desktop
/usr/share/applications/libreoffice7.0-startcenter.desktop
/usr/share/applications/libreoffice7.0-calc.desktop
/usr/share/applications/libreoffice7.0-draw.desktop
/usr/share/applications/libreoffice7.0-impress.desktop
/usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png
This is possible using sed ? or is more practical using another tool

With your list in the filename list, you could do:
sed -n 's/^[.]//;/\/.*[._].*$/p' list
Where:
sed -n suppresses printing of pattern-space; then
s/^[.]// is the substitution form that simply removes the first character '.' from each line; then
/\/.*[._].*$/p matches line that contain a '.' or '_' (optional) after the last '/' with p causing that line to be printed.
Example Use/Output
$ sed -n 's/^[.]//;/\/.*[._].*$/p' list
/usr/share/mime-info/libreoffice7.0.mime
/usr/share/mime-info/libreoffice7.0.keys
/usr/share/appdata/libreoffice7.0-writer.appdata.xml
/usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml
/usr/share/appdata/libreoffice7.0-draw.appdata.xml
/usr/share/appdata/libreoffice7.0-impress.appdata.xml
/usr/share/appdata/libreoffice7.0-base.appdata.xml
/usr/share/appdata/libreoffice7.0-calc.appdata.xml
/usr/share/applications/libreoffice7.0-xsltfilter.desktop
/usr/share/applications/libreoffice7.0-writer.desktop
/usr/share/applications/libreoffice7.0-base.desktop
/usr/share/applications/libreoffice7.0-math.desktop
/usr/share/applications/libreoffice7.0-startcenter.desktop
/usr/share/applications/libreoffice7.0-calc.desktop
/usr/share/applications/libreoffice7.0-draw.desktop
/usr/share/applications/libreoffice7.0-impress.desktop
/usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png
Note, without GNU sed that allows chaining of expressions with ';' you would need:
sed -n -e 's/^[.]//' -e '/\/.*[._].*$/p' list

Assuming you want to delete the line(s) which is included other
pathname(s), would you please try:
sort -r list.txt | awk ' # sort the list in the reverse order
{
sub("^\\.", "") # remove leading dot
s = prev; sub("/[^/]+$", "", s) # remove the rightmost slash and following characters
if (s != $0) print # if s != $0, it means $0 is not a substring of the previous line
prev = $0 # keep $0 for the next line
}'
Result:
/usr/share/mime-info/libreoffice7.0.mime
/usr/share/mime-info/libreoffice7.0.keys
/usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png
/usr/share/applications/libreoffice7.0-xsltfilter.desktop
/usr/share/applications/libreoffice7.0-writer.desktop
/usr/share/applications/libreoffice7.0-startcenter.desktop
/usr/share/applications/libreoffice7.0-math.desktop
/usr/share/applications/libreoffice7.0-impress.desktop
/usr/share/applications/libreoffice7.0-draw.desktop
/usr/share/applications/libreoffice7.0-calc.desktop
/usr/share/applications/libreoffice7.0-base.desktop
/usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml
/usr/share/appdata/libreoffice7.0-writer.appdata.xml
/usr/share/appdata/libreoffice7.0-impress.appdata.xml
/usr/share/appdata/libreoffice7.0-draw.appdata.xml
/usr/share/appdata/libreoffice7.0-calc.appdata.xml
/usr/share/appdata/libreoffice7.0-base.appdata.xml

Related

sed removing # and ; comments from files up to certain keyword

I have files that need to be removed from comments and white space until keyword . Line number varies . Is it possible to limit multiple continued sed substitutions based on Keyword ?
This removes all comments and white spaces from file :
sed -i -e 's/#.*$//' -e 's/;.*$//' -e '/^$/d' file
For example something like this :
# string1
# string2
some string
; string3
; string4
####
<Keyword_Keep_this_line_and_comments_white_space_after_this>
# More comments that need to be here
; etc.

sed -i '1,/keyword/{/^[#;]/d;/^$/d;}' file

I would suggest using awk and setting a flag when you reach your keyword:
awk '/Keyword/ { stop = 1 } stop || !/^[[:blank:]]*([;#]|$)/' file
Set stop to true when the line contains Keyword. Do the default action (print the line) when stop is true or when the line doesn't match the regex. The regex matches lines whose first non-blank character is a semicolon or hash, or blank lines. It's slightly different to your condition but I think it does what you want.
The command prints to standard output so you should redirect to a new file and then overwrite the original to achieve an "in-place edit":
awk '...' input > tmp && mv tmp input

Use grep -n keyword to get the line number that contains the keyword.
Use sed -i -e '1,N s/#..., when N is the line number that contains the keyword, to only remove comments on the lines 1 to N.

Looking for a regex pattern, passing that pattern to a script, and replacing the pattern with the output of the script

For every time the pattern shows up (In this example the case of a 2 digit number) I want to pass that pattern to a script and replace that pattern with the output of a script.
I'm using sed an example of what it should look like would be
echo 'siedi87sik65owk55dkd' | sed 's/[0-9][0-9]/.\/script.sh/g'
Right now this returns
siedi./script.shsik./script.showk./script.shdkd
But I would like it to return
siedi!!!87!!!sik!!!65!!!owk!!!55!!!dkd
This is what is in ./script.sh
#!/bin/bash
echo "!!!$1!!!"
It has to be replaced with the output. In this example I know I could just use a normal sed substitution but I don't want that as an answer.

sed is for simple substitutions on individual lines, that is all. Anything else, even if it can be done, requires arcane language constructs that became obsolete in the mid-1970s when awk was invented and are used today purely for the mental exercise. Your problem is not a simple substitution so you shouldn't try to use sed to solve it.
You're going to want something like:
awk '{
head = ""
tail = $0
while ( match(tail,/[0-9]{2}/) ) {
tgt = substr(tail,RSTART,RLENGTH)
cmd = "./script.sh " tgt
if ( (cmd | getline line) > 0) {
tgt = line
}
close(cmd)
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
e.g. using an echo in place of your script.sh command:
$ echo 'siedi87sik65owk55dkd' |
awk '{
head = ""
tail = $0
while ( match(tail,/[0-9]{2}/) ) {
tgt = substr(tail,RSTART,RLENGTH)
cmd = "echo !!!" tgt "!!!"
if ( (cmd | getline line) > 0) {
tgt = line
}
close(cmd)
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
siedi!!!87!!!sik!!!65!!!owk!!!55!!!dkd

Ed's awk solution is obviously the way to go here.
For fun, I tried to come up with a sed solution, and here is (a convoluted GNU sed) one that takes the pattern and the script to be run as parameters; the input is either read from standard input (i.e., you can pipe to it) or from a file supplied as the third argument.
For your example, we'd have infile with contents
siedi87sik65owk55dkd
siedi11sik22owk33dkd
(two lines to demonstrate how this works for multiple lines), then script with contents
#!/bin/bash
echo "!!!${1}!!!"
and finally the solution script itself, so. Usage is
./so pattern script [input]
where pattern is an extended regular expression as understood by GNU sed (with the -r option), script is the name of the command you want to run for each match, and the optional input is the name of the input file if input is not standard input.
For your example, this would be
./so '[[:digit:]]{2}' script infile
or, as a filter,
cat infile | ./so '[[:digit:]]{2}' script
with output
siedi!!!87!!!sik!!!65!!!owk!!!55!!!dkd
siedi!!!11!!!sik!!!22!!!owk!!!33!!!dkd
This is what so looks like:
#!/bin/bash
pat=$1 # The pattern to match
script=$2 # The command to run for each pattern
infile=${3:-/dev/stdin} # Read from standard input if not supplied
# Use sed and have $pattern and $script expand to the supplied parameters
sed -r "
:build_loop # Label to loop back to
h # Copy pattern space to hold space
s/.*($pat).*/.\/\"$script\" \1/ # (1) Extract last match and prepare command
# Replace pattern space with output of command
e
G # (2) Append hold space to pattern space
s/(.*)$pat(.*)/\1~~~\2/ # (3) Replace last match of pattern with ~~~
/\n[^\n]*$pat[^\n]*$/b build_loop # Loop if string contains match
:fill_loop # Label for second loop
s/(.*\n)(.*)\n([^\n]*)~~~([^\n]*)$/\1\3\2\4/ # (4) Replace last ~~~
t fill_loop # Loop if there was a replacement
s/(.*)\n(.*)~~~(.*)$/\2\1\3/ # (5) Final ~~~ replacement
" < "$infile"
The sed command works with two loops. The first one copies the pattern space to the hold space, then removes everything but the last match from the pattern space and prepares the command to be run. After the substitution with (1) in its comment, the pattern space looks like this:
./script 55
The e command (a GNU extension) then replaces the pattern space with the output of this command. After this, G appends the hold space to the pattern space (2). The pattern space now looks like this:
!!!55!!!
siedi87sik65owk55dkd
The substitution at (3) replaces the last match with a string hopefully not equal to the pattern and we get
!!!55!!!
siedi87sik65owk~~~dkd
The loop repeats if the last line of the pattern space still has a match for the pattern. After three loops, the pattern space looks like this:
!!!87!!!
!!!65!!!
!!!55!!!
siedi~~~sik~~~owk~~~dkd
The second loop now replaces the last ~~~ with the second to last line of the pattern space with substitution (4). The command uses lots of "not a newline" ([^\n]) to make sure we're not pulling the wrong replacement for ~~~.
Because of the way command (4) is written, the loop ends with one last substitution to go, so before command (5), we have this pattern space:
!!!87!!!
siedi~~~sik!!!65!!!owk!!!55!!!dkd
Command (5) is a simpler version of command (4), and after it, the output is as desired.
This seems to be fairly robust and can deal with spaces in the name of the script to be run as long as it's properly quoted when calling:
./so '[[:digit:]]{2}' 'my script' infile
This would fail if
The input file contains ~~~ (solvable by replacing all occurrences at the start, putting them back at the end)
The output of script contains ~~~
The pattern contains ~~~
i.e., the solution very much depends on ~~~ being unique.
Because nobody asked: so as a one-liner.
#!/bin/bash
sed -re ":b;h;s/.*($1).*/.\/\"$2\" \1/;e" -e "G;s/(.*)$1(.*)/\1~~~\2/;/\n[^\n]*$1[^\n]*$/bb;:f;s/(.*\n)(.*)\n([^\n]*)~~~([^\n]*)$/\1\3\2\4/;tf;s/(.*)\n(.*)~~~(.*)$/\2\1\3/" < "${3:-/dev/stdin}"
Still works!

A conceptually simpler multi-utility solution:
Using GNU utilities:
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./script.sh &)|g' |
xargs -d'\n' -I% sh -c 'echo '\"%\"
Using BSD utilities (also works with GNU utilities):
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./script.sh &)|g' | tr '\n' '\0' |
xargs -0 -I% sh -c 'echo '\"%\"
The idea is to use sed to translate the tokens of interest lexically into a string containing shell command substitutions that invoke the target script with the token, and then pass the result to the shell for evaluation.
Note:
Any embedded " and $ characters in the input must be \-escaped.
xargs -d'\n' (GNU) and tr '\n' '\0' / xargs -0 (BSD) are only needed to correctly preserve whitespace in the input - if that is not needed, the following POSIX-compliant solution will do:
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./script.sh &)|g' | tr '\n' '\0' |
xargs -I% sh -c 'printf "%s\n" '\"%\"

How can I retrieve the matching records from mentioned file format in bash

XYZNA0000778800Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
I have above file format from which I want to find a matching record. For example, match a number(7789) on line starting with XYZ and once matched look for a matching number (7345) in lines below starting with 1 until it reaches to line starting with 9. retrieve the entire line record. How can I accomplish this using shell script, awk, sed or any combination.
Expected Output:
XYZNA0000778900Z
17345000012300324000000004000000000000000

With sed one can do:
$ sed -n '/^XYZ.*7789/,/^9$/{/^1.*7345/p}' file
17345000012300324000000004000000000000000
Breakdown:
sed -n ' ' # -n disabled automatic printing
/^XYZ.*7789/, # Match line starting with XYZ, and
# containing 7789
/^1.*7345/p # Print line starting with 1 and
# containing 7345, which is coming
# after the previous match
/^9$/ { } # Match line that is 9
range { stuff } will execute stuff when it's inside range, in this case the range is starting at /^XYZ.*7789/ and ending with /^9$/.
.* will match anything but newlines zero or more times.

If you want to print the whole block matching the conditions, one can use:
$ sed -n '/^XYZ.*7789/{:s;N;/\n9$/!bs;/\n1.*7345/p}' file
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
This works by reading lines between ^XYZ.*7779 and ^9$ into the pattern
space. And then printing the whole thing if ^1.*7345 can be matches:
sed -n ' ' # -n disables printing
/^XYZ.*7789/{ } # Match line starting
# with XYZ that also contains 7789
:s; # Define label s
N; # Append next line to pattern space
/\n9$/!bs; # Goto s unless \n9$ matches
/\n1.*7345/p # Print whole pattern space
# if \n1.*7345 matches

I'd use awk:
awk -v rid=7789 -v fid=7345 -v RS='\n9\n' -F '\n' 'index($1, rid) { for(i = 2; i < $NF; ++i) { if(index($i, fid)) { print $i; next } } }' filename
This works as follows:
-v RS='\n9\n' is the meat of the whole thing. Awk separates its input into records (by default lines). This sets the record separator to \n9\n, which means that records are separated by lines with a single 9 on them. These records are further separated into fields, and
-F '\n' tells awk that fields in a record are separated by newlines, so that each line in a record becomes a field.
-v rid=7789 -v fid=7345 sets two awk variables rid and fid (meant by me as record identifier and field identifier, respectively. The names are arbitrary.) to your search strings. You could encode these in the awk script directly, but this way makes it easier and safer to replace the values with those of a shell variables (which I expect you'll want to do).
Then the code:
index($1, rid) { # In records whose first field contains rid
for(i = 2; i < $NF; ++i) { # Walk through the fields from the second
if(index($i, fid)) { # When you find one that contains fid
print $i # Print it,
next # and continue with the next record.
} # Remove the "next" line if you want all matching
} # fields.
}
Note that multi-character record separators are not strictly required by POSIX awk, and I'm not certain if BSD awk accepts it. Both GNU awk and mawk do, though.
EDIT: Misread question the first time around.

an extendable awk script can be
$ awk '/^9$/{s=0} s&&/7345/; /^XYZ/&&/7789/{s=1} ' file
set flag s when line starts with XYZ and contains 7789; reset when line is just 9, and print when flag is set and contains pattern 7345.

This might work for you (GNU sed):
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789/!b;/7345/p' file
Use the option -n for the grep-like nature of sed. Gather up records beginning with XYZ and ending in 9. Reject any records which do not have 7789 in the header. Print any remaining records that contain 7345.
If the 7345 will always follow the header,this could be shortened to:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789.*7345/p' file
If all records are well-formed (begin XYZ and end in 9) then use:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^[^\n]*7789.*7345/p' file

shell: how to read a certain column in a certain line into a variable

I want to extract the first column of the last line of a text file. Instead of output the content of interest in another file and read it in again, can I just use some command to read it into a variable directly?
For exampole, if my file is like this:
...
123 456 789(this is the last line)
What I want is to read 123 into a variable in my shell script. How can I do that?

One approach is to extract the line you want, read its columns into an array, and emit the array element you want.
For the last line:
#!/bin/bash
# ^^^^- not /bin/sh, to enable arrays and process substitution
read -r -a columns < <(tail -n 1 "$filename") # put last line's columns into an array
echo "${columns[0]}" # emit the first column
Alternately, awk is an appropriate tool for the job:
line=2
column=1
var=$(awk -v line="$line" -v col="$column" 'NR == line { print $col }' <"$filename")
echo "Extracted the value: $var"
That said, if you're looking for a line close to the start of a file, it's often faster (in a runtime-performance sense) and easier to stick to shell builtins. For instance, to take the third column of the second line of a file:
{
read -r _ # throw away first line
read -r _ _ value _ # extract third value of second line
} <"$filename"
This works by using _s as placeholders for values you don't want to read.

I guess with "first column", you mean "first word", do you?
If it is guaranteed, that the last line doesn't start with a space, you can do
tail -n 1 YOUR_FILE | cut -d ' ' -f 1

You could also use sed:
$> var=$(sed -nr '$s/(^[^ ]*).*/\1/p' "file.txt")
The -nr tells sed to not output data by default (-n) and use extended regular expressions (-r to avoid needing to escape the paranthesis otherwise you have to write \( \))). The $ is an address that specifies the last line. The regular expression anchors the beginning of the line with the first ^, then matches everything that is not a space [^ ]* and puts that the result into a capture group ( ) and then gets rid of the rest of the line .* by replacing the line with the capture group \1, then print p to print the line.

Cut the first and the last part of a string in bash

I have a string having this formats:
aa_bb_cc_dd
aa_bb_cc_dd_ee_ff
I want to obtain:
bb_cc
bb_cc_dd_ee
I've tried 'cut', but I didn't manage to obtain what I wanted.

when using bash you can use built-ins for this task:
strip_headtail() {
local s=$1
## strip the head
s=${s#*_}
## strip the tail
s=${s%_*}
echo ${s}
}
strip_headtail aa_bb_cc_dd
strip_headtail aa_bb_cc_dd_ee_ff
you might want to check the bash-manual (man bash) for more information on this.
search for Remove matching prefix pattern resp. Remove matching suffix pattern.

With awk:
$ echo "aa_bb_cc_dd
aa_bb_cc_dd_ee_ff" | awk -F_ '{for(i=1;i<NF;i++) $i=$(i+1); NF=NF-2}1' OFS=_
bb_cc
bb_cc_dd_ee
Explanation
-F_ and OFS=_ set input and output field separator as _.
{for(i=1;i<NF;i++) $i=$(i+1); NF=NF-2} set each field as the next one, so the nth will be the (n+1)th. Then, decrease number of fields in 2.
With sed:
$ echo "aa_bb_cc_dd
aa_bb_cc_dd_ee_ff" | sed -e 's/^[^_]*_//' -e 's/_[^_]*$//'
bb_cc
bb_cc_dd_ee
Explanation
sed -e is used to do multiple commands.
's/^[^_]*_//' delete from the beginning up to first _.
's/_[^_]*$//' delete from last _ up to the end of line.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How i should use sed for delete specific strings and allow duplicate with more characters? - bash

Related

sed removing # and ; comments from files up to certain keyword

Looking for a regex pattern, passing that pattern to a script, and replacing the pattern with the output of the script

How can I retrieve the matching records from mentioned file format in bash

shell: how to read a certain column in a certain line into a variable

Cut the first and the last part of a string in bash

Categories

Resources