Shell Removing part of a string with sed - bash

Good day.
I actually have 2 questions that are related to the sed command in shell and they are very similar.
The first question is how to use sed to get a file name and remover part of it's name like the example below:
Original file:
BAT_MAN_T_spades_proc_whatever_t6_12345_14785963214785_12345.txt
What i want the file name to look like:
BAT_T_spades_proc_whatever_t6_12345_14785963214785_12345.txt
I just want the "MAN" part after the first underline to be removed out of the original file name.
The second question is about the following sed command that I've found on a file a while ago:
random_string_var_name=$(echo $file_name | sed -r 's/^[^_]*_[^_]*_(.*_t[0-9]{1}).*(_[0-9]*)\.txt/_\1\2/')
this pretty much get parts of a file name a saves it on a variable, like the example bellow:
Name of the file:
BAT_MAN_T_spades_proc_whatever_t6_12345_14785963214785_12345.txt
What that sed command gets:
T_spades_proc_whatever_t6_12345
I got what it does but i don't understand how that command works, so i would like to understand that.

I just want the "MAN" part after the first underline to be removed out of the original file name.
echo "BAT_MAN_T_spades_proc_whatever_t6_12345_14785963214785_12345.txt" | sed "s/MAN_//"
What if i want to always remove the first word after the first underline and keep everything else?
echo "BAT_MAN_T_spades_proc_whatever_t6_12345_14785963214785_12345.txt" | sed -r 's/^([^_]*)_[^_]*(_.*)/\1\2/'
what does this do: echo $file_name | sed -r 's/^[^_]*_[^_]*_(.*_t[0-9]{1}).*(_[0-9]*)\.txt/_\1\2/')
-r: runs sed in "extended regex" mode
^: matches beginning of word
[^_]* matches everything except underline 0 or more times
_ matches underline
(.*_t[0-9]{1}) matches zero or more of anything followed by _t and only one number. This match is stored in variable 1
(_[0-9]*) same thing, only that there is no prefix
/_\1\2: replaces the whole filename with _ at the beginning and the match in the first brackets and the match in the second bracket
I recommend reading up on regular expressions. They are important and not really hard to get into

I think that you may have something else than "MAN", you may have "WOMAN". So you can use:
file_name=BAT_WOMAN_T_spades_proc_whatever_t6_12345_14785963214785_12345.txt
echo $file_name | sed 's/_[^_]*_/_/'

Related

Use sed to count periods, commas, and numbers?

I have a file that looks like this:
19.217.179.33,175.176.12.8
253.149.205.57,174.210.221.195
222.118.178.218,255.99.100.202
241.55.199.243,167.98.204.104
38.224.198.117,21.11.184.68
Each line is 2 IP addresses, separated by a comma. So, each line should meet these requirements:
Has 1 comma.
Has 6 periods.
Has ONLY numbers, commas, and periods.
If a line is missing a period, has more/less than one commas, has a letter, is blank, or anything like that - it isn't correct. Basically I just want to use sed or something similar to loop through each line in the file and make sure each of them meets the above requirements.
Is this something that can be done with sed? I know you can use it to delete files that do/don't have matching strings, but I wasn't sure about counting specific characters or verifying that a line only has certain characters.
Any help would be greatly appreciated. Thanks!
I think grep is a better tool for this. You just want to ensure that each line matches a particular regex, so invert the grep with -v and label the input invalid if any line gets output. Something like:
grep -qvE '^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$' input || echo input is valid
You can simplify that a bit:
IP='([0-9]{1,3}\.){3}[0-9]{1,3}'
grep -qvE "^$IP,$IP$" input || echo input is valid
Or if you are more interested in invalid data:
grep -qvE "^$IP,$IP$" input && echo input is invalid
What I'd do is to think up a regular expression that fits the 'proper' lines, and omits them from printing. Like this:
sed -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/d' file
Everything that remains is a wrong line.
Here's the recipe in more detail:
[0-9]{1,3} between one and three digits
\. literal period (just the period is a wildcard and matches any character)
(...){3} three repetitions of something, so together
([0-9]{1,3}\.){3}[0-9]{1,3} makes up something that looks like an IP address. (Though note that it doesn't enforce the <256 rule, so 999.999.999.999 matches.)
/^ ... $/ the match needs to start at the beginning of the line and run until its end.
'/ ... /d' print everything except lines that match what's inside the two slashes
-r is needed to recognise the {1,3} syntax.
This will find and print the lines that are wrong. If you want to delete the wrong lines, you can easily invert this:
sed -i.bak -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/p' file
-i.bak means keep a backup, but overwrite the input file
-n means don't output anything unless expressly directed to output, and
/ ... /p output all the lines that match this regex.
If you would like to display only information about file contents correctness , you can use this command:
sed -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/!{a \
FILE IS INCORRECT
;q;};$aFILE IS OK'
It's modified version of #chw21 answer, but displays only information text:
FILE IS INCORRECT, or
FILE IS OK.

Unix: Removing date from a string in single command

For satisfying a legacy code i had to add date to a filename like shown below(its definitely needed and cannot modify legacy code :( ). But i need to remove the date within the same command without going to a new line. this command is read from a text file so i should do this within the single command.
$((echo "$file_name".`date +%Y%m%d`| sed 's/^prefix_//')
so here i am removing the prefix from filename and adding a date appended to filename. i also do want to remove the date which i added. for ex: prefix_filename.txt or prefix_filename.zip should give me as below.
Expected output:
filename.txt
filename.zip
Current output:
filename.txt.20161002
filename.zip.20161002
Assumming all the files are formatted as filename.ext.date, You can pipe the output to 'cut' command and get only the 1st and 2nd fields :
~> X=filename.txt.20161002
~> echo $X | cut -d"." -f1,2
filename.txt
I am not sure that I understand your question correctly, but perhaps this does what you want:
$((echo "$file_name".`date +%Y%m%d`| sed -e 's/^prefix_//' -e 's/\.[^.]*$//')
Sample input:
cat sample
prefix_original.txt.log.tgz.10032016
prefix_original.txt.log.10032016
prefix_original.txt.10032016
prefix_one.txt.10032016
prefix.txt.10032016
prefix.10032016
grep from start of the string till a literal dot "." followed by digit.
grep -oP '^.*(?=\.\d)' sample
prefix_original.txt.log.tgz
prefix_original.txt.log
prefix_original.txt
prefix_one.txt
prefix.txt
prefix
perhaps, following should be used:
grep -oP '^.*(?=\.\d)|^.*$' sample
If I understand your question correctly, you want to remove the date part from a variable, AND you already know from the context that the variable DOES contain a date part and that this part comes after the last period in the name.
In this case, the question boils down to removing the last period and what comes after.
This can be done (Posix shell, bash, zsh, ksh) by
filename_without=${filename_with%.*}
assuming that filename_with contains the filename which has the date part in the end.
% cat example
filename.txt.20161002
filename.zip.20161002
% cat example | sed "s/.[0-9]*$//g"
filename.txt
filename.zip
%

Delete unknown amount of regexps using sed

I'm trying to get a bunch of regular expressions for a file (one per line) and then fit those regexps into something like this /$regexp/d . I'm trying it this way:
while read line;do sed "/$line/d" to_delete.file >> output;done < to_delete.txt
But it says me 'unknown command', even if I change the delimiter.
--- EDIT
The to_delete.txt file has slashes but i'm already scraping them and that's where i find the error.
To avoid problem with / in regex sed is allow to use another separator, so you can use e.g. sed "\|$line|d".
Secondary if you put script into double-quotes you shoud add space between address range and action e.g. "\|$line| d"
But I see a general mistake in the script. The loop will print into output all to_delete.file (exept 1 line with regexp) by each loop. I suppose it is not the thing what OP wants.
If you'd like to exclude content of to_delete.txt from to_delete.file it can be easy done by grep
grep -vFf "to_delete.txt" "to_delete.file" > output

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

How to apply two different sed commands on a line?

Q1:
I would like to edit a file containing a set of email ids such that all the domain names become generic.
Example,
peter#yahoo.com
peter#hotmail.co.in
philip#gmail.com
to
peter_yahoo#generic.com
peter_hotmail#generic.com
philip_gmail#generic.com
I used the following sed cmd to replace # with _
sed 's/#/_/' <filename>
Is there a way to append another sed cmd to the cmd mentioned above such that I can replace the last part of the domain names with #generic.com?
Q2:
so how do I approach this if I had text at the end of my domain names?
Example,
peter#yahoo.com,i am peter
peter#hotmail.co.in,i am also peter
To,
peter_yahoo.com#generic.com,i am peter
peter_hotmail.co.in#generic.com,i am also peter
I tried #(,) instead of #(.*)
it doesn't work and I cant think of any other solution
Q3:
Suppose if my example is like this,
peter#yahoo.com
peter#hotmail.co.in,i am peter
I want my result to be as follows,
peter_yahoo.com#generic.com
peter_hotmail.co.in#generic.com,i am peter,i am peter
How do i do this with a single sed cmd?
The following cmd would result in,
sed -r 's!#(.*)!_\1#generic.com!' FILE
peter_yahoo.com#generic.com
peter_hotmail.co.in,i am peter,i am peter#generic.com
And the following cmd wont work on "peter#yahoo.com",
sed -r 's!#(.*)(,.*)!_\1#generic.com!' FILE
Thanks!!
Golfing =)
$ cat FILE
Example,
peter#yahoo.com
peter#hotmail.co.in
philip#gmail.com
$ sed -r 's!#(.*)!_\1#generic.com!' FILE
Example,
peter_yahoo.com#generic.com
peter_hotmail.co.in#generic.com
philip_gmail.com#generic.com
In reply to user1428900, this is some explanations :
sed -r # sed in extended regex mode
s # substitution
! # my delimiter, pick up anything you want instead !part of regex
#(.*) # a literal "#" + capture of the rest of the line
! # middle delimiter
_\1#generic.com # an "_" + the captured group N°1 + "#generic.com"
! # end delimiter
FILE # file-name
Extended mode isn't really needed there, consider the same following snippet in BRE (basic regex) mode :
sed 's!#\(.*\)!_\1#generic.com!' FILE
Edit to fit your new needs :
$ cat FILE
Example,
peter#yahoo.com,I am peter
peter#hotmail.co.in
philip#gmail.com
$ sed -r 's!#(.*),.*!_\1#generic.com!' FILE
Example,
peter_yahoo.com#generic.com
peter#hotmail.co.in
philip#gmail.com
If you want only email lines, you can do something like that :
sed -r '/#/s!#(.*),.*!_\1#generic.com!' FILE
the /#/ part means to only works on the lines containing the character #
Edit2:
if you want to keep the end lines like your new comments said :
sed -r 's!#(.*)(,.*)!_\1#generic.com\2!' FILE
You can run multiple commands with:
sed -e cmd -e cmd
or
sed -e cmd;cmd
So, in your case you could do:
sed -e 's/#/_/' -e 's/_.*/_generic.com/' filename
but it seems easier to just do
sed 's/#.*/_generic.com/' filename
sed 's/\(.*\)#\(.*\)\..*/\1_\2#generic.com/'
Expression with escaped parentheses \(.*\) is used to remember portions of the regular expression. The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern.
The expression \(.*\) before the # is used to remember beginning of the email id (peter, peter, philip).
The expression \(.*\)\. after the # is used to remember ending of the email id (yahoo, hotmail, gmail). In other words, it says: take something between # and .
The expression .* at the end is used to match all trailing symbols in the e-mail id (.com, .co.in, .co.in).

Resources