Using Sed to pull out base URL of full website - windows

I'm looking to cat a file containing something like this:
http://www.site1.com/d23bdbd0fbc517d34, r N 4
https://www.site2.com/file/d23bdbd0fbc517d34, X
http://www.site3.com/file/d23bdbd0fbc517d34
https://www.site4.edu/site/d23bdbd0fbc517d34
and I need use sed to get this kind of output:
www.site1.com
www.site2.com
www.site3.com
www.site4.edu
Help! I can't get it fully working right. Technically I'm using sed.exe for Windows but it's probably very similar.

$ cat file.txt
http://www.site1.com/d23bdbd0fbc517d34, r N 4
https://www.site2.com/file/d23bdbd0fbc517d34, X
http://www.site3.com/file/d23bdbd0fbc517d34
https://www.site4.edu/site/d23bdbd0fbc517d34
$ sed -r 's#.*//([^ /]+).*#\1#g' file.txt
www.site1.com
www.site2.com
www.site3.com
www.site4.edu
If you don't have -r switch :
sed 's#.*//\([^ /]\+\)[/ ].*#\1#g' file.txt
Moreover, under windows IIRC, use double-quotes instead of single-quotes.
So maybe :
sed.exe "s#.*//\([^ /]\+\)[/ ].*#\1#g" file.txt

Another variant is:
sed '\#.*www[.]\([^/]*\).*# s::\1:'
will display
site1.com
site2.com
site3.com
site4.edu
tested with
#ThinkPad-T420:~$ sed --version
GNU sed version 4.2.1

Related

Combine multiple sed commands into one

I have a file example.txt, I want to delete and replace fields in it.
The following commands are good, but in a very messy way, unfortunately I'm a rookie to sed command.
The commands I used:
sed 's/\-I\.\.\/\.\.\/\.\.//\n/g' example.txt > example.txt1
sed 's/\-I/\n/g' example.txt1 > example.txt2
sed '/^[[:space:]]*$/d' > example.txt2 example.txt3
sed 's/\.\.\/\.\.\/\.\.//g' > example.txt3 example.txt
and then I'm deleting all the unnecessary files.
I'm trying to get the following result:
Common/Components/Component
Common/Components/Component1
Common/Components/Component2
Common/Components/Component3
Common/Components/Component4
Common/Components/Component5
Common/Components/Component6
Comp
App
The file looks like this:
-I../../../Common/Component -I../../../Common/Component1 -I../../../Common/Component2 -I../../../Common/Component3 -I../../../Common/Component4 -I../../../Common/Component5 -I../../../Common/Component6 -IComp -IApp ../../../
I want to know how the best way to transform input format to output format standard text-processing tool with 1 call with sed tool or awk.
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='-I\\S+' 'RT{sub(/^-I.*Common\//,"Common/Components/",RT);sub(/^-I/,"",RT);print RT}' Input_file
output with samples will be as follows:
Common/Components/Component
Common/Components/Component1
Common/Components/Component2
Common/Components/Component3
Common/Components/Component4
Common/Components/Component5
Common/Components/Component6
Comp
App
Explanation: Simple explanation would be, in GNU awk. Setting RS(record separator) as -I\\S+ -I till a space comes. In main awk program, check if RT is NOT NULL, substitute starting -I till Common with Common/Components/ in RT and then substitute starting -I with NULL in RT. Then printing RT here.
If you don't REALLY want the string /Components to be added in the middle of some output lines then this may be what you want, using any awk in any shell on every Unix box:
$ awk -v RS=' ' 'sub("^-I[./]*","")' file
Common/Component
Common/Component1
Common/Component2
Common/Component3
Common/Component4
Common/Component5
Common/Component6
Comp
App
That would fail if any of the paths in your input contained blanks but you don't show that as a possibility in your question so I assume it can't happen.
What about
sed -i 's/\-I\.\.\/\.\.\/\.\.//\n/g
s/\-I/\n/g
/^[[:space:]]*$/d
s/\.\.\/\.\.\/\.\.//g' example.txt

sed - remove dates from each line of file

I am trying to get rid of the dates - all of them from 2015 -present 2017.
I want to rename each foo_data_$date to just foo_data_*. I just need the files name. Not all the individual dates.
I do not understand the regex for sed - I can do it in perl with perl -nle 'print /(foo_data_)\d+txt) but can't figure out how to do it with sed.
I want to do it in sed because I have been using sed -i flag and changing the file in place.
cat /tmp/foo | head | sed -e 's/foo_data_20*txt/foo_data_\*/g'
foo_data_20150901.txt
foo_data_20150902.txt
foo_data_20150906.txt
foo_data_20150907.txt
foo_data_20150908.txt
foo_data_20150909.txt
foo_data_20150912.txt
You can just run sed like this.
sed -e 's/foo_data_[0-9]*/foo_data_/g'
Now, for the thing to capture dates only between 2015 and 2017, this will make it.
sed -e 's/foo_data_201\(5\|6\|7\)[0-9]*/foo_data_/g'
Then you will remove the dates from the file names in your file.
You don't need to mention foo_data:
sed -i 's/201[567][01][0-9][0-3][0-9]//'
Your command was wrong: /foo_data_20*txt/ will match a '0' 0 or more times (something like foo_data_2000000000000txt).
If you just want to rename the files, most Linux distros (assuming you're on Linux) have a rename utility that handles Perl regular expressions just fine:
pax> touch pax_100.txt ; touch pax_200.txt
pax> rename -n 's/_(\d)/_diablo_$1/' pax*
rename(pax_100.txt, pax_diablo_100.txt)
rename(pax_200.txt, pax_diablo_200.txt)
The -n options shows what will happen rather than doing the rename. Once you're satisfied, simply remove it.
Oh, and one final note. If you remove the dates from all those file names, they'll all have the same file name. Unless your file names are just test data, that's probably going to need some further thought on your part.

sunOS's sed in bash script - replace pattern: nothing changes

ANSWER:
as mlv and sorontar mentioned, my SED version is BRE and doesn't support | (pipe). so in my case is possible use something like:
sed "s/\( [namevlu]*=\"\)BASE\.$str1/\1BASE\.$str2/g"
which match name=" and value=" but not other=" and values=". regex ( [namevlu]) contains only characters what i need. ok, it is not as save, as can be, but i don't expect existence of something like valuenm=" or so. if someone needs exactly specified regex, it needs make two (or more) seds.
ORIGINAL QUESTION:
I need to replace one string with another, but for sure i need check specific context.
for example:
blah val1="BASE.OLD_TEXT_OR_SO" blabla
blah val2="BASE.OLD_SOMETHING" ...
i want change to
blah val1="BASE.NEW_TEXT_OR_SO" blabla
blah val2="BASE.NEW_SOMETHING" ...
this script doesn't change anything:
#!/bin/bash
...
str1="OLD_"
str2="NEW_"
sed "s/\(name=\"|value=\"\)BASE\.$str1/\1BASE\.$str2/g" input.file > output.file
but later similiar sed works ok:
sed "s/\(<Tag>\)[A-Z0-9\-\._|]*\(<\/Tag>\)/\1$otherStr\2/g" input.file > output.file
output file has still BASE.OLD_ :/
also when i try it on console, i get same (none) result. i think there is something wrong in "looking for" pattern, but i havent idea what.
$ str0='blah name="BASE.OLD_TEXT_OR_SO" blabla
> blah value="BASE.OLD_SOMETHING" ...
> blah other="BASE.OLD_SOMETHING" ...
> blah values="BASE.OLD_SOMETHING" ...'
$ echo $str0 | sed "s/\(val1=\"|val2=\"\)BASE\.$str1/\1BASE\.$str2/g"
regex was tested on online tester where it works fine.
(name="|value=")BASE\.OLD_
\1BASE\.NEW_
system:
SunOS 5.11
GNU bash 4.1.11(1)-release
Sed 4.2.1
Thanks in advance.
I'm not sure SunOS sed includes |. That would be added with -r which I don't think SunOS sed supports.
But in this case, you should be able to do:
sed "s/\(val[12]=\"\)BASE.$str1/\1BASE.$str2/"
If you can't use val[12], then I don't think it can be done in sed. But it's not too hard in perl:
perl -pe "if (/val1=BASE.$str1/ || /val2=BASE.$str1/) {s/BASE.$str1/BASE.$str2/;}"
In a basic sed only BRE are supported, (POSIX BRE) and the alternation (|) does not exist.
Your regex must be written as:
sed "s/\(val[12]=\"BASE\.\)${str1}/\1${str2}/g" input.file
If, the version of sed you use supports (extended) ERE (which accepts the |) then you may use:
sed -E "s/(val(1|2)=\"BASE\.)${str1}/\1${str2}/g" input.file
as mlv and sorontar mentioned, my SED version is BRE and doesn't support | (pipe). so in my case is possible use something like:
sed "s/\( [namevlu]*=\"\)BASE\.$str1/\1BASE\.$str2/g"
which match name=" and value=" but not other=" and values=". regex ( [namevlu]) contains only characters what i need. ok, it is not as save, as can be, but i don't expect existence of something like valuenm=" or so. if someone needs exactly specified regex, it needs make two (or more) seds.

sed: Using a variable in sed

I am writing a script that changes the paths in a file. This is what I want to do. If I have a file that has the string "/path/to/incorrect/location", I want to change it to "/path/to/correct/location". I have tried using sed to do this by doing this...
sed -i "s/$badpath/$goodpath/g" file
Doing this though, does absolutely nothing and I cannot for the life of me figure it out. I do need to use the variable for the bad path, but the good path can be written out (since that one never changes). How would one do this?
MORE INFORMATION:
I am using GNU sed, so the -i flag is valid.
Seems to work for me, after fixing the delimiters:
$ cat sample
/path/to/incorrect/location
$ badpath="/path/to/incorrect/location"
$ goodpath="/path/to/correct/location"
$ sed -i "s|$badpath|$goodpath|" sample
$ cat sample
/path/to/correct/location

Case-insensitive search and replace with sed

I'm trying to use SED to extract text from a log file. I can do a search-and-replace without too much trouble:
sed 's/foo/bar/' mylog.txt
However, I want to make the search case-insensitive. From what I've googled, it looks like appending i to the end of the command should work:
sed 's/foo/bar/i' mylog.txt
However, this gives me an error message:
sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'
What's going wrong here, and how do I fix it?
Update: Starting with macOS Big Sur (11.0), sed now does support the I flag for case-insensitive matching, so the command in the question should now work (BSD sed doesn't reporting its version, but you can go by the date at the bottom of the man page, which should be March 27, 2017 or more recent); a simple example:
# BSD sed on macOS Big Sur and above (and GNU sed, the default on Linux)
$ sed 's/ö/#/I' <<<'FÖO'
F#O # `I` matched the uppercase Ö correctly against its lowercase counterpart
Note: I (uppercase) is the documented form of the flag, but i works as well.
Similarly, starting with macOS Big Sur (11.0) awk now is locale-aware (awk --version should report 20200816 or more recent):
# BSD awk on macOS Big Sur and above (and GNU awk, the default on Linux)
$ awk 'tolower($0)' <<<'FÖO'
föo # non-ASCII character Ö was properly lowercased
The following applies to macOS up to Catalina (10.15):
To be clear: On macOS, sed - which is the BSD implementation - does NOT support case-insensitive matching - hard to believe, but true. The formerly accepted answer, which itself shows a GNU sed command, gained that status because of the perl-based solution mentioned in the comments.
To make that Perl solution work with foreign characters as well, via UTF-8, use something like:
perl -C -Mutf8 -pe 's/öœ/oo/i' <<< "FÖŒ" # -> "Foo"
-C turns on UTF-8 support for streams and files, assuming the current locale is UTF-8-based.
-Mutf8 tells Perl to interpret the source code as UTF-8 (in this case, the string passed to -pe) - this is the shorter equivalent of the more verbose -e 'use utf8;'.Thanks, Mark Reed
(Note that using awk is not an option either, as awk on macOS (i.e., BWK awk and BSD awk) appears to be completely unaware of locales altogether - its tolower() and toupper() functions ignore foreign characters (and sub() / gsub() don't have case-insensitivity flags to begin with).)
A note on the relationship of sed and awk to the POSIX standard:
BSD sed and awk limit their functionality mostly to what the POSIX sed and
POSIX awk specs mandate, whereas their GNU counterparts implement many more extensions.
Editor's note: This solution doesn't work on macOS (out of the box), because it only applies to GNU sed, whereas macOS comes with BSD sed.
Capitalize the 'I'.
sed 's/foo/bar/I' file
Another work-around for sed on Mac OS X is to install gsedfrom MacPorts or HomeBrew and then create the alias sed='gsed'.
If you are doing pattern matching first, e.g.,
/pattern/s/xx/yy/g
then you want to put the I after the pattern:
/pattern/Is/xx/yy/g
Example:
echo Fred | sed '/fred/Is//willma/g'
returns willma; without the I, it returns the string untouched (Fred).
The sed FAQ addresses the closely related case-insensitive search. It points out that a) many versions of sed support a flag for it and b) it's awkward to do in sed, you should rather use awk or Perl.
But to do it in POSIX sed, they suggest three options (adapted for substitution here):
Convert to uppercase and store original line in hold space; this won't work for substitutions, though, as the original content will be restored before printing, so it's only good for insert or adding lines based on a case-insensitive match.
Maybe the possibilities are limited to FOO, Foo and foo. These can be covered by
s/FOO/bar/;s/[Ff]oo/bar/
To search for all possible matches, one can use bracket expressions for each character:
s/[Ff][Oo][Oo]/bar/
The Mac version of sed seems a bit limited. One way to work around this is to use a linux container (via Docker) which has a useable version of sed:
cat your_file.txt | docker run -i busybox /bin/sed -r 's/[0-9]{4}/****/Ig'
Use following to replace all occurrences:
sed 's/foo/bar/gI' mylog.txt
I had a similar need, and came up with this:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
Following should be fine:
sed -i 's/foo/bar/gi' mylog.txt

Resources