Strip only domain name out of input url string - bash

Did a bit of searching already but cannot seem to find an elegant way of doing this. I'd like to be able to search through a list like below and only end up with a plain text output file containing on the domain name, no http:// or anything after the /
So a list like this:
http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp
I want to end up with plain text output file like this.
7wind.ru
aldersgatencsc.org
amunow.org

Given:
$ echo "$txt"
http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp
You can use cut:
$ echo "$txt" | cut -d'/' -f3
7wind.ru
aldersgatencsc.org
amunow.org
Or, if your content is in a file:
$ cut -d'/' -f3 file
7wind.ru
aldersgatencsc.org
amunow.org
Then redirect that to the file you want:
$ cut -d'/' -f3 file >new_file

awk -F \/ '{ print $3 }' outputfile > newfile
Print the 3rd field delimited by /

$ sed -r 's#.*//([^/]*)/.*#\1#' Input_file
7wind.ru
aldersgatencsc.org
amunow.org

try following awks.
Solution 1st:
awk '{sub(/.*\/\//,"");sub(/\/.*/,"");print}' Input_file
Solution 2nd:
awk '{match($0,/\/.[^/]*/);print substr($0,RSTART+2,RLENGTH-2)}' Input_file

This works by stripping the protocol and :// first, then anything after and including the next slash.
sed "s|.*://||; s|/.*||" url-list.txt
Add -i to change the file directly.

try this regexp
((http|https):\/\/)?([a-zA-Z\.]+)(\/)?
first match, 3th group
but it may validate invalid url too! be careful

Related

Get only part of file using sed or awk

I have a file which contains text as follows:
Directory /home/user/ "test_user"
bunch of code
another bunch of code
How can I get from this file only the /home/user/ part?
I've managed to use awk -F '"' 'NR==1{print $1}' file.txt to get rid of rest of the file and I'm gettig output like this:
Directory /home/user/
How can I change this command to get only /home/user/ part? I'd like to make it as simple as possible. Unfortunately, I can't modify this file to add/change the content.
this should work the fastest, noticeable if your file is large
awk '{print $2; exit}' file
it will print the second field of the first line and stop processing the rest of the file.
With awk it should be:
awk 'NR==1{print $2}' file.txt
Setting the field delimiter to " was wrong Since it splits the line into these fields:
$1 = 'Directory /home/user/'
$2 = 'test_user'
$3 = '' (empty)
The default record separator, which is [[:space:]]+, splits like this:
$1 = 'Directory'
$2 = '/home/user/'
$3 = '"test_user"'
As an alternate, you can use head and cut:
$ head -n 1 file | cut -d' ' -f2
Not sure why you are using the -F" as that changes the delimiter. If you remove that, then $2 will get you what you want.
awk 'NR==1{print $2}' file.txt
You can also use awk to execute the print when the line contains /home/user instead of counting records:
awk '/\home\/user\//{print $2}' file.txt
In this case, if the line were buried in the file, or if you had multiple instances, you would get the name for every occurrence wherever it was.
Adding some grep
grep Directory file.txt|awk '{print $2}'

To find a word and copy the following word with shell(ubuntu)?

is there a posibility to find a word in a file and than to copy the following word?
Example:
abc="def"
bla="no_need"
line_i_need="information_i_need"
still_no_use="blablabla"
so the third line, is exactly the line i need!
is it possible to find this word with shell orders?
thanks for your support
Using an awk with custom field separator it is much simpler:
awk -F '[="]+' '$1=="line_i_need"{print $2}' file
information_i_need
-F '[="]+' sets field separator as 1 or more of = or "
Use grep:
grep file_name line_i_need
It will print:
line_i_need="information_i_need"
This finds the line with grep an cuts the second column using " separator
grep file_name line_i_need | cut -d '"' -f2

Excluding '#' comments from a sed selection

I'm trying to get a config value from a yml file but there is one line that has that same value, but commented out. That is:
...
#database_name: prod
database_name: demo
database_user: root
database_password: password
...
I'm getting all values with this sed/awk command:
DATABASE_NAME=$(sed -n '/database_name/p' "$CONFIG_PATH" | awk -F' ' '{print $2}');
Now, if I do that, I get the right values for the user and password, but get double name.
Question is:
How do I exclude '#' comments from my sed selection?
You might as well use awk for the whole operation:
DATABASE_NAME=$(awk -F' ' '$1!~/^#/ && /database_name/{print $2}' "$CONFIG_PATH")
This will exclude all lines that start with # (comments).
If there is always a character before the d use /[^#]database_name/p.
If not you can use /\(^\|[^#]\)database_name/p.
I think the braces are a GNU sed feature (not sure though)
sed -n '/database_name/ {/^[[:blank:]]*#/!p}'
For lines matching "database_name", if the line does NOT begin with blanks and a hash then print it.
if the file has blank spaces at starting of lines:
sed 's/ //g' file.txt | awk '/^(database)/{print}'
I ended up using #etan-reisner solution.
Here is another solution to my particular problem I found along the way:
DATABASE_NAME=$(cat "$CONFIG_PATH" | grep -v '^[[:space:]]*#' | sed -n '/database_host/p' | awk -F' ' '{print $2}');
This will filter every line that contains some spaces followed by a hash.

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Basic stream/sed? bash script, perform substring on each line

I know this is basic, but I couldn't find the simplest way to iterate through a file with hundreds of lines and extract a substring.
If I have a file:
ABCY uuuu
UNUY uuuu
...
I want to end up with:
uuuu
uuuu
....
Ideally do a substring
{5} detect at character 5 and output that
You need no sed:
cut -c5-9 yourfile
It would be easier to use cut or awk. Assuming that your fields are separated by a space and you want the second field, you can use:
cut -d' ' -f2 file.txt
awk '{print $2}' file.txt
You can also use cut and awk to extract substrings:
cut -c6- file.txt
awk '{print substr($0,6);}' file.txt
However, if you really want to iterate through the file and extract substrings, you can use a while loop:
while IFS= read -r line
do
echo ${line:5}
done < file.txt
if you really love sed, you could try:
sed -r 's/^.{5}//' file

Resources