nslookup/dig/drill commands on a file that contains websites to add ip addresses - shell

UPDATE : Still open for solutions using nslookup without parallel, dig or drill
I need to write a script that scans a file containing web page addresses on each line, and adds to these lines the IP address corresponding to the name using nslookup command. The script looks like this at the moment :
#!/usr/bin/
while read ip
do
nslookup "$ip" |
awk '/Name:/{val=$NF;flag=1;next} /Address:/ &&
flag{print val,$NF;val=""}' |
sed -n 'p;n'
done < is8.input
The input file contains the following websites :
www.edu.ro
vega.unitbv.ro
www.wikipedia.org
The final output should look like :
www.edu.ro 193.169.21.181
vega.unitbv.ro 193.254.231.35
www.wikipedia.org 91.198.174.192
The main problem i have with the current state of the script is that it takes the names from nslookup (which is good for www.edu.ro) instead of taking the aliases when those are available. My output looks like this:
www.edu.ro 193.169.21.181
etc.unitbv.ro 193.254.231.35
dyna.wikimedia.org 91.198.174.192
I was thinking about implementing a if-else for aliases but i don't know how to do one on the current command. Also the script can be changed if anyone has a better understanding of how to format nslookup to show it like the output given.

Minimalist workaround quasi-answer. Here's a one-liner replacement for the script using GNU parallel, host (less work to parse than nslookup), and sed:
parallel "host {} 2> /dev/null |
sed -n '/ has address /{s/.* /'{}' /p;q}'" < is8.input
...or using nslookup at the cost of added GNU sed complexity.
parallel "nslookup {} 2> /dev/null |
sed -n '/^A/{s/.* /'{}' /;T;p;q;}'" < is8.input
...or using xargs:
xargs -I '{}' sh -c \
"nslookup {} 2> /dev/null |
sed -n '/^A/{s/.* /'{}' /;T;p;q;}'" < is8.input
Output of any of those:
www.edu.ro 193.169.21.181
vega.unitbv.ro 193.254.231.35
www.wikipedia.org 208.80.154.224

Replace your complete nslookup line with:
echo "$IP $(dig +short "$IP" | grep -m 1 -E '^[0-9.]{7,15}$')"

This might work for you (GNU sed and host):
sed '/\S/{s#.*#host & | sed -n "/ has address/{s///p;q}"#e}' file
For all non-empty lines: invoke the host command on the supplied host name and pipe the results to another invocation of sed which strips out text and quits after the first result.

Related

User input into variables and grep a file for pattern

H!
So I am trying to run a script which looks for a string pattern.
For example, from a file I want to find 2 words, located separately
"I like toast, toast is amazing. Bread is just toast before it was toasted."
I want to invoke it from the command line using something like this:
./myscript.sh myfile.txt "toast bread"
My code so far:
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w "$keyword_first""$keyword_second" )
echo $find_keyword
i have tried a few different ways. Directly from the command line I can make it run using:
cat myfile.txt | grep -E 'toast|bread'
I'm trying to put the user input into variables and use the variables to grep the file
You seem to be looking simply for
grep -E "$2|$3" "$1"
What works on the command line will also work in a script, though you will need to switch to double quotes for the shell to replace variables inside the quotes.
In this case, the -E option can be replaced with multiple -e options, too.
grep -e "$2" -e "$3" "$1"
You can pipe to grep twice:
find_keyword=$(cat $text_file | grep -w "$keyword_first" | grep -w "$keyword_second")
Note that your search word "bread" is not found because the string contains the uppercase "Bread". If you want to find the words regardless of this, you should use the case-insensitive option -i for grep:
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
In a full script:
#!/bin/bash
#
# usage: ./myscript.sh myfile.txt "toast" "bread"
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
echo $find_keyword

How to extract Active domain

Is there any bash command/script in Linux so we can extract the active domains from a long list,
example, I have a csv file (domains.csv) there are 55 million domains are listed horizontally, we need only active domains in a csv file (active.csv)
Here active mean a domain who has a web page at least, not a domain who is expired or not expired. example whoisdatacenter.info is not expired but it has no webpage, we consider it as non-active.
I check google and stack website. I saw we can get domain by 2 ways. like
$ curl -Is google.com | grep -i location
Location: http://www.google.com/
or
nslookup google.com | grep -i name
Name: google.com
but I got no idea how can I write a program in bash for this for 55 million domains.
below commands, won't give any result so I come up that nsloop and curl is wayway to get result
$ nslookup whoisdatacenter.info | grep -i name
$ curl -Is whoisdatacenter.info | grep -i location
1st 25 lines
$ head -25 domains.csv
"
"0----0.info"
"0--0---------2lookup.com"
"0--0-------free2lookup.com"
"0--0-----2lookup.com"
"0--0----free2lookup.com"
"0--1.xyz"
"0--123456789.com"
"0--123456789.net"
"0--6.com"
"0--7.com"
"0--9.info"
"0--9.net"
"0--9.world"
"0--a.com"
"0--a.net"
"0--b.com"
"0--m.com"
"0--mm.com"
"0--reversephonelookup.com"
"0--z.com"
"0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0.com"
"0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0.com"
"0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info"
code I am running
while read line;
do nslookup "$line" | awk '/Name/';
done < domains.csv > active3.csv
the result I am getting
sh -x ravi2.sh
+ read line
+ nslookup ''
+ awk /Name/
nslookup: '' is not a legal name (unexpected end of input)
+ read line
+ nslookup '"'
+ awk /Name/
+ read line
+ nslookup '"0----0.info"'
+ awk /Name/
+ read line
+ nslookup '"0--0---------2lookup.com"'
+ awk /Name/
+ read line
+ nslookup '"0--0-------free2lookup.com"'
+ awk /Name/
+ read line
+ nslookup '"0--0-----2lookup.com"'
+ awk /Name/
+ read line
+ nslookup '"0--0----free2lookup.com"'
+ awk /Name/
still, active3.csv is empty
below . the script is working, but something stopping the bulk lookup, either it's in my host or something else.
while read line
do
nslookup $(echo "$line" | awk '{gsub(/\r/,"");gsub(/.*-|"$/,"")} 1') | awk '/Name/{print}'
done < input.csv >> output.csv
The bulk nslookup show such error in below
server can't find facebook.com\013: NXDOMAIN
[Solved]
Ravi script is working perfectly fine, I was running in my MAC which gave Nslookup Error, I work in CentOS Linux server, Nslookup work great with Ravi script
Thanks a lot!!
EDIT: Please try my EDIT solution as per OP's shown samples.
while read line
do
nslookup $(echo "$line" | awk '{gsub(/\r/,"");gsub(/.*-|"$/,"")} 1') | awk '/Name/{found=1;next} found && /Address/{print $NF}'
done < "Input_file"
Could you please try following.
OP has control M characters in her Input_file so run following command too remove them first:
tr -d '\r' < Input_file > temp && mv temp Input_file
Then run following code:
while read line
do
nslookup "$line" | awk '/Name/{found=1;next} found && /Address/{print $NF}'
done < "Input_file"
I am assuming that since you are passing domain name you need to get their address(IP address) in output. Also since you are using a huge Input_file so it may be a bit slow in providing output, but trust me this is a simpler way.
nslookup simply indicates whether or not the domain name has a record in DNS. Having one or more IP addresses does not automatically mean you have a web site; many IP addresses are allocated for different purposes altogether (but might coincidentally host a web site for another domain name entirely!)
(Also, nslookup is not particularly friendly to scripting; you will want to look at dig instead for automation.)
There is no simple way to visit 55 million possible web sites in a short time, and probably you should not be using Bash if you want to. See e.g. https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html for an exposition of various approaches based on Python.
The immediate error message indicates that you have DOS carriage returns in your input file; this is a common FAQ which is covered very well over at Are shell scripts sensitive to encoding and line endings?
You can run multiple curl instances in parallel but you will probably eventually saturate your network -- experiment with various degrees of parallelism -- maybe split up your file into smaller pieces and run each piece on a separate host with a separate network connection (perhaps in the cloud) but to quickly demonstrate,
tr -d '\r' <file |
xargs -P 256 -i sh -c 'curl -Is {} | grep Location'
to run 256 instances of curl in parallel. You will still need to figure out which output corresponds to which input, so maybe refactor to something like
tr -d '\r' <file |
xargs -P 256 -i sh -c 'curl -Is {} | sed -n "s/Location/{}:&/p"'
to print the input domain name in front of each output.
(Maybe also note that just a domain name is not a complete URL. curl will helpfully attempt to add a "http://" in front and then connect to that, but that still doesn't give you an accurate result if the domain only has a "https://" website and no redirect from the http:// one.)
If you are on a Mac, where xargs doesn't understand -i, try -I {} or something like
tr -d '\r' <file |
xargs -P 256 sh -c 'for url; do curl -Is "$url" | sed -n "s/Location/{}:&/p"; done' _
The examples assume you didn't already fix the DOS carriage returns once and for all; you probably really should (and consider dropping Windows from the equation entirely).

Using Bash Less and Grep together [duplicate]

Is that possible to use grep on a continuous stream?
What I mean is sort of a tail -f <file> command, but with grep on the output in order to keep only the lines that interest me.
I've tried tail -f <file> | grep pattern but it seems that grep can only be executed once tail finishes, that is to say never.
Turn on grep's line buffering mode when using BSD grep (FreeBSD, Mac OS X etc.)
tail -f file | grep --line-buffered my_pattern
It looks like a while ago --line-buffered didn't matter for GNU grep (used on pretty much any Linux) as it flushed by default (YMMV for other Unix-likes such as SmartOS, AIX or QNX). However, as of November 2020, --line-buffered is needed (at least with GNU grep 3.5 in openSUSE, but it seems generally needed based on comments below).
I use the tail -f <file> | grep <pattern> all the time.
It will wait till grep flushes, not till it finishes (I'm using Ubuntu).
I think that your problem is that grep uses some output buffering. Try
tail -f file | stdbuf -o0 grep my_pattern
it will set output buffering mode of grep to unbuffered.
If you want to find matches in the entire file (not just the tail), and you want it to sit and wait for any new matches, this works nicely:
tail -c +0 -f <file> | grep --line-buffered <pattern>
The -c +0 flag says that the output should start 0 bytes (-c) from the beginning (+) of the file.
In most cases, you can tail -f /var/log/some.log |grep foo and it will work just fine.
If you need to use multiple greps on a running log file and you find that you get no output, you may need to stick the --line-buffered switch into your middle grep(s), like so:
tail -f /var/log/some.log | grep --line-buffered foo | grep bar
you may consider this answer as enhancement .. usually I am using
tail -F <fileName> | grep --line-buffered <pattern> -A 3 -B 5
-F is better in case of file rotate (-f will not work properly if file rotated)
-A and -B is useful to get lines just before and after the pattern occurrence .. these blocks will appeared between dashed line separators
But For me I prefer doing the following
tail -F <file> | less
this is very useful if you want to search inside streamed logs. I mean go back and forward and look deeply
Didn't see anyone offer my usual go-to for this:
less +F <file>
ctrl + c
/<search term>
<enter>
shift + f
I prefer this, because you can use ctrl + c to stop and navigate through the file whenever, and then just hit shift + f to return to the live, streaming search.
sed would be a better choice (stream editor)
tail -n0 -f <file> | sed -n '/search string/p'
and then if you wanted the tail command to exit once you found a particular string:
tail --pid=$(($BASHPID+1)) -n0 -f <file> | sed -n '/search string/{p; q}'
Obviously a bashism: $BASHPID will be the process id of the tail command. The sed command is next after tail in the pipe, so the sed process id will be $BASHPID+1.
Yes, this will actually work just fine. Grep and most Unix commands operate on streams one line at a time. Each line that comes out of tail will be analyzed and passed on if it matches.
This one command workes for me (Suse):
mail-srv:/var/log # tail -f /var/log/mail.info |grep --line-buffered LOGIN >> logins_to_mail
collecting logins to mail service
Coming some late on this question, considering this kind of work as an important part of monitoring job, here is my (not so short) answer...
Following logs using bash
1. Command tail
This command is a little more porewfull than read on already published answer
Difference between follow option tail -f and tail -F, from manpage:
-f, --follow[={name|descriptor}]
output appended data as the file grows;
...
-F same as --follow=name --retry
...
--retry
keep trying to open a file if it is inaccessible
This mean: by using -F instead of -f, tail will re-open file(s) when removed (on log rotation, for sample).
This is usefull for watching logfile over many days.
Ability of following more than one file simultaneously
I've already used:
tail -F /var/www/clients/client*/web*/log/{error,access}.log /var/log/{mail,auth}.log \
/var/log/apache2/{,ssl_,other_vhosts_}access.log \
/var/log/pure-ftpd/transfer.log
For following events through hundreds of files... (consider rest of this answer to understand how to make it readable... ;)
Using switches -n (Don't use -c for line buffering!).By default tail will show 10 last lines. This can be tunned:
tail -n 0 -F file
Will follow file, but only new lines will be printed
tail -n +0 -F file
Will print whole file before following his progression.
2. Buffer issues when piping:
If you plan to filter ouptuts, consider buffering! See -u option for sed, --line-buffered for grep, or stdbuf command:
tail -F /some/files | sed -une '/Regular Expression/p'
Is (a lot more efficient than using grep) a lot more reactive than if you does'nt use -u switch in sed command.
tail -F /some/files |
sed -une '/Regular Expression/p' |
stdbuf -i0 -o0 tee /some/resultfile
3. Recent journaling system
On recent system, instead of tail -f /var/log/syslog you have to run journalctl -xf, in near same way...
journalctl -axf | sed -une '/Regular Expression/p'
But read man page, this tool was built for log analyses!
4. Integrating this in a bash script
Colored output of two files (or more)
Here is a sample of script watching for many files, coloring ouptut differently for 1st file than others:
#!/bin/bash
tail -F "$#" |
sed -une "
/^==> /{h;};
//!{
G;
s/^\\(.*\\)\\n==>.*${1//\//\\\/}.*<==/\\o33[47m\\1\\o33[0m/;
s/^\\(.*\\)\\n==> .* <==/\\o33[47;31m\\1\\o33[0m/;
p;}"
They work fine on my host, running:
sudo ./myColoredTail /var/log/{kern.,sys}log
Interactive script
You may be watching logs for reacting on events?
Here is a little script playing some sound when some USB device appear or disappear, but same script could send mail, or any other interaction, like powering on coffe machine...
#!/bin/bash
exec {tailF}< <(tail -F /var/log/kern.log)
tailPid=$!
while :;do
read -rsn 1 -t .3 keyboard
[ "${keyboard,}" = "q" ] && break
if read -ru $tailF -t 0 _ ;then
read -ru $tailF line
case $line in
*New\ USB\ device\ found* ) play /some/sound.ogg ;;
*USB\ disconnect* ) play /some/othersound.ogg ;;
esac
printf "\r%s\e[K" "$line"
fi
done
echo
exec {tailF}<&-
kill $tailPid
You could quit by pressing Q key.
you certainly won't succeed with
tail -f /var/log/foo.log |grep --line-buffered string2search
when you use "colortail" as an alias for tail, eg. in bash
alias tail='colortail -n 30'
you can check by
type alias
if this outputs something like
tail isan alias of colortail -n 30.
then you have your culprit :)
Solution:
remove the alias with
unalias tail
ensure that you're using the 'real' tail binary by this command
type tail
which should output something like:
tail is /usr/bin/tail
and then you can run your command
tail -f foo.log |grep --line-buffered something
Good luck.
Use awk(another great bash utility) instead of grep where you dont have the line buffered option! It will continuously stream your data from tail.
this is how you use grep
tail -f <file> | grep pattern
This is how you would use awk
tail -f <file> | awk '/pattern/{print $0}'

bash: cURL from a file, increment filename if duplicate exists

I'm trying to curl a list of URLs to aggregate the tabular data on them from a set of 7000+ URLs. The URLs are in a .txt file. My goal was to cURL each line and save them to a local folder after which I would grep and parse out the HTML tables.
Unfortunately, because of the format of the URLs in the file, duplicates exist (example.com/State/City.html. When I ran a short while loop, I got back fewer than 5500 files, so there are at least 1500 dupes in the list. As a result, I tried to grep the "/State/City.html" section of the URL and pipe it to sed to remove the / and substitute a hyphen to use with curl -O. cURL was trying to grab
Here's a sample of what I tried:
while read line
do
FILENAME=$(grep -o -E '\/[A-z]+\/[A-z]+\.htm' | sed 's/^\///' | sed 's/\//-/')
curl $line -o '$FILENAME'
done < source-url-file.txt
It feels like I'm missing something fairly straightforward. I've scanned the man page because I worried I had confused -o and -O which I used to do a lot.
When I run the loop in the terminal, the output is:
Warning: Failed to create the file State-City.htm
I think you dont need multitude seds and grep, just 1 sed should suffice
urls=$(echo -e 'example.com/s1/c1.html\nexample.com/s1/c2.html\nexample.com/s1/c1.html')
for u in $urls
do
FN=$(echo "$u" | sed -E 's/^(.*)\/([^\/]+)\/([^\/]+)$/\2-\3/')
if [[ ! -f "$FN" ]]
then
touch "$FN"
echo "$FN"
fi
done
This script should work and also take care of downloading same files multiple files.
just replace the touch command by your curl one
First: you didn't pass the url info to grep.
Second: try this line instead:
FILENAME=$(echo $line | egrep -o '\/[^\/]+\/[^\/]+\.html' | sed 's/^\///' | sed 's/\//-/')

Get all running ports using UNIX command and form a new command based on result

I need to get all running ports from the server, like the unix command 'netstat -an | grep tcp46'
OUTPUT:
tcp46 0 0 *.8009 *.* LISTEN
tcp46 0 0 *.8080 *.* LISTEN
Then I need to iterate the ports and form a command like below.
curl http://serverhost.com:${iterative ports}/app/version
eg.
curl http://serverhost.com:8080/app/version
Can anyone please help me with the shell script or any easy commands available?
My netstat -an output looks different from yours, so I'm going in blind here:
for i in netstat -an | grep tcp46 | cut -d' ' -f18 | sed 's/*.//g'; do echo curl http://serverhost.com:$i/app/version; done
That is a one liner that should work, but it assumes the output is the same as you said, if the number of spaces change then it won't work correctly. Just remove the echo if you want to run the command directly.
You can use regular expressions (REGEX) in sed to get to the output you need from the grep input stream. Then in a bash for loop, execute your curl command for every port you find. Note: the following doesn't check for duplicate ports.
for port in $(netstat -an | grep tcp46 | sed 's/[a-zA-Z]\{1,3\}[ 0-9.]*:\{1,3\}//g' | sed 's/ \+.*//g');
do echo "curl http://serverhost.com:$port/app/version" >> commandFile.txt;
done
I invoke sed twice: once to remove the first portion of the string and a second time to remove the trailing portion.
The output of this script is sent to commandFile.txt in your current directory.
If you would rather run each curl command rather than send to a file, simply remove the echo as follows:
for port in $(netstat -an | grep tcp46 | sed 's/[a-zA-Z]\{1,3\}[ 0-9.]*:\{1,3\}//g' | sed 's/ \+.*//g');
do curl http://serverhost.com:$port/app/version;
done

Resources