I have a file with many (~2k) lines similar to:
117 VALID|AUTHEN tcp:10.92.163.5:64127 uniqueID=nwCelerra
....
991 VALID|AUTHEN tcp:10.19.16.21:58332 uniqueID=smUNIX
I want only the IP address (10.19.16.21 shown above) and the value of the uniqueID (smUNIX shown above)
I am able to get close with:
cat t.txt|cut -f2- -d':'
10.22.36.69:46474 uniqueID=smwUNIX
...
I am on Linux using bash.
Using awk:
awk '{split($3,a,":"); split($4,b,"="); print a[2] " " b[2]}'
By default if splits on the whitespaces, with some extra code you can split the subfields
Update:
even easier overriding the default delimiter:
awk -F '[:=]' '{print $2 " "$4}'
using grep and sed :
grep -oP "^\d+ [A-Z]+\|[A-Z]+ \w+:\K(.*)" | sed "s/ uniqueID=/ /g"
outputs:
10.92.163.5:64127 nwCelerra
10.19.16.21:58332 smUNIX
I have a bash file with increasing integers in the first column and some text behind.
1,text1a,text1b
2,text2a,text2b
3,text3a,text3b
4,text4a,text4b
...
I would like to add line 1+2, 3+4 etc. and add the outcome to a new csv file.
The desired output would be
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
...
A second option without the numbers would be great as well. The actual input would be
1,text,text,,,text#text.com,2,text.text,text
2,text,text,,,text#text.com,3,text.text,text
3,text,text,,,text#text.com,2,text.text,text
4,text,text,,,text#text.com,3,text.text,text
Desired outcome
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
$ pr -2ats, file
gives you
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
UPDATE
for the second part
$ cut -d, -f2- file | pr -2ats,
will give you
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
awk solution:
awk '{ printf "%s%s",$0,(!(NR%2)? ORS:",") }' input.csv > output.csv
The output.csv content:
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
----------
Additional approach (to skip numbers):
awk -F',' '{ printf "%s%s",$2 FS $3,(!(NR%2)? ORS:FS) }' input.csv > output.csv
The output.csv content:
text1a,text1b,text2a,text2b
text3a,text3b,text4a,text4b
3rd approach (for your extended input):
awk -F',' '{ sub(/^[0-9]+,/,"",$0); printf "%s%s",$0,(!(NR%2)? ORS:FS) }' input.csv > output.csv
With bash, cut, sed and paste:
paste -d, <(cut -d, -f 2- file | sed '2~2d') <(cut -d, -f 2- file | sed '1~2d')
Output:
text1a,text1b,text2a,text2b
text3a,text3b,text4a,text4b
I hoped to get started with something simple as
printf '%s,%s\n' $(<inputfile)
This turns out wrong when you have spaces inside your text fields.
The improvement is rather a mess:
source <(echo "printf '%s,%s\n' $(sed 's/.*/"&"/' inputfile|tr '\n' ' ')")
Skipping the first filed can be done in the same sed command:
source <(echo "printf '%s,%s\n' $(sed -r 's/([^,]*),(.*)/"\2"/' inputfile|tr '\n' ' ')")
EDIT:
This solution will fail when it has special characters, so you should use a simple solution as
cut -f2- file | paste -d, - -
Suppose there is one file.txt in which below content text is written:
ABC/xyz
ABC/xyz/rst
EFG/ghi
I need to write a shell script that can extract the first unique word before the first /.
So as output, I want ABC and EFG to be written in one file.
You can extract the first word with cut (slash as delimiter), then pipe to sort with the -u (for "unique") option:
$ cut -d '/' -f 1 file.txt | sort -u
ABC
EFG
To get the output into a file, just redirect by appending > filename to the command. (Or pipe to tee filename to see the output and get it in a file.)
Try this :
cat file.txt | tr -s "/" ' ' | awk -F " " '{print $1}' | sort | uniq > outfile.txt
Another interesting variation:
awk -F'/' '{print $1 |" sort -u" }' file.txt > outfile.txt
Not that it matters here, but being able to pipe and redirect within awk can be very handy.
Another easy way:
cut -d"/" -f1 file.txt|uniq > out.txt
You can use a mix of cut and sort like so:
cut -d '/' -f 1 file.txt | sort -u > newfile.txt
The first line grabs any string until a slash / and outputs it into newfile.txt.
The second line sorts the text, removing any duplicate strings you might have.
I have a logfile continously filling with stuff.
I wish to monitor this file, grep for a specific line and then extract and use parts of that line in a curl command.
I had a look at How to grep and execute a command (for every match)
This would work in a script but I wonder if it is possible to achieve this with the oneliner below using xargs or something else?
Example:
Tue May 01|23:59:11.012|I|22|Event to process : [imsi=242010800195809, eventId = 242010800195809112112, msisdn=4798818181, inbound=false, homeMCC=242, homeMNC=01, visitedMCC=238, visitedMNC=01, timestamp=Tue May 12 11:21:12 CEST 2015,hlr=null,vlr=4540150021, msc=4540150021 eventtype=S, currentMCC=null, currentMNC=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wsms2.EventProcessor|processEvent|139
Extract the fields I want and semi-colon separate them:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2";"$4";"$12}' | tr -cd '[[:digit:].\n.;]'
Curl command, e.g. something like:
http://user:pass#www.some-url.com/services/myservice?msisdn=...&imsi=...&vlr=...
Thanks!
Try this:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2" "$4" "$12; }' | tr -cd '[[:digit:].\n. ]' |while read msisdn imsi vlr ; do curl "http://user:pass#www.some-url.com/services/myservice?msisdn=$msisdn&imsi=$imsi&vlr=$vlr" ; done
I need to extract domains from a file.
domains.txt:
eofjoejfej fjpejfe http://ejej.dm1.com dêkkde
ojdoed www.dm2.fr doejd eojd oedj eojdeo
http://dm3.org ieodhjied oejd oejdeo jd
ozjpdj eojdoê jdeojde jdejkd http://dm4.nu/
io d oed 234585 http://jehrhr.dm5.net/hjrehr
[2014-05-31 04:05] eohjpeo jdpiehd pe dpeoe www.dm6.uk/jehr
I need to get:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.co.uk
Try this sed command,
$ sed -r 's/.*(dm[^\.]*\.[^/ ]*).*/\1/g' file
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
This is a bit long, but should work:
grep -oE "http[^ ]*|www[^ ]*" file | sed -e 's|http://||g' -e 's/^www\.//g' -e 's|/.*$||g' -re 's/^.*\.([^\.]+\.[^\.]+$)/\1/g'
Output:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Unrefined method using grep and sed:
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' file | sed 's/www.//'
Outputs:
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
An answer with gawk:
LC_ALL=C gawk -d -v RS="[[:space:]]+" -v FS="." '
{
# Remove the http prefix if it exists
sub( /http:[/][/]/, "" )
# Remove the path
sub( /[/].*$/, "" )
# Does it look like a domain?
if ( /^([[:alnum:]]+[.])+[[:alnum:]]+$/ ) {
# Print the last 2 components of the domain name
print $(NF-1) "." $NF
}
}' file
Some notes:
Using RS="[[:space:]]" allow us to process each group of letter independently.
LC_ALL=C forces [[:alnum:]] to be ASCII-only (this is not necessary any more with gawk 4+).
To be able to remove subdomains you have to validate them first, because if you cut the columns it would affect the TLDs. Then you have to do 3 steps.
Step 1: clean domains.txt
grep -oiE '([a-zA-Z0-9][a-zA-Z0-9-]{1,61}\.){1,}(\.?[a-zA-Z]{2,}){1,}' domains.txt | sed -r 's:(^\.*?(www|ftp|ftps|ftpes|sftp|pop|pop3|smtp|imap|http|https)[^.]*?\.|^\.\.?)::gi' | sort -u > capture
Content capture
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
Step 2: download and filter TLD list:
wget https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat
grep -v "//" public_suffix_list.dat | sed '/^$/d; /#/d' | grep -v -P "[^a-z0-9_.-]" | sed 's/^\.//' | awk '{print "." $1}' | sort -u > tlds.txt
So far you have two lists (capture and tlds.txt)
Step 3: Download and run this python script:
wget https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/tools/parse_domain_tld.py && chmod +x parse_domain_tld.py && python parse_domain_tld.py | sort -u
out:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Source: blackweb
This can be useful:
grep -Pho "(?<=http://)[^(\"|'|[:space:])]*" file.txt | sed 's/www.//g' | grep -Eo '[[:alnum:]]{1,}\.[[:alnum:]]{1,}[.]{0,1}[[:alnum:]]{0,}' | sort | uniq
First grep get 'http://www.example.com' enclosed in single or double quotes, but extract only domain. Second, using 'sed' I remove 'www.', third one extract domain names separated by '.' and in block of two or three alfnumeric characters. At the end, output is ordered to display only single instances of each domain