Adjustment to a grep / sed output - bash

I'm trying to edit a grep/sed output - currently it is as follows;
grep -Pzo '(?s)INTO .?db.? VALUES[^(]\K[^;]*' aniko.sql | grep -Pao '\(([^,]*,){2}\K[^,]*'
'yscr_bbYcqN'
'yscr_bbS4kf'
'yscr_bbhrSZ'
'yscr_bbBl0C'
'yscr_bbrsKX'
I am then wanting to add test_ prefix to them all, however, I can only get it to amend one of them - so;
#!/bin/sh
read -p "enter the cPanel username: " cpuser
cd "/home/$cpuser/public_html"
dbusers=($(grep -Pzo '(?s)INTO .?db.? VALUES[^(]\K[^;]*' aniko.sql | grep -Pao '\(([^,]*,){2}\K[^,]*' | grep -Pzo [^\']));
for dbuser in $dbusers; do sed -i "s/$dbuser/${cpuser}\_${dbuser}/g" aniko.sql; done;
The result of this only updates one of the lines (all should show sensationahosti_)
sensationalhosti_yscr_bbYcqN
yscr_bbS4kf
yscr_bbhrSZ
yscr_bbBl0C
yscr_bbrsKX
The format of the file I am changing is as follows;
INSERT INTO `db` VALUES ('localhost','blog','sensationalhosti_yscr_bbYcqN','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog1','yscr_bbS4kf','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog','yscr_bbhrSZ','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog','yscr_bbBl0C','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog','yscr_bbrsKX','Y','Y','Y','Y','Y','N','N','N','N','N');

Related

Import of SQL Dump

From the below extract of data I am trying to view out only the items in bold, is there a way to do this? The specific section of the file of this is important which is why the 'into .db. values' is important - also assume I do NOT know the actual value of these
grep "INTO .db. VALUES" ./anikokiss_mysql4.sql
INSERT INTO db VALUES ('localhost','blog','yscr_bbYcqN','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog1','yscr_bbS4kf','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog','yscr_bbhrSZ','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog','yscr_bbBl0C','Y','Y','Y','Y','Y','N','N','N','N','N'),('localhost','blog','yscr_bbrsKX','Y','Y','Y','Y','Y','N','N','N','N','N');
Result of
cut -d',' -f2 <(awk '/.*INTO .?db.? VALUES.*/,/.*;/{if (NR > 2) nr[NR]}; NR in nr' ./anikokiss_mysql4.sql)
Is
'blog'
As such it appears to only pick the first batch of command seperated values and ignore those after
Similarly with grep command
grep "INTO .db. VALUES" ./anikokiss_mysql4.sql | cut -d "'" -f4
We get the result
blog
When I was updating DB names
find . -name '*.sql' -exec sed -i "s/\\(CREATE DATABASE [^\`]*\`\\)/\\1${cpuser}_/" {} +
Current output
[root#uk01 public_html]# grep -Pzo '(?s)INTO .?db.? VALUES[^(]\K[^;]*' aniko.sql | grep -Pao '\(([^,]*,){2}\K[^,]*' | sed -e 's/^/test_/'
test_'yscr_bbYcqN'
test_'yscr_bbS4kf'
test_'yscr_bbhrSZ'
test_'yscr_bbBl0C'
test_'yscr_bbrsKX'
This uses grep to achieve what you want:
grep -Pzo '(?s)INTO .?db.? VALUES[^(]\K[^;]*' ./anikokiss_mysql4.sql | grep -Pao '\([^,]*,\K[^,]*'
First I use grep to get everything after INTO db VALUES. Then, pass the output of that to another instance of grep which cuts it after the first ,.
To change what column to get you can just keep repeating the look behind in the last grep {N}:
# This gets first column
grep -Pzo '(?s)INTO .?db.? VALUES[^(]\K[^;]*' file | grep -Pao '\(([^,]*,){0}\K[^,]*'
# This gets fourth column
grep -Pzo '(?s)INTO .?db.? VALUES[^(]\K[^;]*' file | grep -Pao '\(([^,]*,){3}\K[^,]*'
Will get the fourth column,

Script to save bash output to YAML file

I can output the relevant details of my printer using the command line:
sed -En 's/[ \t]*b?(id[vp][^ \t]*|endpoint)(address)?[ \t]+([^ \t]*).* (out|in)?.*/\l\1\4 (\3)/Ip' <(lsusb | awk '$0 ~ /STMicroelectronics printer-80/{print $6}' | xargs -I % sh -c "lsusb -vvv -d %")
outputs:
idVendor (0x0483)
idProduct (0x5743)
endpointOUT (0x01)
endpointIN (0x81)
Now, I need to save those data to a YAML file in the following format:
printer:
type: Usb
idVendor: 0x0483
idProduct: 0x5743
in_ep: 0x81
out_ep: 0x01
I just don’t know how to achieve this formatting and save to the file.
I‘ve tried formatting the output, but couldn’t get further than this snippet.
This reformats each line:
sed -E 's/(.*) \((.*)\)/ \1: \2/g' <<< "idVendor (0x0483)"
result:
idVendor: 0x0483
Combine this with your previous command and prepend the printer: line to get the desired output:
echo "printer:" > file.yaml
sed -En 's/[ \t]*b?(id[vp][^ \t]*|endpoint)(address)?[ \t]+([^ \t]*).* (out|in)?.*/\l\1\4 (\3)/Ip' \
<(lsusb | awk '$0 ~ /STMicroelectronics printer-80/{print $6}' | \
xargs -I % sh -c "lsusb -vvv -d %"\
) | sed -E 's/(.*) \((.*)\)/ \1: \2/g' >> file.yaml
It is unclear how you want to get the line type: Usb, you can of course just write it into the file like printer:. As for the different names, simply append sed -e 's/endpointOUT/out_ep/' -e s/endpointIN/in_ep/'.

How to use awk to select text from a file starting from a line number until a certain string

I have this file where I want to read it starting from a certain line number, until a string. I already used
awk "NR>=$LINE && NR<=$((LINE + 121)) {print}" db_000022_model1.dlg
to read from a specific line until and incremented line number, but right now I need to make it stop by itself at a certain string in order to be able to use it on other files.
DOCKED: ENDBRANCH 7 22
DOCKED: TORSDOF 3
DOCKED: TER
DOCKED: ENDMDL
I want it to stop after it reaches
DOCKED: ENDMDL
#!/bin/bash
# This script is for extracting the pdb files from a sorted list of scored
# ligands
mkdir top_poses
for d in $(head -20 summary_2.0.sort | cut -d, -f1 | cut -d/ -f1)
do
cd "$d"||continue
# find the cluster with the highest population within the dlg
RUN=$(grep '###*' "$d.dlg" | sort -k10 -r | head -1 | cut -d\| -f3 | sed 's/ //g')
LINE=$(grep -ni "BEGINNING GENETIC ALGORITHM DOCKING $RUN of 100" "$d.dlg" | cut -d: -f1)
echo "$LINE"
# extract the best pose and correct the format
awk -v line="$((LINE + 14))" "NR>=line; /DOCKED: ENDMDL/{exit}" "$d.dlg" | sed 's/^........//' > "$d.pdbqt"
# convert the pdbqt file into pdb
#obabel -ipdbqt $d.pdbqt -opdb -O../top_poses/$d.pdb
cd ..
done
When I try the
awk -v line="$((LINE + 14))" "NR>=line; /DOCKED: ENDMDL/{exit}" "$d.dlg" | sed 's/^........//' > "$d.pdbqt"
Just like that in the shell terminal, it works. But in the script it outputs an empty file.
Depending on your requirements for handling DOCKED: ENDMDL occurring before your target line:
awk -v line="$LINE" 'NR>=line; /DOCKED: ENDMDL/{exit}' db_000022_model1.dlg
or:
awk -v line="$LINE" 'NR>=line{print; if (/DOCKED: ENDMDL/) exit}' db_000022_model1.dlg

bash (grep|awk|sed) - Extract domains from a file

I need to extract domains from a file.
domains.txt:
eofjoejfej fjpejfe http://ejej.dm1.com dêkkde
ojdoed www.dm2.fr doejd eojd oedj eojdeo
http://dm3.org ieodhjied oejd oejdeo jd
ozjpdj eojdoê jdeojde jdejkd http://dm4.nu/
io d oed 234585 http://jehrhr.dm5.net/hjrehr
[2014-05-31 04:05] eohjpeo jdpiehd pe dpeoe www.dm6.uk/jehr
I need to get:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.co.uk
Try this sed command,
$ sed -r 's/.*(dm[^\.]*\.[^/ ]*).*/\1/g' file
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
This is a bit long, but should work:
grep -oE "http[^ ]*|www[^ ]*" file | sed -e 's|http://||g' -e 's/^www\.//g' -e 's|/.*$||g' -re 's/^.*\.([^\.]+\.[^\.]+$)/\1/g'
Output:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Unrefined method using grep and sed:
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' file | sed 's/www.//'
Outputs:
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
An answer with gawk:
LC_ALL=C gawk -d -v RS="[[:space:]]+" -v FS="." '
{
# Remove the http prefix if it exists
sub( /http:[/][/]/, "" )
# Remove the path
sub( /[/].*$/, "" )
# Does it look like a domain?
if ( /^([[:alnum:]]+[.])+[[:alnum:]]+$/ ) {
# Print the last 2 components of the domain name
print $(NF-1) "." $NF
}
}' file
Some notes:
Using RS="[[:space:]]" allow us to process each group of letter independently.
LC_ALL=C forces [[:alnum:]] to be ASCII-only (this is not necessary any more with gawk 4+).
To be able to remove subdomains you have to validate them first, because if you cut the columns it would affect the TLDs. Then you have to do 3 steps.
Step 1: clean domains.txt
grep -oiE '([a-zA-Z0-9][a-zA-Z0-9-]{1,61}\.){1,}(\.?[a-zA-Z]{2,}){1,}' domains.txt | sed -r 's:(^\.*?(www|ftp|ftps|ftpes|sftp|pop|pop3|smtp|imap|http|https)[^.]*?\.|^\.\.?)::gi' | sort -u > capture
Content capture
ejej.dm1.com
dm2.fr
dm3.org
dm4.nu
jehrhr.dm5.net
dm6.uk
Step 2: download and filter TLD list:
wget https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat
grep -v "//" public_suffix_list.dat | sed '/^$/d; /#/d' | grep -v -P "[^a-z0-9_.-]" | sed 's/^\.//' | awk '{print "." $1}' | sort -u > tlds.txt
So far you have two lists (capture and tlds.txt)
Step 3: Download and run this python script:
wget https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/tools/parse_domain_tld.py && chmod +x parse_domain_tld.py && python parse_domain_tld.py | sort -u
out:
dm1.com
dm2.fr
dm3.org
dm4.nu
dm5.net
dm6.uk
Source: blackweb
This can be useful:
grep -Pho "(?<=http://)[^(\"|'|[:space:])]*" file.txt | sed 's/www.//g' | grep -Eo '[[:alnum:]]{1,}\.[[:alnum:]]{1,}[.]{0,1}[[:alnum:]]{0,}' | sort | uniq
First grep get 'http://www.example.com' enclosed in single or double quotes, but extract only domain. Second, using 'sed' I remove 'www.', third one extract domain names separated by '.' and in block of two or three alfnumeric characters. At the end, output is ordered to display only single instances of each domain

sed move text in .txt to next line

I am trying to parse out a text file that looks like the following:
EMPIRE,STATE,BLDG,CO,494202320000008,336,5,AVE,ENT,NEW,YORK,NY,10003,N,3/1/2012,TensionCode,VariableICAP,PFJICAP,Residential,%LBMPZone,L,9,146.0,,,10715.0956,,,--,,0,,,J,TripNumber,ServiceClass,PreviousAccountNumber,MinMonthlyDemand,TODCode,Profile,Tax,Muni,41,39,00000000000000,9952,54,Y,Non-Taxable,--,FromDate,ToDate,Use,Demand,BillAmt,12/29/2011,1/31/2012,4122520,6,936.00,$293,237.54
what I would like to see is the data stacked
- EMPIRE STATE BLDG CO
- 494202320000008
- 336 5 AVE ENT
- NEW YORK NY
and so on. If anything, after each comma I would want the text following to go to a new txt line. Ultimatly in regards to the last line where it states date from forward, I would like to have it in a txt file like
- From Date ToDate use Demand BillAmt
- 12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
I am using cygwin on a windows XP machine. Thank you in advance for any assistance.
For getting the last line into a separate file:
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > lastlinefile.txt
cat originalfile.txt | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> lastlinefile.txt
For the rest:
cat originalfile.txt | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > nocommas.txt
Your mileage may vary as far as the first '\n' is concerned in the second command. It if doesn't work properly replace it with a space (assuming your data doesn't have spaces).
Or, if you like, a shell script to operate on a file and split it:
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > "$1_lastline.txt"
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> "$1_lastline.txt"
cat "$1" | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > "$1_fixed.txt"
Just paste it into a file and run it. It's been years since I used Cygwin... you may have to chmod +x file it first.
I'm providing you two answers depending on how you wanted the file. The previous answer split it into two files, this one keeps it all in one file in the format:
EMPIRE
STATE
BLDG
CO
494202320000008
336
5
AVE
ENT
NEW
YORK
NY
From Date ToDate use Demand BillAmt
12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
That's the best I can do with the delimiters have you set in place. If you'd have left it something like "EMPIRE STATE BUILDING CO,494202320000008,336 5 AVE ENT,NEW YORK,NY" it'd be a lot easier.
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{gsub(",","\n",$1);print $1;print "FromDate\tToDate\tuse\tDemand\tBillAmt";gsub("FromDate,ToDate,use,Demand,BillAmt","",$2);gsub(",","\t",$2);print $2}' >> "$1_fixed.txt"
again, just paste it into a file and run it from Cygwin: ./filename.sh

Resources