extract text between two strings bash sed excluding - bash

I am trying to extract some text between two strings (which appear only once in the file).
Suppose the file is,
....Some Data
Your name is:
Dean/Winchester
You are male. Some data .....
I want to extract the text between 'Your name is:' and 'You are male.' both are unique and occur only once.
So, the output should be,
Dean/Winchester
I tried using sed,
sed -n 's/Your name is:\(.*\)You are male./\1/' abcd
But it doesn’t output anything.
Any help will be appreciated.
Thanks

$ sed -n '0,/Your name is/ d; /You are male/,$ d; /^$/d; p' abcd
Dean/Winchester
For variety, here is an awk solution:
$ awk '/Your name is/ {p=1; next} /You are male/ {exit} /^$/ {next} p==1 {print}' abcd
Dean/Winchester

$ sed -n -e '/^Your name is:/,/^You are male/{ /^Your name is:/d; /^You are male/d; p; }' test
Dean/Winchester

Related

Parsing and modifying csv with bash

Have a csv file with tons of rows, small example:
id,location_id,name,title,email,directorate
1,1, Amy lee,Singer,,
2,2,brad Pitt,Actor,,Production
3,5,Steven Spielberg,Producer,spielberg#my.com,Production
Need to:
change first and last name to uppercase, example, Brad Pitt, Amy Lee.
create email with pattern first letter of first name + last name, all in lowercase with #google.com and value from location_id, example - ale1e#google.com, bpitt2#google.com
save it to new file.csv, with the same structure, example:
id,location_id,name,title,email,directorate
1,1, Amy Lee,Singer,alee1#google.com,
2,2,Brad Pitt,Actor,bpitt#google.com,Production
3,5,Steven Spielberg,Producer,sspielberg#google.com,Production
I started from create a array and iterate through it, with bunch of sed, awk, but it gives to me random results.
Please give me advice, how resolve this task.
while read -ra array; do
for i in ${array[#]};
do
awk -F ',' '{print tolower(substr($3,1,1))$2$3"#google.com"}'
done
for i in ${array[#]};
do
awk -F "\"*,\"*" '{print $3}' | sed -e "s/\b\(.\)/\u\1/g"
done
done < file.csv
awk -F ',' '{print tolower(substr($3,1,1))$2$3"#google.com"}' working not correct.
Using GNU sed
$ sed -E 's/([^,]*,([^,]*),) ?(([[:alpha:]])[^ ]* +)(([^,]*),[^,]*,)[^,]*/\1\u\3\u\5\L\4\6\2#google.com/' input_file
id,location_id,name,title,email,directorate
1,1,Amy Lee,Singer,alee1#google.com,
2,2,Brad Pitt,Actor,bpitt2#google.com,Production
3,5,Steven Spielberg,Producer,sspielberg5#google.com,Production
With your shown samples please try following awk.
awk '
BEGIN{ FS=OFS="," }
{
split($3,arr," ")
val=(substr($3,1,1) arr[2]"#google.com,")
$NF=tolower(val) $NF
val=""
}
1
' Input_file

How can I prefix the output of each match in grep with some text?

I have a file with a list of phrases
apples
banananas
oranges
I'm running cat file.txt | xargs -I% sh -c "grep -Eio '(an)' >> output.txt"
What I can't figure out, is that I want the output to contain the original line, for example:
bananas,an
oranges,an
How can I prefix the output of grep to also include the value being piped to it?
This should be a task for awk, could you please try following.
awk '/an/{print $0",an"}' Input_file
This will look for string an in all lines of Input_file and append an in them too.
Solution with sed:
sed '/an/s/$/,an/' intput_file
This finds lines that match the pattern /an/, and appends ,an to the end of the pattern space $.
Use awk instead of grep:
$ awk -v s="an" ' # search string
BEGIN {
OFS="," # separating comma
}
match($0,s) { # when there is a match
print $0,substr($0,RSTART,RLENGTH) # output
}' file
Output:
banananas,an
oranges,an

Match a particular letter and print word after that using SED

I have a file "Log.txt" which look like this:
bla bla.. line1
bla bla.. line2
bla bla.. lineN
:000000 ... 239e670... A bla1.txt
:000000 ... 76fd777... M bla2.txt
:000000 ... e69de29... A bla3.txt
Let's say that I am looking for the letter 'A' and 'M'.
How would I look for it ONLY in the 4th field or line that contains this particular letter only. I need to Match the words "A" and "M" only and print the file name after that. i.e I need to get final output as below:
A bla1.txt
M bla2.txt
A bla3.txt
I used awk to match 4th column with A and M and print the next word. but not getting the expected output. I'm getting extra Bla Bla lines also.
Anyone has idea how to achieve this using sed?
awk for this:
awk '$4 ~ /^[AM]$/ { print $4," ",$5 }' Log.txt
sed for it:
sed -En '/^([^ ]+ ){3}[AM]/ { s/^([^ ]+ ){3}([AM] .*)/\2/; p; }' Log.txt
Both of these confirm that the A or M is in the 4th field.
Awk actually can do your job, just need to add a condition:
awk "/ (A|M) /{print $4,$5}" Log.txt
As for sed, you can do this:
sed -nr "/ (A|M) /{s/.*((A|M)\s+.*)$/\1/;p}" Log.txt
Not sure how are your real data looks like, but I guess you will get it and adjust the command to suit them.
As per your input file and your expected output, Please try below using awk:
awk '{if ($4 == "A" || $4 == "M") {print $4,$5}}' log.txt
Output:
A bla1.txt
M bla2.txt
A bla3.txt
This might work for you (GNU sed):
sed 's/^\(\S*\s\)\{3\}\([AM]\s\)/\2/p;d' file
Match the fourth field to be A or M and if so, remove the first three fields and print the remainder.

Make grep output more readable

I'm working with grep to patterns in files with grep -orI "id=\"[^\"]\+\"" . | sort | uniq -d
Which gives an output like the following:
./myFile.html:id="matchingR"
./myFile.html:id="other"
./myFile.html:id="cas"
./otherFile.html:id="what"
./otherFile.html:id="wheras"
./otherFile.html:id="other"
./otherFile.html:id="whatever"
What would be a convenient way to pipe this an have the following as output:
./myFile.html
id="matchingR"
id="other"
id="cas"
./otherFile.html
id="what"
id="wheras"
id="other"
id="whatever"
Basically group results by filename.
Not the prettiest but it works.
awk -F : -v OFS=: 'f!=$1 {f=$1; print f} f==$1 {$1=""; $0=$0; sub(/^:/, " "); print}'
If none of your lines can ever contain a colon then this simpler version also works.
awk -F : 'f!=$1 {f=$1; print f} f==$1 {$1=""; print}'
These both split fields on colons (-F :) print out the first field (filename) when it differs from a saved value (and save the new value) and when the first field matches the saved value they remove the first field and print. They differ in how they remove the field and print the output. The first attempts to preserve colons in the matched line. The second (and #fedorqui's version ... f==$1 {$0=$2; print}) assume no other colons were on the line to begin with.
Pass output to this script:
#!/bin/sh
sed 's/:/ /' | while read FILE TEXT; do
if [ "$FILE" = "$GROUP" ]; then
echo " $TEXT"
else
GROUP="$FILE"
echo "$FILE"
echo " $TEXT"
fi
done
Here is an short awk
awk -F: '{print ($1!=f?$1 RS:""),$2;f=$1}' file
./myFile.html
id="matchingR"
id="other"
id="cas"
./otherFile.html
id="what"
id="wheras"
id="other"
id="whatever"

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources