First of all, thank you so much for taking some time out to help me out!
I've been trying to figure out a way to split a full name into First, Middle, and Last names using awk and sed in bash shell without any success, this is an example of the names.
Kent D. Hones
Akelli Same Lizarraga
Sherein Rahmi
Theresa Q. Callins
Vanessa M. Dewson
Behzad Gazloo
Jim M. Skolen
Sherry Marie Wanaa
These are the bash commands that I've been trying to use.
awk -F"." '{print $1}' listnames.csv > output.csv
sed -e 's/*.//g' < listnames.csv > output.csv
The output of the used commands are:
for awk -F"." '{print $1}' listnames.csv > output.csv returns an empty output.csv
for sed -e 's/*.//g' < listnames.csv > output.csv returns the exact same list:
Kelly D. Hynes
Aketzalli Gamez Lizarraga
Shervin Rahimi
Theresa M. Collins
Vanessa L. Dawson
Behzad Garagozloo
James M. Skaalen
Shannon Marie Wenaa
The desired output is to have at least two list of
First name
Aketzalli
Shervin
Theresa
Vanessa
Behzad
James
Shannon
Last name
Hynes
Lizarraga
Rahimi
Collins
Dawson
Garagozloo
Skaalen
Wenaa
I was thinking that maybe I could use the "." in the Middle name to differentiate them but that would not work for distinguishing between last names and middle names.
Any help, insights, or feedback would be much appreciated.
Thanks! 🙏🏼
$ awk -v OFS=',' '{print $1, (NF>2 ? $2 : ""), $NF}' file
Kent,D.,Hones
Akelli,Same,Lizarraga
Sherein,,Rahmi
Theresa,Q.,Callins
Vanessa,M.,Dewson
Behzad,,Gazloo
Jim,M.,Skolen
Sherry,Marie,Wanaa
To print the first and last names to 2 separate files as you now show your desired output to be would just be:
awk '{print $1 > "first"; print $NF > "last"}' file
You could use awk to separate the names into comma separated values and then use sed to trim any whitespace characters.
awk '{if (NF == 3) {print $1",",$2",",$3} else {print $1",,"$2}}' listnames.csv |sed 's/[ .]//g'
if (NF == 3) ... This tests if the name contains a middle name or not in order to separate the values properly.
sed 's/[ .]//g' We use sed to remove whitespace and periods.
I need to find two numbers from lines which look like this
>Chr14:453901-458800
I have a large quantity of those lines mixed with lines that doesn't contain ":" so we can search for colon to find the line with numbers. Every line have different numbers.
I need to find both numbers after ":" which are separated by "-" then substract the first number from the second one and print result on the screen for each line
I'd like this to be done using awk
I managed to do something like this:
awk -e '$1 ~ /\:/ {print $0}' file.txt
but it's nowhere near the end result
For this example i showed above my result would be:
4899
Because it is the result of 458800 - 453901 = 4899
I can't figure it out on my own and would appreciate some help
With GNU awk. Separate the row into multiple columns using the : and - separators. In each row containing :, subtract the contents of column 2 from the contents of column 3 and print result.
awk -F '[:-]' '/:/{print $3-$2}' file
Output:
4899
Using awk
$ awk -F: '/:/ {split($2,a,"-"); print a[2] - a[1]}' input_file
4899
I have 2 files that are reports of size of the Databases (1 file is from yesterday, 1 from today).
I want to see how the size of each database changed, so I want to calculate the difference.
File looks like this:
"DATABASE","Alloc MB","Use MB","Free MB","Temp MB","Hostname"
"EUROPE","9133508","8336089","797419","896120","server3"
"ASIA","3740156","3170088","570068","354000","server5"
"AFRICA","4871331","4101711","769620","318412","server4"
Other file is the same, only the numbers are different.
I want to see how the database size changed (so ONLY column "Use MB").
I guess I cannot use "diff" or "awk" options since numbers may change dramatically each day. The only good 'algoritm' I can think of is to subtract numbers between 5th and 6th double quote ("), how do I do that?
You can do this (using awk):
paste file1 file2 -d ',' |awk -F ',' '{gsub(/"/, "", $3); gsub(/"/, "", $9); print $3 - $9}'
paste puts the two files next to another, separated by a comma (-d ','). So you will have :
"DATABASE","Alloc MB","Use MB","Free MB","Temp MB","Hostname","DATABASE","Alloc MB","Use MB","Free MB","Temp MB","Hostname"
"EUROPE","9133508","8336089","797419","896120","server3","EUROPE","9133508","8336089","797419","896120","server3"
...
gsub(/"/, "", $3) removes the quotes around column 3
And finally we print column 3 minus column 9
Maybe I missed something, but I don't get why you could not use awk as it can totally do
The only good 'algoritm' I can think of is to subtract numbers between
5th and 6th double quote ("), how do I do that?
Let's say that file1 is :
"DATABASE","Alloc MB","Use MB","Free MB","Temp MB","Hostname"
"EUROPE","9133508","8336089","797419","896120","server3"
"ASIA","3740156","3170088","570068","354000","server5"
"AFRICA","4871331","4101711","769620","318412","server4"
And file2 is :
"DATABASE","Alloc MB","Use MB","Free MB","Temp MB","Hostname"
"EUROPE","9133508","8335089","797419","896120","server3"
"ASIA","3740156","3170058","570068","354000","server5"
"AFRICA","4871331","4001711","769620","318412","server4"
Command
awk -F'[",]' 'NR>2&&NR==FNR{db[$2]=$8;next}FNR>2{print $2, db[$2]-$8}' file1 file2
gives you result :
EUROPE 1000
ASIA 30
AFRICA 100000
You can also use this answer to deal more properly with quotechars on awk.
If your awk version cannot support multiple field delimiters, you can try this :
awk -F, 'NR>2&&NR==FNR{db[$1]=$3;next}FNR>2{print $1, db[$1]-$3}' <(sed 's,",,g' file1) <(sed 's,",,g' file2)
I have gawk at my disposal and I have the following text I wish to format:
Trip.to.Washington.2004.08...
Florida.1993.12...
Aunt.Rose.2011.06...
I would like it to appear as follows:
Trip to Washington (2004)
Florida (1993)
Aunt Rose (2011)
The number of words in the title varies as does the amount of words following the year. The separator is sometimes a white space rather than a period.
Edit:
I was able achieve the desired output using:
echo Trip.to.Washington.2004.08... |
sed -n 's/\([0-9][0-9][0-9][0-9]\).*/\1/p' |
gawk 'BEGIN { FS="." } { print $1" "$2" "$3" ("$NF")" }'
Which returns:
Trip to Washington (2004)
The problem is that this will fail if there are more, or less, words in the title. It will also fail if the words are separated by anything other than a period.
I also found it is possible to return every field except for the last field using:
awk '{$NF=""; print $0}' file
Unfortunately, my experience with gawk is very limited. I haven't a clue as to how to correctly make use of this statement within my existing gawk.
With sed:
$ sed 's/\([0-9]\{4\}\).*/(\1)/; s/\./ /g' foo
Trip to Washington (2004)
Florida (1993)
Aunt Rose (2011)
Explained:
first surround 4 digit numbers and everything following them with parentheses using back-referencing :s/\([0-9]\{4\}\).*/(\1)/;
then replace all periods with space s/\./ /g
You an easily perform the substitution in Awk as well. If the words you want to replace are always the last five tokens on the dot-separated line, you could do something like
echo "Trip.to.Washington.2004.08..." |
gawk -F . '{ for(i=1; i<=NF-5; ++i) printf "%s ", $i; print "(" $NF-4 ")" }'
We loop over the tokens up to five less than NF and print each followed by a space. Then, we print the next one which is still left inside a pair of parentheses, and never print the rest.
So when i is 1, we print $1, which in this case is Trip, followed by a space.
When i is 2, we print $2, which in this case is to, again followed by a space.
When we reach Washington the loop ends because this is the fifth field counting from the end. We print that followed by a space, too, then exit the loop.
Then we still print the fourth from the end surrounded by parentheses, which should get us the year.
The -F . is a shorthand for your BEGIN { FS="." } but I made this change just for brevity; either way works fine.
There is nothing gawk specific here so you could use the generic awk as well.
How would I go about printing the first line of given input before I start stepping through each of the lines with awk?
Say I wanted to run the command ps aux and return the column headings and a particular pattern I'm searching for. In the past I've done this:
ps aux | ggrep -Pi 'CPU|foo'
Where CPU is a value I know will be in the first line of input as it's one of the column headings and foo is the particular pattern I'm actually searching for.
I found an awk pattern that will pull the first line:
awk 'NR > 1 { exit }; 1'
Which makes sense, but I can't seem to figure out how to fire this before I do my pattern matching on the rest of the input. I thought I could put it in the BEGIN section of the awk command but that doesn't seem to work.
Any suggestions?
Use the following awk script:
ps aux | awk 'NR == 1 || /PATTERN/'
it prints the current line either if it is the first line in output or if it contains the pattern.
Btw, the same result could be achieved using sed:
ps aux | sed -n '1p;/PATTERN/p'
If you want to read in the first line in the BEGIN action, you can read it in with getline, process it, and discard that line before moving on to the rest of your awk command. This is "stepping in", but may be helpful if you're parsing a header or something first.
#input.txt
Name City
Megan Detroit
Jackson Phoenix
Pablo Charlotte
awk 'BEGIN { getline; col1=$1; col2=$2; } { print col1, $1; print col2, $2 }' input.txt
# output
Name Megan
City Detroit
Name Jackson
City Phoenix
Name Pablo
City Charlotte
Explaining awk BEGIN
I thought I could put it in the BEGIN section ...
In awk, you can have more than one BEGIN clause. These are executed in order before awk starts to read from stdin.