Extract a portion of text from a file name - bash

I have files with following names, Each file name contains the information of Area code and house number. I'm new to scripting. How to write bash script for extracting area code and house number?
ID-Final_RDX_301_002-14_33_1992
Area code is 301
house number is 002
ID-Final_RDX-311-004-14_28_1992
Area code is 311
house number is 004
ID-Final_RDX311021-14_28_1992
Area code is 311
house number is 021
ID-Final_RDX-XT-Se3-14_28_1992
Area code is XT
house number is Se3
ID-Final_RDX-XT-Se11-14_28_1992
Area code is XT
house number is Se11

Your filenames doesn't follow a pattern as mentioned in [this] comment. But I hope it is a typo. If that is the case, you could do something in similar terms as mentioned below :
find . -type f -name "ID-Final*" -exec awk -vfile={} 'BEGIN{
split(file,res,"-|_");
printf "Area Code : %s House Number : %s%s",res[4],res[5],ORS
}' \;
Area Code : 311 House Number : 004
Area Code : 14 House Number : 28
Area Code : XT House Number : Se3
Area Code : XT House Number : Se11
Area Code : 301 House Number : 002
Well,if the missing -/_s are intentional, then you need much more than this simple awk to solve this.

Related

how to grab text after newline and concat each line to make a new one in a text file no clean of spaces, tabs

I have a text like this:
Print <javascript:PrintThis();>
www.example.com
Order Number: *912343454656548 * Date of Order: November 54 2043
------------------------------------------------------------------------
*Dicders Folcisad:
* STACKOVERFLOW
*dum FWEFaadasdd:* ‎[U+200E] ‎
STACK OVERFLOW
BLVD OF SOMEPLACENICE 434
SANTA MONICA, COUNTY
LOS ANGEKES, CALI 90210
(SW)
*Order Totals:*
Subtotal Usd$789.75
Shipping Usd$87.64
Duties & Taxes Usd$0.00 ‎
Rewards Credit Usd$0.00
*Order Total * *Usd$877.39 *
*Wordskccds:*
STACKOVERFLOW
FasntAsia
xxxx-xxxx-xxxx-
*test Method / Welcome Info *
易客满x京配个人行邮税- 运输 + 关税 & 税费 / ADHHX15892013504555636
*Order Number: 916212582744342X*
*#* *Item* *Price* *Qty.* *Discount* *Subtotal*
1
Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535
Usd$141.92 4 -Usd$85.16 Usd$482.52
2
Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg,
60 Stresss CXB-034251
Usd$192.24 1 -Usd$28.83 Usd$163.41
3
34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Tablesds XDF-38452
Usd$169.20 1 -Usd$25.38 Usd$143.82
*Extra Discounts:* Extra 15% discounts applied! Usd$139.37
*Stackoverflox Contact Information :*
*Web: *www.example.com
*Disclaimer:* something made, or service sold through this website,
have not been test by the sweden Spain norway and Dumrug
Advantage. They are not intended to treet, treat, forsee or
forshadow somw clover.
I'm trying to grab each line that start with number, then concat second line, and finally third line. example text:
1 Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535 Usd$141.92 4 -Usd$85.16 Usd$482.52
2 Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg, 60 Stresss CXB-034251 Usd$192.24 1 -Usd$28.83 Usd$163.41 <- 1 line
3 34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Wedscsd XDF-38452 Usd$169.20 1 -Usd$25.38 Usd$143.82 <- 1 lines as first
as you may notices Second line has 3 lines instead of 2 lines. So make it harder to grab.
Because of the newline and whitespace, the next command only grabs 1:
grep -E '1\s.+'
also, I have been trying to make it with new concats:
grep -E '1\s|[A-Z].+'
But doesn't work, grep begins to select similar pattern in different parts of the text
awk '{$1=$1}1' #done already
tr -s "\t\r\n\v" #done already
tr -d "\t\b\r" #done already
I'm trying to make a script, so I give as an ARGUMENT a not clean FILE and then grab the table and select each number with their respective data. Sometimes data has 4 lines, sometimes 3 lines. So copy/paste don't work for ME.
I think the last line to be joined is the line starting with "Usd". In that case you only need to change the formatting in
awk '
!orderfound && /^[0-9]/ {ordernr++; orderfound=1 }
orderfound { order[ordernr]=order[ordernr] " " $0 }
$1 ~ "Usd" { orderfound = 0 }
END {
for (i=1; i<=ordernr; i++) { print order[i] }
}' inputfile

How to copy particular lines below particular text?

Suppose I have long text files output1, output2, and output3. In all output files, somewhere is "My Name is Rock (static)"
and below that text some values like
"My Name is Rock (static)"
10 20 30
-10 0.5 00
3.0 0.0 0.0 (different for all output file)
How can I copy the second column of the third line below the line ("My Name is Rock (static)") to a new file?
Remember line numbers are different for all output file.
awk 'c{c--;if(!c) print $2}/My Name is Rock \(static\)/{c=3}' ./output1 ./output2 ./output3
Explanation
/My Name is Rock \(static\)/{c=3}: When "My Name is Rock (static)" is seen, set c to 3
c{...}: If the counter c is non-zero, do ... (note, c starts off at 0)
c--;if(!c) print $2: Decrement counter, if counter is now zero, print 2nd field of current line

How to find number of unique strings in a column followed by position of a given string

I need to do get 2 things from tsv input file:
1- To find how many unique strings are there in a given column where individual values are comma separated. For this I used the below command which gave me unique values.
$awk < input.tsv '{print $5}' | sort | uniq | wc -l
Input file example with header (6 columns) and 10 rows:
$cat hum1003.tsv
p-Value Score Disease-Id Disease-Name Gene-Symbols Entrez-IDs
0.0463 4.6263 OMIM:117000 #117000 CENTRAL CORE DISEASE OF MUSCLE;;CCD;;CCOMINICORE MYOPATHY, MODERATE, WITH HAND INVOLVEMENT, INCLUDED;;MULTICORE MYOPATHY, MODERATE, WITH HAND INVOLVEMENT, INCLUDED;;MULTIMINICORE DISEASE, MODERATE, WITH HAND INVOLVEMENT, INCLUDED;;NEUROMUSCULAR DISEASE, CONGENITAL, WITH UNIFORM TYPE 1 FIBER, INCLUDED;CNMDU1, INCLUDED RYR1 (6261) 6261
0.0463 4.6263 OMIM:611705 MYOPATHY, EARLY-ONSET, WITH FATAL CARDIOMYOPATHY TTN (7273) 7273
0.0513 4.6263 OMIM:609283 PROGRESSIVE EXTERNAL OPHTHALMOPLEGIA WITH MITOCHONDRIAL DNA DELETIONS,AUTOSOMAL DOMINANT, 2 POLG2 (11232), SLC25A4 (291), POLG (5428), RRM2B (50484), C10ORF2 (56652) 11232, 291, 5428, 50484, 56652
0.0539 4.6263 OMIM:605637 #605637 MYOPATHY, PROXIMAL, AND OPHTHALMOPLEGIA; MYPOP;;MYOPATHY WITH CONGENITAL JOINT CONTRACTURES, OPHTHALMOPLEGIA, ANDRIMMED VACUOLES;;INCLUSION BODY MYOPATHY 3, AUTOSOMAL DOMINANT, FORMERLY; IBM3, FORMERLY MYH2 (4620) 4620
0.0577 4.6263 OMIM:609284 NEMALINE MYOPATHY 1 TPM2 (7169), TPM3 (7170) 7169, 7170
0.0707 4.6263 OMIM:608358 #608358 MYOPATHY, MYOSIN STORAGE;;MYOPATHY, HYALINE BODY, AUTOSOMAL DOMINANT MYH7 (4625) 4625
0.0801 4.6263 OMIM:255320 #255320 MINICORE MYOPATHY WITH EXTERNAL OPHTHALMOPLEGIA;;MINICORE MYOPATHY;;MULTICORE MYOPATHY;;MULTIMINICORE MYOPATHY MULTICORE MYOPATHY WITH EXTERNAL OPHTHALMOPLEGIA;;MULTIMINICORE DISEASE WITH EXTERNAL OPHTHALMOPLEGIA RYR1 (6261) 6261
0.0824 4.6263 OMIM:256030 #256030 NEMALINE MYOPATHY 2; NEM2 NEB (4703) 4703
0.0864 4.6263 OMIM:161800 #161800 NEMALINE MYOPATHY 3; NEM3MYOPATHY, ACTIN, CONGENITAL, WITH EXCESS OF THIN MYOFILAMENTS, INCLUDED;;NEMALINE MYOPATHY 3, WITH INTRANUCLEAR RODS, INCLUDED;;MYOPATHY, ACTIN, CONGENITAL, WITH CORES, INCLUDED ACTA1 (58) 58
0.0939 4.6263 OMIM:602771 RIGID SPINE MUSCULAR DYSTROPHY 1 MYH7 (4625), SEPN1 (57190), TTN (7273), ACTA1 (58) 4625, 57190, 7273, 58
So in this case the string is gene name and I want to count unique strings within the entire stretch of 5th column where they are separated by a comma and a space.
2- Next, the order of data is fixed and is arranged as per column 2's score. So, I want to know where is the gene of interest placed in this ranked list within column 5 (Gene-Symbols). And this has to be done after removing duplicates as same genes are being repeated based on other parameters in rest of the columns but it doesn't concern my final output. I only need to focus on ranked list as per column 2. How do I do that? Is there a command I can pipe to above command to get the position of given value?
Expected output:
If I type the command in point 1 then it should give me unique genes in column 5. I have total 18 genes in column 5. But unique values are 14. If gene of interest is TTN, then it's first occurrence was at second position in original ranked list. Hence, expected answer of where my gene of interest is located should be 2.
$14
$2
Thanks

bash awk get numbers in two digits

I want to correct wrong meta data or add missing meta data for the 75 cd's I have ripped from disc.
I got the track info from AllMusic en stripped it to almost usable "CSV" data.
Number";"1";"Piece";"Nocturne for piano No. 2 in E flat major, Op. 9/2, CT. 109";"Componist";"Frédéric Chopin
MainPiece";"";"Piece";"Symphony No. 9 in E minor ("From the New World"), B. 178 (Op. 95) (first published as No. 5)
Number";"2";"Piece";"Largo";"Componist";"Antonin Dvorák
Number";"3";"Piece";"La plus que lente, waltz for piano (or orchestra), L. 121";"Componist";"Claude Debussy
Number";"4";"Piece";"Waldesrauschen (Forest Murmurs), for piano (Zwei Konzertetuden No. 1), S. 145/1 (LW A218/1)";"Componist";"Franz Liszt
MainPiece";"";"Piece";"Oboe Concerto, for oboe, strings & continuo in D minor, Op. 8/9, RV 454
Number";"5";"Piece";"Allegro";"Componist";"Antonio Vivaldi
Number";"6";"Piece";"Largo";"Componist";"Antonio Vivaldi
Number";"7";"Piece";"Allegro";"Componist";"Antonio Vivaldi
MainPiece";"";"Piece";"Cello Concerto in A major, G. 475
Number";"8";"Piece";"1. Allegro";"Componist";"Luigi Boccherini
Number";"9";"Piece";"2. Adagio";"Componist";"Luigi Boccherini
Number";"10";"Piece";"3. Rondò - Allegro";"Componist";"Luigi Boccherini
MainPiece";"";"Piece";"Serenade No. 12 for winds in C minor ("Nacht Musique"), K. 388 (K. 384a)
Number";"11";"Piece";"Allegro";"Componist";"Wolfgang Amadeus Mozart
Number";"12";"Piece";"Liebesträume, notturno for piano No. 3 in A flat major ("O Lieb, so lang du lieben kannst"), S. 541/3 (LW A103/3)";"Componist";"Franz Liszt
MainPiece";"";"Piece";"Phantasiestücke (4) for violin, cello & piano in A minor, Op. 88
Number";"13";"Piece";"Romanze";"Componist";"Robert Schumann
MainPiece";"";"Piece";"Sinfonia Concertante for violin, cello, oboe, bassoon & orchestra, H. 1/105
Number";"14";"Piece";"Andante";"Componist";"Franz Joseph Haydn
I would like to rewrite this with awk to a script to set meta data
eyeD3 -n 01 -a composer -t mainpiece piece 01*.mp3
And with awk to rename the files
mv 01*.mp3 01 [composer] mainpiece piece.mp3
The mainpiece / piece is an manual part but I would like to rewrite 1 to 01.
I found something with printf ("%2d" ,$1,$2) but thins complaints about .mp3
Has anyone suggestions for me?

using variables in gsub

I have a variable address which for now is a long string containing some unneccessary info, eg: "Aboriginal Relations 11th Floor Commerce Place 10155 102 Street Edmonton AB T5J 4G8 Phone 780 427-9658 Fax 780 644-4939 Email gerry.kushlyk#gov.ab.ca"
Aboriginal Relations is in a variable called title, and I'm trying to call address.gsub!(title,''), but its returning the original string.
I've also tried address.gsub!(/#{title}/,'') and address.gsub!("#{title}",'') but those won't work either. Any ideas?
Sorry, the typo occurred when I typed it into stack overflow, heres the code and the output, copied and pasted:
(this is within a loop, so there will be multiple outputs)
p title
address.gsub!(title,'')
p address
output
"Aboriginal Relations "
"Aboriginal Relations 11th Floor Commerce Place 10155 102 Street Edmonton AB T5J 4G8 Phone 780 427-9658 Fax 780 644-4939 Email gerry.kushlyk#gov.ab.ca"
"Aboriginal Tourism Advisory Council "
"Aboriginal Tourism Advisory Council 5th Floor Terrace Building 9515 107 Street Edmonton AB T5K 2C3 Phone 780 427-9687 Fax 780 422-7235 Email foip.fintprccs#gov.ab.ca"
"Acadia Foundation "
"Acadia Foundation PO Box 96 Oyen AB T0J 2J0 Phone 403 664-3384 Fax 403 664-3316 Email acadiafoundation#telus.net"
"Access Advisory Council "
"Access Advisory Council 12th Floor Centre West Building 10035 108 Street Edmonton AB T5J 3E1 Phone 780 427-2805 Fax 780 422-3204 Email barb.joyner#gov.ab.ca"
"ACCM Benevolent Association "
"ACCM Benevolent Association Suite 100 9403 95 Avenue Edmonton AB T6C 4M7 Phone 780 468-4648 Fax 780 468-4648 Email accmmanor#shaw.ca"
"Acme Municipal Library "
"Acme Municipal Library PO Box 326 Acme AB T0M 0A0 Phone 403 546-3845 Fax 403 546-2248 Email aamlibrary#marigold.ab.ca"
likewise, if I try address.match(/#{title}/) I get nil.
I'm assuming you're using ruby 1.9 or higher.
It's possible that the trailing whitespace is a non-breaking space:
p "Relations\u00a0" # looks like a trailing space, but strip won't remove it
to get rid of it:
"Relations\u00a0".gsub!(/^\u00a0|\u00a0$/, '') # => "Relations"
A more generic solution for all unicode whitespace:
"Relations\u00a0".gsub!(/^[[:space:]]|[[:space:]]$/, '') # => "Relations"
To see what the character is in your case:
title[-1].ord # => 160 (example only)
'%x' % title[-1].ord # => "a0" (hex equivalent; example only)
title = title[0..-2] seemed to solve it. for some reason strip and chomp wouldn't work.

Resources