Suppose I have long text files output1, output2, and output3. In all output files, somewhere is "My Name is Rock (static)"
and below that text some values like
"My Name is Rock (static)"
10 20 30
-10 0.5 00
3.0 0.0 0.0 (different for all output file)
How can I copy the second column of the third line below the line ("My Name is Rock (static)") to a new file?
Remember line numbers are different for all output file.
awk 'c{c--;if(!c) print $2}/My Name is Rock \(static\)/{c=3}' ./output1 ./output2 ./output3
Explanation
/My Name is Rock \(static\)/{c=3}: When "My Name is Rock (static)" is seen, set c to 3
c{...}: If the counter c is non-zero, do ... (note, c starts off at 0)
c--;if(!c) print $2: Decrement counter, if counter is now zero, print 2nd field of current line
Related
I have a text like this:
Print <javascript:PrintThis();>
www.example.com
Order Number: *912343454656548 * Date of Order: November 54 2043
------------------------------------------------------------------------
*Dicders Folcisad:
* STACKOVERFLOW
*dum FWEFaadasdd:* [U+200E]
STACK OVERFLOW
BLVD OF SOMEPLACENICE 434
SANTA MONICA, COUNTY
LOS ANGEKES, CALI 90210
(SW)
*Order Totals:*
Subtotal Usd$789.75
Shipping Usd$87.64
Duties & Taxes Usd$0.00
Rewards Credit Usd$0.00
*Order Total * *Usd$877.39 *
*Wordskccds:*
STACKOVERFLOW
FasntAsia
xxxx-xxxx-xxxx-
*test Method / Welcome Info *
易客满x京配个人行邮税- 运输 + 关税 & 税费 / ADHHX15892013504555636
*Order Number: 916212582744342X*
*#* *Item* *Price* *Qty.* *Discount* *Subtotal*
1
Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535
Usd$141.92 4 -Usd$85.16 Usd$482.52
2
Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg,
60 Stresss CXB-034251
Usd$192.24 1 -Usd$28.83 Usd$163.41
3
34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Tablesds XDF-38452
Usd$169.20 1 -Usd$25.38 Usd$143.82
*Extra Discounts:* Extra 15% discounts applied! Usd$139.37
*Stackoverflox Contact Information :*
*Web: *www.example.com
*Disclaimer:* something made, or service sold through this website,
have not been test by the sweden Spain norway and Dumrug
Advantage. They are not intended to treet, treat, forsee or
forshadow somw clover.
I'm trying to grab each line that start with number, then concat second line, and finally third line. example text:
1 Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535 Usd$141.92 4 -Usd$85.16 Usd$482.52
2 Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg, 60 Stresss CXB-034251 Usd$192.24 1 -Usd$28.83 Usd$163.41 <- 1 line
3 34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Wedscsd XDF-38452 Usd$169.20 1 -Usd$25.38 Usd$143.82 <- 1 lines as first
as you may notices Second line has 3 lines instead of 2 lines. So make it harder to grab.
Because of the newline and whitespace, the next command only grabs 1:
grep -E '1\s.+'
also, I have been trying to make it with new concats:
grep -E '1\s|[A-Z].+'
But doesn't work, grep begins to select similar pattern in different parts of the text
awk '{$1=$1}1' #done already
tr -s "\t\r\n\v" #done already
tr -d "\t\b\r" #done already
I'm trying to make a script, so I give as an ARGUMENT a not clean FILE and then grab the table and select each number with their respective data. Sometimes data has 4 lines, sometimes 3 lines. So copy/paste don't work for ME.
I think the last line to be joined is the line starting with "Usd". In that case you only need to change the formatting in
awk '
!orderfound && /^[0-9]/ {ordernr++; orderfound=1 }
orderfound { order[ordernr]=order[ordernr] " " $0 }
$1 ~ "Usd" { orderfound = 0 }
END {
for (i=1; i<=ordernr; i++) { print order[i] }
}' inputfile
The following qb45 code partly operates my Zebra GC420D printer.
1 cls : locate 15, 30: input "type floppy code"; B$
3 cls : locate 12, 30: "Is Barcode ";B$
5 Locate 15, 30: "Confirm Y/N"; E$: IF E$ = "Y" OR E$ = "y" THEN 20 ELSE 1
20 LPRINT"^XA"
23 LPRINT"B3,N,175,Y"
25 LPRINT"N^FD, B$, ^FS"
30 LPRINT "^PQ2"
35 LPRINT "^XZ"
40 LPRINT "end"
Two barcodes are produced which only code the string variable B$ not what B$ represents. Do I need a text variable? How can I emulate a text variable?
You have the variable inside a quoted string. Not sure what flavor of BASIC that is, but try:
LPRINT "^FD", B$, "^FS"
If that doesn't work, some BASICs use semicolon as the list separator to LPRINT:
LPRINT "^FD"; B$; "^FS"
I need to do get 2 things from tsv input file:
1- To find how many unique strings are there in a given column where individual values are comma separated. For this I used the below command which gave me unique values.
$awk < input.tsv '{print $5}' | sort | uniq | wc -l
Input file example with header (6 columns) and 10 rows:
$cat hum1003.tsv
p-Value Score Disease-Id Disease-Name Gene-Symbols Entrez-IDs
0.0463 4.6263 OMIM:117000 #117000 CENTRAL CORE DISEASE OF MUSCLE;;CCD;;CCOMINICORE MYOPATHY, MODERATE, WITH HAND INVOLVEMENT, INCLUDED;;MULTICORE MYOPATHY, MODERATE, WITH HAND INVOLVEMENT, INCLUDED;;MULTIMINICORE DISEASE, MODERATE, WITH HAND INVOLVEMENT, INCLUDED;;NEUROMUSCULAR DISEASE, CONGENITAL, WITH UNIFORM TYPE 1 FIBER, INCLUDED;CNMDU1, INCLUDED RYR1 (6261) 6261
0.0463 4.6263 OMIM:611705 MYOPATHY, EARLY-ONSET, WITH FATAL CARDIOMYOPATHY TTN (7273) 7273
0.0513 4.6263 OMIM:609283 PROGRESSIVE EXTERNAL OPHTHALMOPLEGIA WITH MITOCHONDRIAL DNA DELETIONS,AUTOSOMAL DOMINANT, 2 POLG2 (11232), SLC25A4 (291), POLG (5428), RRM2B (50484), C10ORF2 (56652) 11232, 291, 5428, 50484, 56652
0.0539 4.6263 OMIM:605637 #605637 MYOPATHY, PROXIMAL, AND OPHTHALMOPLEGIA; MYPOP;;MYOPATHY WITH CONGENITAL JOINT CONTRACTURES, OPHTHALMOPLEGIA, ANDRIMMED VACUOLES;;INCLUSION BODY MYOPATHY 3, AUTOSOMAL DOMINANT, FORMERLY; IBM3, FORMERLY MYH2 (4620) 4620
0.0577 4.6263 OMIM:609284 NEMALINE MYOPATHY 1 TPM2 (7169), TPM3 (7170) 7169, 7170
0.0707 4.6263 OMIM:608358 #608358 MYOPATHY, MYOSIN STORAGE;;MYOPATHY, HYALINE BODY, AUTOSOMAL DOMINANT MYH7 (4625) 4625
0.0801 4.6263 OMIM:255320 #255320 MINICORE MYOPATHY WITH EXTERNAL OPHTHALMOPLEGIA;;MINICORE MYOPATHY;;MULTICORE MYOPATHY;;MULTIMINICORE MYOPATHY MULTICORE MYOPATHY WITH EXTERNAL OPHTHALMOPLEGIA;;MULTIMINICORE DISEASE WITH EXTERNAL OPHTHALMOPLEGIA RYR1 (6261) 6261
0.0824 4.6263 OMIM:256030 #256030 NEMALINE MYOPATHY 2; NEM2 NEB (4703) 4703
0.0864 4.6263 OMIM:161800 #161800 NEMALINE MYOPATHY 3; NEM3MYOPATHY, ACTIN, CONGENITAL, WITH EXCESS OF THIN MYOFILAMENTS, INCLUDED;;NEMALINE MYOPATHY 3, WITH INTRANUCLEAR RODS, INCLUDED;;MYOPATHY, ACTIN, CONGENITAL, WITH CORES, INCLUDED ACTA1 (58) 58
0.0939 4.6263 OMIM:602771 RIGID SPINE MUSCULAR DYSTROPHY 1 MYH7 (4625), SEPN1 (57190), TTN (7273), ACTA1 (58) 4625, 57190, 7273, 58
So in this case the string is gene name and I want to count unique strings within the entire stretch of 5th column where they are separated by a comma and a space.
2- Next, the order of data is fixed and is arranged as per column 2's score. So, I want to know where is the gene of interest placed in this ranked list within column 5 (Gene-Symbols). And this has to be done after removing duplicates as same genes are being repeated based on other parameters in rest of the columns but it doesn't concern my final output. I only need to focus on ranked list as per column 2. How do I do that? Is there a command I can pipe to above command to get the position of given value?
Expected output:
If I type the command in point 1 then it should give me unique genes in column 5. I have total 18 genes in column 5. But unique values are 14. If gene of interest is TTN, then it's first occurrence was at second position in original ranked list. Hence, expected answer of where my gene of interest is located should be 2.
$14
$2
Thanks
I have about 6GB of various text files, the files have many lines but each record is missing its commas so all the data is in 1 record. I want to create a batch file where I can add commas at the appropriate places in each "record". I'm hoping to add commas so I can then import this into a database.
For example the file would be structured like this.
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
Each field has a unique length which I know, and it's static between all files.
For example
ID - 10 characters
NAME - 40 characters
ADDRESS - 30 characters
etc
This will need to be run on an ongoing basis as new files come in so I'm hoping for something I can give a non technical person they can just run.
Any quick way to do this in a bat file?
Using your example above. Note we count the characters starting from 0, then tell the set to use letters starting at a certain count, counting the word length from there. See bottom for layout.
#echo off
setlocal enabledelayedexpansion
for /F "tokens=* delims=" %%a in (filename.txt) do (
set str=%%a
set id=!str:~0,2!
set na=!str:~2,4!
set add=!str:~6,7!
set ph=!str:~13,5!
set em=!str:~18,5!
set etc=!str:~23,3!
echo !id!,!na!,!add!,!ph!,!em!,!etc!
)
Characters assigned in a string as:
I D n a m e A D D R E S S p h o n e E M A I L e t c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
ID starts at Character 0 and is 2 characters, including itself :~0,2
name starts at character 2 and is 4 characters long :~2,4
etc..
For many files just add another loop as a main loop or give a list of files.
Based on your provided example, here is a quick powershell command, (despite no tag):
(GC 'Report.txt' | Select -First 1).Insert(10,',').Insert(51,',').Insert(82,',') > 'Fixed.txt'
It takes the first line of Report.txt…
After 10 characters insert ,(0 + 10 = 10) + 1
After another 40 characters insert ,(11 + 40 = 51) + 1
After another 30 characters insert ,(52 + 30 = 82) + 1
etc.
…then outputs the line complete with insertions to Fixed.txt
Just continue the .Insert(<number>,',') sequence for your other fixed width column sizes and ensure you've changed the filenames to suit your circumstances.
Edit
The following as an update to your comment and subsequent edit should work for all lines in the file.
GC 'Report.txt' | % {($_).Insert(10,',').Insert(51,',').Insert(82,',')} | Out-File 'Fixed.txt'
I'm trying to print the first 5 lines from a set of large (>500MB) csv files into small headers in order to inspect the content more easily.
I'm using Ruby code to do this but am getting each line padded out with extra Chinese characters, like this:
week_num type ID location total_qty A_qty B_qty count㌀㐀ऀ猀漀爀琀愀戀氀攀ऀ㤀㜀ऀ䐀䔀开伀渀氀礀ऀ㔀㐀㜀㈀ ㌀ऀ㔀㐀㜀㈀ ㌀ऀ ऀ㤀㈀㔀㌀ഀ
44 small 14 A 907859 907859 0 550360㐀ऀ猀漀爀琀愀戀氀攀ऀ㐀㈀ऀ䐀䔀开伀渀氀礀ऀ㌀ ㈀㜀㐀ऀ㌀ ㈀
The first few lines of input file are like so:
week_num type ID location total_qty A_qty B_qty count
34 small 197 A 547203 547203 0 91253
44 small 14 A 907859 907859 0 550360
41 small 421 A 302174 302174 0 18198
The strange characters appear to be Line 1 and Line 3 of the data.
Here's my Ruby code:
num_lines=ARGV[0]
fh = File.open(file_in,"r")
fw = File.open(file_out,"w")
until (line=fh.gets).nil? or num_lines==0
fw.puts line if outflag
num_lines = num_lines-1
end
Any idea what's going on and what I can do to simply stop at the line end character?
Looking at input/output files in hex (useful suggestion by #user1934428)
Input file - each character looks to be two bytes.
Output file - notice the NULL (00) between each single byte character...
Ruby version 1.9.1
The problem is an encoding mismatch which is happening because the encoding is not explicitly specified in the read and write parts of the code. Read the input csv as a binary file "rb" with utf-16le encoding. Write the output in the same format.
num_lines=ARGV[0]
# ****** Specifying the right encodings <<<< this is the key
fh = File.open(file_in,"rb:utf-16le")
fw = File.open(file_out,"wb:utf-16le")
until (line=fh.gets).nil? or num_lines==0
fw.puts line
num_lines = num_lines-1
end
Useful references:
Working with encodings in Ruby 1.9
CSV encodings
Determining the encoding of a CSV file