Parsing text in Ruby

Parsing text in Ruby - ruby

I'm working on a script for importing component information for SketchUp. A very helpful individual on their help page, assisted me in creating one that works on an "edited" line by line text file. Now I'm ready to take it to the next level - importing directly from the original file created by FreePCB.
The portion of the file I wish to use is below: "sample_1.txt"
[parts]
part: C1
ref_text: 1270000 127000 0 -7620000 1270000 1
package: "CAP-AX-10X18-7X"
value: "4.7pF" 1270000 127000 0 1270000 1270000 1
shape: "CAP-AX-10X18-7"
pos: 10160000 10160000 0 0 0
part: IC1
ref_text: 1270000 177800 270 2540000 2286000 1
package: "DIP-8-3X"
value: "JRC 4558" 1270000 177800 270 10668000 508000 0
shape: "DIP-8-3"
pos: 2540000 27940000 0 90 0
part: R1
ref_text: 1270000 127000 0 3380000 -600000 1
package: "RES-CF-1/4W-4X"
value: "470" 1270000 127000 0 2180000 -2900000 0
shape: "RES-CF-1/4W-4"
pos: 15240000 20320000 0 270 0
The word [parts], in brackets, is just a section heading. The information I wish to extract is the reference designator, shape, position, and rotation. I already have code to do this from a reformatted text file, using IO.readlines(file).each{ |line| data = line.split(" ");.
My current method uses a text file re-formatted as thus: "sample_2.txt"
C1 CAP-AX-10X18-7 10160000 10160000 0 0 0
IC1 DIP-8-3 2540000 27940000 0 90 0
R1 RES-CF-1/4W-4 15240000 20320000 0 270 0
I then use an array to extract data[0], data[1], data[2], data[3], and data[5].
Plus an additional step, to append ".skp" to the end of the package name, to allow the script to insert components with the same name as the package.
I would like to extract the information from the 1st example, without having to re-format the file, as is the case with the 2nd example. i.e. I know how to pull information from a single string, split by spaces - How do I do it, when the text for one array, appears on more than one line?
Thanks in advance for any help ;-)
EDIT: Below is the full code to parse "sample_2.txt", that was re-formatted prior to running the script.
# import.rb - extracts component info from text file
# Launch file browser
file=UI.openpanel "Open Text File", "c:\\", "*.txt"
# Do for each line, what appears in braces {}
IO.readlines(file).each{ |line| data = line.split(" ");
# Append second element in array "data[1]", with SketchUp file extension
data[1] += ".skp"
# Search for component with same name as data[1], and insert in component browser
component_path = Sketchup.find_support_file data[1] ,"Components"
component_def = Sketchup.active_model.definitions.load component_path
# Create transformation from "origin" to point "location", convert data[] to float
location = [data[2].to_f, data[3].to_f, 0]
translation = Geom::Transformation.new location
# Convert rotation "data[5]" to radians, and into float
angle = data[5].to_f*Math::PI/180.to_f
rotation = Geom::Transformation.rotation [0,0,0], [0,0,1], angle
# Insert an instance of component in model, and apply transformation
instance = Sketchup.active_model.entities.add_instance component_def, translation*rotation
# Rename component
instance.name=data[0]
# Ending brace for "IO.readlines(file).each{"
}
Results in the following output, from running "import.rb" to open "sample_2.txt".
C1 CAP-AX-10X18-7 10160000 10160000 0<br>IC1 DIP-8-3 2540000 27940000 90<br>R1 RES-CF-1/4W-4 15240000 20320000 270
I am trying to get the same results from the un-edited original file "sample_1.txt", without the extra step of removing information from the file, with notepad "sample_2.txt". The keywords, followed by a colon (part, shape, pos), only appear in this part of the document, and nowhere else, but... the document is rather lengthy, and I need the script to ignore all that appears before and after, the [parts] section.

Your question is not clear, but this:
text.scan(/^\s+shape: "(.*?)"\s+pos: (\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)
will give you:
[["CAP-AX-10X18-7", "10160000", "10160000", "0", "0", "0"],
["DIP-8-3", "2540000", "27940000", "0", "90", "0"],
["RES-CF-1/4W-4", "15240000", "20320000", "0", "270", "0"]]
Added after change in the question
This:
text.scan(/^\s*part:\s*(.*?)$.*?\s+shape:\s*"(.*?)"\s+pos:\s*(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/m)
will give you
[["C1", "CAP-AX-10X18-7", "10160000", "10160000", "0", "0", "0"],
["IC1", "DIP-8-3", "2540000", "27940000", "0", "90", "0"],
["R1", "RES-CF-1/4W-4", "15240000", "20320000", "0", "270", "0"]]
Second time Added after change in the question
This:
text.scan(/^\s*part:\s*(.*?)$.*?\s+shape:\s*"(.*?)"\s+pos:\s*(-?\d+)\s+(-?\d+)\s+(-?\d+)\s+(-?\d+)\s+(-?\d+)/m)
will let you capture numbers even if they are negative.

Not sure exactly what you're asking, but hopefully this helps you get what you're looking for.
parts_text = <<EOS
[parts]
part: **C1**
ref_text: 1270000 127000 0 -7620000 1270000 1
package: "CAP-AX-10X18-7X"
value: "4.7pF" 1270000 127000 0 1270000 1270000 1
shape: "**CAP-AX-10X18-7**"
pos: **10160000** **10160000** 0 **0** 0
part: **IC1**
ref_text: 1270000 177800 270 2540000 2286000 1
package: "DIP-8-3X"
value: "JRC 4558" 1270000 177800 270 10668000 508000 0
shape: "**DIP-8-3**"
pos: **2540000** **27940000** 0 **90** 0
part: **R1**
ref_text: 1270000 127000 0 3380000 -600000 1
package: "RES-CF-1/4W-4X"
value: "470" 1270000 127000 0 2180000 -2900000 0
shape: "**RES-CF-1/4W-4**"
pos: **15240000** **20320000** 0 **270** 0
EOS
parts = parts_text.split(/\n\n/)
split_parts = parts.each.map { |p| p.split(/\n/) }
split_parts.each do |part|
stripped = part.each.collect { |p| p.strip }
stripped.each do |line|
p line.split(" ")
end
end
This could be done much more efficiently with regular expressions, but I opted for methods that you might already be familiar with.

Related

How do I make this postscript code work again

I have the following postscript code snippet. What it does is read a PDF filename from an array ConvertItem get the page information for this through getfirstpagesize and store the values that are coming out of this back in the array ConvertItem (8 - 9). This used to work perfectly until ghostscript 9.27. In this version the pdfdict is deprecated.
/ConvertItems
[
[ (...) (...) <...> (...) (...) () 0 0 0 0 0 0 ]
] def
/getfirstpagesize
{
1 1 1
{
pdfgetpage /MediaBox pget dup pop
{
/MediaBox exch def
/PageWidth MediaBox 2 get def
/PageHeight MediaBox 3 get def
} if
(First page - width [) print PageWidth 10 string cvs print
(], height [) print PageHeight 10 string cvs print
(]\n) print
} for
} bind def
------- snip -------
ConvertItem 0 get (r) file"
pdfdict begin"
pdfopen begin"
getfirstpagesize"
ConvertItem 8 PageWidth put"
ConvertItem 9 PageHeight put"
currentdict pdfclose"
end % temporary dict"
end % pdfdict"
------- snip -------
I tried to solve it by using pdfrunbegin and pdfrunend like this
------- snip -------
ConvertItem 0 get (r) file
runpdfbegin
getfirstpagesize
ConvertItem 8 PageWidth put
ConvertItem 9 PageHeight put
runpdfend
(*First page - width [) print ConvertItem 8 get 10 string cvs print
(], height [) print ConvertItem 9 get 10 string cvs print
(]\n) print
------- snip -------
But when I try to print the ConvertItem 8 and 9 value then these are always 0 so it seems like runpdfend is clearing the memory that is shared with these values. So my question is if there is some way around this?

Well it seems that runpdfbegin does a save of the interpreter memory and restores that after runpdfend is executed. This is the reason why the value gets "reset" to it's devault value. To work around this problem the PageWidth and PageHeight variables have to be stored in the globaldict, this doesn't get reset when the memory of the interpreter is restored.
/getfirstpagesize
{
1 1 1
{
pdfgetpage dup
/MediaBox pget
{
/MediaBox exch def
true setglobal
globaldict begin
/PageWidth MediaBox 2 get def
/PageHeight MediaBox 3 get def
false setglobal
end
} if
pop
(First page - width [) print PageWidth 10 string cvs print
(], height [) print PageHeight 10 string cvs print
(]\n) print
} for
} bind def

X10 reading from a file not as expected

I encountered following behavior when reading from a text file.
val input = new File(inputFileName);
val inp = input.openRead();
Console.OUT.println(inp.lines().next());
if (inp.lines().hasNext())
Console.OUT.println(inp.lines().next());
my input file contains
0 1
0 2
0 3
As a result I get
0 1
0 3
It seems that inp.lines().hasNext() has moved the pointer forward and as a result one line is skipped in the text file.
Is this a bug?

Yes, this looks like a bug. x10.io.FileReader.lines().hasNext() should not be skipping forward in the text file.
Could you please raise an issue in the X10 JIRA project?

How do I embed icons into text within Squib, the Ruby gem?

I added this block to my deck.rb:
text(str: 'Gain 1 :tribute:') do |embed|
embed.svg key: ':tribute:', file: 'tribute.svg'
end
However, this puts "Gain 1 [my icon here]" into the top left of every card, but not where the card text says "Gain 1 tribute."
If I add this line, in an attempt to make it specify the "Ability" column in my .csv file:
%w(Ability).each do |key|
Then I get an error message:
"Syntax error, unexpected end-of-input, expecting keyword_end."
What do I need to add to my deck.rb, exactly, in order to make it use the tribute.svg icon wherever cards within the Ability column have the text, "Gain 1 tribute"?
Here's my current deck.rb:
require 'squib'
require 'game_icons'
Squib::Deck.new(cards: 4, layout: %w(hand.yml layout.yml)) do
background color: '#FFFFFF'
data = csv file: 'country.csv'
png file: data['Art'], layout: 'Art'
%w(Title Ability Quote Type Subtype).each do |key|
text str: data[key], layout: key, markup: true
end
%w(Tribute Power Dominion).each do |key|
svg file: "#{key.downcase}.svg", layout: "#{key}Icon"
text str: data[key], layout: key
end
text(str: 'Gain 1 :tribute:', x: 275, y: 745) do |embed|
embed.svg key: ':tribute:', file: 'tribute.svg'
end
save_png prefix: 'country_'
end

The text method needs to have x and y specified. Like this:
text(str: 'Gain 1 :tribute:', x: 300, y: 500) do |embed|
embed.svg key: ':tribute:', file: 'tribute.svg'
end
As for the syntax error, every do needs an end, because you're defining a block. Although that part seems unrelated to the first part of your question. The snipped %w(Ability).each seems silly to me because that's just iterating over a 1-element array.

yaml.scanner.ScannerError: while scanning a directive

I use PyYAML to read a file, python code is:
with open('demo.yml') as f:
dataMap = yaml.load(f)
demo.yml:
%YAML:1.0
my_svm: !!opencv-ml-svm
svm_type: C_SVC
kernel: { type:LINEAR }
C: 1.
Then error is:
yaml.scanner.ScannerError: while scanning a directive
in "demo.yml", line 1, column 1
expected alphabetic or numeric character, but found ':'
in "demo.yml", line 1, column 6
Someone help me?

The directive should be %YAML 1.0 (with no colon). You also will need a "document start" (---) to separate your directives from the document. E.g.:
%YAML 1.0
---
my_svm: !!opencv-ml-svm
svm_type: C_SVC
kernel: { type: LINEAR }
C: 1.

you can modify the yaml file created by opencv 3.0
file1 from opencv:
1 %YAML:1.0
2 my_svm: !!opencv-ml-svm
3 svm_type: C_SVC
4 kernel: { type:LINEAR }
5 C: 1.
6 ...
file2:
1 my_svm: opencv-ml-svm
2 svm_type: C_SVC
3 kernel: { type: LINEAR }
4 C: 1.
5 ...
file1 -> file2:
delete line 1
delete "!!opencv-ml-svm"
add space after "type:" in line 4
then you can use yaml.load(filename) to load your data.

This worked for me:
from cv2 import cv
import numpy as np
filepath = "test.yml"
matrixA = np.array( cv.Load(filepath, cv.CreateMemStorage(), "matrixA") )
matrixB = np.array( cv.Load(filepath, cv.CreateMemStorage(), "matrixB") )
print "matrixA:", matrixA
print "matrixB:", matrixB
As seen in:
http://xudongai.blogspot.jp/2013/08/how-to-use-python-to-load-opencv-yml.html

Ruby data extraction from a text file

I have a relatively big text file with blocks of data layered like this:
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
(they contain more lines and then are repeated)
I would like first to extract the numerical value after TUNE X = and output these in a text file. Then I would like to extract the numerical value of LINE FREQUENCY and AMPLITUDE as a pair of values and output to a file.
My question is the following: altough I could make something moreorless working using a simple REGEXP I'm not convinced that it's the right way to do it and I would like some advices or examples of code showing how I can do that efficiently with Ruby.

Generally, (not tested)
toggle=0
File.open("file").each do |line|
if line[/TUNE/]
puts line.split("=",2)[-1].strip
end
if line[/Line Frequency/]
toggle=1
next
end
if toggle
a = line.split
puts "#{a[1]} #{a[2]}"
end
end
go through the file line by line, check for /TUNE/, then split on "=" to get last item.
Do the same for lines containing /Line Frequency/ and set the toggle flag to 1. This signify that the rest of line contains the data you want to get. Since the freq and amplitude are at fields 2 and 3, then split on the lines and get the respective positions. Generally, this is the idea. As for toggling, you might want to set toggle flag to 0 at the next block using a pattern (eg SIGNAL CASE or ANALYSIS)

file = File.open("data.dat")
#tune_x = #frequency = #amplitude = []
file.each_line do |line|
tune_x_scan = line.scan /TUNE X = (\d*\.\d*)/
data_scan = line.scan /(\d*\.\d*E[-|+]\d*)/
#tune_x << tune_x_scan[0] if tune_x_scan
#frequency << data_scan[0] if data_scan
#amplitude << data_scan[0] if data_scan
end

There are lots of ways to do it. This is a simple first pass at it:
text = 'ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 1.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 1.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 1.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 2.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 2.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 2.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
'
require 'stringio'
pretend_file = StringIO.new(text, 'r')
That gives us a StringIO object we can pretend is a file. We can read from it by lines.
I changed the numbers a bit just to make it easier to see that they are being captured in the output.
pretend_file.each_line do |li|
case
when li =~ /^TUNE.+?=\s+(.+)/
print $1.strip, "\n"
when li =~ /^\d+\s+(\S+)\s+(\S+)/
print $1, ' ', $2, "\n"
end
end
For real use you'd want to change the print statements to a file handle: fileh.print
The output looks like:
# >> 0.2561890123390808
# >> 0.2561890123391E+00 0.204316425208E-01
# >> 0.2562865535359E+00 0.288712798671E-01
# >> 1.2561890123390808
# >> 1.2561890123391E+00 0.204316425208E-01
# >> 1.2562865535359E+00 0.288712798671E-01
# >> 2.2561890123390808
# >> 2.2561890123391E+00 0.204316425208E-01
# >> 2.2562865535359E+00 0.288712798671E-01

You can read your file line by line and cut each by number of symbol, for example:
to extract tune x get symbols from
10 till 27 on line 2
to extract LINE FREQUENCY get
symbols from 3 till 22 on line 6+n

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parsing text in Ruby - ruby

Related

How do I make this postscript code work again

X10 reading from a file not as expected

How do I embed icons into text within Squib, the Ruby gem?

yaml.scanner.ScannerError: while scanning a directive

Ruby data extraction from a text file

Categories

Resources