X10 reading from a file not as expected - x10-language

I encountered following behavior when reading from a text file.
val input = new File(inputFileName);
val inp = input.openRead();
Console.OUT.println(inp.lines().next());
if (inp.lines().hasNext())
Console.OUT.println(inp.lines().next());
my input file contains
0 1
0 2
0 3
As a result I get
0 1
0 3
It seems that inp.lines().hasNext() has moved the pointer forward and as a result one line is skipped in the text file.
Is this a bug?

Yes, this looks like a bug. x10.io.FileReader.lines().hasNext() should not be skipping forward in the text file.
Could you please raise an issue in the X10 JIRA project?

Related

Error in Parsing the postscript to pdf

I have a postscript file when i try to convert it into pdf or open it with postscript it gives the following error
undefined in execform
I am trying to fix this error. But there is no solution i found. Kindly Help me understand the issue.
This is postscript file
OK so a few observations to start;
The file is 8 pages long, uses many forms, and the first form it uses has nested forms. This really isn't suitable as an example file, you are expecting other programmers to dig through a lot of extraneous cruft to help you out. When you post an example, please try and reduce it to just the minimum required to reproduce the problem.
Have you actually tried to debug this problem yourself ? If so what did you do ? (and why didn't you start by reducing the file complexity ?)
I don't want to be offensive, but this is the third rather naive posting you've made recently, do you have much experience of PostScript programming ? Has anyone offered you any training in the language ? It appears you are working on behalf of a commercial organisation, you should talk to your line manager and try and arrange some training if you haven't already been given some.
The PostScript program does not give the error you stated
undefined in execform
In fact the error is a Ghostscript-specific error message:
Error: /undefined in --.execform1--
So that's the .execform1 operator (note the leading '.' to indicate a Ghostscript internal operator). That's only important because firstly its important to accurately quote error messages, and secondly because, for someone familiar with Ghostscript, it tells you that the error occurs while executing the form PaintProc, not while executing the execform operator.
After considerably reducing of the complexity of the file, the problem is absolutely nothing to do with the use of Forms. The offending Form executes code like this:
2 RM
0.459396 w
[(\0\1\0\2)]435.529999 -791.02002 T
(That's the first occurrence, and its where the error occurs)
That executes the procedure named T which is defined as:
/T{neg _LY add /_y ed _LX add /_x ed/_BLSY _y _BLY sub D/_BLX _x D/_BLY _y D _x _y TT}bd
Obviously that's using a number of other functions defined in the prolog, but the important point is that it executes TT which is defined as :
/TT{/_y ed/_x ed/_SX _x _LX sub D/_SY _y _LY sub D/_LX _x D/_LY _y D _x _y m 0 _rm eq{ dup type/stringtype eq{show}{{ dup type /stringtype eq{show}{ 0 rmoveto}?}forall}?} if
1 _rm eq {gsave 0 _scs eq { _sr setgray}if 1 _scs eq { _sr _sg _sb setrgbcolor}if 2 _scs eq { _sr _sg _sb _sk setcmykcolor} if dup type/stringtype eq{true charpath }{{dup type /stringtype eq{true charpath } { 0 rmoveto}?}forall}? S grestore} if
2 _rm eq {gsave 0 _fcs eq { _fr setgray}if 1 _fcs eq { _fr _fg _fb setrgbcolor}if 2 _fcs eq { _fr _fg _fb _fk setcmykcolor} if dup type/stringtype eq{true charpath }{{dup type /stringtype eq{true charpath } { 0 rmoveto}?}
forall}? gsave fill grestore 0 _scs eq { _sr setgray}if 1 _scs eq { _sr _sg _sb setrgbcolor}if 2 _scs eq { _sr _sg _sb _sk setcmykcolor}if S grestore} if
Under the conditions holding at the time TT is executed (RM sets _rm to 2), we go through this piece of code:
gsave 0 _fcs eq
However, _fcs is initially undefined, and only defined when the /fcs function is executed. Your program never executes /fcs so _fcs is undefined, leading to the error.
Is there a reason why you are defining each page in a PostScript Form ? This is not optimal, if the interpreter actually supports Forms then you are using up VM for no useful purpose (since you only execute each Form once).
If its because the original PDF input uses PDF Form XObjects I would recommend that you don't try and reproduce those in PostScript. Reuse of Form XObjects in PDF is rather rare (it does happen but non-reuse is much more common). The loss of efficiency due to describing PostScript Forms for each PDF Form XObject for all the files where the form isn't reused exceeds the benefit for the rare cases where it would actually be valuable.

python regex specific blocks of text from large text file

I'm new to python and this site so thank-you in advance for your... understanding. This is my first attempt at a python script.
I'm having what I think is a performance issue trying to solve this problem which is causing me to not get any data back.
This code works on a small text file of a couple pages but when I try to use it on my 35MB real data text file it just hits the CPU and hasn't returned any data (>24 hours now).
Here's a snippet of the real data from the 35MB text file:
D)dddld
d00d90d
dd
ddd
vsddfgsdfgsf
dfsdfdsf
aAAAAAa
221546
29806916295
Meowing
fs:/mod/umbapp/umb/sentbox/221546.pdu
2013:10:4:22:11:31:4
sadfsdfsdf
sdfff
ff
f
29806916295
What's your cat doing?
fs:/mod/umbapp/umb/sentbox/10955.pdu
2013:10:4:22:10:15:4
aaa
aaa
aaaaa
What I'm trying to copy into a new file:
29806916295
Meowing
fs:/mod/umbapp/umb/sentbox/221546.pdu
2013:10:4:22:11:31:4
29806916295
What's your cat doing?
fs:/mod/umbapp/umb/sentbox/10955.pdu
2013:10:4:22:10:15:4
My Python code is:
import re
with open('testdata.txt') as myfile:
content = myfile.read()
text = re.search(r'\d{11}.*\n.*\n.*(\d{4})\D+(\d{2})\D+(\d{1})\D+(\d{2})\D+(\d{2})\D+\d{2}\D+\d{1}', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
myfile2.write(text)
Regex isn't the fastest way to search a string. You also compounded the problem by having a very big string (35MB). Reading an entire file into memory is generally not recommended because you may run into memory issues.
Judging from your regex pattern, it seems like you want to capture 4-line groups that start with an 11-digit string and end with some time-line string. Try this code:
import re
start_pattern = re.compile(r'^\d{11}$')
end_pattern = re.compile(r'^\d{4}\D+\d{2}\D+\d{1}\D+\d{2}\D+\d{2}\D+\d{2}\D+\d{1}$')
capturing = 0
capture = ''
with open('output.txt', 'w') as output_file:
with open('input.txt', 'r') as input_file:
for line in input_file:
if capturing > 0 and capturing <= 4:
capturing += 1
capture += line
elif start_pattern.match(line):
capturing = 1
capture = line
if capturing == 4:
if end_pattern.match(line):
output_file.write(capture + '\n')
else:
capturing = 0
It iterates over the input file, line by line. If it finds a line matching the start_pattern, it will read in 3 more. If the 4th line matches the end_pattern, it will write the whole group to the output file.

Python: Can I grab the specific lines from a large file faster?

I have two large files. One of them is an info file(about 270MB and 16,000,000 lines) like this:
1101:10003:17729
1101:10003:19979
1101:10003:23319
1101:10003:24972
1101:10003:2539
1101:10003:28242
1101:10003:28804
The other is a standard FASTQ format(about 27G and 280,000,000 lines) like this:
#ST-E00126:65:H3VJ2CCXX:7:1101:1416:1801 1:N:0:5
NTGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTATGG
+
AAAFFKKKKKKKKKFKKKKKKKFKKKKAFKKKKKAF7AAFFKFAAFFFKKF7FF<FKK
#ST-E00126:65:H3VJ2CCXX:7:1101:10003:75641:N:0:5
TAAGATAGATAGCCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTATGG
+
AAAFFKKKKKKKKKFKKKKKKKFKKKKAFKKKKKAF7AAFFKFAAFFFKKF7FF<FKK
The FASTQ file uses four lines per sequence. Line 1 begins with a '#' character and is followed by a sequence identifie. For each sequence,this part of the Line 1 is unique.
1101:1416:1801 and 1101:10003:75641
And I want to grab the Line 1 and the next three lines from the FASTQ file according to the info file. Here is my code:
import gzip
import re
count = 0
with open('info_path') as info, open('grab_path','w') as grab:
for i in info:
sample = i.strip()
with gzip.open('fq_path') as fq:
for j in fq:
count += 1
if count%4 == 1:
line = j.strip()
m = re.search(sample,j)
if m != None:
grab.writelines(line+'\n'+fq.next()+fq.next()+fq.next())
count = 0
break
And it works, but because both of these two files have millions of lines, it's inefficient(running one day only get 20,000 lines).
UPDATE at July 6th:
I find that the info file can be read into the memory(thank #tobias_k for reminding me), so I creat a dictionary that the keys are info lines and the values are all 0. After that, I read the FASTQ file every 4 line, use the identifier part as the key,if the value is 0 then return the 4 lines. Here is my code:
import gzip
dic = {}
with open('info_path') as info:
for i in info:
sample = i.strip()
dic[sample] = 0
with gzip.open('fq_path') as fq, open('grap_path',"w") as grab:
for j in fq:
if j[:10] == '#ST-E00126':
line = j.split(':')
match = line[4] +':'+line[5]+':'+line[6][:-2]
if dic.get(match) == 0:
grab.writelines(j+fq.next()+fq.next()+fq.next())
This way is much faster, it takes 20mins to get all the matched lines(about 64,000,000 lines). And I have thought about sorting the FASTQ file first by external sort. Splitting the file that can be read into the memory is ok, my trouble is how to keep the next three lines following the indentifier line while sorting. The Google's answer is to linear these four lines first, but it will take 40mins to do so.
Anyway thanks for your help.
You can sort both files by the identifier (the 1101:1416:1801) part. Even if files do not fit into memory, you can use external sorting.
After this, you can apply a simple merge-like strategy: read both files together and do the matching in the meantime. Something like this (pseudocode):
entry1 = readFromFile1()
entry2 = readFromFile2()
while (none of the files ended)
if (entry1.id == entry2.id)
record match
else if (entry1.id < entry2.id)
entry1 = readFromFile1()
else
entry2 = readFromFile2()
This way entry1.id and entry2.id are always close to each other and you will not miss any matches. At the same time, this approach requires iterating over each file once.

Getting fortran runtime error: end of file

I have recently learned how to work with basic files in Fortran
and I assumed it was as simple as:
open(unit=10,file="data.dat")
read(10,*) some_variable, somevar2
close(10)
So I can't understand why this function I wrote is not working.
It compiles fine but when I run it it prints:
fortran runtime error:end of file
Code:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*) Staff_Name(i)
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
I am using Fortran 2008 with gfortran.
A common reason for the error you report is that the program doesn't find the file it is trying to open. Sometimes your assumptions about the directory in which the program looks for files at run-time will be wrong.
Try:
using the err= option in the open statement to write code to deal gracefully with a missing file; without this the program crashes, as you have observed;
or
using the inquire statement to figure out whether the file exists where your program is looking for it.
You can check when a file has ended. It is done with the option IOSTAT for read statement.
Try:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
integer :: iostat
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*, IOSTAT=iostat) Staff_Name(i)
if( iostat < 0 )then
write(6,'(A)') 'Warning: File containts less than 65 entries'
exit
else if( iostat > 0 )then
write(6,'(A)') 'Error: error reading file'
stop
end if
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
Using Fortran 2003 standard, one can do the following to check if the end of file is reached:
use :: iso_fortran_env
character(len=1024) :: line
integer :: u1,stat
open (newunit=u1,action='read',file='input.dat',status='old')
ef: do
read(u1,'A',iostat=stat) line
if (stat == iostat_end) exit ef ! end of file
...
end do ef
close(u1)
Thanks for all your help i did fix the code:
Function Load_Names(Staff_Name(65))!Loads Staff Names
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt", status='old', action='read')!opens file for reading
do while(i < 66)!Sets Set_Name() equal to the file one string at a time
read(10,*,end=100) Staff_Name(i)
i = i + 1
end do
100 close(10)!closes file
return!returns Value
end Function Load_Names
I needed to change read(10,*) to read(10,*,END=100)
so it knew what to do when it came to the end the file
as it was in a loop I assume.
Then your problem was that your file was a row vector, and it was likely
giving you this error immediately after reading the first element, as #M.S.B. was suggesting.
If you have a file with a NxM matrix and you read it in this way (F77):
DO i=1,N
DO j=1,M
READ(UNIT,*) Matrix(i,j)
ENDDO
ENDDO
it will load the first column of your file in the first row of your matrix and will give you an error as soon as it reaches the end of the file's first column, because the loop enforces it to read further lines and there are no more lines (if N<M when j=N+1 for example). To read the different columns you should use an implicit loop, which is why your solution worked:
DO i=1,N
READ(UNIT,*) (Matrix(i,j), j=1,M)
ENDDO
I am using GNU Fortran 5.4.0 on the Ubuntu system 16.04. Please check your file if it is the right one you are looking for, because sometimes files of the same name are confusing, and maybe one of them is blank. As you may check the file path if it is in the same working directory.

Ruby data extraction from a text file

I have a relatively big text file with blocks of data layered like this:
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
(they contain more lines and then are repeated)
I would like first to extract the numerical value after TUNE X = and output these in a text file. Then I would like to extract the numerical value of LINE FREQUENCY and AMPLITUDE as a pair of values and output to a file.
My question is the following: altough I could make something moreorless working using a simple REGEXP I'm not convinced that it's the right way to do it and I would like some advices or examples of code showing how I can do that efficiently with Ruby.
Generally, (not tested)
toggle=0
File.open("file").each do |line|
if line[/TUNE/]
puts line.split("=",2)[-1].strip
end
if line[/Line Frequency/]
toggle=1
next
end
if toggle
a = line.split
puts "#{a[1]} #{a[2]}"
end
end
go through the file line by line, check for /TUNE/, then split on "=" to get last item.
Do the same for lines containing /Line Frequency/ and set the toggle flag to 1. This signify that the rest of line contains the data you want to get. Since the freq and amplitude are at fields 2 and 3, then split on the lines and get the respective positions. Generally, this is the idea. As for toggling, you might want to set toggle flag to 0 at the next block using a pattern (eg SIGNAL CASE or ANALYSIS)
file = File.open("data.dat")
#tune_x = #frequency = #amplitude = []
file.each_line do |line|
tune_x_scan = line.scan /TUNE X = (\d*\.\d*)/
data_scan = line.scan /(\d*\.\d*E[-|+]\d*)/
#tune_x << tune_x_scan[0] if tune_x_scan
#frequency << data_scan[0] if data_scan
#amplitude << data_scan[0] if data_scan
end
There are lots of ways to do it. This is a simple first pass at it:
text = 'ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 1.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 1.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 1.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 2.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 2.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 2.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
'
require 'stringio'
pretend_file = StringIO.new(text, 'r')
That gives us a StringIO object we can pretend is a file. We can read from it by lines.
I changed the numbers a bit just to make it easier to see that they are being captured in the output.
pretend_file.each_line do |li|
case
when li =~ /^TUNE.+?=\s+(.+)/
print $1.strip, "\n"
when li =~ /^\d+\s+(\S+)\s+(\S+)/
print $1, ' ', $2, "\n"
end
end
For real use you'd want to change the print statements to a file handle: fileh.print
The output looks like:
# >> 0.2561890123390808
# >> 0.2561890123391E+00 0.204316425208E-01
# >> 0.2562865535359E+00 0.288712798671E-01
# >> 1.2561890123390808
# >> 1.2561890123391E+00 0.204316425208E-01
# >> 1.2562865535359E+00 0.288712798671E-01
# >> 2.2561890123390808
# >> 2.2561890123391E+00 0.204316425208E-01
# >> 2.2562865535359E+00 0.288712798671E-01
You can read your file line by line and cut each by number of symbol, for example:
to extract tune x get symbols from
10 till 27 on line 2
to extract LINE FREQUENCY get
symbols from 3 till 22 on line 6+n

Resources