difference between iterating lines with while and array assignment - macos

I'm writing a perl script that reads a file into an array. I wrote the program on Windows, using Perl 5.16 (it also works on 5.14), and the script failed using a Mac with Perl 5.12.
The part that failed is this: my #array = <$file>. On the Mac, the array came back the correct size (same as number of lines in the file), but every element except the last one was empty. The code worked correctly when I switched to this instead:
my #array;
while(<$file>){
push #array, $_;
}
I'm not sure if it would have made a difference if I switched the line endings to be LF instead of CRLF (Windows style). Though the problem is fixed, it leaves me puzzled. I thought those two code snippets I listed were exactly the same thing. What is the difference in them that produces different results here?

The answer is that the two methods are exactly equivalent, as you suspected. Example:
my $start = tell DATA; #store beginning filehandle position
my #array1 = <DATA>;
seek DATA,$start,0; #reset filehandle position
my #array2;
while(<DATA>){
push #array2,$_;
}
print "List assignment:\n #array1\n";
print "Looping through:\n #array2\n";
__DATA__
1
2
foo
bar
Your previous failure was likely something else. Perhaps some sort of problem with Perl on Mac or Mac's file IO was involved, but more likely it was some other part of your code (by this I mean nothing personal: I would make the same assumption about my own code).

Related

Deleting contents of file after a specific line in ruby

Probably a simple question, but I need to delete the contents of a file after a specific line number? So I wan't to keep the first e.g 5 lines and delete the rest of the contents of a file. I have been searching for a while and can't find a way to do this, I am an iOS developer so Ruby is not a language I am very familiar with.
That is called truncate. The truncate method needs the byte position after which everything gets cut off - and the File.pos method delivers just that:
File.open("test.csv", "r+") do |f|
f.each_line.take(5)
f.truncate( f.pos )
end
The "r+" mode from File.open is read and write, without truncating existing files to zero size, like "w+" would.
The block form of File.open ensures that the file is closed when the block ends.
I'm not aware of any methods to delete from a file so my first thought was to read the file and then write back to it. Something like this:
path = '/path/to/thefile'
start_line = 0
end_line = 4
File.write(path, File.readlines(path)[start_line..end_line].join)
File#readlines reads the file and returns an array of strings, where each element is one line of the file. You can then use the subscript operator with a range for the lines you want
This isn't going to be very memory efficient for large files, so you may want to optimise if that's something you'll be doing.

Ruby - Files - gets method

I am following Wicked cool ruby scripts book.
here,
there are two files, file_output = file_list.txt and oldfile_output = file_list.old. These two files contain list of all files the program went through and going to go through.
Now, the file is renamed as old file if a 'file_list.txt' file exists .
then, I am not able to understand the code.
Apparently every line of the file is read and the line is stored in oldfile hash.
Can some one explain from 4 the line?
And also, why is gets used here? why cant a .each method be used to read through every line?
if File.exists?(file_output)
File.rename(file_output, oldfile_output)
File.open(oldfile_output, 'rb') do |infile|
while (temp = infile.gets)
line = /(.+)\s{5,5}(\w{32,32})/.match(temp)
puts "#{line[1]} ---> #{line[2]}"
oldfile_hash[line[1]] = line[2]
end
end
end
Judging from the redundant use of quantifiers ({5,5} and {32,32}) in the regex (which would be better written as {5}, {32}), it looks like the person who wrote that code is not a professional Ruby programmer. So you can assume that the choice taken in the code is not necessarily the best.
As you pointed out, the code could have used each instead of while with gets. The latter approach is sort of an old-school Ruby way of doing it. There is nothing wrong in using it. Until the end of file is reached, gets will return a string, and when it does reach the end of file, gets will return nil, so the while loop works as the same when you use each; in each iteration, it reads the next line.
It looks like each line is supposed to represent a key-value pair. The regex assumes that the key is not an empty string, and that the key and the value are separated by exactly five spaces, and the the value consists of exactly thirty-two letters. Each key-value pair is printed (perhaps for monitoring the progress), and is stored in oldfile_hash, which is most likely a hash.
So the point of using .gets is to tell when the file is finished being read. Essentially, it's tied to the
while (condition)
....
end
block. So gets serves as a little method that will keep giving ruby the next line of the file until there is no more lines to give.

storage in awk when used within shell script

I am writing a shell script program in which I am internally calling an awk script. Here is my script below.
for FILE in `eval echo{0..$fileIterator}`
{
if(FILE == $fileIterator)
{
printindicator =1;
}
grep RECORD FILEARRAY[FILE]| awk 'for(i=1;i<=NF;i++) {if($i ~ XXXX) {XARRAY[$i]++}} END {if(printIndicator==1){for(element in XARRAY){print element >> FILE B}}'
I hope I am clear with my code . Please let me know if you need any other details.
ISSUE
My motivation in this program is to traverse through all the files an get the lines that has "XXXX" in all the files and store the lines in an array. That is what I am doing here. Finally I need to store the contents of the array variable into a file. I can store the contents at each and every step like the below
{if($i ~ XXXX) {XARRAY[$i]++; print XARRAY[$i] >> FILE B}}
But the reason behind not going to this approach is each time I need to do an I/O operation and for this the time taken is much and that is why I am converting that into inmemory everytime and then at last dumping the in memory array(XARRAY) into the file.
The problem I am facing here is that. The shell script calls the awk everytime, the data's are getting stored in the array(XARRAY) but for the next iteration, the previous content of XARRAY is getting deleted and it puts the new content as this assumes this as a new array. Hence at last when I print the contents, it prints only the lately updated XARRAY and not all the data that is expected from this.
SUGGESTIONS EXPECTED
1) How to make the awk script realize that the XARRAY is an old one and not the new one when it is being called everytime in each iteration.
2) One of the alternative is to do an I/O everytime. But I am not interested in this. Is there any other alternative other than this. Thank you.
This post involves combining shell script and awk script to solve a problem. This is very often a useful approach, as it can leverage the strengths of each, and potentially keep the code from getting ugly in either!
You can indeed "preserve state" with awk, with a simple trick: leveraging a coprocess from the shell script (bash, ksh, etc. support coprocess).
Such a shell script launches one instance of awk as a coprocess. This awk instance runs your awk code, which continuously processes its lines of input, and accumulates stateful information as desired.
The shell script continues on, gathering up data as needed, and passes data to the awk coprocess whenever ready. This can run in a loop, potentially blocking or sleeping, potentially acting as a long-running background daemon. Highly versatile!
In your awk script, you need a strategy for triggering the output of the stateful data it has been accumulating. The simplest, is to have an END{} action which triggers when awk stdin closes. If you need output data sooner than that, at each line of input the awk code has a chance to output its data.
I have successfully used this approach many times.
Ouch, can't tell if it is meant to be real or pseudocode!
You can't make awk preserve state. You would either have to save it to a temporary file or store it in a shell variable, the contents of which you'd pass to later invocations. But this is all too much hassle for what I understand you want to achieve.
I suggest you omit the loop, which will allow you to call awk only once with just some reordering. I assume FILE A is the FILE in the loop and FILE B is something external. The reordering would end up something very roughly like:
grep RECORD ${FILEARRAY[#]:0:$fileIterator} | awk 'for(i=1;i<=NF;i++) {if($i ~ XXXX) {XARRAY[$i]++}} END {for(element in XARRAY){print element >> FILEB}'
I move the filename expansion to the grep call and removed the whole printIndicator check.
It could all be done even more efficiently (the obvious one being removal of grep), but you provided too little detail to make early optimisation sensible.
EDIT: fixed the loop iteration with the info from the update. Here's a loopy solution, which is immune to new whitespace issues and too long command lines:
for FILE in $(seq 0 $fileIterator); do
grep RECORD "${FILEARRAY[$FILE]}"
done |
awk 'for(i=1;i<=NF;i++) {if($i ~ XXXX) {XARRAY[$i]++}} END {for(element in XARRAY){print element >> FILEB}'
It still runs awk only once, constantly feeding it data from the loop.
If you want to load the results into an array UGUGU, do the following as well (requires bash 4):
mapfile UGUGU < FILEB
results=$(for loop | awk{for(element in XARRAY)print element})..
I declared result as an array so for every "element" that is being printed it should store in results[1], results[2].
But instead of this, it is performing the below ...
Lets assume
element = "I am fine"(First iteration of for loop),
element = "How are you" (Second iteration of for loop).
My expected result in accordance to this is,
results[1]= "I am fine" and results[2] = "How are you" ,
but the output I am getting is results[1]= "I" results[2]= "am". I dont know why it is delimiting by space .. Any suggestions regarding this

Fastest way to skip lines while parsing files in Ruby?

I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.
I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?
Currently I'm using something along the lines of:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
It works, but I just can't help but think it could work better.
I'm very new to Ruby and am interested in learning new ways of doing things in it.
file.lines.drop(500).take(100) # will get you lines 501-600
Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.
PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:
file.lines.select.with_index{|l,i| (501..600) === i}
PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:
enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100
or, if you prefer FP:
file.lines.tap{|enum| 500.times{enum.next}}.take(100)
Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)
I don't know if there is an equivalent way of doing this for lines, but you can use seek or the offset argument on an IO object to "skip" bytes.
See IO#seek, or see IO#open for information on the offset argument.
Sounds like rio might be of help here. It provides you with a lines() method.
You can use IO#readlines, that returns an array with all the lines
IO.readlines(file_in)[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end
or
f = File.new(file_in)
f.readlines[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end

Reformatting text (or, better, LaTeX) in 80 colums in SciTE

I recently dived into LaTeX, starting with the help of a WYSIWYM editor like Lix. Now I'm staring writing tex files in Sci-TE, It already has syntax higlighting and I adapted the tex.properties file to work in Windows showing a preview on Go [F5]
One pretty thing Lyx does, and it's hard to acheive with a common text editor, is to format text in 80 columns: I can write a paragraph and hit Return each time I reach near the edge column but if, after the first draft, I want to add or cut some words here and there I end up breaking the layout and having to rearrange newlines.
It would be useful to have a tool in Sci-TE so I can select a paragraph of text I added or deleted some words in and have it rearranged in 80 columns. Probably not something working on the whole document since it could probably break some intended anticipated line break.
Probably I could easily write a Python plugin for geany, I saw vim has something similar, but I'd like to know if its' possible in Sci-TE too.
I was a bit disappointed when I found no answer as I was searching for same. No helpers by Google either, so I searched for Lua examples and syntax in a hope to craft it myself. I don't know Lua so this can perhaps be made differently or efficiently but its better then nothing I hope - here is Lua function which needs to be put in SciTE start-up Lua script:
function wrap_text()
local border = 80
local t = {}
local pos = editor.SelectionStart
local sel = editor:GetSelText()
if #sel == 0 then return end
local para = {}
local function helper(line) table.insert(para, line) return "" end
helper((sel:gsub("(.-)\r?\n", helper)))
for k, v in pairs(para) do
line = ""
for token in string.gmatch(v, "[^%s]+") do
if string.len(token .. line) >= border then
t[#t + 1] = line
line = token .. " "
else
line = line .. token .. " "
end
end
t[#t + 1] = line:gsub("%s$", "")
end
editor:ReplaceSel(table.concat(t, "\n"))
editor:GotoPos(pos)
end
Usage is like any other function from start-up script, but for completness I'll paste my tool definition from SciTE properties file:
command.name.8.*=Wrap Text
command.mode.8.*=subsystem:lua,savebefore:no,groupundo
command.8.*=wrap_text
command.replace.selection.8.*=2
It does respect paragraphs, so it can be used on broader selection, not just one paragraph.
This is one way to do it in scite: first, add this to your .SciTEUser.properties (Options/Open User Options file):
# Column guide, indicates long lines (https://wiki.archlinux.org/index.php/SciTE)
# this is what they call "margin line" in gedit (at right),
# in scite, "margin" is the area on left for line numbers
edge.mode=1
edge.column=80
... and save, so you can see a line at 80 characters.
Then scale the scite window, so the text you see is wrapped at the line.
Finally, select the long line text which is to be broken into lines, and do Edit / Paragraph / Split (for me the shortcut Ctrl-K also works for that).
Unfortunately, there seems to be no "break-lines-as-you-type" facility in scite, like the "Line Breaking" facility in geany. not anymore, now there's a plugin - see this answer
Well, I was rather disappointed that there seems to be no "break-lines-as-you-type" facility in scite; and I finally managed to code a small Lua plugin/add-on/extension for that, and released it here:
lua-users wiki: Scite Line Break
Installation and usage instructions are in the script itself. Here is how SciTE may look when the extension properly installed, and toggle activated after startup:
Note that it's pretty much the same functionality as in geany - it inserts linebreaks upon typing text - but not on pressing backspace, nor upon copy/pasting.
the same but more easy, I think...
put this in the user properties:
command.name.0.*=swrap
command.0.*=fold -s $(FileNameExt) > /tmp/scite_temp ; cat /tmp/scite_temp >$(FileNameExt)
command.is.filter.0.*=1
Ciao
Pietro

Resources