I have an issue about getting the same result in the output txt file that I get applying the sub method in a string. So the thing is when I apply the following code in a single string I get the \n before the capital letter in the middle of the string:
line3= "We were winning The Home Secretary played a
important role."
line3.sub(/(?<!^) *+(?=[A-Z])/, "\n")
=> "We were winning\nThe Home Secretary played a\n important role."
But if I apply the following code the txt file that I get doesn't have any \n before the capital letter.
old= File.readlines("Modificado word.txt")
second= old.join
third= second.sub(/(?<!^) *+(?=[A-Z])/, "\n")
new= IO.write("new.txt", third)
I've tried multiple ways of encoding(surely in the wrong way) because I thought the the issue might be there but any of them worked. Even the gsub, but didn't work either.
Ok, I've got the solution, I don't know why but the type of encode of the txt file is in a format that the readlines command is not even able to read, so I copied all the content in another txt file which should be created from scratch and it worked :)
Related
I have a file, I want to go over that file and any time it matches this "LINE_MATCH_START" and this is the only text in that line (except comments or preceding whitespace), I want it to print everything following it till it matches "LINE_MATCH_END" (which also has to be the only text in that line, comments are allowed but this has to be first thing in the line except whitespace). I want it to go over the entire file and save it as many times it catches it.
Example,
printf ("this is some text")
// comment LINE_MATCH_START this should be ignored
some_other_code
LINE_MATCH_START // can have spaces before it and comments after it
oh
this
should be
saved
LINE_MATCH_END
some_other_piece_of_code
LINE_MATCH_START
AGAIN
lets save this part as well
LINE_MATCH_END
From the above snippet, there can be space before the "LINE_MATCH_START" and it can have comments on the same line but no other piece of code.
I want my code to save all this part
oh
this
should be
saved
AGAIN
lets save this part as well
How can I do this in ruby?
This looks like it gets your output and maybe helps with an idea, but I would work on something more robust.
f = File.new('output.txt', 'w')
visible = false
IO.foreach('file_name') do |line|
case line
when /\s*LINE_MATCH_START.*/ then visible = true
next
when /\s*LINE_MATCH_END.*/ then visible = false
end
f.write(line) if visible
end
I'm writing tests that will be confirming a lot of text on page. It's for Terms and Conditions pages, Cookies, Privacy Policy etc. Not what I'd like to do but it's a requirement I can't avoid. I've heard that Cucumber can open a text file like. txt or .doc and compare the text on screen.
I've tried to find any reference to this but have come up short. Is anyone able to point me in the right direction please?
Thanks
Cucumber/aruba has a Given step that goes like this :
Given a file named "foo" with:
"""
hello world
"""
You would then be able to check that you webpage has content with :
Then I should see "hello world"
And your step :
Then /^I should see "([^"]*)" do |text
(page.should have_content(text))
I continued to play and found this works for a .txt file:
File.foreach(doc) do |line|
line = line.strip
within_window(->{ page.title == 'Cookies' }) do
page_content = find('#main').text
page_content.gsub!('‘',"'")
page_content.gsub!('’',"'")
expect(page_content).to have_text line
end
end
end
doc is my filename variable. I used strip to remove the newline characters created when pasting into the txt file. The cookies open up in a new tab so I navigated to that. I had invalid UTF-8 characters so just used gsub on those.
I'm sure there's a much better way to do this but this works for now
I'd recommend using approval tests. A person approves the original, then the automated test verifies no change. If the license text changes, the test fails. If the text is supposed to pass, then the tester approves the new text, then it's automated thereafter.
I'm trying to create a regular (ruby) expression which checks for multiple conditions. I use this regex to replace the content of my object. My regex is close to finished, except two problems I'm facing with regard to markdown.
First of, headers are giving me trouble. For example, I don't want to replace the word "Hi" for "Hello" if "Hi" is in a header.
Hi John <== # should not change
==================
Text: Hi, how are you? <== # Should be: Hello, how are you? after substitution
Or:
#### Hi Peter <== # should not change
Text: Hi, how are you? <== # Should be: Hello, how are you? after substitution
Question: How can I escape markdown headers within my regex? I've tried negative lookbehind and lookahead assertions, but to no avail.
My second problem should be quite easy, but somehow I'm struggling. If words are Italic "hi" I want to find and replace them, without changing the underscores. I can find the word with this regex:
\b[_]*hi[_]*\b
Question 2: But if I would replace it, I would also change the underscores. Is there a way to only detect the word itself and replace it, while still using word boundaries?
Code Example
#website.autolinks.all.each do |autolink|
autolink.name #for example returns "Iphone5"
autolink.url #for example returns "http://www.apple.com"
regex = /\b(?<!##\s)(?<![\d.\[])([_]*)#{autolink.name}([_]*)(?![\d'"<\/a>])\b/
if #permalink.blog_entry.content.match(regex)
#permalink.blog_entry.content.gsub!(regex, "[#{autolink.name}](# {autolink.url})")
end
end
Example text
Iphone5
==============
Iphone5 is the best mobile phone there is, even though the people at Samsung probably think, or perhaps only hope that their Samsung Galaxy S3 is better.
#### Samsung Galaxy S3?
Yes, that's the name of the newest Samsung phone.
This will result in a text with HTML tags, but when I use my regex my content uses Markdown syntax (used before the markdown converter).
Regexes work best when they do one clear thing. If you have multiple conditions, your code should usually reflect that by dividing the processing into steps.
In this case, you have two clear steps:
Use a simple regex or other logic to skip over the header portion of the message.
Once you know you are in the content, use another regex to process the content.
I've found a solution:
regex = /(?<!##\s)(?<![\d.\[a-z])#{autolink.name}(?![\d'"a-z<\/a>])(?!.*\n(==|--))/i
if #permalink.blog_entry.content.match(regex)
#permalink.blog_entry.content.gsub!(regex, "[\\0](#{autolink.url})")
end
I have a file that is comma delimited
Text File:
"some_key_1", "Translation 1"
"some_key_2", "Translation 2"
"some_key_3", "Translation 3"
"some_key_4", "Translation 4"
"some_key_5", "I am a very long line of text that has decided to cause an issue for the programmer, I have thus far laughed at his futile attempts to fix me."
Private Sub ImportFile()
Dim strEmpFileName As String
Dim intEmpFileNbr As Integer
Dim strTranslationKey As String
Dim strTranslation As String
Dim error As String
strEmpFileName = "C:\Files\test_file_1.asp"
intEmpFileNbr = FreeFile
Open strEmpFileName For Input As #intEmpFileNbr
Do Until EOF(intEmpFileNbr)
Input #intEmpFileNbr, strTranslationKey, strTranslation
Loop
End Sub
The code assigns the lines of text just fine until it gets to some_key_5, even tho there it thinks the text on a new line even tho it is a new line because of word wrap and not me hitting enter.
Is there any way around this? Shortening the line is not really an option.
I think the likely problem is the comma in the translation element. I'd suggest your best bet is probably to read the file with the
Line Input #my_file, my_string
statement and pick it apart by hand. See here for more.
You can use regular expression to match all strings inside "", and then put them into pairs.
As for line breaks, you can simply replace all of them to empty string before doing a regexp match.
Most likely you have a line termination issue, and you actully have a CR where it stops reading. I would suggest recreating the file using notepad, or looking at it with a text editor with a hex mode (that might tell you if you have a really long line as as well, if your real file has a line that exceeds a few K it might actually be a problem).
I ran your code against your supplied data, with a msgbox giving the length and right 10 characters of the string, and it worked. It also worked when I increased the field length of the second part to 721 chars.
I opted to use the file system object and it works just fine with the same data. It couldn't have been the comma, as it would attach the data from the file to the variable just fine, and include information past the comma. It would then add the item on the next line as a new key. I couldn't get past this, tried even removing the commas from the offending item and it would still break. But after I switched it to the file system object, it works just fine with the same data, i just have to manually split it.
My script reads in large text files and grabs the first page with a regex. I need to remove the first two lines of each first page or change the regex to match 1 line after the ==Page 1== string. I include the entire script here because I've been asked to in past questions and because I'm new to ruby and don't always know how integrate snippets as answers:
#!/usr/bin/env ruby -wKU
require 'fileutils'
source = File.open('list.txt')
source.readlines.each do |line|
line.strip!
if File.exists? line
file = File.open(line)
end
text = (File.read(line))
match = text.match(/==Page 1(.*)==Page 2==/m)
puts match
end
Now, when You have updated your question, I had to delete a big part of so good answer :-)
I guess the main point of your problem was that you wanted to use match[1] instead of match. The object returned by Regexp.match method (MatchData) can be treated like an array, which holds the whole matched string as the first element, and each subquery in the following elements. So, in your case the variable match (and match[0]) is the whole matched string (together with '==Page..==' marks), but you wanted just the first subexpression which is hidden in match[1].
Now about other, minor problems I sense in your code. Please, don't be offended in case you already know what I say, but maybe others will profit from the warnings.
The first part of your code (if File.exists? line) was checking whether the file exists, but your code just opened the file (without closing it!) and still was trying to open the file few lines later.
You may use this line instead:
next unless File.exists? line
The second thing is that the program should be prepared to handle the situation when the file has no page marks, so it does not match the pattern. (The variable match would then be nil)
The third suggestion is that a little more complicated pattern might be used. The current one (/==Page 1==(.*)==Page 2==/m) would return the page content with the End-Of-Line mark as the first character. If you use this pattern:
/==Page 1==\s*\n(.*)==Page 2==/m
then the subexpression will not contain the white spaces placed in the same line as the '==Page 1==` text. And if you use this pattern:
/==Page 1==\s*\n(.*\n)==Page 2==/m
then you will be sure that the '==Page 2==' mark starts from the beginning of the line.
And the fourth issue is that very often programmers (sometimes including me, of course) tend to forget about closing the file after they opened it. In your case you have opened the 'source' file, but in the code there was no source.close statement after the loop. The most secure way of handling files is by passing a block to the File.open method, so You might use the following form of the first lines of your program:
File.open('list.txt') do |source|
source.readlines.each do |line|
...but in this case it would be cleaner to write just:
File.readlines('list.txt').each do |line|
Taking it all together, the code might look like this (I changed the variable line to fname for better code readability):
#!/usr/bin/env ruby -wKU
require 'fileutils'
File.readlines('list.txt').each do |fname|
fname.strip!
next unless File.exists? fname
text = File.read(fname)
if match = text.match(/==Page 1==\s*\n(.*\n)==Page 2==/m)
# The whole 'page' (String):
puts match[1].inspect
# The 'page' without the first two lines:
# (in case you really wanted to delete lines):
puts match[1].split("\n")[2..-1].inspect
else
# What to do if the file does not match the pattern?
raise "The file #{fname} does NOT include the page separators."
end
end