I have some Ruby scripts that I run overnight. They output to a cronlog.txt as a report I can check daily. This is using the puts method. It has some characters output which don't show up in the terminal when run.
How can I rid these odd characters? I am assuming I need to force some kind of formatting.
Here's a sample of the output:
[H[2J[0;31;49m### wobbly/assign_city_pc.rb 2020_Sep_01 3:09[0m
[0;33;49mlocations count: 16484[0m
[0;33;49mgrabbed count: 150[0m
[0;33;49mtargets count: 16334[0m
Desired output:
### wobbly/assign_city_pc.rb 2020_Sep_01 3:09
locations count: 16484
grabbed count: 150
targets count: 16334
Solved:
As per #Stefan, ANSI escape codes.
I took out the gem 'colorize' from that script.
Now, as #Stefan properly noted, it was the resulting codes that gave the characters in my situation. There are other gems that also could do this, but I haven't used them. I don't know what their names are. I don't wish to use them. I will not research them. I am assuming there are others that accomplish this. I'm also thinking there are other gems that provide other markup as well. I haven't used them. I don't wish to use them.
None of the aforementioned gems I will research in order to post a novel about new markup. I used one, and I stopped using it and the characters went away. I can safely assume that this specific change is what brought on the fix.
As you can see, I am assuming that some gems providing "ANSI escape codes" would be a sufficient answer to this question, as it was evident more characters were being added. My original point was to say "I took out anything that generated more characters", but that wasn't good enough.
So the answer to the specific characters I had, was to take out colorize only. Any other new characters generated are outside of my knowledge and I will not be testing any further on something that was kindly answered by three words.
Related
I wrote a parser, which recognizes elements of text based on certain pattern.
My program is able to recognize paragraph, chapter etc. The problem is it shouldn't recognize elements, when there's a quote. For example:
Paragraph 1
Something here...
would be proceed as Paragraph.
And:
Paragraph 1
"Paragraph 2"
shouldn't. But as my program is based on regexp patterns, it looks for the word "Paragraph". I'm going line by line and recognize patterns for each line. I don't know how to tell my program: if you see quotes mark, leave text alone without doing anything? My mentor told me to use raise, but I'm not sure how to do it.
OK, so I'm still a bit of a beginner, I don't know if there is a way to direct the regex to ignore things inside quotes, but if I wanted to solve this problem, I would first make a copy of the text to be parsed, run a regex over that and delete everything inside quotes, then run the parser over the remaining text.
A bit kludgy and inelegant I admit, and may have performance issues over a large enough text, but it would get the job done.
See HERE for link to documentation of ruby regex. About a third of the way down it discusses quotes:
/\p{Pi}/ - 'Punctuation: Initial Quote'
/\p{Pf}/ - 'Punctuation: Final Quote'
You may be able to bake that into the regex with the ^ to direct it to ignore items in quotes.
I just want to be able to run a python script from the interpreter, so that I can work on my changes to my script in notepad or other editor, save, and then interactively test changed code in the python interpreter.
Also, IDLE is not a solution. I'm operating on a government computer that is blocking the port it uses to communicate interaction between console and module.
To clear up any confusion, here's a demonstration of what I'm trying to do:
So, how do I do it?
EDIT:
Okay so I found a statement that does what I want. exec(open('dir').read()). The problem I think is that the directory I want to refer to contains periods. But I'm sure this will work, because open('dir').read() produces a string of the contents of a document specified, as long as I reference the likes of C:\myTest.py, and exec() obviously runs strings as input. So how can I reference files from the location I want?
Okay so the problem seems to be that Windows addresses often contain what python sees as 'unicode exits'. I'm not sure what they do or how they work, but I know they start with \ and are followed by a single letter and that there are enough of them to use up half the alphabet. There are a few solutions but only one is worth a damn for this application. I came across an operator that can be used in conjunction with strings, similarly to how + can be used to concatenate multiple strings, it seems r or R if you prefer (interestingly), can be used immediately before a string to tell the interpreter to take the string 'literally' as a string, and nothing else.
One would think that the quotes would be enough to express this, but they aren't and I'll probably eventually find out why. But for now, here's the answer to my question. I hope someone else finds it useful:
In plain text: >>> exec(open(R'C:\Users\First.Last\Desktop\myScript.py').read())
I have a large source code where most of the documentation and source code comments are in english. But one of the minor contributors wrote comments in a different language, spread in various places.
Is there a simple trick that will let me find them ? I imagine first a way to extract all comments from the code and generate a single text file (with possible source file / line number info), then pipe this through some language detection app.
If that matters, I'm on Linux and the current compiler on this project is CLang.
The only thing that comes to mind is to go through all of the code manually and check it yourself. If it's a similar language, that doesn't contain foreign letters, consider using something with a spellchecker. This way, the text that isn't recognized will get underlined, and easy to spot.
Other than that, I don't see an easy way to go through with this.
You could make a program, that reads the files and only prints the comments out to another output file, where you then spell check that file, but this would seem to be a waste of time, as you would easily be able to spot the comments yourself.
If you do make a program for that, however, keep in mind that there are three things to check for:
If comment starts with /*, make sure it stops reading when encountering */
If comment starts with //, only read one line - unless:
If line starting with // ends with \, read next line as well
While it is possible to detect a language from a string automatically, you need way more words than fit in a usual comment to do so.
Solution: Use your own eyes and your own brain...
Open up irb and
type gets. It should work fine.
Then try system("choice /c YN") It should work as expected.
Now try gets again, it behaves oddly.
Can someone tell me why this is?
EDIT: For some clarification on the "odd" behavior, it allows me to type for gets, but doesn't show me the characters and I have to press the enter key twice.
Terminal input-output handling is dark and mysterious art. Anyone trying to make colorized output of bash work in windows PowerShell via ssh knows that. (And various shortcutting habits like Ctrl+Backspace only make things worse.)
One of the possible reasons for your problem is special characters handling. Every terminal out there can type characters in number of different modes, and it parses its own output in search for certain character sequences in order to toggle states.
F.e. here one can find ANSI escape code sequences, one of possible supported standards among different kind of terminals.
See there Esc[5;45m? That will make all the following output to blink on magenta background. And there is significantly more stuff like that out there.
So, the answer to your question taken literally is — your choice command messes something with output modes using special escape sequences, and ruby's gets breaks in that quirk special mode of terminal operation.
But more useful will be the link to HighLine gem documentation. Why one might want to implement platform-specific and obtrusive behavior when it is possible to implement the same with about 12 LOC? All the respect for the Gist goes to botimer, I've only stumbled into his code using search.
(Using Ruby 1.8)
I only have a brief understanding of encoding and such...but what I want to know is, in any given script handling any given text-file, is there some universal library or call I need to make to turn non-standard characters into their nearest printable equivalent. I realize there's no "all-in-one" fix, but this is for a English (U.S. gov't) text file, and so I'm wondering if there's something that mitigates what must be a relatively common issue in English text formatting.
For example, in a text file, I have an entry like this:
0-823
That hyphen is just literally a hyphen as I've typed it out. In the file though, it's something that looks like a hyphen (an n-dash?) but when copy and pasting it...for example, into this browser text box, it doesn't show up.
Printing it out via a Ruby script gets this:
08�23
How do I get my script to resolve it into a dash. Or something other than a gremlin?
It's very common to run into hyphen-like characters and dashes, especially in the output of word-processors. Converting them isn't too hard if you know what the byte is that represents the character, but gets to be a pain when you get a document with several different ones. It gets worse as you throw other accented characters into the mix.
Ruby 1.8 doesn't support multibyte and Unicode character sets as well as 1.9+, but you can work around that somewhat by using the Iconv library.
Iconv lets you convert between various character-sets, such as US-ASCII, ISO-8859-1 and WIN-1252. It's smarter than a regex, because it knows how to convert from accented characters, to similarly looking characters, or ignore them if nothing similar exists, allowing your transliteration to degrade gracefully.
I have some example code in an answer to a related question. Also read James Grey's article linked in the answer. It explains the problem and ways to fix it, ending up with recommending Iconv too.
You could whitelist with gsub:
string.gsub(/[^a-zA-Z0-9]/)
Without knowing more information, I can't build the perfect regex for you, but the general idea is to replace anything that's not what you're expecting (anything not a letter or number or expected symbols).