Split string into an array by comma, unless comma is inside quotes

Split string into an array by comma, unless comma is inside quotes - ruby

Given a string of an array in Ruby with some items in quotes that contain commas:
my_string.inspect
# => "\"hey, you\", 21"
How can I get an array of:
["hey, you", " 21"]

The Ruby standard CSV library's .parse_csv, does exactly this.
require 'csv'
"\"hey, you\", 21".parse_csv
# => ["hey, you", " 21"]

Yes, using CSV::parse_line or String#parse_csv, which require 'csv' adds to String's instance methods) is the way to go here, but you could also do it with a regex:
r = /
(?: # Begin non-capture group
(?<=\") # Match a double-quote in a positive lookbehined
.+? # Match one or more characters lazily
(?=\") # Match a double quote in a positive lookahead.
) # End non-capture group
| # Or
\s\d+ # Match a whitespace character followed by one or more digits
/x # Extended mode
str = "\"hey, you\", 21"
str.scan(r)
#=> ["hey, you", " 21"]
If you'd prefer to have "21" rather than " 21", just remove \s.

Related

How to check with ruby if a word is repeated twice in a file

I have a large file, and I want to be able to check if a word is present twice.
puts "Enter a word: "
$word = gets.chomp
if File.read('worldcountry.txt') # do something if the word entered is present twice...
How can i check if the file worldcountry.txt include twice the $word i entered ?

I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby
On the Gerry post with this code
word_count = 0
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count += 1 if word == my_word
end
end
end
puts "\n" + word_count.to_s
Thanks, i will pay more attention next time.

If the file is not overly large, it can be gulped into a string. Suppose:
str = File.read('cat')
#=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.
Suppose the given word is 'dog'.
Confirm the file contains at least two instances of the given word
One can attempt to match the regular expression
r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
#=> true
Demo
Confirm the file contains exactly two instances of the given word
Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let
r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
#=> false
Demo
The two regular expressions can be written in free-spacing mode to make them self-documenting.
r1 = /
\bdog\b # match 'dog' surrounded by word breaks
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
/m # cause . to match newlines
r2 = /
\A # match beginning of string
(?: # begin non-capture group
(?: # begin non-capture group
. # match one character
(?! # begin negative lookahead
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
) # end non-capture group
* # execute preceding non-capture group zero or more times
\bdog\b # match 'dog' surrounded by word breaks
) # end non-capture group
{2} # execute preceding non-capture group twice
(?! # begin negative lookahead
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
/xm # # cause . to match newlines and invoke free-spacing mode

Ruby regex to filter out word ending with a "string" suffix

I am trying to come up with a Ruby Regex that will match the following string:
MAINT: Refactor something
STRY-1: Add something
STRY-2: Update something
But should not match the following:
MAINT: Refactored something
STRY-1: Added something
STRY-2: Updated something
MAINT: Refactoring something
STRY-3: Adding something
STRY-4: Updating something
Basically, the first word after : should not end with either ed or ing
This is what I have currently:
^(MAINT|(STRY|PRB)-\d+):\s([A-Z][a-z]+)\s([a-zA-Z0-9._\-].*)
I have tried [^ed] and [^ing] but they would not work here since I am targeting more than single character.
I am not able to come up with a proper solution to achieve this.

You could use
^[-\w]+:\s*(?:(?!(?:ed|ing)\b)\w)+\b.+
See a demo on regex101.com.
Broken down this says:
^ # start of the line/string
[-\w]+:\s* # match - and word characters, 1+ then :
(?: # non-capturing group
(?!(?:ed|ing)\b) # neg. lookahead: no ed or ing followed by a word boundary
\w # match a word character
)+\b # as long as possible, followed by a boundary
.* # match the rest of the string, if any
I have no experience in Ruby but I guess you could alternatively do a split and check if the second word ends with ed or ing. The latter approach might be easier to handle for future programmers/colleagues.

r = /
\A # match beginning of string
(?: # begin a non-capture group
MAINT # match 'MAINT'
| # or
STRY\-\d+ # match 'STRY-' followed by one or more digits
) # end non-capture group
:[ ] # match a colon followed by a space
[[:alpha:]]+ # match one or more letters
(?<! # begin a negative lookbehind
ed # match 'ed'
| # or
ing # match 'ing'
) # end negative lookbehind
[ ] # match a space
/x # free-spacing regex definition mode
"MAINT: Refactor something".match?(r) #=> true
"STRY-1: Add something".match?(r) #=> true
"STRY-2: Update something".match?(r) #=> true
"MAINT: Refactored something".match?(r) #=> false
"STRY-1: Added something".match?(r) #=> false
"STRY-2: Updated something".match?(r) #=> false
"A MAINT: Refactor something".match?(r) #=> false
"STRY-1A: Add something".match?(r) #=> false
This regular expression is conventionally written as follows.
r = /\A(?:MAINT|STRY\-\d+): [[:alpha:]]+(?<!ed|ing) /
Expressed this way the two spaces can each be represented a space character. In free-spacing mode, however, all spaces outside character classes are removed, which is why I needed to enclose each space in a character class.

(Posted on behalf of the question author).
This is what I ended up using:
^(MAINT|(STRY|PRB)-\d+):\s(?:(?!(?:ed|ing)\b)[A-Za-z])+\s([a-zA-Z0-9._\-].*)

How to replace Perl-style regex with MatchData object

I am using the gsub method with a regular expression:
#text.gsub(/(-\n)(\S+)\s/) { "#{$2}\n" }
Example of input data:
"The wolverine is now es-
sentially absent from
the southern end
of its European range."
should return:
"The wolverine is now essentially
absent from
the southern end
of its European range."
The method works fine, but rubocop reports and offense:
Avoid the use of Perl-style backrefs.
Any ideas how to rewrite it using MatchData object instead of $2?

If you want to use Regexp.last_match :
#text.gsub(/(-\n)(\S+)\s/) { Regexp.last_match[2] + "\n" }
or :
#text.gsub(/-\n(\S+)\s/) { Regexp.last_match[1] + "\n" }
Note that the block in gsub should be used when logic is involved. Without logic, a second parameter set to "\\1\n" or '\1' + "\n" would do just fine.

You can use backslash without the block:
#text.gsub /(-\n)(\S+)\s/, "\\2\n"
Also, it's a bit cleaner to use only one group, since the first one above isn't needed:
#text.gsub /-\n(\S+)\s/, "\\1\n"

This solution accounts for errant spaces before newlines and split words that end a sentence or the string. It uses String#gsub with a block and no capture groups.
Code
R = /
[[:alpha:]]\- # match a letter followed by a hyphen
\s*\n # match a newline possibly preceded by whitespace
[[:alpha:]]+ # match one or more letters
[.?!]? # possibly match a sentence terminator
\n? # possibly match a newline
\s* # match zero or more whitespaces
/x # free-spacing regex definition mode
def remove_hyphens(str)
str.gsub(R) { |s| s.gsub(/[\n\s-]/, '') << "\n" }
end
Examples
str =<<_
The wolverine is now es-
sentially absent from
the south-
ern end of its
European range.
_
puts remove_hyphens(str)
The wolverine is now essentially
absent from
the southern
end of its
European range.
puts remove_hyphens("now es- \nsentially\nabsent")
now essentially
absent
puts remove_hyphens("now es-\nsentially.\nabsent")
now essentially.
absent
remove_hyphens("now es-\nsentially?\n")
#=> "now essentially?\n" (no extra \n at end)

Extract all words with # symbol from a string

I need to extract all #usernames from a string(for twitter) using rails/ruby:
String Examples:
"#tom #john how are you?"
"how are you #john?"
"#tom hi"
The function should extract all usernames from a string, plus without special characters disallowed for usernames... as you see "?" in an example...

From "Why can't I register certain usernames?":
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
The \w metacharacter is equivalent to [a-zA-Z0-9_]:
/\w/ - A word character ([a-zA-Z0-9_])
Simply scanning for #\w+ will succeed according to that:
strings = [
"#tom #john how are you?",
"how are you #john?",
"#tom hi",
"#foo #_foo #foo_ #foo_bar #f123bar #f_123_bar"
]
strings.map { |s| s.scan(/#\w+/) }
# => [["#tom", "#john"],
# ["#john"],
# ["#tom"],
# ["#foo", "#_foo", "#foo_", "#foo_bar", "#f123bar", "#f_123_bar"]]

There are multiple ways to do it - here's one way:
string = "#tom #john how are you?"
words = string.split " "
twitter_handles = words.select do |word|
word.start_with?('#') && word[1..-1].chars.all? do |char|
char =~ /[a-zA-Z1-9\_]/
end && word.length > 1
end
The char =~ regex will only accept alphaneumerics and the underscore

r = /
# # match character
[[[:alpha:]]]+ # match one or more letters
\b # match word break
/x # free-spacing regex definition mode
"#tom #john how are you? And you, #andré?".scan(r)
#=> ["#tom", "#john", "#andré"]
If you wish to instead return
["tom", "john", "andré"]
change the first line of the regex from # to
(?<=#)
which is a positive lookbehind. It requires that the character "#" be present but it will not be part of the match.

Ruby Return the content of a text(.txt) file after replacing variables

Imagine I have 2 files:
A.txt
This is a sentence with a #{variable}.
and a ruby script.
Iamascript.rb
...
variable = "period"
...
Is there any way I can read the content of the .txt file and insert the variable before puts'ing it?
This means my output when running the rb-script shall be
This is a sentence with a period.
The .txt file is dynamic.

What you are looking for is commonly called templating, and you have basically defined a template language. Ruby actually ships with a template language called ERb in the standard library, so, if you are willing to change the syntax of your template language a bit, you can just use that instead of having to invent your own:
A.txt:
This is a sentence with a <%=variable%>.
Iamascript.rb
require 'erb'
variable = 'period'
puts ERB.new(File.read('A.txt')).result(binding)
# This is a sentence with a period.

There's one "obvious" (but bad) solution, that'd be eval. eval runs the bit of code you give it.
This is an issue for security concerns, but can be what you're looking for if you want if you need complex expressions in #{...}.
The more correct way to do it if you care even a tiny little bit about security is to use Ruby's formatting operator: % (similar to Python's).
template = "the variable's value is %{var}"
puts template % {var: "some value"} => prints "the variable's value is some value"

Suppose the file "A.txt" contains the single line of text (or this line is extracted from the file):
s1 = 'This is a sentence with a #{my_var}'
and the second file, "Iamascript.rb", contains:
s2 =<<_
line of code
line of code
my_var = "period"
line of code
line of code
_
#=> "line of code\n line of code\n my_var = 'period'\n line of code\nline of code\n"
Let's create those files:
File.write("A.txt", s1)
#=> 35
File.write("Iamascript.rb", s2)
#=> 78
Now read the first line of "A.txt" and extract the string beginning "\#{" and ending "}", then extract the variable name from that string.
r1 = /
\#\{ # match characters
[_a-z]+ # match > 0 understores or lower case letters
\} # match character
/x # free-spacing regex definition mode
s1 = File.read("A.txt")
#=> "This is a sentence with a #{my_var}"
match = s1[r1]
#=> "\#{my_var}"
var_name = match[2..-2]
#=> "my_var"
Now read "Iamascript.rb" and look for a line that matches the following regex.
r2 = /
\A # match beginning of string
#{var_name} # value of var_name
\s*=\s* # match '=' and surrounding whitespace
([\"']) # match a single or double quote in capture group 1
([^\"']+) # match other than single or double quote in capture group 2
([\"']) # match a single or double quote in capture group 3
\z # match end of string
/x # free-spacing regex definition mode
#=> /
# \A # match beginning of string
# my_var # value of var_name
# \s*=\s* # match '=' and surrounding whitespace
# ([\"']) # match a single or double quote in capture group 1
# ([^\"']+) # match other than single or double quote in capture group 2
# ([\"']) # match a single or double quote in capture group 3
# \z # match end of string
# /x
If a match is found return the line from "A.txt" with the text substitution, else return nil.
if File.foreach("Iamascript.rb").find { |line| line.strip =~ r2 && $1==$3 }
str.sub(match, $2)
else
nil
end
#=> "This is a sentence with a period"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Split string into an array by comma, unless comma is inside quotes - ruby

Given a string of an array in Ruby with some items in quotes that contain commas: my_string.inspect # => "\"hey, you\", 21" How can I get an array of: ["hey, you", " 21"]

The Ruby standard CSV library's .parse_csv, does exactly this. require 'csv' "\"hey, you\", 21".parse_csv # => ["hey, you", " 21"]

Related

How to check with ruby if a word is repeated twice in a file

Ruby regex to filter out word ending with a "string" suffix

How to replace Perl-style regex with MatchData object

Extract all words with # symbol from a string

Ruby Return the content of a text(.txt) file after replacing variables

Categories

Resources