Ruby Regex: efficient way of capturing and deleting simultaneously

Ruby Regex: efficient way of capturing and deleting simultaneously - ruby

I am deleting comments from a code file using regular expressions in ruby. The code is C++ (but i think this is not relevant) and the file contains something like:
/*
Hello! I'm a comment!
*/
int main(int argc, char* argv[])
{
Foo foo;
foo.bar();
return 0;
}
My goal is to remove the comments from the code and, at the same time, to parse them, which for now I can achieve by doing capture and then deleting:
text.scan(UGLY_COMMENTS_REGEX).each do |m|
m.method_for_printing_matched_comment
end
text = text.gsub(UGLY_COMMENTS_REGEX,'');
Another alternative that occurs to me is doing the gsub for each regex match instead of doing it with the full regex, something like:
text.scan(UGLY_COMMENTS_REGEX).each do |m|
m.method_for_printing_matched_comment
text = text.gsub(m,'');
end
The problem with this (also suboptimal) alternative is that it is not straightforward when the match contains "groups", e.g m[0], m[1]...
As doing this seems extremely inefficient I was wondering if there is any way of doing the match just once (for both capturing and deleting).

String#gsub! (and other String#gsub, String#sub!, String#sub) accepts an optional block (which will be called with a matched string). So you can do something like this:
text.gsub!(UGLY_COMMENTS_REGEX) { |m|
puts m # to print the matched comment / OR m.method_for_printing_matched_comment
'' # Return value is used as a replacement string; effectively remove the comment
}

I believe the following should work.
Code
def strip_comments(str)
comments = []
[str.split(/[ \t]*\/\*|\*\/(?:[ \t]*\n?/)
.select.with_index {|ar,i| i.even? ? true : (comments << ar.strip; false)}
.join,
comments]
end
Example
str =<<_
/*
Hello! I'm a comment!
*/
int main(int argc, char* argv[])
{
Foo foo;
/* Let's get this one too */
foo.bar();
return 0;
}
_
cleaned_code, comments = strip_comments(str)
puts cleaned_code
# int main(int argc, char* argv[])
# {
# Foo foo;
# foo.bar();
# return 0;
# }
puts comments
# Hello! I'm a comment!
# Let's get this one too
Explanation
For the example above.
comments = []
Splitting the string on /* or */ will create an array in which every other element is the text of a comment. The first element of the array will be text to retain, which will equal "" if the string begins with a comment. To retain correct formatting (I hope), I'm also stripping any spaces or tabs (but not newlines) that precede /* and any tabs or spaces followed by a newline, following */.
b = str.split(/[ \t]*\/\*|\*\/(?:[ \t]*\n)?/)
#=> ["",
# "\n Hello! I'm a comment!\n",
# "\nint main(int argc, char* argv[])\n{\n Foo foo;\n",
# " Let's get this one too ",
# " foo.bar();\n return 0;\n}\n"]
We wish to select the elements that are not comments, and at the same time keep the latter:
enum0 = b.select
#=> #<Enumerator: [
# "",
# "\n Hello! I'm a comment!\n",
# "\nint main(int argc, char* argv[])\n{\n Foo foo;\n",
# " Let's get this one too ",
# " foo.bar();\n return 0;\n}\n"]:select>
Add the index so we'll be able to figure out which elements are comments:
enum1 = enum0.with_index
#=> #<Enumerator: #<Enumerator: [
# "",
# "\n Hello! I'm a comment!\n",
# "\nint main(int argc, char* argv[])\n{\n Foo foo;\n",
# " Let's get this one too ",
# " foo.bar();\n return 0;\n}\n"]:select>:with_index>
You might think of enum1 as a "compound enumerator". To see what elements it will pass into its block, convert it to an array:
enum1.to_a
#=> [["", 0],
# ["\n Hello! I'm a comment!\n", 1],
# ["\nint main(int argc, char* argv[])\n{\n Foo foo;\n", 2],
# [" Let's get this one too ", 3],
# [" foo.bar();\n return 0;\n}\n", 4]]
Execute the enumerator with its block using Array#each:
c = enum1.each {|ar,i| i.even? ? true : (comments << ar.strip; false)}
#=> ["",
# "\nint main(int argc, char* argv[])\n{\n Foo foo;\n",
# " foo.bar();\n return 0;\n}\n"]
Confirm comments was constructed correctly:
puts comments
# Hello! I'm a comment!
# Let's get this one too
Join the elements of c:
cleaned_text = c.join
#=> "\nint main(int argc, char* argv[])\n{\n Foo foo;\n foo.bar();\n return 0;\n}\n"
and return:
[cleaned_text, comments]
as shown above.
Edit: a little better, I think:
def strip_comments(str)
a = str.split(/[ \t]*\/\*|\*\/(?:[ \t]*\n)?/)
a << "" if a.size.odd?
cleaned, comments = a.each_pair.transpose
[cleaned.join, comments.map(&:strip)]
end

Related

Attaching a Comment to a YAML::Node for Presentation in Output

I'm using yaml-cpp with C++11. I can create a YAML file using something simple like this:
#include <yaml-cpp/yaml.h>
#include <iostream>
int main(void)
{
YAML::Node topNode;
topNode["one"]["two"]["A"] = "foo";
topNode["one"]["two"]["B"] = 42;
std::cout << "%YAML 1.2\n---\n" << topNode;
return 0;
}
That will produce a YAML file like this:
%YAML 1.2
---
one:
two:
A: foo
B: 42
Lovely!
I can also produce exactly the same YAML file like this:
#include <yaml-cpp/yaml.h>
#include <iostream>
int main(void)
{
YAML::Emitter out;
out << YAML::BeginMap // document {
<< "one"
<< YAML::BeginMap // one {
<< "two"
<< YAML::BeginMap // two {
<< YAML::Key << "A" << YAML::Value << "foo"
<< YAML::Key << "B" << YAML::Value << 42
<< YAML::EndMap // two }
<< YAML::EndMap // one }
<< YAML::EndMap // document }
;
std::cout << "%YAML 1.2\n---\n"
<< out.c_str();
return 0;
}
The nice thing about the second approach is that I can also add comments into the output file:
#include <yaml-cpp/yaml.h>
#include <iostream>
int main(void)
{
YAML::Emitter out;
out << YAML::BeginMap // document {
<< "one"
<< YAML::BeginMap // one {
<< "two"
<< YAML::BeginMap // two {
<< YAML::Key << "A" << YAML::Value << "foo"
<< YAML::Comment("A should be 'foo'")
<< YAML::Key << "B" << YAML::Value << 42
<< YAML::Comment("B is meaningful")
<< YAML::EndMap // two }
<< YAML::EndMap // one }
<< YAML::EndMap // document }
;
std::cout << "%YAML 1.2\n---\n"
<< out.c_str();
return 0;
}
to produce:
%YAML 1.2
---
one:
two:
A: foo # A should be 'foo'
B: 42 # B is meaningful
My question if there is a way to add comments into the first approach? Perhaps something like this:
topNode["one"]["two"]["A"] = "foo";
topNode["one"]["two"]["A"].addComment("A should be 'foo'");
I could subclass YAML::Node, adding my addComment() method, but I don't want to re-write all of YAML::Emitter to get my comment appended appropriately. The code is there, but I don't know how to get to it. How? Can you point me to an example or an approach?
I understand that the YAML specification says that comments are not an integral part of a YAML file, and can be discarded. My users find them useful, so I don't relish a debate that begins with "Your question is stupid." :-)

That is not possible with the current API. The Emitter uses an EventHandler which, as you can see, is not able to emit comments.
The Emit function that creates the events does not create any comment events via other means either.
Since operator<< on Node will internally use the Emitter class, there's no way to emit comments by adding them to a node, unless you rewrite the emitter yourself.

Generator method not working as expected in Ruby native extension

I work in simulation and often need multiple data generators which are structurally identical but have different parameterizations. I'm trying to write a gem using C extensions to implement a factory method, which determines and creates an appropriate type of enumerator based on the parameterization provided. The following is not my actual code, but is intended as a minimal reproducible example to illustrate the behavior I find confusing:
#include "ruby.h"
VALUE rb_mTest = Qnil;
VALUE rb_cTest = Qnil;
static VALUE super_initialize(VALUE self) {
return self;
}
static VALUE rand_enum(VALUE self) {
RETURN_SIZED_ENUMERATOR(self, 0, 0, 0);
VALUE ary[1] = {LL2NUM(42)};
for(;;) {
rb_yield(rb_funcallv(rb_mKernel, rb_intern("rand"), 1, ary));
}
return Qnil;
}
static VALUE enum_by_2(int32_t argc, VALUE* argv , VALUE self) {
rb_check_arity(argc, 1, 1);
int64_t value = NUM2LL(argv[0]);
RETURN_SIZED_ENUMERATOR(self, 1, argv, (1ll << 63) - value);
for(;;) {
rb_yield(LL2NUM(value));
value += 2;
}
return Qnil;
}
static VALUE enum_factory(int32_t argc, VALUE* argv, VALUE self) {
VALUE argument;
rb_scan_args(argc, argv, "01", &argument);
switch (TYPE(argument)) {
case T_NIL:
printf("It's a nil\n");
return rand_enum(self);
case T_FIXNUM:
printf("It's a FIXNUM\n");
return enum_by_2(1, argv, self);
default:
printf("Unrecognized argument type\n");
return Qnil;
}
}
void Init_test(void) {
rb_mTest = rb_define_module("Test");
rb_cTest = rb_define_class_under(rb_mTest, "Tester", rb_cObject);
rb_define_method(rb_cTest, "initialize", super_initialize, 0);
rb_define_method(rb_cTest, "enum_factory", enum_factory, -1);
}
This code works, but in a way that I didn't expect. Here's an irb session from a test run after compiling the code above:
irb(main):001:0> require_relative 'lib/test'
=> true
irb(main):002:0> t = Test::Tester.new
=> #<Test::Tester:0x0000000104b23c70>
irb(main):003:0> g1 = t.enum_factory
It's a nil
=> #<Enumerator: ...>
irb(main):004:0> g2 = t.enum_factory(10)
It's a FIXNUM
=> #<Enumerator: ...>
irb(main):005:0> g1.take(5).to_a
It's a nil
=> [35, 21, 30, 20, 0]
irb(main):006:0> g2.take(5).to_a
It's a FIXNUM
=> [10, 12, 14, 16, 18]
As you can see, the enum_factory switches to return different enumerators based on the parameterization, and the printf statements and results indicate that the correct parameterization is being applied. My confusion stems from the printf output which shows that subsequent method calls applied directly to g1 and g2 still seem to be going through the factory method. My work often generates samples into the millions or even tens of millions, so my intentions were to maximize speed by writing this in C and to avoid parsing the parameters after doing it once in the factory, but clearly that's not what's happening. I'm probably misreading the (sparse) C API documentation and missing something trivial or obvious. I'd appreciate any pointers to where my misunderstanding lies.

You've only tagged ruby, I know ruby, but my c knowledge is 0, but maybe this will help.
I've added some extra output, because it was messing with my head, when things are actually executed in c vs ruby.
static VALUE enum_by_2(int32_t argc, VALUE* argv , VALUE self) {
rb_check_arity(argc, 1, 1);
int64_t value = NUM2LL(argv[0]);
printf("before RETURN_SIZED_ENUMERATOR\n");
RETURN_SIZED_ENUMERATOR(self, 1, argv, (1ll << 63) - value);
printf("after RETURN_SIZED_ENUMERATOR\n");
for(;;) {
printf("for\n");
rb_yield(LL2NUM(value));
printf("after rb_yield\n");
value += 2;
}
return Qnil;
}
Best I figured, this is how RETURN_SIZED_ENUMERATOR function behaves. It returns a proc from the ruby method that was called, enum_factory. When you chain another method to it, proc is called again:
>> require_relative "test"; t = Test::Tester.new; g2 = t.enum_factory(1)
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
=> #<Enumerator: ...>
# ^
# NOTE: not very helpful
>> g2.inspect
=> "#<Enumerator: #<Test::Tester:0x00007f0ddf6cecd0>:enum_factory(1)>"
# ^
# NOTE: a little more context -----------------------'
So g2 now holds a proc type of a thing that calls enum_factory.
>> g2.take(1)
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
# after RETURN_SIZED_ENUMERATOR
# for
=> [1]
# NOTE: take(2) actually does a second for loop.
# yes, take(3) does it three times. it makes so much sense now.
>> g2.take(2)
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
# after RETURN_SIZED_ENUMERATOR
# for
# after rb_yield
# for
=> [1, 3]
But it also depends on the implementation of the method. For example, next works differently:
>> g2.next
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
# after RETURN_SIZED_ENUMERATOR
# for
=> 1
>> g2.next
# after rb_yield
# for
=> 3
From ruby perspective, this is analogous to each and anything else that returns an enumerator:
>> g3 = [1].each
=> #<Enumerator: ...>
>> g3.inspect
=> "#<Enumerator: [1]:each>"
# ^
# NOTE: looks like enum_factory
That's about as much as I could figure out.
Update
Ok, I figured out a little more. This call ID2SYM(rb_frame_this_func()) is what determines the method/proc that is returned in ruby. So just copy pasting rb_enumeratorize_with_size and fixing up some arguments, does the job:
static VALUE enum_by_2(int32_t argc, VALUE* argv , VALUE self) {
rb_check_arity(argc, 1, 1);
int64_t value = NUM2LL(argv[0]);
/* RETURN_SIZED_ENUMERATOR(self, 1, argv, (1ll << 63) - value); */
if (!rb_block_given_p()){
return rb_enumeratorize_with_size(
(self), ID2SYM(rb_intern("enum_by_2")),
(argc), (argv), ((1ll << 63) - value)
);
}
/* there is also `while(0)` in RETURN_SIZED_ENUMERATOR,
I don't know what that's about, so skipped */
for(;;) {
rb_yield(LL2NUM(value));
value += 2;
}
return Qnil;
}
and define this method for ruby in Init_test:
rb_define_method(rb_cTest, "enum_by_2", enum_by_2, -1);
>> g2 = t.enum_factory(1)
# It's a FIXNUM
=> #<Enumerator: ...>
# NOTE: no FIXNUM the second time
>> g2.take(1)
=> [1]
>> g2.inspect
=> "#<Enumerator: #<Test::Tester:0x00007f6b11ba4158>:enum_by_2(1)>"
# ^
# NOTE: the final `enum_by_2` is returned in ruby ---'
This roughly resembles return enum_for(__callee__) unless block_given?.
...
#define RETURN_SIZED_ENUMERATOR(obj, argc, argv, size_fn) do {
...
https://github.com/ruby/ruby/blob/v3_1_2/include/ruby/internal/intern/enumerator.h#L198

How can I capitalize a letter from a word one at a time, then add each instance of the word with a caps letter into a array?

My code:
def wave(str)
ary = []
increase_num = 0
str = str.chars
until increase_num > str.size
ary << str[increase_num].upcase && increase_num += 1
end
end
What it's supposed to do:
wave("hello") => ["Hello", "hEllo", "heLlo", "helLo", "hellO"]
I would really appreciate some help, as you probably know by looking at it I'm relatively new.

str = "hello"
str.size.times.map { |i| str[0,i] << str[i].upcase << str[i+1..] }
#=> ["Hello", "hEllo", "heLlo", "helLo", "hellO"]

I would go about it as follows:
def wave(str)
str = str.downcase # so we can ensure a wave even if the original string had capitalization
str.each_char.with_index.map do |c,idx|
str[0...idx].concat(c.upcase,str[idx.+(1)..-1])
end
end
wave("hello")
#=> ["Hello", "hEllo", "heLlo", "helLo", "hellO"]
str.each_char.with_index.map do |c,idx| - This converts the String into an Enumerator and yields each character and its index to the map block.
str[0...idx] - In the block we slice the string into characters 0 through index (exclusive)
.concat(c.upcase,str[idx.+(1)..-1]) - Then we concatenate that with the current character upcased and the remaining portion of the String (index + 1 through the end of the String)
First 2 passes will look like:
# idx = 0
# c = "h"
# str[0...idx].concat(c.upcase,str[idx.+(1)..-1])
"".concat("H","ello")
# idx = 1
# c = "e"
# str[0...idx].concat(c.upcase,str[idx.+(1)..-1])
"h".concat("E","llo")

Reverse words of a sentence

I want to reverse a sentence. For example, my string is like follows.
str = "I am a good boy"
I want the result "boy good a am I". I can reverse the string by using built in Ruby methods like:
str.split(" ").reverse.join(" ") #=> "boy good a am I"
Is there any way to do this without using Ruby built in methods?

Sure, there is the way. In fact my favorite way. You said you don't want to use Ruby's builtins. Well, we won't. What about native extensions? I know people love them.
Firstly, create reverse/reverse.c file. Most of the source I took here.
#include "ruby/ruby.h"
void reverseWords(char *s)
{
char *word_begin = NULL;
char *temp = s;
while( *temp )
{
if (( word_begin == NULL ) && (*temp != ' ') )
{
word_begin=temp;
}
if(word_begin && ((*(temp+1) == ' ') || (*(temp+1) == '\0')))
{
reverse(word_begin, temp);
word_begin = NULL;
}
temp++;
}
reverse(s, temp-1);
}
void reverse(char *begin, char *end)
{
char temp;
while (begin < end)
{
temp = *begin;
*begin++ = *end;
*end-- = temp;
}
}
VALUE reverse_words(VALUE str)
{
char *s;
s = RSTRING_PTR(str);
reverseWords(s);
return str;
}
void Init_reverse_words()
{
VALUE string = rb_const_get(rb_cObject, rb_intern("String"));
rb_define_method(string, "reverse_words!", reverse_words, 0);
}
Secondly, create reverse/extconf.rb file:
require 'mkmf'
create_makefile('reverse_words')
Thirdly, in terminal cd to reverse folder and run:
$ ruby extconf.rb
$ make && make install
Finally, test it at irb.
irb(main):001:0> require 'reverse_words'
=> true
irb(main):002:0> "foo bar baz".reverse_words!
=> "baz bar foo"
That's the way to reverse words order without using builtins.

You can reverse an Array by pop ing each element into a new Array
arr, new_arr = ["I", "am", "a", "good", "boy"], []
for i in 0...arr.length do
new_arr << arr[arr.length - 1 - i]
end
new_arr
# => ["boy", "good", "a", "am", "I"]

TRy this
def reverse(string)
reverse = ""
index = 0
while index < string.length
reverse = string[index] + reverse
index += 1
end
return reverse
end
reverse_sent = reverse(""I am a good boy"")
reverse_sent.split.map{|word| reverse(word)}.join(" ")

You can not.
In ruby, you'll always end up using just an other methods, or reimplementing some that already exist (like other answer did).
In C maybe you could, using dynamic length array, but in ruby you MUST use the builtins methods.

Is there any way to do this without using Ruby built in methods?
No.
Ruby is an object-oriented language. In OO, you do things by calling methods. There is no other way to perform any action.
Okay, technically, there are some corners where Ruby is not object-oriented: if, &&, ||, and and or have no corresponding methods. So, that's all you can use.

Difference between puts a and puts "#{a}"

I thought that doing puts #{a} would result in the same output as puts a, but found this not to be the case. Consider:
irb(main):001:0> a = [1,2]
=> [1, 2]
irb(main):002:0> puts a
1
2
=> nil
irb(main):003:0> puts "#{a}"
12
=> nil
irb(main):004:0>
In the above example it doesn't matter much, but it may matter when I want to print multiple variables on one line, such as (psudocode):
puts "There are #{a.size} items in the whitelist: #{a}"
Why is the output different here? Do they actually do different things, or have different semantics?

That's because "#{a}" calls the #to_s method on the expression.
So:
puts a # equivalent to, well, puts a
puts "#{a}" # equivalent to the next line
puts a.to_s
Update:
To elaborate, puts eventually calls #to_s but it adds logic in front of the actual output, including special handling for arrays. It just happens that Array#to_s doesn't use the same algorithm. (See puts docs here.) Here is exactly what it does...
rb_io_puts(int argc, VALUE *argv, VALUE out)
{
int i;
VALUE line;
/* if no argument given, print newline. */
if (argc == 0) {
rb_io_write(out, rb_default_rs);
return Qnil;
}
for (i=0; i<argc; i++) {
if (TYPE(argv[i]) == T_STRING) {
line = argv[i];
goto string;
}
line = rb_check_array_type(argv[i]);
if (!NIL_P(line)) {
rb_exec_recursive(io_puts_ary, line, out);
continue;
}
line = rb_obj_as_string(argv[i]);
string:
rb_io_write(out, line);
if (RSTRING_LEN(line) == 0 ||
!str_end_with_asciichar(line, '\n')) {
rb_io_write(out, rb_default_rs);
}
}
return Qnil;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby Regex: efficient way of capturing and deleting simultaneously - ruby

Related

Attaching a Comment to a YAML::Node for Presentation in Output

Generator method not working as expected in Ruby native extension

How can I capitalize a letter from a word one at a time, then add each instance of the word with a caps letter into a array?

Reverse words of a sentence

Difference between puts a and puts "#{a}"

Categories

Resources