How do I find any space before "." - ruby

I have names "example .png" and "example 2.png". I am trying to convert any space to "_" and any space before "." should be removed.
So far I am doing it like this:
file.gsub(" .",".").gsub(" ", "_").gsub(".tif", "")

Use an rstripped File.basename(filename,File.extname(filename)) and replace spaces with underscores inside it then add an extname:
File.basename(filename,File.extname(filename)).rstrip.gsub(" ", "_") + File.extname(filename)
See the Ruby demo
Details:
File.basename(filename,File.extname(filename)) - get file name without extension
.rstrip - remove whitespace before the extension
.gsub(" ", "_") - replaces spaces (use /\s+/ regex to remove any whitespaces) with underscores
File.extname(filename) - a file extension.
If you prefer a regex way:
s = 'some example 2 .png'
puts s.gsub(/\s+(\.[^.]+\z)|\s/) {
Regexp.last_match(1) ?
Regexp.last_match(1) :
"_"
}
(can be shortened to s.gsub(/\s+(\.[^.]+\z)|\s/) { $1 || "_" } (see Jordan's remark)).
See this Ruby demo.
Here, the pattern matches:
\s+(\.[^.]+\z) - 1 or more whitespaces (\s+) before the extension (\.[^.]+ - a dot followed with 1+ chars other than a dot before the end of string \z), while capturing the extension into Group 1
| - or
\s - any other whitespace symbol (add + after it if you need to replace whole whitespace chunks with underscores).
In the gsub block, a check is performed to test Group 1, and if it matched, only the extension is inserted into the result. Else, a whitespace is replaced with an underscore.

Related

Ignore empty captures when splitting string

I have a string:
Ayy ***lol* m8\nlol"
I would like to not include the empty capture and produce:
["Ayy ", "**", "*", "lol", "*", " m8", "\n", "lol"]
I am splitting the string by this regex:
/(?x)(\*\*|\*|\n|[.])/
This produces:
["Ayy ", "**", "", "*", "lol", "*", " m8", "\n", "lol"]
Here is a simplified version of your regex, chained with a method to remove empty strings -- which is inevitably necessary here when using String#split, since there is an 'empty result' in the middle of '***':
string = "Ayy ***lol* m8\nlol"
string.split(/(\*{1,2}|\n|\.)/).reject(&:empty?)
#=> ["Ayy ", "**", "*", "lol", "*", " m8", "\n", "lol"]
A few differences from your pattern:
I have removed the (?x); this served no purpose. Extended patterns are useful for ignoring spaces and comments within the regex - neither of which you are doing here.
\*\*|\* can be simplified to \*{1,2} (or \*\*? if you prefer).
[.] is technically fine, but \. is one character shorter and in my opinion shows clearer intent.
When splitting with a regex containing capturing groups, consecutive matches always produce empty array items.
Rather than switch to a matching approach, use
arr = arr.reject { |c| c.empty? }
Or any other method, see How do I remove blank elements from an array?
Else, you will have to match the substrings using a regex that will match the deilimiters first and then any text that does not start the delimiter texts (that is, you will need to build a tempered greedy token):
arr = s.scan(/(?x)\*{2}|[*\n.]|(?:(?!\*{2})[^*\n.])+/)
See the regex demo.
Here,
(?x) - a freespacing/comment modifier
\*{2} - ** substring
| - or
[*\n.] - a char that is either *, newline LF or a .
| - or
(?:(?!\*{2})[^*\n.])+ - 1 or more (+) chars that are not *, LF or . ([^*\n.]) that do not start a ** substring.
r = /
[ ]+ # match one or more spaces
| # or
(\*) # match one asterisk in capture group 1
[ ]* # match zero or more spaces
(?!\*) # not to be followed by an asterisk (negative lookahead)
| # or
(\n) # match "\n" in capture group 2
/x # free-spacing regex definition mode
str = "Ayy ***lol* m8\nlol"
str.split r
#=> ["Ayy", "**", "*", "lol", "*", "m8", "\n", "lol"]

Regex: match something except within arbitrary delimiters

My string:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
Using Ruby's regex flavor, I would like to do something like:
a.gsub /regex_pattern/, '_'
And obtain:
"Please_match_spaces_here_<but not here>._Again_match_here_<while ignoring these>"
This should do it:
result = subject.gsub(/\s+(?![^<>]*>)/, '_')
This regex assumes there's nothing tricky like escaped angle brackets. Also be aware that \s matches newlines, TABs and other whitespace characters as well as spaces. That's probably what you want, but you have the option of matching only spaces:
/ +(?![^<>]*>)/
I think, it works:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
pattern = /<(?:(?!<).)*>/
a.gsub(pattern, '')
# => "Please match spaces here . Again match here "

How to strip out \r\n in between a quoted string in between tabs when rows are also delimited by \r\n?

In Ruby 2.1.3, I have a string representing a title such as in a tab delimited csv file format:
string = "helloworld\r\n14522\tAB-12-00420\t\"PROTOCOL \r\nRisk Effectiveness \r\nand Device Effectiveness In \r\Ebola Candidates \"\tData Collection only\t\t20\t"
I want to strip out the "\r\n" only in the tab delimited portion that starts with Protocol so I can read a complete title as "PROTOCOL Risk Effectiveness and Device Effectiveness In Ebola Candidates"....I want the end result to be:
"helloworld\r\n14522\tAB-12-00420\t\"PROTOCOL Risk Effectiveness and Device Effectiveness In Heart Failure Candidates \"\tData Collection only\t\t20\t"
If I don't do this, trying to read it in via CSV truncates the title so I only end up reading "PROTOCOL" and not the rest of the title.
Keep in mind there may be an indeterminate number of \r\n characters I want to remove within a title (I'll be parsing through different titles). How do I accomplish this? I was thinking a regular expression might be the way...
Since a newline (outside of quotes) is treated as a delimiter,
you could use this regex to isolate quoted fields then replace any \r?\n just
within that field.
You would then pass the string into the CSV module.
There are 3 groups that together constitute the entire match.
1. Delimiter
2. Double quoted field
3 Non-quoted field
Would need a replace-with-callback function implementation.
Within the callback, if group 2 is not empty, do a separate replace of all CRLF's.
Catenate goup 1 + replaced(group2) + group 3, then return the catenation.
# ((?:^|\t|\r?\n)[^\S\r\n]*)(?:("[^"\\]*(?:\\[\S\s][^"\\]*)*"(?:[^\S\r\n]*(?=$|\t|\r?\n)))|([^\t\r\n]*(?:[^\S\r\n]*(?=$|\t|\r?\n))))
( # (1 start), Delimiter tab or newline
(?: ^ | \t | \r? \n )
[^\S\r\n]* # leading optional whitespaces
) # (1 end)
(?:
( # (2 start), Quoted string field
"
[^"\\]*
(?: \\ [\S\s] [^"\\]* )*
"
(?:
[^\S\r\n]* # trailing optional whitespaces
(?= $ | \t | \r? \n ) # Delimiter ahead, tab or newline
)
) # (2 end)
| # OR
( # (3 start), Non quoted field
[^\t\r\n]*
(?:
[^\S\r\n]* # trailing optional whitespaces
(?= $ | \t | \r? \n ) # Delimiter ahead, tab or newline
)
) # (3 end)
)
Unfortunately I don't know ruby, and the solution I'm going to offer is not very nice, but here goes:
Since ruby's implementation of regex doesn't support dynamic width lookbehinds, I couldn't come up with a pattern that matches only the \r\n you want to remove. But you can replace all matches of this regex pattern
(\t"?PROTOCOL[^\t]*)[\r\n]+
with \1 (the text that has been matched by group 1), until the pattern no longer matches. Only one substitution won't remove all occurences of \r\n. See demo.
I hope you'll find a nicer solution.

Ruby: Regular expression to match which strings made up either space , tab or new line and nothing else

I trying to formulate a regular expression which will match only those strings which are made up of only 3 types of characters: tab, space and new line. For ex.
String1 = " \t "
String2 = "\n\n"
String3 = " \t \n \n \n "
All above strings should match the regular expression.
I tried this : %r/[ \n]+/
But this is also matching strings having space and new line but apart from those many other characters also, like
string4 = " I am a boy \n"
My expression is also match string4 which it should not match.
I am not able to fix it. It will be great if someone could come up with a solution to fix this.
You need to tell the regex that the WHOLE string must fit, rather than part of a string. Do this with the ^ and $ operators, which mean 'start of file' and 'end of file' respectively:
/^[\t\n ]+$/
This site, and sites like it, can be useful:
http://regex101.com/

Ruby - how to remove some chars from string?

I have following strings:
" asfagds gfdhd"sss dg "
"sdg "dsg "
desired output:
asfagds gfdhd"sss dg
sdg "dsg
(Empty spaces removed from the front and end of the strings, as well as leading and trailing double quotes.)
I have a big file with these lines and I need them format to our needs... How could I remove the " from the start and end of the respective file and remove the white spaces from the start and end of the file?
Use string.strip or string.strip!.
" asfagds gfdhd\"sss dg ".strip
"asfagds gfdhd\"sss dg"
Be aware that strip removes all whitespaces (fe. tabs, newlines), not just spaces.
If you want to remove just spaces use:
string.gsub /^ *| *$/, ''
If you want to remove " as well:
string.gsub /^" *| *"$/, ''
If the data in the file is clean and uniform, then this should do
'" asfagds gfdhd"sss dg "'[1..-2].strip
If the data is not clean, you may need to do a strip before too.. (ie if there are trailing spaces after the closing quotation marks.
'" asfagds gfdhd"sss dg "'.strip[1..-2].strip
Really depends on how clean the data in the file is.
Use strip:
" Hello World ".strip #=> "Hello World"
Or to only strip from the left/right use lstrip and rstrip respectively.
One liner:
irb> '" asfagds gfdhd"sss dg "'[1..-2].strip
=> "asfagds gfdhd"sss dg"
take the [1,n-1] substring, remove whitespace

Resources