What's the best way to truncate a string to the first n words?
n = 3
str = "your long long input string or whatever"
str.split[0...n].join(' ')
=> "your long long"
str.split[0...n] # note that there are three dots, which excludes n
=> ["your", "long", "long"]
You could do it like this:
s = "what's the best way to truncate a ruby string to the first n words?"
n = 6
trunc = s[/(\S+\s+){#{n}}/].strip
if you don't mind making a copy.
You could also apply Sawa's Improvement (wish I was still a mathematician, that would be a great name for a theorem) by adjusting the whitespace detection:
trunc = s[/(\s*\S+){#{n}}/]
If you have to deal with an n that is greater than the number of words in s then you could use this variant:
s[/(\S+(\s+)?){,#{n}}/].strip
You can use str.split.first(n).join(' ')
with n being any number.
Contiguous white spaces in the original string are replaced with a single white space in the returned string.
For example, try this in irb:
>> a='apple orange pear banana pineaple grapes'
=> "apple orange pear banana pineaple grapes"
>> b=a.split.first(2).join(' ')
=> "apple orange"
This syntax is very clear (as it doesn't use regular expression, array slice by index). If you program in Ruby, you know that clarity is an important stylistic choice.
A shorthand for join is *
So this syntax str.split.first(n) * ' ' is equivalent and shorter (more idiomatic, less clear for the uninitiated).
You can also use take instead of first
so the following would do the same thing
a.split.take(2) * ' '
This could be following if it's from rails 4.2 (which has truncate_words)
string_given.squish.truncate_words(number_given, omission: "")
Related
I'm using Ruby 2.2 and have a string that looks like this:
myvar = '{"myval1"=>"value1","mayval2"=>"value2"}'
How can I get this into a key-value pair and/or hash of some sort? When I do myvar['myval1'] I get back 'myval1', which isn't quite what I'm after. The answer's probably staring right at me but nothing's worked so far.
As I've seen times and times again - simply mentioning eval makes people instantly upset, even if it was a proper use case (which this is not).
So I'm going to go with another hate magnet - parsing nested structures with regexes.
Iteration (1) - a naive approach:
JSON.parse(myvar.gsub(/=>/, ':'))
Problem - will mess up your data if the string key/values contain =>.
Iteration (2) - even number of "s remaining mean you are not inside a string:
JSON.parse(myvar.gsub(/=>(?=(?:[^"]*"){2}*[^"]*$)/, ':'))
Problem - there might be a " inside a string, that is escaped with a slash.
Iteration (3) - like iteration (2), but count only " that are preceded by unescaped slashes. An unescaped slash would be a sequence of odd number of slashes:
eq_gt_finder = /(?<non_quote>
(?:
[^"\\]|
\\{2}*\\.
)*
){0}
=>(?=
(?:
\g<non_quote>
"
\g<non_quote>
){2}*
$
)/x
JSON.parse(myvar.gsub(eq_gt_finder, ':'))
See it in action
Q: Are you an infallible divine creature that is absolutely certain this will work 100% of the time?
A: Nope.
Q: Isn't this slow and unreadable as shit?
Q: Ok?
A: Yep.
You can change that string to valid JSON easily and use JSON.parse then:
require 'JSON'
myvar = '{"myval1"=>"value1","mayval2"=>"value2"}'
hash = JSON.parse(myvar.gsub(/=>/, ': '))
#=> { "myval1" => "value1", "mayval2" => "value2" }
hash['myval1']
#=> "value1"
I want to remove words from a string which are there in some set. One way is iterate over this set and remove the particular word using str.gsub("subString", ""). Does this kind of function already exits ?
Example string :
"Hotel Silver Stone Resorts"
Strings in set:
["Hotel" , "Resorts"]
Output should be:
" Silver Stone "
You can build a union of several patterns with Regexp::union:
words = ["Hotel" , "Resorts"]
re = Regexp.union(words)
#=> /Hotel|Resorts/
"Hotel Silver Stone Resorts".gsub(re, "")
#=> " Silver Stone "
Note that you might have to escape your words.
You can subtract one array from another in ruby. Result is that all elements from the first array are removed from the second.
Split the string on whitespace, remove all extra words in one swift move, rejoin the sentence.
s = "Hotel Silver Stone Resorts"
junk_words = ['Hotel', 'Resorts']
def strip_junk(original, junk)
(original.split - junk).join(' ')
end
strip_junk(s, junk_words) # => "Silver Stone"
It certainly looks better (to my eye). Not sure about performance characteristics (too lazy to benchmark it)
I am not sure what you wanted but as I understood
sentence = 'Hotel Silver Stone Resorts'
remove_words = ["Hotel" , "Resorts"] # you can add words to this array which you wanted to remove
sentence.split.delete_if{|x| remove_words.include?(x)}.join(' ')
=> "Silver Stone"
OR
if you have an array of strings, it's easier:
sentence = 'Hotel Silver Stone Resorts'
remove_words = ["Hotel" , "Resorts"]
(sentence.split - remove_words).join(' ')
=> "Silver Stone"
You could try something different , but I don't know if it will be faster or not (depends on the length of your strings and set)
require 'set'
str = "Hotel Silver Stone Resorts"
setStr = Set.new(str.split)
setToRemove = Set.new( ["Hotel", "Resorts"])
modifiedStr = (setStr.subtract setToRemove).to_a.join " "
Output
"Silver Stone"
It uses the Set class which is faster for retrieving single element (built on Hash).
But again, the underlying transformation with to_a may not improve speed if your strings / set are very big.
It also remove implicitly the duplicates in your string and your set (when your create the sets)
I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."
I want my string in groups of 5 characters, .e.g.
thisisastring => ["thisi", "satri", "ng"]
but I also want the last group to be padded with __'s, e.g.
thisisastring => ["thisi", "satri", "ng___"]
I have got as far as the string splitting:
"thisisastring".scan /.{5}/)
["thisi", "satri", "ng"]
but not too sure how to do the padding for that last group to make it "ng___"
although starting to think that combinations of dividend (div()), modulus (%) and .ljust might do it.
Maybe number of padding characters would be: (length % 5) * "_" (if you can multiply that) ?
Perhaps something that uses:
ruby-1.9.2-p290 :023 > (len % 5).to_i.times { print '_' }
___ => 3
Not even close to efficient, but if you wanted t to one-line it, something like this should work:
"thisisastring".scan(/.{1,5}/).collect {|x| x.ljust(5,"_")}
Since the adjustment is only required on the last element, it is more effective to do the adjustment before splitting rather than itterating over the elements to do adjustment.
("thisisastring"+"_"*("thisisastring".length % 5)).scan(/.{5}/)
I'm outputting a set of numbered files from a Ruby script. The numbers come from incrementing a counter, but to make them sort nicely in the directory, I'd like to use leading zeros in the filenames. In other words
file_001...
instead of
file_1
Is there a simple way to add leading zeros when converting a number to a string? (I know I can do "if less than 10.... if less than 100").
Use the % operator with a string:
irb(main):001:0> "%03d" % 5
=> "005"
The left-hand-side is a printf format string, and the right-hand side can be a list of values, so you could do something like:
irb(main):002:0> filename = "%s/%s.%04d.txt" % ["dirname", "filename", 23]
=> "dirname/filename.0023.txt"
Here's a printf format cheat sheet you might find useful in forming your format string. The printf format is originally from the C function printf, but similar formating functions are available in perl, ruby, python, java, php, etc.
If the maximum number of digits in the counter is known (e.g., n = 3 for counters 1..876), you can do
str = "file_" + i.to_s.rjust(n, "0")
Can't you just use string formatting of the value before you concat the filename?
"%03d" % number
Use String#next as the counter.
>> n = "000"
>> 3.times { puts "file_#{n.next!}" }
file_001
file_002
file_003
next is relatively 'clever', meaning you can even go for
>> n = "file_000"
>> 3.times { puts n.next! }
file_001
file_002
file_003
As stated by the other answers, "%03d" % number works pretty well, but it goes against the rubocop ruby style guide:
Favor the use of sprintf and its alias format over the fairly
cryptic String#% method
We can obtain the same result in a more readable way using the following:
format('%03d', number)
filenames = '000'.upto('100').map { |index| "file_#{index}" }
Outputs
[file_000, file_001, file_002, file_003, ..., file_098, file_099, file_100]