Is Regex faster than array comparison in this case? - ruby

Say I have an incoming string that I want scan to see if it contains any of the words I have chosen to be "bad." :)
Is it faster to split the string into an array, as well as keep the bad words in an array, and then iterate through each bad word as well as each incoming word and see if there's a match, kind of like:
badwords.each do |badword|
incoming.each do |word|
trigger = true if badword == word
end
end
OR is it faster to do this:
incoming.each do |word|
trigger = true if badwords.include? word
end
OR is it faster to leave the string as it is and run a .match() with a regex that looks something like:
/\bbadword1\b|\bbadword2\b|\bbadword3\b/
Or is the performance difference almost completely negligible? Been wondering this for a while.

You're giving the regex an advantage by not stopping your loop when it finds a match. Try:
incoming.find{|word| badwords.include? word}
My money is still on the regex though which should be simplified to:
/\b(badword1|badword2|badword3)\b/
or to make it a fair fight:
/\a(badword1|badword2|badword3)\z/

Once it is compiled, the Regex is the fastest in real live (i.e. really long incoming string, many similar bad words, etc.) since it can run on incoming in situ and will handle overlapping parts of your "bad words" really well.

The answer probably depends on the number of bad words to check: if there is only one bad word it probably doesn't make a huge difference, if there are 50 then checking an array would probably get slow. On the other hand, with tens or hundreds of thousands of words the regexp probably won't be too fast either
If you need to handle large numbers of bad words, you might want to consider splitting into individual words and then using a bloomfilter to test whether the word is likely to be bad or not.

This does not excatly answer your question but this will definitely help solve it.
Take some examples what your are tring to acheive and put them to bench marks.
you can find how to do benchmarking in ruby here
Just put the varoius forms between report block and get the benchmarks and decide yourself what suits you the best.
http://ruby.about.com/od/tasks/f/benchmark.htm
http://ruby-doc.org/stdlib-1.9.3/libdoc/benchmark/rdoc/Benchmark.html
For better solutions use the real data to test.
Benchmarks are always better than discussions :)

If you want to scan a string for occurrences of words, use scan to find them.
Use Regexp.union to build a pattern that will find the strings in your black-list. You will want to wrap the result with \b to force matching word-boundaries, and use a case-insensitive search.
To give you an idea of how Regexp.union can help:
words = %w[foo bar]
Regexp.union(words)
=> /foo|bar/
'Daniel Foo killed him a bar'.scan(/\b#{Regexp.union(words)}\b/i)
=> ["foo", "bar"]
You could also build the pattern using Regexp.new or /.../ if you want a bit more control:
Regexp.new('\b(?:' + words.join('|') + ')\b', Regexp::IGNORECASE)
=> /\b(?:foo|bar)\b/i
/\b(?:#{words.join('|')})\b/i
=> /\b(?:foo|bar)\b/i
'Daniel Foo killed him a bar'.scan(/\b(?:#{words.join('|')})\b/i)
=> ["Foo", "bar"]
As a word of advice, black-listing words you find offensive is easily tricked by a user, and often gives results that are wrong because many "offensive" words are only offensive in a certain context. A user can deliberately misspell them or use "l33t" speak and have an almost inexhaustible supply of alternate spellings that will make you constantly update your list. It's a source of enjoyment to some people to fool a system.
I was once given a similar task and wrote a translator to supply alternate spellings for "offensive" words. I started with a list of words and terms I'd gleaned from the Internet and started my code running. After several million alternates were added to the database I pulled the plug and showed management it was a fools-errand because it was trivial to fool it.

Related

Slow Ruby Regex Becomes Fast with Odd Change

I've been debugging a site to find the source of long page loading times, and I've narrowed it down to a regex that's used to extract URLs from text:
/(?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/g
This takes about 3 seconds to run on a large block of text. I found out that if I add the inverse of the first clause to the start of the regex ((?:[^\w+.-]|^)), it runs almost instantly:
/(?:[^\w+.-]|^)(?:([\w+.-]++):\/\/|(?:www\.))[^\s<]+/gx
It seems to me like the added clause shouldn't affect the regex at all, since nothing could cause that clause to fail (as those characters would be matched by the "[\w+.-]++" clause). Why does this make the regex run so much faster?
Edit
Some people have asked for an example of what I'm trying to do. To simplify things and to address the concerns people had in the comments, I'll be using the following two regexes:
# slow one
/(?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/g
# fast one
/[^\w+.-](?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/g
Fire up IRB/Pry and throw some text in a variable (this is a scrubbed version of what is actually searched against):
text = <<END_OF_TEXT
Unable to deliver message to email#example.com. Error message: request: <soap:Envelope xmlns:soap=";http://schemas.xmlsoap.org/soap/envelope/" xmlns:t=";http://schemas.microsoft.com/exchange/services/year/types" xmlns:m=";http://schemas.microsoft.com/exchange/services/year/messages"><soap:Header><t:RequestServerVersion Version="ExchangeYear"/></soap:Header><soap:Body><m:CreateItem MessageDisposition="SendAndSaveCopy"><m:SavedItemFolderId><t:DistinguishedFolderId Id="stuff"/></m:SavedItemFolderId><m:Items><t:Message><t:MimeContent>RGF0ZTogRnJpLCAwMyBBcHIgMjAxNSAxNDo0MzozMCArMDMwMA0KRnJvbTogPT91dGYtOD9RPz1EMD05ND1EMD1BMT1EMD1BMl89M4j5Ba0fQrvz8atXqDIHQS4xT5dBOrGbeSsUfHFTfj6eP8blEZKl16Pgp4iA0AcFvJtCeC6s3Iq5GJbVXivtrpyKa5n3yB0f6xUQdFc95hTUleo12k0MH3rRwi0RX8wxyRaWUH81yjiXRmcjeTWAhtOCoDVb7oxOmIZAXNVQAMh05JitBFSUwVvZQuXPOo7BfGsIog4rjacpj743JsmuHxuYSl7WZQj7hV9wxvVSpE78Ey6uLAAL3yCBQ41EOSG5alLJeUOb7LVTlPQL6cauwRDUERZ5UYlJzYXj26hfrpzIVL15RlzQyLwt0cFFLcOsKNBwVXoyRyB784mqhJ7Ks9pFngzk9GZQ23M9ivSD5tDvc2083K7DPgfNThy9ev64jKZ7Ktex2ljsBovyDK9zr9RLWTViuoRjpltzZ8efRu6cBMppofm5DxbQbowvP5nRXSdS9ay9gfZ6Z2Sl3mO5W4LQh6xOE2uCqLNCeQSVWqUzf7dHyLp1RE6br76Rok1rhE8xi7NomNWViQb3ZA45gbUsY0UqvhrsgVfGZ5z5XuyzezQ9u8sHxSOGoK4XgfoZbOboOgxtB07JNx0CtGupENtOCH3Et4lGoNmJ1Mb7DaUAVNEf3m90bHm1M2d8QX2Bik6fvk9TguIKWH6syPFKJMKQrR5zIouNEyqqERYjZIJtRl6AQ4DZQd2iKt1ENZm8XZJXQhtNoE9h3DhWLDsN26cipUd6Y0abkZQX9ObR4J8ULscFmZvkh0MDS0Grpx8zwxn0Mg9bDGqbbJ97iGCb0DkhiU1TsqEZSfZzRB9c8PRba8QrlZ4FapbE1tRswDe2MPSEjJCLUJIdnHDNmBfA807NMVfwYS8FF5fG63XIruKACsXjxiKq8KF4ciXpEuM2jJcp2fCqOD0t4OUXCnGvodu7for9xKA8JWU9tJYZEAnNUKrWK8yh18pXltvElyVRNfnYXkaqFWA7Py6AmdoCoKPL3zBJh9AkoK8QpZfRgQeG5XQ4WzQlG2Zhsz7lcA9v3uyv2g4kh8HAzzZ1c6fyKP4M32yS09tN2N91aMyWTvBEic60FHZbZsnUowoSgi1kBeSE4IV3BWp6wly2Z149sUshIuBxC3IdVGA2cHHpk0aQgdIPJkqniWATqiMpNhsTtvqZAxlhTY4ELxJWuOBgfEkh5nGG3MfRcGirPRqLLFGiOn0i9HnUzsKZ4hc9jNVKUQfrQgUEbH7Y6Zck7gCgxP72JgBulHZYUPwJMWSDdlUl2LYROVFbGphx6CwCCKG0yiD1ImNu1JLr4J9fZRFLHlaDorSNCCNf6ERCBUlOIVYe2NLxQrUwMFnE6WDuNJI8WTO2jhm5PugypaQHBuUwcIvz5bhQD3PRcmJJieDTw4tbakH8NKl2KYJ63Yvsi2TP1AEuCm6EW5AVlxaDr2HqEjUdisK1TuOv2Y3vOpHjxwQGwnDAi61K2leqkNkCB4La7y2OHQVrVcEjALKF9cdR0tcfrTWXqoTPWevti8QPxkZDdBlImvMLaN1JH1XVv0XQoE5Qbvvv4RyxhGmFQV36VsdwE1s794QHwUl3dBxIZAYe5jqXc6VK90wRgvU8rDByY5WB3F0s2Bi7UeuLZnzHddTTBMiMU0yTXCQTBfx2OKM5PoIZdDbCCALMGbMwFRldlt8Z61vZEKZDjiJOkXCphAyNoY1K0vZJIGVDZjzNPRMy9fvQvOliREksdaKGYhJrpUsXGsDJLv5w7DcFygeOZPAkuacXCtLwYExbRd9SUqgOsCKRUbw7ZA4ysEVL1D8rpRbJeKKYKYIyOd9W2BEf5uGQ10YRRSzl8l87n28XApT5d8vYIznXp1zxs2jU0uQKd0NgvslVVUGRo3o17xGOnNBJM5QvyYnEG0FZnhXbIExq2zr0H1Km0OgEU1W7pFYyUE0spZwVtTazYD6TZPXUfj4jYS2vgnGRnHYfVAEH6Ufyl7VfcGbqsF8C5wTBLCSB854DBOI65NWEfJPmypwTUU5dkpk34DY9PEtb0w1LnLPAYlG35wlKPx63QExGVJgBu9rPRYX33NH8lPVeuHIXxdCZdvthOly2hVs6PsDU5WJxyIhIhvIauwY7nBx2BkGNfpoDX7Qb21MOUwWXGmwGBXIThZpP5oiUJTkVq1p3QsMl9cIlFzmvJvGMFHIca4ZZGZq4ukbFXOZJkJ5UWP0CYccv0HQPZjhpDJfnJNSctArpNmYGDAlFgkgrb24deZj0pNE1qfByF1H092bb9ZNIiBwp3d3av1S2thX6sKycCWibrF0MhqaEPGP016E018I4GUCk4IkuQ32ZMEAwzLVpjd9eT0pWMC6eeGy7EVobYs9OdZazxJQUOYc2JXGbdBJ1WK0BNrFANa3MzPUZBpw7tG98Bz3htPIAYRURwPqWFMAfwz0TvJZrLrVQUoQFBDl2mmdI1Fw0CoBnhC6w7XtjNKxfYWroixWIM6knPS1XcEDZoxSEUQGoUIzO2fK3RU2z0h4dbqUNfXCM7tLmIj6xqLlpbWZTy8XEAWq0bmABYy1VFv90yF0HeEF4laiKmYhQ5TyMJzS4XX77xIrLQNzDO03zWr1JMyucczShje0tZptKMGawoqWUyPFmbgzxTyjWjiLSWO6vxFl27SKbiERw74bqnjZ0hUWXsHHAnlCwl6nCfFvY52q6skalWZGBdiG7O4NbG8WUZxpcvtEu7xI9qYkECV8ThoicSudB0LV7T5VUKZvAwsOFcQ6L5kGJGxCS4dIwARllkvXmdKUyAYXGxk7IPoSJWcb4kjoxrlhoIvs8TlM4LfKBBYjOkfuz5XKCNEWnfDGDeWkjVagatxpbiDmDl8OhHppg6VB7LapvLIHolfqEmgAelTjhrTNBMHVypFk1945p9gDH6szpHkW5AADAy5o0rmGlUisiOG9zzEaFqMhCYIhW7Iqjd1SxBRrCPXxpe08kH6cYKObmHo1cerTBVml5BMUbmaxVG3X6VVKjFRpZD6mHOiFXvt0zwr69RntM9YvaUk31cIXOjF8rZ0VAxsnvYXD1QWtjhtY55L1DEzcSFhjiQChmBK4vUEaWksCbxYCeN8lVvCGaZ7wxHRNMdwrG7d0QHrw0C39LPZHGSjKYzbgx1JpM88dmQu2yNkrOuXJUYSJS6LBYGrIHCzJzebmmAMVjWIiCb6B5jccssUgn0Yl14PaP3LxG8QBsRlWjxfyz9b3aP4nvg4Db39hIB9cBRoqdooaa9FAv9HiVPEfiIA2N6q2DzaJFQglEcLVWXOVTuT3EAmc0Q2OqJE6WNlHFwVPXylalY4IDjEquQH4TZLjFwrUcNx3lTgSymASFmrUipau4QkL1PCrXY3AfqusoU1WRczs2ZRZhBkTu6SUGyQuyohomIXvD6AjRx6eKQwzd4VdfauUads6v7CsryyU2tJim4IQge9wLjJd5HbD2BeRjbUH918fL4BreR7Kv2wMNXydJkS7URtW3CH3VImZqUMwJCsGtD158GcDrST7A81MdjezzrBFpH43eQIXfVnKYiv7zBHLvtaJ2NaoSGC7FhzfpCjsbZFMlZkiaQykCZp5pUfH1izWvC5fpJ9Cwkc4lEMcKGLPpFq6vugPYpktDLNaAvyDZfVK8XixTzpR3DAQ5ehWsd98lAmUElUu4SlB7EuB2QglcstnnXZRCqA2jfYpGoKsMPRIYOswHEqJbiBQdvr6i8LNcyhlCQEcqufQJDSampQQTFTYlqn4XI80n9h2U3eAS5xIqCELMJaBKJIjDFBH8bTXB1RzGUL3OnIZ39tkJH1Uf2ygSlJKHTlpCSgcLxCnSFoYKzNEybkHIW6teq3lGtE0Asdsbb7ys3ictlJ7omJET0iJrSoZf9S4KCJUgBbRn3FDHTp75XWofO7kw6WpYHl3LZOAOvVabmUu9jAdtxwbsx0y5SnmtFsp40cM3uWAsIM7j6Nx3144hgKQD72VuHpeFCCj2CrtERhVxkDyAV2EmroGmhom3rlkdeyjS44eee36KR5xB6q4nxl7qdXvlchQKoWb24Q5DnDzdmIGWh6EeDHse8MLlvtlKZhsRAH3TiTdqKHN1uLDCvS9PUz7i7PzbmFC3sWVxeG2FtRzQEw6BzOlQizXBClak69r5oJ5t7Hcx1Rlv9ktHX3A1rwM8S3eTmzGuZIxXSCZRddtrbdMxAyiYuH7lhJRGbC3x5AZgcqgYHSOJa9mmh5ouVgMoxj7Pt5qHSlIJ8E4YWnLGNmQyRVou06G6Dtu3EYmtUcZ1VgnMeCdwkBBPe0bkFAbCiTRfYTgC3oyftSxW4HF5WhN5yORlTfg4rtrWOC5rWsJZiuZSvV0l9IwdkPA04n1f0ryz0Eo7EtHf5EQEIHc66340SBjKQrChypJIw77QRCBvT8bJHRHjnZytfj1clmK20A1cAJaEncUrm0FusmiQSAkZgO86abE1tmgGh27p0O71EpQ1jo13MONtsltNrsec99X2qKWec3BqalQZzss93iJYgRzKbOgf4xkOCT9xM44p55JtFA3n4AXI12yJvplL6QixN045qeqC6ssMhJE2Y8H2EhHRIYBfNpLZ3xzEXH6aeywGF3optfPN5STtwlPXFdwlMwMXgp7Av5zuhaE7SBjgb5yL9FsX1XyffxeTtXQp00Er5JJvOkeP6H1N97GX0oOj1pfC2mexPVbbgDmoY6gyTx3fw2kYFqS0laELEtium2Jy5G1QIyisvaXAUjkLS3zo1wnJPzM48VApY2OqTsZlwWZAZgk8BKMh4U7XKij7cRWJsnKAR0yIFmMWwkUmao0X8AMc4Ki114cNx1HIeGuvpIZvedCHazFBpCzwjg6DZXHFYCB0yMGILd7R5D7LHcaVtlCq4ezPCULVpD2ZsB7bSnnTj8boYmzAWaoTJwUM9h1sEuiP4klUAj4YjHj0gdBYSgGInkMC9v64iago5lxNHKsF1rIy5tWvXmEbRbcbTPmYfuy3z101KCzAfGZkWNVbWVnV0gAbMf4405gU1HYDgJFXWZgaeHdRYyhcEXkDwEBHetGF97e9GcbvMgV35n9kmWUmtAEVAmk3mNEP1dPBiuQmKVQdaJem6aOMskQEMfC8rH9ILNshh9wYqLCtpvSFlrUODHdE1dkJJUdZVfF9DJZwOZx47VFfIak4NZSaP9iGT56aY6OEqKGC4zqxDxMVWHhdE1QWpGjcEoZj0Tfn1cCZ6iTl4UHqDWK3o0MDuS16OkU0tTn7qYuBdokaVAWcdTFtoWdLeFK87ZUjax99jFeBKNEdubnWHohy4kRovMIcRZ2tq55Ix5Dmakg1juNBBlyCJ1QccOjPWIXgzsnbcgTKYuh2EwkWDoaFD3WjZMPkMLQZxr5L7z1OBDtcOfwmxp6MOGemt9lBGpDwK5LZlmpHB0jF8q1Iovdfbv3Nkg44AAjqV3oqHE41QBd34xUVlNVpNzDAyeU9reNmTE7y6A2GqUNPjerMPFp2Rj6ksqDs8JMuz4pSX4r72NlNGyOcE0dRLysM2AQwV1hI7xNPc6Jx60tqIaDeStJ9f8rLb0TjymgmsDKOFIT2SQK6R1nbORU2vhgyunLpnaw9oX41qYNYV8cttRNqtTwmlBf6Bt6jv9rJI2HBajBuBwaDpuSrsLEoWARJMImuo4djeKMm8tJHv3aj4k9qe68umdbSxSTBpffb1846CbNWnvUIki4VDEYP85c9CrRat6sk2oWH1w5NGMe5AYsR5schIVEIpf4OfHpaWHPT316ip1r3dgsKD95KZ25FzmPdtRZajIL25f5ZMzgpXZliwrThurFBDPRZVpiiZB1uEPWIktHU3e8u1B7Ug8qg3IQkEeXk73ir1Npe2ZpI4JYKkNkksrQjjchMJOWLkHzHoAQfuXhPfdVVUDfvVteBgXV2KB2XJKe73pSPqIeQhL8ozu7VjpBr9qoXW2UhtWzTCSDUjokLNta7KnEmqEZbcVyaQMfS0cb0GpM1B0JGXimkCiiQm22P4Hj4dsVJenXUnjqpLHBvFFgRspNqsMLaIqBdE1dLqcREtGPhy3dFta6OppJ8q4IhmsEqkLLu43LMzDH1p9e1xatH5iSiUNn8S1jhEIKEe1LGAO5VcYvds0WcJS0JchSDLvlIwQsb5xTAvocdWFICvIfCWpnmCkoCKl777LZ74G78STTjPubAaME3x9cliLzMIVXuGqZZ8zwpM2D71TIAgBy49IFvqrTMm7Rw50I5hZOwEMmCna4lzEKAdIMYiSrD0XHOA9XdGj6xj91zch0qR05HRLfJeIZzuiLR80B7QjxI84OgD7nLZrVuWxjVoOCvx8sR8vQcGClXUmMpB3RZfamBEswBsvaUKVPiHDTp0QpMKUtMeX9LzNLSrq7WgNugolPz2MpsGSNKYKREvhvTFAcxBc7ZjpflosHH4OvaQ0DUzgxp3mbZpY7eeHLeLT2DMODiYBD8QPaLusJXTvRovJviw8v0DG7A7s0qTfhidyiFVoWJvnZ4n3QSOh79XDlx5fAL1oZjQdEHEOpzGtuhXqCnJhvhhPN3vybKfyIvyJzIOu0NNSPe7P8jFOyLzOKiHMMSR0QG4vZhp3winzD6yCuq8tFo5p0jktwjvArc1OME09KdXyEgIY1JNANsHJiSTmnvRkXg0UyoZX48SdjAnDxlKLzRfT5128hIMRQXpi8RDI4SraijcX91If4NX7nm4K2AruWqbLnUTFiaXzcLBPitp9Ij5KyH3sxspAnykxHFTLfqPDv2Q6QBAyMvDw2TkDMB4dsJAmbiDelw8B01xbdm0j8f4Kemcqy8mjJlEX6cb9lSdqJnYeDisEXEsbqgVnS1ZejTomV0sjFYV3BGZ7oFwtiZa7MjnLhcYQAaaacw3lpSiRqM4yFvPrbV2ZAMl1vpd3YULaO36WZHRvUi8qe1Xwwmj4CHBDeX2moaIdlKxDlksKwvLi9C0hvOXdEBQiBWLA3AUO3pXGs9qIYY0BHolqWQCnXDMUcJBHgGaiT1CRXLydzNk0A3i8QXINindQsCuive0xjpb7YJYzpu7zlXYgmctprr6szyhLBIditlsuTAgu832tLUrnnKc1W4JHh5i9892V0FoD8ct9DOKUlB804IH9douZ97giyttRaIQXPr4DsQyYS25sDsC1h6bFsCqPIqXXbVHHims4hrrWz8f9kPlkBoEeIs5wFDCdmSnGE1hZI95NIH0JYnoBIsTRKLA3pwOlp6M1hvsXr3MIUONrmoZbHUdGyGhuUqeCDC6WM9bfqakuSVEfmf5nO9ayGrrPH5jfnVcbhMapBWAqp59gjAMbpPgYyD5pqD7apEEM66gEhwTLGWIUmrNL2SRTxmkA5BjBPiwmAjMeQFxdi1fU3CONJAR5vqL7mmOCs3nNjNDrMJrUztVBfcydUz5QKW0S045Qs2f8oskGtIolrChroR4zLkvFkW0EJOTNMw5H3ntK6QRDgJDFyfBFOx7r80oYpzPBT3kUi1E7glttb986fOE5nlEeUoK1a23u3gAuCVLfj8eeovwPUgIzvWMvzfKWHPoNoJ41I0FJgr6M59sskx93wX0Olvcm2Jg9Vn5kWIUvQ7A1OYx9p1iLa4UHtiS71l4SYEyiLJWayxixuUGqrnR43eFezXLwf9N0b6LwsPf1b0xkgtFKFF8WR9V1k5VsoQDSxPkn0bXOksiQuvbRdLIMLGSefKY4sUUGYD3I01KicMj8R0yU3hpmZJyqX5meQzZjGNvkTMQfTPTY5jWkPFdRTAR8Y0WGYbw2LSjQU1yH0cE02IhOoRbjrdvq6gVuHx1my8BVHbSNSlp3IfzV6KAfqZQ9OgmXgnndfQ1IE0vdhQUsY5OkAhxlqdEUSL9tA5m8RP3qsgOwiQcu6CKWlS26lK2AetDh2r1njq13KOWL4rKikQaP0qxpS1z8oWSNwCURLEUzeESLuIiuv9drIn5HvjfrmgVFdhfi03MmuI2LLA69UbMsA8GhFki07Ssx4W0WBgYuhYSgh84SZMRyd9n3R5mjqdIW4fUORe6Ql3P0JFYjJTRKeakoqupL6DaYVYGmaxDY77bepszXnDJnbhPi18NiGt11vygNPvw8ppijWSaXxP3pb94IdjATK4f5886oyTSkhhYidWK6qQfXwsDuf8hr3Zhv5Vd5IC7hjnK82XYFkMu0slr2LyqsVnvPvdLFvnxdazJMtwBG1f3ddCDWOgGEtXDLps3dQ6HOxJGSrT9WIGSiuii3ypKAUc1uesbVte228SDOZcAfVMut395Ulw4X8VRw3szjRKKmlOvRgTrC5yYONhXiLC0VLv74wTkLG3QTEPuqKGXJszO0mTjbUmIcGlX4q7sLxNwrZnwDOJPJXh12cwCYJBsqToxoL9tstwxB3x99QzOQMuCAR8BdywJcO6wmc1n0fwVnK7tBalXMiTv8Fns0qURWzagMEOiN488KS0Qj54TRG6AJxWULfa9kfCbXxZycud5yPbmUcE6IQFnNTaKmj7m0U6t9mA5fPveDDtyUqrstxyr4WMtw4FiBwjpNpraKWnzBfi4OBK4iYaCJdPJKRrpaAQyGSJyGedQ9AgFHYh9EZopKZgH5pcBnf2oHfhuBTq0NJGmrKusb4nC1PGjV1jFGpF3r1RgYtWMQte6KUQNCDSW0XGegmU2VVrboOaeAaMWM23WXRxa92e6zMsYxzLgMUpAmXnonfIr9ZPAjzwx0UXLqnWZP99s6Da9DewN2EKXEXQgzllbLdr61pNasr0KeyOq1sRWLuQzXrvergG5q4GKUopJBH02sAFfTUcaxgbTvRUxwqTbvQksLz7KwsirrtzuaxTk5WpL8yg51mjPrvyLUhCGFv7IvQible3seUkmqea9eKwDfvc9ZJNzOzWTkIL5VygG7onS1dlkp7bYFeLl6n2Iy9XKAVDrzSH5zzCEPoihK2NftnXrPwf4YoEstg1zl93WCMJC3mingiyZ3ILTq7hEDvJWzDdtKP5OIlMFHlrexwg2ej5aoa6YO7oi9PgzjfSVlGmGbqv0IggxHhe0FP43CcaqJUfTzYekLHOmhk4to5Lf7ttISxAda3bQOSkxYR4gS7z9GCCgsArwzfgfmw42YyGwydDTMQuSrnLvJXLXvaO8Yn22gkQ3XZ22axJhQaqcAb4lw3oDTeQwSxFDxly8J4U4Vm71rKMWxAyfzDYkRMQMMehpw5bCPLuYUOBYTBIWdQtr4HXze3FNHRDdWAud7EorALu9Q4IwNrvf0Fy0nPivrrLEjE5wBJDH1usLMMTizZSCvpKscVN6NBk2ll8PmEQG01D9lCTOUIXpbbOatipjSTSHgR3lHt30rkmskRXxz6aVYzYaLmBDucf5vhv9IWORxRsP0KfkgqzfoZi9dJ3RZoPaVJ2WoRCwGsFIx8cVPsF6L67kSTNuci9B0TbFUeuaCS1bauz4IUaWB4UZnZOY0hMtDYxc1zOSBM2h8x1QVeOxAbI74Sr6d768RWzx6sSShJ0RIS36zlLFmJ106ogFEqfA81dhSVJflBuFcSHMPwyvkD9YNiIWJCkP8kdoHASwdde7hdnom1OVKZSvjJxibdAJhKGwYdSy4YUsBPGXCtemsMp8Zn0xoQc5nhI2fPkrLPwFMftO4J2IO0sj7rQvtlYylTZFMHzaG5VDzrZrFMRPEnnjAkoKDFhfiXY9iMAK8busZaHjhANoQ5l0ewEkDCqxaU5ej8EffvmEruywI3luXyGkEUpef3qKev3w4Wf5umX5mYQncTAfqmxp0kn1iedSIfJFVu7Tm0IBAuir0T3XakyRgCEyVvytFNwiuvuVDw2HCmfgAnB7NtRvG8045XyirFBpLbgefOVkYh4IK1svdL86Dlh5Jo5mRhBTNf1nEmHcylgMJUWVB0RaVa8d4KwR2Y7cag7BSlC6IXl2Bxh1TZau4TUpSnikx7JtC42nkCSYFJbq1FIHMu2xrNhlrjgvNSjjipLOyv7uOzHUrgoCTcYYFik77ncAbCbFSxC4sjYAWR7rIQBC6Ghyw8eBdCdQyVwznjeeUFWpcWMiJC0C8ZywQllwj2eU0gsZjFHxPsMlU9v5yHJ5un2HlqDMHknrlXWGKctXt0xwcVQx0agFVJh7uRVNiYU5eATpDUPj8MBHekFD9m65mVTbNFOcpxKHliJqjglUmz0TCqCTlrRyjiNM7csOq6BWDt2jyhp0JTpZPTL161cmwgQWfcVPqv0lyNKWeSb8zVr86H9jxBTRu4CfOW7KorFEtkIVqDzynx3c33ruKPapMe83M4rNeXc8enDEMJ34z7urZDCmqXhgghwV8jzG959csv6J2kvfBLoTCH4lm0RRpWVkkx1oGVPTA9UiqDjv10qNXYCF0RUn6vsqGcz213KSfC4vG3X2pSnmk0Au8dEjA0mzhILy0z72zHLStcIuH6ShW3R1xQYr6bdGnsGn34aiz5ztJaVcwszOg31QXUJ4nRCMrVmkdkC5LuhpQcl3vKnzXxPEDHF6jscBfSYCEVNRmV1x1eXtNePKJMfSTvn4NUCitvqYrXVhrlfEIdANYxy5Z8wTaZh2fn3G4jKj4356Ar5bmpRjgcWyGvGjEILm0mknGSNMD2498vzP2wQO9rnM1tTbcVBckAgZCOzW3eYs0tJ25u7uhbLxxJLK3Z5laGpYL3QSUxiaPX1Che7fnMIL20jC9cJ3kUfzuILxSaFXdTuPxz2xox4JZ4yfdvaGDozG4DsTA0o0ZYSQQ8i2l7zNYQ136wputv5lfVrDGZ2bniRuAx99qKFrdTaYy4RuGdBDqS1g9OBwS6Wol0TgA291cWuxgINnACTQT5jcGQ0Z1ovcCkXjYR3Uk2vSz5a7pWxT27ZOSyGZ4s7g48aBaQCQIX9W5P235Aksw3AF0t2FTzYaVLwe5gYnZEtmdeh8CZOH8ZoDh3hcmEKbAH1kxHYqUzileGXWtGU8sR171Wf5Q6CXVnVL3gKJwUyLjESokumSA3vAJXysiuowShTVu8YBj5qqMzq1Jkuc6pg8xAOxfSrQOmDp6ul2uCtQWk5GcGQxzIxwH8RFhWj5p9zFbjHwkgZYDQoBpaZjXGbLYQUbWm3eCp5OTayu5FBljTodojLXszR789I87EadRyK72h4sSaXP5xvOY55puL3JowcjGdR2WdUT2fFIi0Rnr9uvXhzIuLgvpThUXpwHerVwJY5fTKE8ousbnMbUOCIbMjgJXwfey7ukYiTpm7TiRKGdeX3VukYQzKnOnt68IGljXFSJdg6xFMCfMG1pMXIKVc2BSoRgv0mQwljkgd1kNNUzzxHL1qiiH3ZtTD28uRvkUqcFFuKRogN7wtx4TRSePhK0kktAkVPdR1NhpJ3XZvHdpvvPsv16E86p6jIPwlZtPTmJCXUO0CjMIofAmRG2pmcXzxbKwGqn8yIvVy1QkXGc4f81Cd3sKlQIPYIO9HpkVqX8VdilJYyDQNNxqTiU6OOjigP7yF6ND2wXZScCdpsIS7eMNwoZRKJb3ccoioR15fCuHE8WuxuRR0hTC4D2cgPaFhRDKSBNm4EreCCQMC7Y8Bz7w8lLkE2fu3dmgcS3lC0a9XlnprVL9cpKP2mq8BWrGnWz05fyu41iCwGLseWcnqL8wD7zYwymlO1ptmWIOXVFbD6bgfGclFb5mtGsytRzBccdNE2gKIWTShnXPzsCYWsGAyxTPY1k5LPXZLEAayzmxJExoORuG82I5Aqy1yzcr7ew7mUeejyeJfrWPqL2zQCi6c6AMmaVN8BkLrviWw9DYp2slT5QCzNdcJDgC8pV6epUc7QxBTttuU8zfjwQ4ORYNnpA27xngdO8yIYxCand8ajx5kXpcpBPAELg4LJcPDLZ4HqGSUnEE4cUxrEuSnO1dXWOaTlxaiKjiScTM3SpYZrlOm7yhdce4xxvReEVFkHw6ykbfgo1TEk8wDgmIHBtPOoFOvQFCgrRmi0zAYyYRFiRGX3OyHKGs1qAoEWt4cOG6UJZpSGK8jz9BYag57lTE1yck9r29Y8UbsvLvI1NSLhJQLNk9gmLHRr0iGV7QRpYCkdcTz7eWO6VrmfFq42ngWid4gKSBqi0ts7dGYiv361JyHOKA3soVgjJ51dyJ64zdSpuJoa7HpKeUFfmhRA7uq6Ztsg1vmoVBjRdZe6SOLCtux2Cw4HbxDSEJBlVVLr99atQ2POJjzC3p6H5bpJK2HJBFWQJgtHF1WorwFNeL855c3LdIbfS5gU3EGxrfqowdYcC3UdaoLrRBFIjOFlzHXohh7Bo0IWHrFZpSt9QjD5eIvHXoLH4EZCrfNLLzHpBP7IIrKNGpVkDjAJ9soXmcmIJ5xt1hyriyho5X3N8hbuandC8cGRhGH4ba182PxBEbnvIVbZ9jX9hKjjRgABwr3GSPSUvQGbm0aj9myAnieCLALeVmNNGH92sO9FBY2whV3rcJehz9Q0onTQi2ABBXxQYVJMS7xF62g9gTIYHKZAHfu3MlNTqBaYz2N8zVwWsC1hgJnjHcACjMDENgZaCa373ZbPYLPTHqCecOYTGDTOH1cxYsxyjbsJUBvh2OYGZ7ZmBonymiDV6qiaoKDU1RED6gJTKStyGQ8xdZafxDMxPJx1djx4QzpmJEXODrwRR05UpwxCkWH4nG0RqIOE4keNNRx18Xcx9e8DDWnzNeNOW9fQ6klmcLbZRIXsy7Tg7nGdONu8TJkgojnHFbZ3mIBrkmg5HXRRTRIVLKHmNV3JsfEAqMo3eu4d5f8EL1IbAl5sdbbcf9zLbCjJoS0uRxuAXBsEPqlMYiN5krG4dvUtgNzjBoddnPXhAl3OTx04KA2K2W5wtrfcuYD4uthnWbr9fchyvE99UkfthE8dsAuOIX4yiJyrJ3JjLZndVrYhjfb5HrgEXkLgpzywOGzGqzCRwyN1Nmnkpj8zZwxwhtSZkiyafemzBx7pspb0Hr1l2eEjXucPnvccxtAYzTK7fFHDGE3VAe3HawVUikXXPArgK0YfRZWUo3uwJYRe9UspKHswg2pCQgPOJeVryJWgwRfooZdwTzmJO2noRdVLBFrRUQweyOzU4lgVBxVx72fvr4Dj35kd9mXqUq9fzvRBmNBrsTAiOhGc1Xq5z4C0NUvZEOSJa9AEkvCoiMyvj4Q77mJjr8Y0SXAbujXFyL4pQPfiVk3pTLHNSy0UEj3NHdtCmDmn2k5AimwL3VaXXack396CsdgMqTYfeRlNrqyaz8cRAjBKBH3vfrzlxvLs2A7Hk1lMVxCI71YccSW3R6W9uAuWUREXTJpOtCgN4xtLTjQx7CrO4AiIB8XsHROooTmWHaRQZ254NO17hbtgfvUZNlHnRzF6hreYDkUSaKb9AWWQkxYCNKeayjJgrXplfvXQfH6B7jXVXdvhjgOajJsbRCmx5POMmqQKrecNPRcPE9TMOAPLEyHIEd2qp9z576s91Jkiw3xg1RqlkKoY4L5yv8TdKe3BuN4SFNxmbI7KeAM2O9AVInsH37Hlw5cBgFRCoxrOfs9bv7TrjcusBQsSu5JWJ8xAlN4cJt66SHol1j7QHHXHRHZCkdNqGpIG0uwC8jNKVQN32UZxKwNBXJhGkopXu1z4JoWh5fh19ojUVoDVnTGaLqv7cSfofOTBEII4FdfEy04LAzEOMT8Hp1ZkBpHjwJ8IdJ2fX3XZ9b6tcCK2Fv37vAxijBFYaMgUW1Ll4GEgcft599wtnaU1ICeWMJM2tvjQwWUMw0JvtdV06wvBTmlnN35IhWn8wUUDGtwt62ItvhPqSgNHkKR8zaPKvB5Dclnfpw10zaHrQhwdtsxucTqrtdgCe4d5UZNCbZZvMIdgQscG6I0o72ksErRemzSMwW4i2Zejzuwr1N5Ow6jR1BI8EIRRWoDJOQRbMb41ETfiObhOAUOeMafT7NmNHbDncm2VelfwV56pJa1FuAjD9Y7rYvGoE7iJ7wDKLSbgBQN94r1gnRjjixsXr7RCEVz7BZ1YNxpD5BdKvfT4KsJcfh9gYYsLzAc1Hk5ZpBCpZO1QBl2LLeS7HKmWEaZhxVxB1B3tTJz2YGdYtLTByvZeiv3lffi5FTFoUmB1Re9j29FcgWGITRdOcdVG9hbVyZUVQOYKJUtfwRugsoKGsuQx3dvr5VgpAyYXO6ybEqtjgT2YxtsasAXToO5XViTRy3oVkUZvjfSYBdoUbPT2UYr0P4L42tFw4pWi2FEwIU0cKUC7LJSl1TjJB85ZL41J8qUFlwl8MCw1gDMpMnVIdW38R3XqtqUNbRbOvgayTWGiBO1kWDi40Mr8jzRIKxDKJAur0YySwaVocY37ZRelZftCeziaRHX3D3LBY31DiVVwHxHwU0ZLFBohb5KYA0s0W6zdxp7pDUg8KvVRB7l6RqbJnILOH2mUTrSUHBTTLkGh45LtBRwh9pH78Ay4ztrEqetR33Cv0jMy9lVibFxM7PMPPdu9m0HOJVm33Uh0NSV7yVobcp525djoX6vFdmKIJDh8N447bd1DCO28TXxYgfg7sjcVGbzHJNgsM1VHTl5PzYlZzWxxNVGTKK3I70GKY6b7SsksarR1Zk3oo8Qtzr3fq2gxQpF9lDO62IxwfJqyeQplNH7yTeVvgFpnCSLx00P74GwGsr4rmc76lYXpiacpMEjgYHNcKry03zGIc6zVtbtyVsOGFZiwQezl4U2h18iwLRVswJlFhED2cqeUX7cGfv5G15iL3xYBtsSE4dCqzikh8eupJBuZLPuRY4kcSbzLFJufOusgSvb7WRfyN2IhWA8mIPiQseUywNbxi3QXXFdAmPwMC1i9JQ8Hx2tWzyeM0MleFXLs0kYCJgrKqLuSIwhA3oc4uXzbSz58oGIMTW4aE25zZXLUwXCcSRf2NMjI6xOkoycrZ8NGiV6PATiSQkGq7MxW5qc7SqEsrDSJRviAf2q1mVegXqKBRxOY3zR3SMalTGPaeM9P9HJQISqlPzOX3hOwNi6vR5cypj91hRoVkUBbZaG9yAdzkfQLrnt9NXHl2E8j0mPAH39tQSlBIirKMcmc29GSHoTAsfA3XxQz4i5X5l45Oxn9tdhFtFnPDwfAh6kFWMpXpKCyguqQ9N4w7W2en9305aKhYkzn5yWPtOt6LnDh7wHupLrDP9chweXsqqtxr46XuORLctrWlwIYVLnOG80RYOLghr4F2dyfAer9aMBaSPpybx1JFgOOICmPCRG8SiQQpX3sPNmT3Cj36vTi2OC6hM0RxgWOFqrMmwAMm4wBB0ukft22DLX7QbKVXkiNdoVDq6Q0BIF9U8gV4E40kvqqnwM56loEfBuYiFePURX1aw9iR3rRhaeY0CHwfV7BEyqKNUFDDzjXzCSV2dtxcv0MJd6mu1XhkICEt1z288tjPrVv2j1qfe6U9vHrO9jGJIFCAuwJe7ouLXAycZDqZ9iWvJETOU4sYGWJFFzFJeHsA63xfX96Q0E4WJTkhKE8OGm0WrhglV5ZAnwbNQBDZ52fuw3Y3rQXiU87KCZTJoWs8GP3QQKVMSR6sFtAGyEJ3dOIk8tMMYxhcZPj8cDQ3BIA4qIBRywGTmpiOZ7sYS7zO5Df55IAeNDUvmETy5QQku6kQjfeseFqgrGmvdE17mD6zCcDmspf6IFZWEBC2pb55VhOIfX3Q5tfnfQBmisyirJZKy9QTTNwl09MomyBKw8XeBBgoc7pd2CRh17QT194OyAlwYNDM4KBOAqOAPkTmUhipiFzDemBPUDN3mVuVNVx7ZaJpOJF6aEvolnT9PiFMLRbOkFgiZAsQRIEyqdlTLt4fDpD0U9SNfYDmm092bwmLhKSvfQNmnXaHUz21ooTAkFcIVPatIhayhRO5MCFPjEf3Fwdp2brMVMI6C4ic4Win5qw9WiEBn5NqNAS4HSmX7GnFYvhs1OpJ3wZIEyT0CDj0woaBen5j4aAqXdpzFNWvxDL2UrhRSIFiJ5ELoER2nWqFEK8fOd5mFSXcqJlcejevLOqmubhuYdVMqtQ2WlRDjUNJVgAQ5JxE5c49B9R3aLy1hNyedOS1ApHn0RJ6iAxWMLFOsiajlhu4vCP1uNe8EMbAnu0pmaRAs0e5OpPpJkS91DTcZQ822fZ5tjkY7ATkxJcLDDt5pVKLTIs6HnAlrzVFjnUEvMrQSIo9xbDptTNOMqQ4shGSYe8QiAW4vLLoOvjJkXrghwWUnvsYcLKvwAQ0cTNE7g1xX6KOBDdBT8rvGDrAW1803702BohvPMX3ES5elx5B4NkL3OUoAP2b9jdzzodGhl7IdyEUPtA9QqwLVV1rGMFpNbxow5E5jVAgJWvOnjoe12JQCMCXoSWWAPZDsE4mxD3HxshEChzpgKulSsr5eKwmLMAKaYz5GCzIXdfTP9r628oezuYzxSvevAx1K6k5JqGgiKcZ7kFDXfNrr7UnpzgMN0kpgmdjgnKMJvViedVmlwJ2VX5CtZWCg7VthFpQYkxj2xca1IfTUrb8rk01x9q2BO7j0FliN4fY09cTfuCzl9E49HKFpkLKznUa4itV7KUIx1KMpxI0Ya7pX0ahwXG3imxmOYxBN0PtilhKeQ97N7QwdQdmSvapAp4auFBSyfSWsVqDg0JRRODf6VRsLkkM7iiB94X2lIYiLA1EnNqAloGAHOltdg1F2FMfKRLh518xhJBdBegSfzAQbBGZ7f88mYi4Ei55FQW8p5bdGbcubd8M7sdWLcjavWNgtEfr1wr2SwV5FFRWZVu313BQtPrZ2ODBepRy3Tu0HZPE1GCsNJvZMiKIWpLSa5PJRl5HAPfJVbwCLFNrRpFyT1Ks8G6knrdjURUPLPfYITjIPH49yz7NTViCmiWkDc7GFSn6PRd1hh22c8GmWzIiYIORcIGzZrkwYu3L2ECIFhmH4CWMlrTztubS5UgFYdF9jREkAkjfLp76QPEqf4TABPSHtK7uucZw5rkjuoFQNdhauUWvJcleChUAsGlliLcxDajTkzvYBUunt7GCHuRPYanIIbSZiOLl5GtbxnspE5QYf582X4yWFz6ohCNFhBwfXUOTqKWhFMH6Z7QN2YqWpFkktii51bj4eDXXK7bstq9zoPKB7FW7KZtPGEAAr7VV5ALuKmkFkW60rcG6xHsmxiv1ZQp3ejvz5NquIzSLoCiGK08BIBVN8yTqJLRZM3aX9U4EcNPLHXXtC3pOdgwUa99n4fwYx1ddOIADFhQDVLHQy31RWFPipqRUWSQR2AYDlNdjmdyTqJBe5CO91cswO60hk7FAHjGlu48zZgco4oDD7tLn9BSafwh9uxhNjhTXsWbeMO1t4DJ8dAZd6BGThjcfLCPFawfBfSvTNt73Cq6AkmRMSRdxnwPjAYbL3Cm4wPRJ2sKK5raRuC1MNY126QD7GzDugGoOYgF0cTckbeu6VaofNMkqyDqBdOVpQPXHZX68wbqTKoKsv3x95vxWAXVN4TyjtsWySFPKIlbQhojHJ3c975jlCo3rhyCquVnm9FiYnCY80aRqG0ABcSAlqUHQljbHPRqzVaIKXWCheEps4cSdb4RNRHyyrf84380almWd9DmTqfCOtJE3KI8dcuptL7KEtJe87Ggfls04N1syrYEY0ZOg6l4EQBc2J3aM5jquilOc2yeGtoPdb2dyv8mScjtE4msoH9GNrjZOMHDX3i9okVVwqnyqJhADSg5EzK5EpEfG7yxKgcYjrs9pOAnNs5OzhWlf6YcZ4VhflXm4amuY5Tf7AbhuxOjxiFglR1oxPLRMOBqEOlYwTe1Hb7u4f3pm4fX5w0GczNdojb7IffhYwZ2zju8gQEidySAoxgWQAiZmSxeJ21WCE4YcDulyL7NeqSeHN3tdmNfMQ9nwkjm8geu40n3YZBeNJZkuLlNvrxx0Ah8lVBuAC6BfWPkcKWsUQYX8rShiWIbHJJrYzO0D27TGWPZE1rbL8z4zuqebzAh5ZDy2JtQ217FvlFOTol0kJVjB6QE9Sm7P4jEKG4RNSdKO8Ixb6dAETYOKSoC3iFvjHyykZjovvDCACkUneTLgVhrAL8Cminu4Vsbsmz8SMHtVv3MTE6TlWJMksOT591LeJqLzdwc7gWZYRczzbY5btLbVyCUKkCShiiz8NU5xlLNtl0bXfxPrxyskz4ZzMZEFNM5ryjJUasT3pSnuYjcwdEK5ri6FaIArc24MWcmWJivb8NyLyabrksePJORoLbHRRnMbMmYM49JZPcUa2Hf7Jc4zIn0Mf4emMXWpKJPuR9qkhIIMaA8cWM9T3mgprc8Unb4LGqhDXptyr1dmEYKPZRSZv0ILKZSeSWZVYTZ1HWoSlVZFg5EiKj9QFBtthcCZdrz3sc8ugcdRboHXGEH5SSbfZVM9cy9JOtL2nO2ZDmMUx3dSZm8zXIBL5Upom8Z1PHkGIvXa8qgh4bJDzSTIyrLL9TCHSCiltWW2MzDkcZu0eBKyKrpJ9jiqnvLgBN6tsuDkhtBKk1IYi9MYEYE6YjiRSQhZubtUVYKopCpRM4igc8StqsjSRvZShI4ei2cYCW0yKQmhKq6QeiJFECrUHXrZK0fE3eAuXNunXrPcXK8daorknhmyOIXO2xa1CtEwCwAd4cfDx8sJ3RZIdSwnKTLqMQN38ovShWkiIGpFMpTi4LUYl32DMKwjodXGHslYJaSxv02qTo2kheqhcfqlQrEQQJ5tvdmLKGaoCKYiwAoVHgDS6QD6q21YTGzRzAYcdNR7RwVwY0LBH6Jd7hHRriexAG8uvEgLRdKEvVoAsowVVsgfaKNSQBQgV6bAHgVZYobCPbsY3OCbVxSyjcbWCtJ4u5dtpFrRE05hulVBaXXw24tsoT1139nWeiDllBxqFCOlWphpb77PhzUPegsI7YDPKiXFqQB7PsAvhHVNm9X04FN4PPHTRchrRmc8BQ1xfP36mVCMTN1W5zJovdFyvd5EG4kymc456KMefh7oOCZZP0JVDc4WiNmK1pdN492zBOTY1Jbsi5HQXSs5lpYHlH56AdvaAbeLSt1izQ2Lbk5V6UgiBL5AwgZmHRkdhAcZYW0elh5CjdKT1cg1ngJMbSHKl756Pyioc73ZQvdXOuwHK7ClMQ6VuNdJnoy0sExffYE1AKO4ORHJnbXwjqZnmNJuYP6FR3vzCv4OXgzg6sh24oy9ofUYX6VMdi90UabMOcA9YIzfTKCtA9dxgvVHnUigz97dkthRKk71br0wJTsv01Sk9KjD6CdNSoWaB1tm3T6ADV3dA9DKj2oX1YsFovUgcDtmkhjmC7hdKi6Up4PeIYlfvg8B3cZtncNy9bhWHHtOw1PtKLTcgPpJyW2a8SjsyNxnLLtJosf6gJLrPyYAA55q87bDTYS6iUHriTnrjlRTPGNqWRMsPWLRxKboaHKnCC8RBPAFOgiJhNROJ2rGDWrKSWnaedxVczwoCswxbBmcCwr6YysIgDih5ewdgZvCWaR4gakVHyyMcVsFhOaR6UZRV4cXcREanMAbMPNnyrZXCybbVua5kgXMKbYehmM8qZZi8rhkbcjquAFQnEUjWYhL4vZk2PR8wk0TIMkRx0NWSAR8PeJkRx03JAj8aSsgMWtxCfX1yMR02IXYT5kEMKws1Cq8bUG1NMYfkzgcgKJeHpvZzWPmhhqSKUOZnqvng5RgkUjygToobecX80JbjD7rwezbeZaLy4Irl82fX1FUyrSu5xEwVTRpnkGG6BMLKqDdUhuTu7ZcmztjDuxP0w4MaE2yrbbkctbNsaHTEtIzjFbKdocRZvQjJlW2bn3uYKiGXoVo806B49oRmZToPgsUJ8i3wBXRDnfscilqCMfjsOcXo8i2PjLvXJbUZSlJbWeDAVzyBUrYQamdcZlKM1Aqnte2hwqus2J9fq4PbIxhXgDt2z6pEqhPHiiAdzoesKxW38g22DFSrnYwryekslCqlQhVkdAkoDULSlR5gz4SQyVobvWalGpAo2MEwmLaT70qItgCHv0TVGvmQRyTzK201LGrd8slwwmHl9yYiQgu2nrNC3WvetveHdAP7UlhnY1CtNG9tQFHzmefW6ohg9v0
END_OF_TEXT
Use the slow regex on it and note how slow it is:
text.gsub(/(?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/).to_a
Use the fast regex and note how fast it is:
text.gsub(/[^\w+.-](?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/).to_a
I figured out that this problem is specific to the type of data I used in the example (not a lot of spaces). If you run it against RFC 3986, which is much longer, both versions are equally fast.
The first pattern is slow because it starts with an alternation and the first branch of the alternation is very permissive since it allows any number of words characters or dots or hyphens. Consequence, this alternation takes a lot of time/steps before failing.
The second pattern is faster because (?:[^\w+.-]|^) (that is an alternation too) works like a kind of anchor. Indeed, even it is an alternation, it is quickly tested because the first branch matches only one character and the second is a zero-width assertion. So it takes less time/steps to fail. (in particular because it must be followed by a word character or a dot or an hypĥen, that is a binding condition)
But you can write this pattern in a better way. Since your are looking for urls, you can be more precise for the begining: the url can begin with, lets say, "http", "ftp", "sftp", "gopher", "www" (feel free to add other schemes if needed).
So you can describe the start with:
(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)
To limit the cost of the alternation (5 branches to test at each positions in the string) you can use two tricks:
you can use a word boundary to quickly skip positions that are not the start or the end of a word:
\b(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)
you can add a lookahead with the first letter of each branches, to quickly avoid uneeded positions in the string without to test the five branches:
\b(?=[fghsw])(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)
So you can write a more efficient pattern like this:
/\b(?=[fghsw])(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)[^\s<]+/
In short: a pattern is efficient when it fail fast at bad positions in the string.
An other possible design that uses more memory and needs to check if the capture group exists for each match, but that is faster:
/[^ghsfw]*+(?:\B[ghsfw][^ghsfw]*)*+|\b((?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)[^\s<"&]+)/
(the idea is to divide the pattern in two main branches, the first one describes all that you want to avoid, and the second describes the urls. The effect is quick jumps to key positions in the string)
Note: when patterns begin to be long, you can use the free-spacing mode (or comment mode...) for readability and maintainability:
/(?x)
\b (?=[fghsw])
(?:
https?:\/\/ |
ftp:\/\/ |
sftp:\/\/ |
gopher:\/\/ |
www\.
)
[^\s<]+/
or you can use a formatted string and a join as suggested by Cary Swoveland in comments.

scripting logic : matching patterns

I am trying to figure out regex/scripting logic to parse something out like this;
RAW DATA
{CLNDSDB=MedGen:OMIM:SNOMED_CT;CLNDSDBID=C0432243:271640:254100000}
Here, the value is;
MedGen = C0432243
OMIM = 271640
SNOMED_CT = 254100000
Result: 271640
I am envisaging a convoluted if-else loop to get the result. Just wanted to know if there any simple way of get the same result. Much appreciate your answers.
Perhaps something like this: (assuming there is always three fields)
(?<=[=:])(?<key>[^:;]+)(?=[:=;](?:[^:;=]+[=;:]){3}(?<val>[^:]+))
The idea is to capture the field values inside a lookahead assertion so as not to be interfering with overlapping substrings.
However, there is probably a cleaner way that uses successive split.
It's difficult to tell from the question whether the input string is two lines or one:
str = 'RAW DATA
{CLNDSDB=MedGen:OMIM:SNOMED_CT;CLNDSDBID=C0432243:271640:254100000}
'
or
str = '{CLNDSDB=MedGen:OMIM:SNOMED_CT;CLNDSDBID=C0432243:271640:254100000}'
but, in either case I'd use a simple pattern:
str = '{CLNDSDB=MedGen:OMIM:SNOMED_CT;CLNDSDBID=C0432243:271640:254100000}'
medgen, omim, snomed_ct = str.match(/(\w+):(\w+):(\w+)}/).captures
medgen # => "C0432243"
omim # => "271640"
snomed_ct # => "254100000"
Here's the pattern at Rubular.
I am envisaging a convoluted if-else loop to get the result.
Well, don't do that. Most programming solutions are surprisingly simple, so start simple. As you learn, your programming toolbox will grow as you become familiar with new ways of doing things, and you'll find certain tools are more useful for certain tasks. Still, always start from "simple", get the basics working, then carefully add to handle the corner cases.
In this case, when using a regular expression, it's important to look for landmarks in the string that you can use to locate your target text. In this case the trailing '}' is usable, so I wrote three simple captures to find \w strings separated by :.

How to build a regexp for matching word without `"."`

Suppose I have a string like this:
w="abc#name,xy.abc=abc"
I want to replace the first and the third "abc" with another string. I used this code:
puts w.gsub(/\babc\b/,"replacer");
# => replacer#name,xy.replacer=replacer
where the second "abc" is replaced, which was not what I expected. Then I changed to the following pattern:
puts w.gsub(/[^\.]\babc\b/,"replacer");
# => abc#name,xy.abcreplacer
where the first "abc" is not replaced. I have no idea now how to fix it.
You can try
/\b(?<!\.)abc\b/
but it's a rather brute-force solution with negative look-behind.
Similar to Tass but using a negative look-ahead
/\babc\b(?!=)/
I'd simplify the regex and rely on gsub's ability to take a block:
target = 'abc'
replacement = 'foo'
'w="abc#name,xy.abc=abc"'.gsub(/#{ target }#|=#{ target }/) { |s| s.sub(target, replacement) }
=> "w=\"foo#name,xy.abc=foo\""
The patterns you want are simple:
<target>#
=<target>
Find those, then do a simple string substitution.
Doing it this way isn't encapsulating all the logic into the regex, it's breaking it into two separate steps, which simplifies the logic, speeds up development time, and results in code that's easier to maintain.
Regex is a powerful tool, but sometimes you don't need a complicated pneumatic hammer, you need a small, simple, claw hammer and a screw driver.

Matching keywords with sentence database, how to avoid duplicated keywords in results?

I'm very new to programming and am a beginner in Ruby. I've done a lot of searching to try to find the answers I need, but nothing seems to match what I'm looking for.
I need to make a program for work that will:
Get keywords from the user
Match those keywords with the same keywords in a database of sentences, and then
Spit out randomized sentences that:
contain all the keywords 1 time
do NOT contain keywords not listed
do NOT duplicate keywords
Important to know: Sentences all have a mix of several keywords, NOT one per sentence
1 & 2 are OK, I've been able to do those. My problem is with part 3. I've tried long lists of "if include?" parameters, but it never ends up working and I know there must be a better way to do this.
My grasp of Ruby (and programming generally) is basic and I don't really know what it can and can't do, so any tips or hints in what functions would be useful would be very very much appreciated.
If the match is found, why don't you consecutively pop it out of your array/db? It will ensure no duplication, since that record would not be present to be matched later. No?
Consider this snippet:
db=%q(It is hot today), %q(It is going to rain), %q(Where are you, sonny?), %q(sentence contains is and are)
keyw=%w(is am are)
de=[]
keyw.each do |word|
for index in 0...db.length
if db[index].include?(word)
puts "Matched #{word} with #{db[index]}"
de<<index
end
end
until de.empty?
db.delete_at(de.pop)
end
end
db is database example and keyw contains keywords.
Corresponding output:
Matched is with It is hot today
Matched is with It is going to rain
Matched is with sentence contains is and are
Matched are with Where are you, sonny?
No duplication. :)

ALL CAPS to Normal case

I'm trying to find an elegant solution on how to convert something like this
ALL CAPS TEXT. "WHY ANYONE WOULD USE IT?" THIS IS RIDICULOUS! HELP.
...to regular-case. I could more or less find all sentence-starting characters with:
(?<=^|(\. \"?)|(! ))[A-Z] #this regex sure should be more complex
but (standard) Ruby neither allows lookbehinds, nor it is possible to apply .capitalize to, say, gsub replacements. I wish I could do this:
"mytext".gsub(/my(regex)/, '\1'.capitalize)
but the current working solution would be to
"mytext".split(/\. /).each {|x| p x.capitalize } #but this solution sucks
First of all, notice that what you are trying to do will only be an approximation.
You cannot correctly tell where the sentence boundaries are. You can approximate it as The beginning of the entire string or right after a period, question mark, or exclamation mark followed by spaces. But then, you will incorrectly capitalize "economy" in "U.S. economy".
You cannot correctly tell which words should be capitalized. For example, "John" will be "john".
You may want to do some natural language processing to give you a close-to-correct result in many cases, but these methods are only probablistically correct. You will never get a perfect result.
Understanding these limitations, you might want to do:
mytext.gsub(/.*?(?:[.?!]\s+|\z)/, &:capitalize)

Resources