I have a project which receives a delimited string through SMS. I am tasked to split the string by colon (:) using the Split function. My SMS server receives the messages and my script processes it.
Sample code:
dim a
a = split(string,delimiter)
dim value
value = a(1)
Sample input (SMS message): abc:def ghi:jkl
Now when I split it, I was expecting value to return only def, but I get defghi instead. Why?
Your output is correct, split() creates an array of substrings which are determined by the delimiter provided.
The substring "def ghi" is due to whitespace being used to separate the characters instead of a colon.
If you don't want the whitespace you can use split again with no given delimiter, " " is the default used when one isn't provided.
e.g. split(value1)
You could also try checking the received string for spaces and replacing any found with colons and then proceed as normal.
Related
Has there ever been an implementation of the field function (page 311) in the various flavors of Pick/UniBasic etc. that would operate on a delimiter of more than one character?
The documented implementations I can find stipulate one character as the delimiter argument and if the delimiter is presented with more than one character, the first character of the delimiter string is used instead of the entire string as a delimiter.
I am asking this because there are many instances in the commercial and custom software I maintain where I see attempts to use a multi-character delimiter with the field statement. It seems programmers were using it expecting a different result than is currently happening.
jBASE does allow for this. From the FIELD docs:
This function returns a multi-character delimited field from within a string. It takes the general form:
FIELD(string, delimiter, occurrence{, extractCount})
where:
string specifies the string, from which the field(s) is to be extracted.
delimiter specifies the character or characters that delimit the fields within the dynamic array.
occurrence should evaluate to an integer of value 1 or higher. It specifies the delimiter used as the starting point for the extraction.
extractCount is an integer that specifies the number of fields to extract. If omitted, assumes one.
Additionally, an example from the docs:
in_Value = "AAAA : BBjBASEBB : CCCCC"
CRT FIELD(in_Value , "jBASE", 1)
Producing output:
AAAA : BB
Update 2020-08-13 (adding context for OpenQM):
As an official comment since we maintain both jBASE and OpenQM, I felt it worth calling out that OpenQM does not allow multi-character delimiters for FIELD().
I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
Could anyone help me here to understand when we need to consider the
below 4 methods:
strict_decode64(str)
strict_encode64(bin)
urlsafe_encode64(bin)
urlsafe_decode64(str)
From the doc also I didn't get any examples. So examples with
explanation might be helpful for me to understand.
Thanks in advance
An example of usage would be:
require "base64"
Base64.strict_encode64('Stuff to be encoded')
Base64.strict_decode64("U3R1ZmYgdG8gYmUgZW5jb2RlZA==")
Strict means that white spaces / CR/LF are rejected at decode and CR/LF are not added at encode.
Note that if the folowing is accepted:
Base64.decode64("U3R1ZmYgdG8gYmUgZW5jb2RlZA==\n")
with strict the above is not accepted because of the trailing \n (linefeed) and the following line will throw ArgumentError: invalid base64 exception:
Base64.strict_decode64("U3R1ZmYgdG8gYmUgZW5jb2RlZA==\n")
So strict accepts/expects only alphanumeric characters at decode and returns only alphanumeric at encode.
Please try the following and see how one encodes wraps the lines every 60 characters with '\n' (linefeed) and the strict doesn't:
print Base64.encode64('I will not use spaces and new lines. I will not use spaces and new lines. I will not use spaces and new lines. I will not use spaces and new lines.I will not use spaces and new lines.')
print Base64.strict_encode64('I will not use spaces and new lines. I will not use spaces and new lines. I will not use spaces and new lines. I will not use spaces and new lines.I will not use spaces and new lines.')
The _encode and _decode do opposite things: the first one converts a normal string into an encoded string, and the second one converts an encoded string into a normal string.
str = "Hello!"
str == decode64(encode64(str)) # This is true
The difference between strict_ and urlsafe_ is the characters that will be used inside the encoded string. When you need to pass your string inside a URL, all characters are not allowed (like / for instance, because it has a special meaning in URLs) so you should use the urlsafe_ version.
A word processor program features a search and replace function. However, partial words (character combinations found within words) are also replaced. To fix this, I plan to remove extra spaces and use the split function to change the string into an array of words by using " " as a delimiter.
However, once I search through the array, replace the appropriate words, and put the array back into a string separated by spaces, the original formatting of the user will be lost. For example, if the original string was "This is a sentence." and the user wanted "a" to be replaced with "the", the output will be "This is the sentence.", with no additional spaces.
So, my question is whether there is any way to search and replace entire words only while still preserving the formatting (extra spaces) of the user in Visual Basic.
What about using a regex?
In a regex the code \b is a word boundary so for example the regex \ba\b will match a only when a is a whole word.
So for example your code would be:
Dim strPattern As String: strPattern = "\ba\b"
Dim regex As New RegExp
regex.Global = True
regex.Pattern = strPattern
result = regex.Replace("This is a sentence.", "the")
If you use the Split function without removing your extra spaces first your array will have empty items in it so you would not lose the extra spaces and can reconstruct your document with the original formatting in tact.
Why is your formatting lost? If you split the text by space, just attach a space after each element when composing it back from an array. But you will also have to take into account words that end not with a space but punctuation.
in "This is a simple sentence, eh?", "eh" will be stored as "eh?" because u split by space. So you will have to program a complex punctuation-friendly formula or simply use regex. Be prepared - regex is... tricky.
I want to fetch all the strings between _(" ") from my file.
How may i fetch that?
Assuming there are no quotation marks nested within the string you're looking for, you want to load the file into a string
str=File.read("/path/to/file")
Then scan the string using a regular expression. The following regular expression should do the trick. It looks for the characters _(" (the open parentheses here is escaped, because parentheses have a special meaning in regular expressions). The next parentheses starts a capturing group (so that the text of the string will be stored in the special variable $1. Then it finds a string of consecutive characters until the first quotation mark. Then it ends the capturing group (with an unescaped close parentheses) looks for a ") to finish the expression.
/_\("([^"]*)"\)/
To use it
str.scan( /_\("([^"]*)"\)/ ) do
puts $1
end