gsub or sub out the first '_' but not the following - gsub

I have mistakenly linked up some columns with _ but now I need to split them on just the first _.
For example the names are plate_Vancouver and other names are plate_1_Vancouver_east, but I want them to split into just col 1 being plate and col 2 being Vancouver or 1_Vancouver_east
I was going to attempt it by replacing the first _ with / then splitting on /.
sub("\\_", "_", combined, fixed = TRUE) #remove '_' and convert to '/'
but something is wrong with the code I think?

You can use string::str_split to split with an underscore into two substrings:
x <- c("plate_Vancouver", "plate_1_Vancouver_east")
library(stringr)
str_split(x, "_", n = 2)
Output:
[[1]]
[1] "plate" "Vancouver"
[[2]]
[1] "plate" "1_Vancouver_east"
Or, you can use two sub calls:
sub("_.*", "", x)
## => [1] "plate" "plate"
sub("^[^_]*_", "", x)
## => [1] "Vancouver" "1_Vancouver_east"
See the R demo.
Here, sub("_.*", "", x) removes all after first _, and sub("^[^_]*_", "", x) removes all up to the first one.

Related

Don't understand leetcode question about isomorphic strings

In the leetcode problem, isomorphic strings the input outputs to false.
Here is the question...
Given two strings s and t, determine if they are isomorphic.
Two strings s and t are isomorphic if the characters in s can be replaced to get t.
All occurrences of a character must be replaced with another character while preserving the order of characters. No two characters may map to the same character, but a character may map to itself.
Example 1:
Input: s = "egg", t = "add"
Output: true
Example 2:
Input: s = "foo", t = "bar"
Output: false
Example 3:
Input: s = "paper", t = "title"
Output: true
"bbbaaaba"
"aaabbbba"
This is not clear to me because if we follow the instructions I believe we should have true as the answer.
My understanding to the problem is that any character from s, x, can only be replaced by a character from b, y, if y is said to replace x.
First off, strings s and t must be of the same length, otherwise return false
We know that x is replaced by ya if x has not been replaced by any other character yb where ya != yb
We also know that from the instructions we can freely replace any character with the same character from both strings.
The replacements must occur while preserving the order of characters
With the example I gave above a walkthrough my solution would be as follows...
s = "bbbaaaba"
t = "aaabbbba"
for index = 0, replace b with a, so b -> a
for index = 1, replace b with a, so b -> a # we can do this because a has already
replaced b
for index = 2, replace b with a, so b -> a # same reasoning as above
for index = 3, replace a with b, so a -> b # we have not replaced any characters from
s with the character b, so this is allowed
for index = 4, replace a with b, so a -> b
for index = 5, replace a with b, so a -> b
for index = 6, replace b with b, # this is allowed because both characters
from each string s, and t match at the
current index
for index = 7, replace a with a # same reasoning as before
After finishing we should return true successfully, how come the answer is suppose to be false?
All occurrences of a character must be replaced with another character.
If you choose to replace b with a, all b in the original string need to be changed to a. Therefore, the string would be aaabbbab instead of the desired answer. The answer is therefore false.

Find first unique character in a string in Elixir [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Given a string that contains only lowercase English letters, I am writing an Elixir function that finds the first non-repeating character in it and returns its index or else -1.
Examples:
s = "leetcode"
should return 0 because "l" is the first character that does not repeat and the zero-based index is 0
s = "loveleetcode"
should return 2 because v is the first character that does not repeat and the zero-based index is 2
The following is my solution so far, can you make it better or fix it?
defmodule Algos do
def first_unique_char_index(str) do
arr = String.split(str, "", trim: true)
indexes = Enum.with_index(arr)
first = Enum.frequencies(arr)
|> Map.to_list
|> Enum.sort(fn ({a,_b}, {c,_d}) ->
{_char1, i1} = Enum.find(indexes, (fn {x,_i} -> x == a end))
{_char2, i2} = Enum.find(indexes, (fn {y,_j} -> y == c end))
i1 <= i2
end)
|> Enum.find(fn {_char, num} -> num == 1 end)
case first do
{char, _num} ->
result = Enum.find(indexes, fn {x, _i} -> char == x end)
{_letter, index} = result
index
nil ->
-1
end
end
end
Algos.first_unique_char_index("aabcc") # returns 2
Algos.first_unique_char_index("picadillo") # returns 0
Algos.first_unique_char_index("dood") # returns -1
As a sindenote, the problem is from the "first unique character in a string" LeetCode puzzle.
The below is probably the most performant solution; I decided to put it here because it reveals several interesting tricks.
"leetcode"
|> to_charlist()
|> Enum.with_index() # we need index to compare by
|> Enum.reduce(%{}, fn {e, i}, acc ->
# trick for the future: `:many > idx` for any integer `idx` :)
Map.update(acc, e, {e, i}, &{elem(&1, 0), :many})
end)
|> Enum.sort_by(&elem(elem(&1, 1), 1)) # sort to get a head
|> case do
[{_, {_, :many}} | _] -> "All dups"
[{_, {result, index}} | _] -> {<<result>>, index}
_ -> "Empty input"
end
#⇒ {"l", 0}
This is a good little puzzle, and one that could be solved via a couple accumulators. Instead of splitting the string, you could work with the internal binary representation, or (in order to skip the extra complexity involved with encoding) you could convert the string to a character list and focus on the integer components.
Here's a possible solution (not thoroughly tested):
defmodule FirstUniq do
def char(string) do
[first_char | rest] = to_charlist(string)
eval_char(first_char, 0, rest, rest)
end
# Case where we hit the end of the string without a duplicate!
defp eval_char(_char, index, [], _), do: index
# Case where a character repeats... increment the index and eval next char
defp eval_char(char, index, [x | _], [next_char | rest]) when char == x do
eval_char(next_char, index + 1, rest, rest)
end
# Case where the character does not repeat: keep looking
defp eval_char(char, index, [x | rest], acc2) when char != x do
eval_char(char, index, rest, acc2)
end
end
# should be 0 (because "l" does not occur more than once)
IO.puts(FirstUniq.char("leetcode"))
# should be 2 (because "v" is the first char that does not repeat)
IO.puts(FirstUniq.char("loveleetcode"))
The hard work is done by the eval_char/4 function, whose multiple clauses act something like a case statement. The trick is we have to keep two accumulators, which is analogous to having a nested loop.
I would recommend Exercism's Elixir Track for presenting many of the common patterns that you'll encounter in the language.

How to remove multiple consecutive white space in Prolog using only get0 predicate

I have a program that read a string from the input and removes the multiples blank spaces between words changing them into single withe space using both get0 and get predicate:
squeeze :- get0(C),
put(C),
dorest(C).
dorest(46).
dorest(32) :- !,
get(C),
put(C),
dorest(C).
dorest(Letter) :- squeeze.
This is pretty simple, now I have an exercise that ask mw to create a new version of the previous program that use only get0 built in predicate
I am finding some difficulties with this version.
This is my personal solution of the problem (that don't work well):
squeeze2 :- NumBlank is 0, % At the beginning of a word the number of blank character is 0
get0(Char), % Read a character that could be a blank
%put(Char),
dorest2(Char, NumBlank).
dorest2(46, _) :- !. % If the read character is a full stop character the program end
% Read a white space but the number of white space is 1
dorest2(32, NumBlank) :- !,
NumBlankUpdated is NumBlank + 1, % Update the number of blanks
NumBlankUpdated < 2, % The number of blank have to be at most 1
put(32), % Print a white space
get0(Char), % Read a character
dorest2(Char, NumBlankUpdated). % Call dorest2
% Read a white space but the number of white space is > 1:
dorest2(32, NumBlank) :- !,
NumBlankUpdated is NumBlank + 1, % Update the number of blanks
NumBlankUpdated >= 2, % The number of blanks is >1
get0(Char), % Read a character and don't print anything
dorest2(Char, NumBlankUpdated). % Call dorest2
% Id the read character it is a letter (not a blank) print it
dorest2(Letter2, NumBlank) :- !,
put(Letter2),
squeeze2. % Read an other character that could be a blank
My idea to solve it using only the get0 predicate involves count the number of white spaces, and based on that value do different things
The squeeze2 predicate is called when a new word begin so the number of consecutive withe spaces found it is 0. It read a character from the input and call the dorest2/2 predicate.
Now I have divided into 4 differents cases the dorest2/2 predicate and using the cut operator these cases are mutually exclusive (like a procedural if):
1) The FIRST CASE it is related at the read of a full stop character (the '.' character) that corresponds at the end of the program.
2) The SECOND CASE it is related at the read of the first blank character between 2 words so this single white space have to be print by put predicate. In this case there is an update of the counter that count the number of the sequentual white characters.
Then, another character is read.
3) The: THIRD CASE it is realated to the situaion in which the program read a second consecutive white character, in this case this withe character it is not print and another character is read and the white character counter is updated with the new number of sequential white characters found.
4) The FOURTH CASE it is related to the situaion in which the program read a character that is not a white space or a full stop character so this character have to be a letter and this means that a new word is beginning. So simply have to print this letter (by the put) and call the squeeze2 predicate that reset to 0 the sequential white character counter and read a new character
The problem is that in the multiple consecutive blanks characters don't work.
If I perform a query like these work well:
STRING WITHOUT NOT BLANK CHARACTER:
[debug] [2] ?- squeeze2.
|: ciao.
ciao
true.
This work well.
STRING THAT CONTAINS ONLY SINGLE WHITE CHARACTERS BETWEEN WORDS:
[debug] [2] ?- squeeze2.
|: ciao.
ciao
true.
This also work well
But in this situation I have an error:
STRING THAT CONTAINS MULTIPLE WHITHE CHARACTWERS BETWEEN WORDS:
[debug] [2] ?- squeeze2.
| multiple blanks characters.
multiple
false.
ERROR: Syntax error: Operator expected
ERROR: blanks
ERROR: ** here **
ERROR: characters .
It seems that the problem is in the THIRD CASE but I am not understanding where is the error because this case seems to me very simple: if the counter of consecutive white characters is > 1 then don't print anything and continue to write untill a new word begin.
Where is the problem? Someone can help me?
Tnx
Andrea
Here some code working, that shows an alternative for representing character constants, instead of numeric codes:
squeeze2 :-
get0(C),
squeeze2(C, 0).
squeeze2(C, N) :-
( [C] == "."
-> true
; ( [C] == " "
-> ( N == 0
-> put(C)
; true
),
get0(D),
squeeze2(D, 1)
; put(C),
get0(D),
squeeze2(D, 0)
)
).
It boils down to a single character lookahead.
About your code: the problem it's the cut after the first space matching
dorest2(32, NumBlank) :- !, % remove this
...
after cut deletion it works:
?- squeeze2.
|: a b c .
a b c
true .

Very simple sexp parser

For an assignment, we had to implement something like a very basic sexp parser, such that for input like:
"((a b) ((c d) e) f)"
It would return:
[["a", "b"], [["c", "d"], "e"], "f"]
Since this was part of a larger assignment, the parser is only given valid input (matching parens &c). I came up with the following solution in Ruby:
def parse s, start, stop
tokens = s.scan(/#{Regexp.escape(start)}|#{Regexp.escape(stop)}|\w+/)
stack = [[]]
tokens.each do |tok|
case tok
when start
stack << []
when stop
stack[-2] << stack.pop
else
stack[-1] << tok
end
end
return stack[-1][-1]
end
Which may not be the best solution, but it does the job.
Now, I'm interested in an idiomatic Haskell solution for the core functionality (i.e. I don't care about the lexing or choice of delimiters, taking already lexed input would be fine), if possible using only "core" haskell, without extensions or libs like parsec.
Note that this is NOT part of the assignment, I'm just interested in the Haskell way of doing things.
[["a", "b"], [["c", "d"], "e"], "f"]
Does not have a valid type in haskell (because all elements of a list need to be of the same type in haskell), so you'll need to define your own datastructure for nested lists like this:
data NestedList = Value String | Nesting [NestedList]
Now if you have a list of Tokens where Token is defined as data Token = LPar | RPar | Symbol String, you can parse that into a NestedList like this:
parse = fst . parse'
parse' (LPar : tokens) =
let (inner, rest) = parse' tokens
(next, outer) = parse' rest
in
(Nesting inner : next, outer)
parse' (RPar : tokens) = ([], tokens)
parse' ((Symbol str) : tokens) =
let (next, outer) = parse' tokens in
(Value str : next, outer)
parse' [] = ([],[])
The idiomatic way in Haskell would be to use parsec, for combinator parsing.
There are lots of examples online, including,
This nice answer on SO.
Or here's another one.
While fancier parsers like Parsec are nice, you don't really need all that power
for this simple case. The classic way to parse is using the ReadS
type from the Prelude. That is also the way you would give your Sexp type a
Read instance.
It's good to be at least a little familiar with this style of
parsing, because there are quite a few examples of it in
the standard libraries.
Here's one simple solution, in the classic style:
import Data.Char (isSpace)
data Sexp = Atom String | List [Sexp]
deriving (Eq, Ord)
instance Show Sexp where
show (Atom a ) = a
show (List es) = '(' : unwords (map show es) ++ ")"
instance Read Sexp where
readsPrec n (c:cs) | isSpace c = readsPrec n cs
readsPrec n ('(':cs) = [(List es, cs') |
(es, cs') <- readMany n cs]
readsPrec _ (')':_) = error "Sexp: unmatched parens"
readsPrec _ cs = let (a, cs') = span isAtomChar cs
in [(Atom a, cs')]
readMany :: Int -> ReadS [Sexp]
readMany _ (')':cs) = [([], cs)]
readMany n cs = [(e : es, cs'') | (e, cs') <- readsPrec n cs,
(es, cs'') <- readMany n cs']
isAtomChar :: Char -> Bool
isAtomChar '(' = False
isAtomChar ')' = False
isAtomChar c = not $ isSpace c
Note that the Int parameter to readsPrec,
which usually indicates operator precedence, is not
used here.

In Erlang, when do I use ; or , or .?

I have been trying to learn Erlang and have been running into some problems with ending lines in functions and case statements.
When do I use a semicolon (;), comma (,), or period inside my functions or case statements?
I like to read semicolon as OR, comma as AND, full stop as END. So
foo(X) when X > 0; X < 7 ->
Y = X * 2,
case Y of
12 -> bar;
_ -> ook
end;
foo(0) -> zero.
reads as
foo(X) when X > 0 *OR* X < 7 ->
Y = X * 2 *AND*
case Y of
12 -> bar *OR*
_ -> ok
end *OR*
foo(0) -> zero *END*
This should make it clear why there is no ; after the last clause of a case.
Comma at the end of a line of normal code.
Semicolon at the end of case statement, or if statement, etc.
The last case or if statement doesn't have anything at the end.
A period at the end of a function.
example (sorry for the random variable names, clearly this doesn't do anything, but illustrates a point):
case Something of
ok ->
R = 1, %% comma, end of a line inside a case
T = 2; %% semi colon, end of a case, but not the end of the last
error ->
P = 1, %% comma, end of a line inside a case
M = 2 %% nothing, end of the last case
end. %% period, assuming this is the end of the function, comma if not the end of the function
Period (.)
In modules, the period is used to terminate module attributes and function declarations (a.k.a. 'forms'). You can remember this because forms aren't expressions (no value is returned from them), and therefore the period represents the end of a statement.
Keep in mind that definitions of functions with different arities are considered separate statements, so each would be terminated by a period.
For example, the function definitions for hello/0 and hello/1:
hello() -> hello_world.
hello(Greeting) -> Greeting.
(Note that in the erlang shell the period is used to terminate and evaluate expressions, but that is an anomaly.)
Semicolon (;)
The semicolon acts as a clause separator, both for function clauses and expression branches.
Example 1, function clauses:
factorial(0) -> 1;
factorial(N) -> N * fac(N-1).
Example 2, expression branches:
if X < 0 -> negative;
X > 0 -> positive;
X == 0 -> zero
end
Comma (,)
The comma is an expression separator. If a comma follows an expression, it means there's another expression after it in the clause.
hello(Greeting, Name) ->
FullGreeting = Greeting ++ ", " ++ Name,
FullGreeting.
You can think of it like english punctuation. Commas are used to separate things in a series, semicolons are used to separate two very closely related independent clauses[1] (e.g. the different cases of the case statement, function clauses of the same name and arity that match different patterns), and periods are used to end a sentence (complete thought).
Or to prove you went to college. "Do not use semicolons. They are transvestite hermaphrodites representing absolutely nothing. All they do is show you've been to college." -- Kurt Vonnegut
The comma separates expressions, or arguments, or elements of a list/tuple or binary. It is overworked.

Resources