Join a string using delimiters - algorithm

What is the best way to join a list of strings into a combined delimited string. I'm mainly concerned about when to stop adding the delimiter. I'll use C# for my examples but I would like this to be language agnostic.
EDIT: I have not used StringBuilder to make the code slightly simpler.
Use a For Loop
for(int i=0; i < list.Length; i++)
{
result += list[i];
if(i != list.Length - 1)
result += delimiter;
}
Use a For Loop setting the first item previously
result = list[0];
for(int i = 1; i < list.Length; i++)
result += delimiter + list[i];
These won't work for an IEnumerable where you don't know the length of the list beforehand so
Using a foreach loop
bool first = true;
foreach(string item in list)
{
if(!first)
result += delimiter;
result += item;
first = false;
}
Variation on a foreach loop
From Jon's solution
StringBuilder builder = new StringBuilder();
string delimiter = "";
foreach (string item in list)
{
builder.Append(delimiter);
builder.Append(item);
delimiter = ",";
}
return builder.ToString();
Using an Iterator
Again from Jon
using (IEnumerator<string> iterator = list.GetEnumerator())
{
if (!iterator.MoveNext())
return "";
StringBuilder builder = new StringBuilder(iterator.Current);
while (iterator.MoveNext())
{
builder.Append(delimiter);
builder.Append(iterator.Current);
}
return builder.ToString();
}
What other algorithms are there?

It's impossible to give a truly language-agnostic answer here as different languages and platforms handle strings differently, and provide different levels of built-in support for joining lists of strings. You could take pretty much identical code in two different languages, and it would be great in one and awful in another.
In C#, you could use:
StringBuilder builder = new StringBuilder();
string delimiter = "";
foreach (string item in list)
{
builder.Append(delimiter);
builder.Append(item);
delimiter = ",";
}
return builder.ToString();
This will prepend a comma on all but the first item. Similar code would be good in Java too.
EDIT: Here's an alternative, a bit like Ian's later answer but working on a general IEnumerable<string>.
// Change to IEnumerator for the non-generic IEnumerable
using (IEnumerator<string> iterator = list.GetEnumerator())
{
if (!iterator.MoveNext())
{
return "";
}
StringBuilder builder = new StringBuilder(iterator.Current);
while (iterator.MoveNext())
{
builder.Append(delimiter);
builder.Append(iterator.Current);
}
return builder.ToString();
}
EDIT nearly 5 years after the original answer...
In .NET 4, string.Join was overloaded pretty significantly. There's an overload taking IEnumerable<T> which automatically calls ToString, and there's an overload for IEnumerable<string>. So you don't need the code above any more... for .NET, anyway.

In .NET, you can use the String.Join method:
string concatenated = String.Join(",", list.ToArray());
Using .NET Reflector, we can find out how it does it:
public static unsafe string Join(string separator, string[] value, int startIndex, int count)
{
if (separator == null)
{
separator = Empty;
}
if (value == null)
{
throw new ArgumentNullException("value");
}
if (startIndex < 0)
{
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndex"));
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NegativeCount"));
}
if (startIndex > (value.Length - count))
{
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_IndexCountBuffer"));
}
if (count == 0)
{
return Empty;
}
int length = 0;
int num2 = (startIndex + count) - 1;
for (int i = startIndex; i <= num2; i++)
{
if (value[i] != null)
{
length += value[i].Length;
}
}
length += (count - 1) * separator.Length;
if ((length < 0) || ((length + 1) < 0))
{
throw new OutOfMemoryException();
}
if (length == 0)
{
return Empty;
}
string str = FastAllocateString(length);
fixed (char* chRef = &str.m_firstChar)
{
UnSafeCharBuffer buffer = new UnSafeCharBuffer(chRef, length);
buffer.AppendString(value[startIndex]);
for (int j = startIndex + 1; j <= num2; j++)
{
buffer.AppendString(separator);
buffer.AppendString(value[j]);
}
}
return str;
}

There's little reason to make it language-agnostic when some languages provide support for this in one line, e.g., Python's
",".join(sequence)
See the join documentation for more info.

For python be sure you have a list of strings, else ','.join(x) will fail.
For a safe method using 2.5+
delimiter = '","'
delimiter.join(str(a) if a else '' for a in list_object)
The "str(a) if a else ''" is good for None types otherwise str() ends up making then 'None' which isn't nice ;)

In PHP's implode():
$string = implode($delim, $array);

I'd always add the delimeter and then remove it at the end if necessary. This way, you're not executing an if statement for every iteration of the loop when you only care about doing the work once.
StringBuilder sb = new StringBuilder();
foreach(string item in list){
sb.Append(item);
sb.Append(delimeter);
}
if (list.Count > 0) {
sb.Remove(sb.Length - delimter.Length, delimeter.Length)
}

I would express this recursively.
Check if the number of string arguments is 1. If it is, return it.
Otherwise recurse, but combine the first two arguments with the delimiter between them.
Example in Common Lisp:
(defun join (delimiter &rest strings)
(if (null (rest strings))
(first strings)
(apply #'join
delimiter
(concatenate 'string
(first strings)
delimiter
(second strings))
(cddr strings))))
The more idiomatic way is to use reduce, but this expands to almost exactly the same instructions as the above:
(defun join (delimiter &rest strings)
(reduce (lambda (a b)
(concatenate 'string a delimiter b))
strings))

List<string> aaa = new List<string>{ "aaa", "bbb", "ccc" };
string mm = ";";
return aaa.Aggregate((a, b) => a + mm + b);
and you get
aaa;bbb;ccc
lambda is pretty handy

In C# you can just use String.Join(separator,string_list)

The problem is that computer languages rarely have string booleans, that is, methods that are of type string that do anything useful. SQL Server at least has is[not]null and nullif, which when combined solve the delimiter problem, by the way: isnotnull(nullif(columnvalue, ""),"," + columnvalue))
The problem is that in languages there are booleans, and there are strings, and never the twain shall meet except in ugly coding forms, e.g.
concatstring = string1 + "," + string2;
if (fubar)
concatstring += string3
concatstring += string4 etc
I've tried mightily to avoid all this ugliness, playing comma games and concatenating with joins, but I'm still left with some of it, including SQL Server errors when I've missed one of the commas and a variable is empty.
Jonathan

Since you tagged this language agnostic,
This is how you would do it in python
# delimiter can be multichar like "| trlalala |"
delimiter = ";"
# sequence can be any list, or iterator/generator that returns list of strings
result = delimiter.join(sequence)
#result will NOT have ending delimiter
Edit: I see I got beat to the answer by several people. Sorry for dupication

I thint the best way to do something like that is (I'll use pseudo-code, so we'll make it truly language agnostic):
function concat(<array> list, <boolean> strict):
for i in list:
if the length of i is zero and strict is false:
continue;
if i is not the first element:
result = result + separator;
result = result + i;
return result;
the second argument to concat(), strict, is a flag to know if eventual empty strings have to be considered in concatenation or not.
I'm used to not consider appending a final separator; on the other hand, if strict is false the resulting string could be free of stuff like "A,B,,,F", provided the separator is a comma, but would instead present as "A,B,F".

that's how python solves the problem:
','.join(list_of_strings)
I've never could understand the need for 'algorithms' in trivial cases though

This is a Working solution in C#, in Java, you can use similar for each on iterator.
string result = string.Empty;
// use stringbuilder at some stage.
foreach (string item in list)
result += "," + item ;
result = result.Substring(1);
// output: "item,item,item"
If using .NET, you might want to use extension method so that you can do
list.ToString(",")
For details, check out Separator Delimited ToString for Array, List, Dictionary, Generic IEnumerable
// contains extension methods, it must be a static class.
public static class ExtensionMethod
{
// apply this extension to any generic IEnumerable object.
public static string ToString<T>(this IEnumerable<T> source,
string separator)
{
if (source == null)
throw new ArgumentException("source can not be null.");
if (string.IsNullOrEmpty(separator))
throw new ArgumentException("separator can not be null or empty.");
// A LINQ query to call ToString on each elements
// and constructs a string array.
string[] array =
(from s in source
select s.ToString()
).ToArray();
// utilise builtin string.Join to concate elements with
// customizable separator.
return string.Join(separator, array);
}
}
EDIT:For performance reasons, replace the concatenation code with string builder solution that mentioned within this thread.

Seen the Python answer like 3 times, but no Ruby?!?!?
the first part of the code declares a new array. Then you can just call the .join() method and pass the delimiter and it will return a string with the delimiter in the middle. I believe the join method calls the .to_s method on each item before it concatenates.
["ID", "Description", "Active"].join(",")
>> "ID, Description, Active"
this can be very useful when combining meta-programming with with database interaction.
does anyone know if c# has something similar to this syntax sugar?

In Java 8 we can use:
List<String> list = Arrays.asList(new String[] { "a", "b", "c" });
System.out.println(String.join(",", list)); //Output: a,b,c
To have a prefix and suffix we can do
StringJoiner joiner = new StringJoiner(",", "{", "}");
list.forEach(x -> joiner.add(x));
System.out.println(joiner.toString()); //Output: {a,b,c}
Prior to Java 8 you can do like Jon's answer
StringBuilder sb = new StringBuilder(prefix);
boolean and = false;
for (E e : iterable) {
if (and) {
sb.append(delimiter);
}
sb.append(e);
and = true;
}
sb.append(suffix);

In .NET, I would use the String.join method if possible, which allows you to specify a separator and a string array. A list can be converted to an array with ToArray, but I don't know what the performance hit of that would be.
The three algorithms that you mention are what I would use (I like the second because it does not have an if statement in it, but if the length is not known I would use the third because it does not duplicate the code). The second will only work if the list is not empty, so that might take another if statement.
A fourth variant might be to put a seperator in front of every element that is concatenated and then remove the first separator from the result.
If you do concatenate strings in a loop, note that for non trivial cases the use of a stringbuilder will vastly outperform repeated string concatenations.

You could write your own method AppendTostring(string, delimiter) that appends the delimiter if and only if the string is not empty. Then you just call that method in any loop without having to worry when to append and when not to append.
Edit: better yet of course to use some kind of StringBuffer in the method if available.

string result = "";
foreach(string item in list)
{
result += delimiter + item;
}
result = result.Substring(1);
Edit: Of course, you wouldn't use this or any one of your algorithms to concatenate strings. With C#/.NET, you'd probably use a StringBuilder:
StringBuilder sb = new StringBuilder();
foreach(string item in list)
{
sb.Append(delimiter);
sb.Append(item);
}
string result = sb.ToString(1, sb.Length-1);
And a variation of this solution:
StringBuilder sb = new StringBuilder(list[0]);
for (int i=1; i<list.Count; i++)
{
sb.Append(delimiter);
sb.Append(list[i]);
}
string result = sb.ToString();
Both solutions do not include any error checks.

From http://dogsblog.softwarehouse.co.zw/post/2009/02/11/IEnumerable-to-Comma-Separated-List-(and-more).aspx
A pet hate of mine when developing is making a list of comma separated ids, it is SO simple but always has ugly code.... Common solutions are to loop through and put a comma after each item then remove the last character, or to have an if statement to check if you at the begining or end of the list. Below is a solution you can use on any IEnumberable ie a List, Array etc. It is also the most efficient way I can think of doing it as it relies on assignment which is better than editing a string or using an if.
public static class StringExtensions
{
public static string Splice<T>(IEnumerable<T> args, string delimiter)
{
StringBuilder sb = new StringBuilder();
string d = "";
foreach (T t in args)
{
sb.Append(d);
sb.Append(t.ToString());
d = delimiter;
}
return sb.ToString();
}
}
Now it can be used with any IEnumerable eg.
StringExtensions.Splice(billingTransactions.Select(t => t.id), ",")
to give us 31,32,35

For java a very complete answer has been given in this question or this question.
That is use StringUtils.join in Apache Commons
String result = StringUtils.join(list, ", ");

In Clojure, you could just use clojure.contrib.str-utils/str-join:
(str-join ", " list)
But for the actual algorithm:
(reduce (fn [res cur] (str res ", " cur)) list)

Groovy also has a String Object.join(String) method.

Java (from Jon's solution):
StringBuilder sb = new StringBuilder();
String delimiter = "";
for (String item : items) {
sb.append(delimiter).append(item);
delimeter = ", ";
}
return sb.toString();

Here is my humble try;
public static string JoinWithDelimiter(List<string> words, string delimiter){
string joinedString = "";
if (words.Count() > 0)
{
joinedString = words[0] + delimiter;
for (var i = 0; i < words.Count(); i++){
if (i > 0 && i < words.Count()){
if (joinedString.Length > 0)
{
joinedString += delimiter + words[i] + delimiter;
} else {
joinedString += words[i] + delimiter;
}
}
}
}
return joinedString;
}
Usage;
List<string> words = new List<string>(){"my", "name", "is", "Hari"};
Console.WriteLine(JoinWithDelimiter(words, " "));

Related

splitting up the contents of a single line

I just went through a problem, where input is a string which is a single word.
This line is not readable,
Like, I want to leave is written as Iwanttoleave.
The problem is of separating out each of the tokens(words, numbers, abbreviations, etc)
I have no idea where to start
The first thought that came to my mind is making a dictionary and then mapping accordingly but I think making a dictionary is not at all a good idea.
Can anyone suggest some algorithm to do it ?
First of all, create a dictionary which helps you to identify if some string is a valid word or not.
bool isValidString(String s){
if(dictionary.contains(s))
return true;
return false;
}
Now, you can write a recursive code to split the string and create an array of actually useful words.
ArrayList usefulWords = new ArrayList<String>; //global declaration
void split(String s){
int l = s.length();
int i,j;
for(i = l-1; i >= 0; i--){
if(isValidString(s.substr(i,l)){ //s.substr(i,l) will return substring starting from index `i` and ending at `l-1`
usefulWords.add(s.substr(i,l));
split(s.substr(0,i));
}
}
}
Now, use these usefulWords to generate all possible strings. Maybe something like this:
ArrayList<String> splits = new ArrayList<String>[10]; //assuming max 10 possible outputs
ArrayList<String>[] allPossibleStrings(String s, int level){
for(int i = 0; i < s.length(); i++){
if(usefulWords.contains(s.substr(0,i)){
splits[level].add(s.substr(0,i));
allPossibleStrings(s.substr(i,s.length()),level);
level++;
}
}
}
Now, this code gives you all possible splits in a somewhat arbitrary manner. eg.
dictionary = {cat, dog, i, am, pro, gram, program, programmer, grammer}
input:
string = program
output:
splits[0] = {pro, gram}
splits[1] = {program}
input:
string = iamprogram
output:
splits[0] = {i, am, pro, gram} //since `mer` is not in dictionary
splits[1] = {program}
I did not give much thought to the last part, but I think you should be able to formulate a code from there as per your requirement.
Also, since no language is tagged, I've taken the liberty of writing the code in JAVA-like syntax as it is really easy to understand.
Instead of using a Dictionary, I'd suggest you use a Trie with all your valid words (the whole English dictionary?). Then you can start moving one letter at a time in your input line and the trie at the same time. If the letter leads to more results in the trie, you can continue expanding the current word, and if not, you can start looking for a new word in the trie.
This won't be a forward only search for sure, so you'll need some sort of backtracking.
// This method Generates a list with all the matching phrases for the given input
List<string> CandidatePhrases(string input) {
Trie validWords = BuildTheTrieWithAllValidWords();
List<string> currentWords = new List<string>();
List<string> possiblePhrases = new List<string>();
// The root of the trie has an empty key that points to all the first letters of all words
Trie currentWord = validWords;
int currentLetter = -1;
// Calls a backtracking method that creates all possible phrases
FindPossiblePhrases(input, validWords, currentWords, currentWord, currentLetter, possiblePhrases);
return possiblePhrases;
}
// The Trie structure could be something like
class Trie {
char key;
bool valid;
List<Trie> children;
Trie parent;
Trie Next(char nextLetter) {
return children.FirstOrDefault(c => c.key == nextLetter);
}
string WholeWord() {
Debug.Assert(valid);
string word = "";
Trie current = this;
while (current.Key != '\0')
{
word = current.Key + word;
current = current.parent;
}
}
}
void FindPossiblePhrases(string input, Trie validWords, List<string> currentWords, Trie currentWord, int currentLetter, List<string> possiblePhrases) {
if (currentLetter == input.Length - 1) {
if (currentWord.valid) {
string phrase = ""
foreach (string word in currentWords) {
phrase += word;
phrase += " ";
}
phrase += currentWord.WholeWord();
possiblePhrases.Add(phrase);
}
}
else {
// The currentWord may be a valid word. If that's the case, the next letter could be the first of a new word, or could be the next letter of a bigger word that begins with currentWord
if (currentWord.valid) {
// Try to match phrases when the currentWord is a valid word
currentWords.Add(currentWord.WholeWord());
FindPossiblePhrases(input, validWords, currentWords, validWords, currentLetter, possiblePhrases);
currentWords.RemoveAt(currentWords.Length - 1);
}
// If either the currentWord is a valid word, or not, try to match a longer word that begins with current word
int nextLetter = currentLetter + 1;
Trie nextWord = currentWord.Next(input[nextLetter]);
// If the nextWord is null, there was no matching word that begins with currentWord and has input[nextLetter] as the following letter.
if (nextWord != null) {
FindPossiblePhrases(input, validWords, currentWords, nextWord, nextLetter, possiblePhrases);
}
}
}

Processing: create an array of the characters within the string

I am new to processing and trying to figure out a way to create an array of all the characters within a string. Currently I Have:
String[] words = {"hello", "devak", "road", "duck", "face"};
String theWord = words[int(random(0,words.length))];
I've been googling and haven't found a good solution yet. Thanks in advance.
In addition to the comment you posted (which perhaps should have been an answer), there are a ton of ways to split a String.
The most obvious solution might be the String.split() function. If you give that function an empty String "" as an argument, it will split every character:
void setup() {
String myString = "testing testing 123";
String[] chars = myString.split("");
for (String c : chars) {
println(c);
}
}
You could also just use the String.charAt() function:
void setup() {
String myString = "testing testing 123";
for (int i = 0; i < myString.length(); i++) {
char c = myString.charAt(i);
println(c);
}
}

Algorithm to generate all variants of a word

i would like to explain my problem by the following example.
assume the word: abc
a has variants: ä, à
b has no variants.
c has variants: ç
so the possible words are:
abc
äbc
àbc
abç
äbç
àbç
now i am looking for the algorithm that prints all word variantions for abritray words with arbitray lettervariants.
I would recommend you to solve this recursively. Here's some Java code for you to get started:
static Map<Character, char[]> variants = new HashMap<Character, char[]>() {{
put('a', new char[] {'ä', 'à'});
put('b', new char[] { });
put('c', new char[] { 'ç' });
}};
public static Set<String> variation(String str) {
Set<String> result = new HashSet<String>();
if (str.isEmpty()) {
result.add("");
return result;
}
char c = str.charAt(0);
for (String tailVariant : variation(str.substring(1))) {
result.add(c + tailVariant);
for (char variant : variants.get(c))
result.add(variant + tailVariant);
}
return result;
}
Test:
public static void main(String[] args) {
for (String str : variation("abc"))
System.out.println(str);
}
Output:
abc
àbç
äbc
àbc
äbç
abç
A quickly hacked solution in Python:
def word_variants(variants):
print_variants("", 1, variants);
def print_variants(word, i, variants):
if i > len(variants):
print word
else:
for variant in variants[i]:
print_variants(word + variant, i + 1, variants)
variants = dict()
variants[1] = ['a0', 'a1', 'a2']
variants[2] = ['b0']
variants[3] = ['c0', 'c1']
word_variants(variants)
Common part:
string[] letterEquiv = { "aäà", "b", "cç", "d", "eèé" };
// Here we make a dictionary where the key is the "base" letter and the value is an array of alternatives
var lookup = letterEquiv
.Select(p => p.ToCharArray())
.SelectMany(p => p, (p, q) => new { key = q, values = p }).ToDictionary(p => p.key, p => p.values);
A recursive variation written in C#.
List<string> resultsRecursive = new List<string>();
// I'm using an anonymous method that "closes" around resultsRecursive and lookup. You could make it a standard method that accepts as a parameter the two.
// Recursive anonymous methods must be declared in this way in C#. Nothing to see.
Action<string, int, char[]> recursive = null;
recursive = (str, ix, str2) =>
{
// In the first loop str2 is null, so we create the place where the string will be built.
if (str2 == null)
{
str2 = new char[str.Length];
}
// The possible variations for the current character
var equivs = lookup[str[ix]];
// For each variation
foreach (var eq in equivs)
{
// We save the current variation for the current character
str2[ix] = eq;
// If we haven't reached the end of the string
if (ix < str.Length - 1)
{
// We recurse, increasing the index
recursive(str, ix + 1, str2);
}
else
{
// We save the string
resultsRecursive.Add(new string(str2));
}
}
};
// We launch our function
recursive("abcdeabcde", 0, null);
// The results are in resultsRecursive
A non-recursive version
List<string> resultsNonRecursive = new List<string>();
// I'm using an anonymous method that "closes" around resultsNonRecursive and lookup. You could make it a standard method that accepts as a parameter the two.
Action<string> nonRecursive = (str) =>
{
// We will have two arrays, of the same length of the string. One will contain
// the possible variations for that letter, the other will contain the "current"
// "chosen" variation of that letter
char[][] equivs = new char[str.Length][];
int[] ixes = new int[str.Length];
for (int i = 0; i < ixes.Length; i++)
{
// We start with index -1 so that the first increase will bring it to 0
equivs[i] = lookup[str[i]];
ixes[i] = -1;
}
// The current "workin" index of the original string
int ix = 0;
// The place where the string will be built.
char[] str2 = new char[str.Length];
// The loop will break when we will have to increment the letter with index -1
while (ix >= 0)
{
// We select the next possible variation for the current character
ixes[ix]++;
// If we have exausted the possible variations of the current character
if (ixes[ix] == equivs[ix].Length)
{
// Reset the current character to -1
ixes[ix] = -1;
// And loop back to the previous character
ix--;
continue;
}
// We save the current variation for the current character
str2[ix] = equivs[ix][ixes[ix]];
// If we are setting the last character of the string, then the string
// is complete
if (ix == str.Length - 1)
{
// And we save it
resultsNonRecursive.Add(new string(str2));
}
else
{
// Otherwise we have to do everything for the next character
ix++;
}
}
};
// We launch our function
nonRecursive("abcdeabcde");
// The results are in resultsNonRecursive
Both heavily commented.

Refactoring many nested ifs or chained if statements

I have an object with large number of similar fields (like more than 10 of them) and I have to assign them values from an array of variable length. The solution would be either a huge nested bunch of ifs based on checking length of array each time and assigning each field
OR
a chain of ifs checking on whether the length is out of bounds and assigning each time after that check.
Both seem to be repetitive. Is there a better solution ?
If you language has switch/case with fallthrough, you could do it like this:
switch(array.length){
case 15: field14 = array[14];
case 14: field13 = array[13];
case 13: field12 = array[12];
// etc.
case 1: field0 = array[0];
case 0: break;
default: throw Exception("array too long!");
}
for (int i = 0; i < fieldCount; i++)
fields[i].value = array[i];
That is to say, maintain an array of fields that corresponds to your array of values.
If your language supports delegates, anonymous functions, that sort of thing, you can use those to clean it up. For example, in C# you could write this:
string[] values = GetValues();
SomeObject result = new SomeObject();
Apply(values, 0, v => result.ID = v);
Apply(values, 1, v => result.FirstName = v);
Apply(values, 2, v => result.LastName = v);
// etc.
The apply method would look like:
void Apply(string[] values, int index, Action<string> action)
{
if (index < values.Length)
action(values[index]);
}
This is obviously language-dependent, but something to think about regardless.
Another very simple option that we might be overlooking is, if you are actually trying to initialize an object from this value array (as opposed to update an existing object), to just accept the default values if the array isn't large enough.
C# example:
void CreateMyObject(object[] values)
{
MyObject o = new MyObject();
o.ID = GetValueOrDefault<int>(values, 0);
o.FirstName = GetValueOrDefault<string>(values, 0);
o.LastName = GetValueOrDefault<string>(values, 0);
// etc.
}
void GetValueOrDefault<T>(object[] values, int index)
{
if (index < values.Length)
return (T)values[index];
return default(T);
}
Sometimes the dumb solution is the smartest choice.
If your fields are declared in the same order of the array's elements, you could use reflection (if available in your language) to set these values. Here is an example of how you could do it in Java:
// obj is your object, values is the array of values
Field[] fields = obj.getClass().getFields();
for (int i = 0; i < fields.length && i < values.length; ++i) {
fields[i].set(obj, values[i]);
}

Algorithm to format text to Pascal or camel casing

Using this question as the base is there an alogrithm or coding example to change some text to Pascal or Camel casing.
For example:
mynameisfred
becomes
Camel: myNameIsFred
Pascal: MyNameIsFred
I found a thread with a bunch of Perl guys arguing the toss on this question over at http://www.perlmonks.org/?node_id=336331.
I hope this isn't too much of a non-answer to the question, but I would say you have a bit of a problem in that it would be a very open-ended algorithm which could have a lot of 'misses' as well as hits. For example, say you inputted:-
camelCase("hithisisatest");
The output could be:-
"hiThisIsATest"
Or:-
"hitHisIsATest"
There's no way the algorithm would know which to prefer. You could add some extra code to specify that you'd prefer more common words, but again misses would occur (Peter Norvig wrote a very small spelling corrector over at http://norvig.com/spell-correct.html which might help algorithm-wise, I wrote a C# implementation if C#'s your language).
I'd agree with Mark and say you'd be better off having an algorithm that takes a delimited input, i.e. this_is_a_test and converts that. That'd be simple to implement, i.e. in pseudocode:-
SetPhraseCase(phrase, CamelOrPascal):
if no delimiters
if camelCase
return lowerFirstLetter(phrase)
else
return capitaliseFirstLetter(phrase)
words = splitOnDelimiter(phrase)
if camelCase
ret = lowerFirstLetter(first word)
else
ret = capitaliseFirstLetter(first word)
for i in 2 to len(words): ret += capitaliseFirstLetter(words[i])
return ret
capitaliseFirstLetter(word):
if len(word) <= 1 return upper(word)
return upper(word[0]) + word[1..len(word)]
lowerFirstLetter(word):
if len(word) <= 1 return lower(word)
return lower(word[0]) + word[1..len(word)]
You could also replace my capitaliseFirstLetter() function with a proper case algorithm if you so wished.
A C# implementation of the above described algorithm is as follows (complete console program with test harness):-
using System;
class Program {
static void Main(string[] args) {
var caseAlgorithm = new CaseAlgorithm('_');
while (true) {
string input = Console.ReadLine();
if (string.IsNullOrEmpty(input)) return;
Console.WriteLine("Input '{0}' in camel case: '{1}', pascal case: '{2}'",
input,
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.CamelCase),
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.PascalCase));
}
}
}
public class CaseAlgorithm {
public enum CaseMode { PascalCase, CamelCase }
private char delimiterChar;
public CaseAlgorithm(char inDelimiterChar) {
delimiterChar = inDelimiterChar;
}
public string SetPhraseCase(string phrase, CaseMode caseMode) {
// You might want to do some sanity checks here like making sure
// there's no invalid characters, etc.
if (string.IsNullOrEmpty(phrase)) return phrase;
// .Split() will simply return a string[] of size 1 if no delimiter present so
// no need to explicitly check this.
var words = phrase.Split(delimiterChar);
// Set first word accordingly.
string ret = setWordCase(words[0], caseMode);
// If there are other words, set them all to pascal case.
if (words.Length > 1) {
for (int i = 1; i < words.Length; ++i)
ret += setWordCase(words[i], CaseMode.PascalCase);
}
return ret;
}
private string setWordCase(string word, CaseMode caseMode) {
switch (caseMode) {
case CaseMode.CamelCase:
return lowerFirstLetter(word);
case CaseMode.PascalCase:
return capitaliseFirstLetter(word);
default:
throw new NotImplementedException(
string.Format("Case mode '{0}' is not recognised.", caseMode.ToString()));
}
}
private string lowerFirstLetter(string word) {
return char.ToLower(word[0]) + word.Substring(1);
}
private string capitaliseFirstLetter(string word) {
return char.ToUpper(word[0]) + word.Substring(1);
}
}
The only way to do that would be to run each section of the word through a dictionary.
"mynameisfred" is just an array of characters, splitting it up into my Name Is Fred means understanding what the joining of each of those characters means.
You could do it easily if your input was separated in some way, e.g. "my name is fred" or "my_name_is_fred".

Resources