Algorithm to format text to Pascal or camel casing

Algorithm to format text to Pascal or camel casing - algorithm

Using this question as the base is there an alogrithm or coding example to change some text to Pascal or Camel casing.
For example:
mynameisfred
becomes
Camel: myNameIsFred
Pascal: MyNameIsFred

I found a thread with a bunch of Perl guys arguing the toss on this question over at http://www.perlmonks.org/?node_id=336331.
I hope this isn't too much of a non-answer to the question, but I would say you have a bit of a problem in that it would be a very open-ended algorithm which could have a lot of 'misses' as well as hits. For example, say you inputted:-
camelCase("hithisisatest");
The output could be:-
"hiThisIsATest"
Or:-
"hitHisIsATest"
There's no way the algorithm would know which to prefer. You could add some extra code to specify that you'd prefer more common words, but again misses would occur (Peter Norvig wrote a very small spelling corrector over at http://norvig.com/spell-correct.html which might help algorithm-wise, I wrote a C# implementation if C#'s your language).
I'd agree with Mark and say you'd be better off having an algorithm that takes a delimited input, i.e. this_is_a_test and converts that. That'd be simple to implement, i.e. in pseudocode:-
SetPhraseCase(phrase, CamelOrPascal):
if no delimiters
if camelCase
return lowerFirstLetter(phrase)
else
return capitaliseFirstLetter(phrase)
words = splitOnDelimiter(phrase)
if camelCase
ret = lowerFirstLetter(first word)
else
ret = capitaliseFirstLetter(first word)
for i in 2 to len(words): ret += capitaliseFirstLetter(words[i])
return ret
capitaliseFirstLetter(word):
if len(word) <= 1 return upper(word)
return upper(word[0]) + word[1..len(word)]
lowerFirstLetter(word):
if len(word) <= 1 return lower(word)
return lower(word[0]) + word[1..len(word)]
You could also replace my capitaliseFirstLetter() function with a proper case algorithm if you so wished.
A C# implementation of the above described algorithm is as follows (complete console program with test harness):-
using System;
class Program {
static void Main(string[] args) {
var caseAlgorithm = new CaseAlgorithm('_');
while (true) {
string input = Console.ReadLine();
if (string.IsNullOrEmpty(input)) return;
Console.WriteLine("Input '{0}' in camel case: '{1}', pascal case: '{2}'",
input,
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.CamelCase),
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.PascalCase));
}
}
}
public class CaseAlgorithm {
public enum CaseMode { PascalCase, CamelCase }
private char delimiterChar;
public CaseAlgorithm(char inDelimiterChar) {
delimiterChar = inDelimiterChar;
}
public string SetPhraseCase(string phrase, CaseMode caseMode) {
// You might want to do some sanity checks here like making sure
// there's no invalid characters, etc.
if (string.IsNullOrEmpty(phrase)) return phrase;
// .Split() will simply return a string[] of size 1 if no delimiter present so
// no need to explicitly check this.
var words = phrase.Split(delimiterChar);
// Set first word accordingly.
string ret = setWordCase(words[0], caseMode);
// If there are other words, set them all to pascal case.
if (words.Length > 1) {
for (int i = 1; i < words.Length; ++i)
ret += setWordCase(words[i], CaseMode.PascalCase);
}
return ret;
}
private string setWordCase(string word, CaseMode caseMode) {
switch (caseMode) {
case CaseMode.CamelCase:
return lowerFirstLetter(word);
case CaseMode.PascalCase:
return capitaliseFirstLetter(word);
default:
throw new NotImplementedException(
string.Format("Case mode '{0}' is not recognised.", caseMode.ToString()));
}
}
private string lowerFirstLetter(string word) {
return char.ToLower(word[0]) + word.Substring(1);
}
private string capitaliseFirstLetter(string word) {
return char.ToUpper(word[0]) + word.Substring(1);
}
}

The only way to do that would be to run each section of the word through a dictionary.
"mynameisfred" is just an array of characters, splitting it up into my Name Is Fred means understanding what the joining of each of those characters means.
You could do it easily if your input was separated in some way, e.g. "my name is fred" or "my_name_is_fred".

Related

splitting up the contents of a single line

I just went through a problem, where input is a string which is a single word.
This line is not readable,
Like, I want to leave is written as Iwanttoleave.
The problem is of separating out each of the tokens(words, numbers, abbreviations, etc)
I have no idea where to start
The first thought that came to my mind is making a dictionary and then mapping accordingly but I think making a dictionary is not at all a good idea.
Can anyone suggest some algorithm to do it ?

First of all, create a dictionary which helps you to identify if some string is a valid word or not.
bool isValidString(String s){
if(dictionary.contains(s))
return true;
return false;
}
Now, you can write a recursive code to split the string and create an array of actually useful words.
ArrayList usefulWords = new ArrayList<String>; //global declaration
void split(String s){
int l = s.length();
int i,j;
for(i = l-1; i >= 0; i--){
if(isValidString(s.substr(i,l)){ //s.substr(i,l) will return substring starting from index `i` and ending at `l-1`
usefulWords.add(s.substr(i,l));
split(s.substr(0,i));
}
}
}
Now, use these usefulWords to generate all possible strings. Maybe something like this:
ArrayList<String> splits = new ArrayList<String>[10]; //assuming max 10 possible outputs
ArrayList<String>[] allPossibleStrings(String s, int level){
for(int i = 0; i < s.length(); i++){
if(usefulWords.contains(s.substr(0,i)){
splits[level].add(s.substr(0,i));
allPossibleStrings(s.substr(i,s.length()),level);
level++;
}
}
}
Now, this code gives you all possible splits in a somewhat arbitrary manner. eg.
dictionary = {cat, dog, i, am, pro, gram, program, programmer, grammer}
input:
string = program
output:
splits[0] = {pro, gram}
splits[1] = {program}
input:
string = iamprogram
output:
splits[0] = {i, am, pro, gram} //since `mer` is not in dictionary
splits[1] = {program}
I did not give much thought to the last part, but I think you should be able to formulate a code from there as per your requirement.
Also, since no language is tagged, I've taken the liberty of writing the code in JAVA-like syntax as it is really easy to understand.

Instead of using a Dictionary, I'd suggest you use a Trie with all your valid words (the whole English dictionary?). Then you can start moving one letter at a time in your input line and the trie at the same time. If the letter leads to more results in the trie, you can continue expanding the current word, and if not, you can start looking for a new word in the trie.
This won't be a forward only search for sure, so you'll need some sort of backtracking.
// This method Generates a list with all the matching phrases for the given input
List<string> CandidatePhrases(string input) {
Trie validWords = BuildTheTrieWithAllValidWords();
List<string> currentWords = new List<string>();
List<string> possiblePhrases = new List<string>();
// The root of the trie has an empty key that points to all the first letters of all words
Trie currentWord = validWords;
int currentLetter = -1;
// Calls a backtracking method that creates all possible phrases
FindPossiblePhrases(input, validWords, currentWords, currentWord, currentLetter, possiblePhrases);
return possiblePhrases;
}
// The Trie structure could be something like
class Trie {
char key;
bool valid;
List<Trie> children;
Trie parent;
Trie Next(char nextLetter) {
return children.FirstOrDefault(c => c.key == nextLetter);
}
string WholeWord() {
Debug.Assert(valid);
string word = "";
Trie current = this;
while (current.Key != '\0')
{
word = current.Key + word;
current = current.parent;
}
}
}
void FindPossiblePhrases(string input, Trie validWords, List<string> currentWords, Trie currentWord, int currentLetter, List<string> possiblePhrases) {
if (currentLetter == input.Length - 1) {
if (currentWord.valid) {
string phrase = ""
foreach (string word in currentWords) {
phrase += word;
phrase += " ";
}
phrase += currentWord.WholeWord();
possiblePhrases.Add(phrase);
}
}
else {
// The currentWord may be a valid word. If that's the case, the next letter could be the first of a new word, or could be the next letter of a bigger word that begins with currentWord
if (currentWord.valid) {
// Try to match phrases when the currentWord is a valid word
currentWords.Add(currentWord.WholeWord());
FindPossiblePhrases(input, validWords, currentWords, validWords, currentLetter, possiblePhrases);
currentWords.RemoveAt(currentWords.Length - 1);
}
// If either the currentWord is a valid word, or not, try to match a longer word that begins with current word
int nextLetter = currentLetter + 1;
Trie nextWord = currentWord.Next(input[nextLetter]);
// If the nextWord is null, there was no matching word that begins with currentWord and has input[nextLetter] as the following letter.
if (nextWord != null) {
FindPossiblePhrases(input, validWords, currentWords, nextWord, nextLetter, possiblePhrases);
}
}
}

Methods, more than one return?

I have the following method:
From what I learned methods which are not voids need a return. For the following examples I can see two returns, once after if(), and one at the end.
For this example if String s is not a digit it will return the boolean as false. Which makes sense. If it is a digit then it will check whether it is in the interval. I guess I am confused regarding whether we can have multiple returns in such cases and what the limitations are, if there are any. thank you.
private boolean ElementBienFormat(String s) {
for (int i = 0; i < s.length(); i++) {
if (!Character.isDigit(s.charAt(i))) {
return false;
}
}
int n = Integer.valueOf(s);
return (n>=0 && n <=255);

A method will "quit" (return) when control reaches a return. So in this case as soon as a character is not a digit in the input String control will go back to the caller (with the appropriate value).
boolean success = ElementBienFormat( "a" ); // <-- control would go back here with the value of false.
Another quick note is that a void method can have multiple return statements as well
private void Method( int n )
{
if( n < 0 )
return;
//...
//implicit
//return;
}

Build trie faster

I'm making an mobile app which needs thousands of fast string lookups and prefix checks. To speed this up, I made a Trie out of my word list, which has about 180,000 words.
Everything's great, but the only problem is that building this huge trie (it has about 400,000 nodes) takes about 10 seconds currently on my phone, which is really slow.
Here's the code that builds the trie.
public SimpleTrie makeTrie(String file) throws Exception {
String line;
SimpleTrie trie = new SimpleTrie();
BufferedReader br = new BufferedReader(new FileReader(file));
while( (line = br.readLine()) != null) {
trie.insert(line);
}
br.close();
return trie;
}
The insert method which runs on O(length of key)
public void insert(String key) {
TrieNode crawler = root;
for(int level=0 ; level < key.length() ; level++) {
int index = key.charAt(level) - 'A';
if(crawler.children[index] == null) {
crawler.children[index] = getNode();
}
crawler = crawler.children[index];
}
crawler.valid = true;
}
I'm looking for intuitive methods to build the trie faster. Maybe I build the trie just once on my laptop, store it somehow to the disk, and load it from a file in the phone? But I don't know how to implement this.
Or are there any other prefix data structures which will take less time to build, but have similar lookup time complexity?
Any suggestions are appreciated. Thanks in advance.
EDIT
Someone suggested using Java Serialization. I tried it, but it was very slow with this code:
public void serializeTrie(SimpleTrie trie, String file) {
try {
ObjectOutput out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
out.writeObject(trie);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public SimpleTrie deserializeTrie(String file) {
try {
ObjectInput in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(file)));
SimpleTrie trie = (SimpleTrie)in.readObject();
in.close();
return trie;
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
return null;
}
}
Can this above code be made faster?
My trie: http://pastebin.com/QkFisi09
Word list: http://www.isc.ro/lists/twl06.zip
Android IDE used to run code: http://play.google.com/store/apps/details?id=com.jimmychen.app.sand

Double-Array tries are very fast to save/load because all data is stored in linear arrays. They are also very fast to lookup, but the insertions can be costly. I bet there is a Java implementation somewhere.
Also, if your data is static (i.e. you don't update it on phone) consider DAFSA for your task. It is one of the most efficient data structures for storing words (must be better than "standard" tries and radix tries both for size and for speed, better than succinct tries for speed, often better than succinct tries for size). There is a good C++ implementation: dawgdic - you can use it to build DAFSA from command line and then use a Java reader for the resulting data structure (example implementation is here).

You could store your trie as an array of nodes, with references to child nodes replaced with array indices. Your root node would be the first element. That way, you could easily store/load your trie from simple binary or text format.
public class SimpleTrie {
public class TrieNode {
boolean valid;
int[] children;
}
private TrieNode[] nodes;
private int numberOfNodes;
private TrieNode getNode() {
TrieNode t = nodes[++numberOnNodes];
return t;
}
}

Just build a large String[] and sort it. Then you can use binary search to find the location of a String. You can also do a query based on prefixes without too much work.
Prefix look-up example:
Compare method:
private static int compare(String string, String prefix) {
if (prefix.length()>string.length()) return Integer.MIN_VALUE;
for (int i=0; i<prefix.length(); i++) {
char s = string.charAt(i);
char p = prefix.charAt(i);
if (s!=p) {
if (p<s) {
// prefix is before string
return -1;
}
// prefix is after string
return 1;
}
}
return 0;
}
Finds an occurrence of the prefix in the array and returns it's location (MIN or MAX are mean not found)
private static int recursiveFind(String[] strings, String prefix, int start, int end) {
if (start == end) {
String lastValue = strings[start]; // start==end
if (compare(lastValue,prefix)==0)
return start; // start==end
return Integer.MAX_VALUE;
}
int low = start;
int high = end + 1; // zero indexed, so add one.
int middle = low + ((high - low) / 2);
String middleValue = strings[middle];
int comp = compare(middleValue,prefix);
if (comp == Integer.MIN_VALUE) return comp;
if (comp==0)
return middle;
if (comp>0)
return recursiveFind(strings, prefix, middle + 1, end);
return recursiveFind(strings, prefix, start, middle - 1);
}
Gets a String array and prefix, prints out occurrences of prefix in array
private static boolean testPrefix(String[] strings, String prefix) {
int i = recursiveFind(strings, prefix, 0, strings.length-1);
if (i==Integer.MAX_VALUE || i==Integer.MIN_VALUE) {
// not found
return false;
}
// Found an occurrence, now search up and down for other occurrences
int up = i+1;
int down = i;
while (down>=0) {
String string = strings[down];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
down--;
}
while (up<strings.length) {
String string = strings[up];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
up++;
}
return true;
}

Here's a reasonably compact format for storing a trie on disk. I'll specify it by its (efficient) deserialization algorithm. Initialize a stack whose initial contents are the root node of the trie. Read characters one by one and interpret them as follows. The meaning of a letter A-Z is "allocate a new node, make it a child of the current top of stack, and push the newly allocated node onto the stack". The letter indicates which position the child is in. The meaning of a space is "set the valid flag of the node on top of the stack to true". The meaning of a backspace (\b) is "pop the stack".
For example, the input
TREE \b\bIE \b\b\bOO \b\b\b
gives the word list
TREE
TRIE
TOO
. On your desktop, construct the trie using whichever method and then serialize by the following recursive algorithm (pseudocode).
serialize(node):
if node is valid: put(' ')
for letter in A-Z:
if node has a child under letter:
put(letter)
serialize(child)
put('\b')

This isn't a magic bullet, but you can probably reduce your runtime slightly by doing one big memory allocation instead of a bunch of little ones.
I saw a ~10% speedup in the test code below (C++, not Java, sorry) when I used a "node pool" instead of relying on individual allocations:
#include <string>
#include <fstream>
#define USE_NODE_POOL
#ifdef USE_NODE_POOL
struct Node;
Node *node_pool;
int node_pool_idx = 0;
#endif
struct Node {
void insert(const std::string &s) { insert_helper(s, 0); }
void insert_helper(const std::string &s, int idx) {
if (idx >= s.length()) return;
int char_idx = s[idx] - 'A';
if (children[char_idx] == nullptr) {
#ifdef USE_NODE_POOL
children[char_idx] = &node_pool[node_pool_idx++];
#else
children[char_idx] = new Node();
#endif
}
children[char_idx]->insert_helper(s, idx + 1);
}
Node *children[26] = {};
};
int main() {
#ifdef USE_NODE_POOL
node_pool = new Node[400000];
#endif
Node n;
std::ifstream fin("TWL06.txt");
std::string word;
while (fin >> word) n.insert(word);
}

Tries that prealloate space all possible children (256) have a huge amount of wasted space. You are making your cache cry. Store those pointers to children in a resizable data structure.
Some tries will optimize by having one node to represent a long string, and break that string up only when needed.

Instead of a simple file you can use a database like sqlite and a nested set or celko tree to store the trie and you can also build a faster and shorter (less nodes) trie with a ternary search trie.

I don't like the idea of addressing nodes by index in array, but only because it requires one more addition (index to the pointer). But with array of preallocated nodes you will maybe save some time on allocation and initialization. And you can also save a lot of space by reserving first 26 indices for leaf nodes. Thus you'll not need to allocate and initialize 180000 leaf nodes.
Also with indices you will be able to read the prepared nodes array from disk in binary format. This has to be several times faster. But I'm not sure how to do this on your language. Is this Java?
If you checked that your source vocabulary is sorted, you may also save some time by comparing some prefix of the current string with the previous one. E.g. first 4 characters. If they are equal you can start your
for(int level=0 ; level < key.length() ; level++) {
loop from the 5-th level.

Is it space inefficient or time inefficient? If you are rolling a plain trie then space may be part of the problem when dealing with a mobil device. Check out patricia/radix tries, especially if you are using it as a prefix look-up tool.
Trie:
http://en.wikipedia.org/wiki/Trie
Patricia/Radix trie:
http://en.wikipedia.org/wiki/Radix_tree
You didn't mention a language but here are two implementations of prefix tries in Java.
Regular trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/Trie.java
Patricia/Radix (space-effecient) trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/PatriciaTrie.java

Generally speaking, avoid using a lot of object creations from scratch in Java, which is both slow and it also has a massive overhead. Better implement your own pooling class for memory management that allocates e.g. half a million entries at a time in one go.
Also, serialization is too slow for large lexicons. Use a binary read to populate array-based representations proposed above quickly.

For loop construction and code complexity [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My group is having some discussion and strong feelings about for loop construction.
I have favored loops like:
size_t x;
for (x = 0; x < LIMIT; ++x) {
if (something) {
break;
}
...
}
// If we found what we're looking for, process it.
if (x < LIMIT) {
...
}
But others seem to prefer a Boolean flag like:
size_t x;
bool found = false;
for (x = 0; x < LIMIT && !found; ++x) {
if (something) {
found = true;
}
else {
...
}
}
// If we found what we're looking for, process it.
if (found) {
...
}
(And, where the language allows, using "for (int x = 0; ...".)
The first style has one less variable to keep track of and a simpler loop header. Albeit at the cost of "overloading" the loop control variable and (some would complain), the use of break.
The second style has clearly defined roles for the variables but a more complex loop condition and loop body (either an else, or a continue after found is set, or a "if (!found)" in the balance of the loop).
I think that the first style wins on code complexity. I'm looking for opinions from a broader audience. Pointers to actual research on which is easier to read and maintain would be even better. "It doesn't matter, take it out of your standard" is a fine answer, too.
OTOH, this may be the wrong question. I'm beginning to think that the right rule is "if you have to break out of a for, it's really a while."
bool found = false;
x = 0;
while (!found && x < LIMIT) {
if (something) {
found = true;
...handle the thing...
}
else {
...
}
++x;
}
Does what the first two examples do but in fewer lines. It does divide the initialization, test, and increment of x across three lines, though.

I'd actually dare to suggest consideration of GOTO to break out of loops in such cases:
for (size_t x = 0; x < LIMIT && !found; ++x) {
if (something)
goto found;
else {
...
}
}
// not found
...
return;
found:
...
return;
I consider this form to be both succint and readable. It may do some good in many simple cases (say, when there is no common processing in this function, in both found/unfound cases).
And about the general frowning goto receives, I find it to be a common misinterpretation of Dijkstra's original claims: his arguments favoured structured loop clauses, as for or while, over a primitive loop-via-goto, that still had a lot of presence circa 1968. Even the almighty Knuth eventualy says -
The new morality that I propose may
perhaps be stated thus: "Certain go to
statements which arise in connection with
well-understood transformations are acceptable, provided that the program documentation explains what the transformation was."
Others here occasionaly think the same.

While I disagree that an extra else really makes the 2nd more complicated, I think it's primarily a matter of aesthetics and keeping to your standard.
Personally, I have a probably irrational dislike of breaks and continues, so I'm MUCH more likely to use the found variable.
Also, note that you CAN add the found variable to the 1st implementation and do
if(something)
{
found = true;
break;
}
if you want to avoid the variable overloading problem at the expense of the extra variable, but still want the simple loop terminator...

The former example duplicates the x < LIMIT condition, whereas the latter doesn't.
With the former, if you want to change that condition, you have to remember to do it in two places.

I would prefer a different one altogether:
for (int x = 0; x < LIMIT; ++x) {
if (something) {
// If we found what we're looking for, process it.
...
break;
}
...
}
It seems you have not any trouble you mention about one or the other... ;-)
no duplication of condition, or readability problem
no additional variable

I don't have any references to hand (-1! -1!), but I seem to recall that having multiple exit points (from a function, from a loop) has been shown to cause issues with maintainability (I used to know someone who wrote code for the UK military and it was Verboten to do so). But more importantly, as RichieHindle points out, having a duplicate condition is a Bad Thing, it cries out for introducing bugs by changing one and not the other.
If you weren't using the condition later, I wouldn't be bothered either way. Since you are, the second is the way to go.

This sort of argument has been fought out here before (probably many times) such as in this question.
There are those that will argue that purity of code is all-important and they'll complain bitterly that your first option doesn't have identical post-conditions for all cases.
What I would answer is "Twaddle!". I'm a pragmatist, not a purist. I'm as against too much spaghetti code as much as the next engineer but some of the hideous terminating conditions I've seen in for loops are far worse than using a couple of breaks within your loop.
I will always go for readability of code over "purity" simply because I have to maintain it.

This looks like a place for a while loop. For loops are Syntactic Sugar on top of a While loop anyway. The general rule is that if you have to break out of a For loop, then use a While loop instead.

package com.company;
import java.io.*;
import java.util.Scanner;
public class Main {
// "line.separator" is a system property that is a platform independent and it is one way
// of getting a newline from your environment.
private static String NEWLINE = System.getProperty("line.separator");
public static void main(String[] args) {
// write your code here
boolean itsdone = false;
String userInputFileName;
String FirstName = null;
String LastName = null;
String user_junk;
String userOutputFileName;
String outString;
int Age = -1;
int rint = 0;
int myMAX = 100;
int MyArr2[] = new int[myMAX];
int itemCount = 0;
double average = 0;
double total = 0;
boolean ageDone = false;
Scanner inScan = new Scanner(System.in);
System.out.println("Enter First Name");
FirstName = inScan.next();
System.out.println("Enter Last Name");
LastName = inScan.next();
ageDone = false;
while (!ageDone) {
System.out.println("Enter Your Age");
if (inScan.hasNextInt()) {
Age = inScan.nextInt();
System.out.println(FirstName + " " + LastName + " " + "is " + Age + " Years old");
ageDone = true;
} else {
System.out.println("Your Age Needs to Have an Integer Value... Enter an Integer Value");
user_junk = inScan.next();
ageDone = false;
}
}
try {
File outputFile = new File("firstOutFile.txt");
if (outputFile.createNewFile()){
System.out.println("firstOutFile.txt was created"); // if file was created
}
else {
System.out.println("firstOutFile.txt existed and is being overwritten."); // if file had already existed
}
// --------------------------------
// If the file creation of access permissions to write into it
// are incorrect the program throws an exception
//
if ((outputFile.isFile()|| outputFile.canWrite())){
BufferedWriter fileOut = new BufferedWriter(new FileWriter(outputFile));
fileOut.write("==================================================================");
fileOut.write(NEWLINE + NEWLINE +" You Information is..." + NEWLINE + NEWLINE);
fileOut.write(NEWLINE + FirstName + " " + LastName + " " + Age + NEWLINE);
fileOut.write("==================================================================");
fileOut.close();
}
else {
throw new IOException();
}
} // end of try
catch (IOException e) { // in case for some reason the output file could not be created
System.err.format("IOException: %s%n", e);
e.printStackTrace();
}
} // end main method
}

Join a string using delimiters

What is the best way to join a list of strings into a combined delimited string. I'm mainly concerned about when to stop adding the delimiter. I'll use C# for my examples but I would like this to be language agnostic.
EDIT: I have not used StringBuilder to make the code slightly simpler.
Use a For Loop
for(int i=0; i < list.Length; i++)
{
result += list[i];
if(i != list.Length - 1)
result += delimiter;
}
Use a For Loop setting the first item previously
result = list[0];
for(int i = 1; i < list.Length; i++)
result += delimiter + list[i];
These won't work for an IEnumerable where you don't know the length of the list beforehand so
Using a foreach loop
bool first = true;
foreach(string item in list)
{
if(!first)
result += delimiter;
result += item;
first = false;
}
Variation on a foreach loop
From Jon's solution
StringBuilder builder = new StringBuilder();
string delimiter = "";
foreach (string item in list)
{
builder.Append(delimiter);
builder.Append(item);
delimiter = ",";
}
return builder.ToString();
Using an Iterator
Again from Jon
using (IEnumerator<string> iterator = list.GetEnumerator())
{
if (!iterator.MoveNext())
return "";
StringBuilder builder = new StringBuilder(iterator.Current);
while (iterator.MoveNext())
{
builder.Append(delimiter);
builder.Append(iterator.Current);
}
return builder.ToString();
}
What other algorithms are there?

It's impossible to give a truly language-agnostic answer here as different languages and platforms handle strings differently, and provide different levels of built-in support for joining lists of strings. You could take pretty much identical code in two different languages, and it would be great in one and awful in another.
In C#, you could use:
StringBuilder builder = new StringBuilder();
string delimiter = "";
foreach (string item in list)
{
builder.Append(delimiter);
builder.Append(item);
delimiter = ",";
}
return builder.ToString();
This will prepend a comma on all but the first item. Similar code would be good in Java too.
EDIT: Here's an alternative, a bit like Ian's later answer but working on a general IEnumerable<string>.
// Change to IEnumerator for the non-generic IEnumerable
using (IEnumerator<string> iterator = list.GetEnumerator())
{
if (!iterator.MoveNext())
{
return "";
}
StringBuilder builder = new StringBuilder(iterator.Current);
while (iterator.MoveNext())
{
builder.Append(delimiter);
builder.Append(iterator.Current);
}
return builder.ToString();
}
EDIT nearly 5 years after the original answer...
In .NET 4, string.Join was overloaded pretty significantly. There's an overload taking IEnumerable<T> which automatically calls ToString, and there's an overload for IEnumerable<string>. So you don't need the code above any more... for .NET, anyway.

In .NET, you can use the String.Join method:
string concatenated = String.Join(",", list.ToArray());
Using .NET Reflector, we can find out how it does it:
public static unsafe string Join(string separator, string[] value, int startIndex, int count)
{
if (separator == null)
{
separator = Empty;
}
if (value == null)
{
throw new ArgumentNullException("value");
}
if (startIndex < 0)
{
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndex"));
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NegativeCount"));
}
if (startIndex > (value.Length - count))
{
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_IndexCountBuffer"));
}
if (count == 0)
{
return Empty;
}
int length = 0;
int num2 = (startIndex + count) - 1;
for (int i = startIndex; i <= num2; i++)
{
if (value[i] != null)
{
length += value[i].Length;
}
}
length += (count - 1) * separator.Length;
if ((length < 0) || ((length + 1) < 0))
{
throw new OutOfMemoryException();
}
if (length == 0)
{
return Empty;
}
string str = FastAllocateString(length);
fixed (char* chRef = &str.m_firstChar)
{
UnSafeCharBuffer buffer = new UnSafeCharBuffer(chRef, length);
buffer.AppendString(value[startIndex]);
for (int j = startIndex + 1; j <= num2; j++)
{
buffer.AppendString(separator);
buffer.AppendString(value[j]);
}
}
return str;
}

There's little reason to make it language-agnostic when some languages provide support for this in one line, e.g., Python's
",".join(sequence)
See the join documentation for more info.

For python be sure you have a list of strings, else ','.join(x) will fail.
For a safe method using 2.5+
delimiter = '","'
delimiter.join(str(a) if a else '' for a in list_object)
The "str(a) if a else ''" is good for None types otherwise str() ends up making then 'None' which isn't nice ;)

In PHP's implode():
$string = implode($delim, $array);

I'd always add the delimeter and then remove it at the end if necessary. This way, you're not executing an if statement for every iteration of the loop when you only care about doing the work once.
StringBuilder sb = new StringBuilder();
foreach(string item in list){
sb.Append(item);
sb.Append(delimeter);
}
if (list.Count > 0) {
sb.Remove(sb.Length - delimter.Length, delimeter.Length)
}

I would express this recursively.
Check if the number of string arguments is 1. If it is, return it.
Otherwise recurse, but combine the first two arguments with the delimiter between them.
Example in Common Lisp:
(defun join (delimiter &rest strings)
(if (null (rest strings))
(first strings)
(apply #'join
delimiter
(concatenate 'string
(first strings)
delimiter
(second strings))
(cddr strings))))
The more idiomatic way is to use reduce, but this expands to almost exactly the same instructions as the above:
(defun join (delimiter &rest strings)
(reduce (lambda (a b)
(concatenate 'string a delimiter b))
strings))

List<string> aaa = new List<string>{ "aaa", "bbb", "ccc" };
string mm = ";";
return aaa.Aggregate((a, b) => a + mm + b);
and you get
aaa;bbb;ccc
lambda is pretty handy

In C# you can just use String.Join(separator,string_list)

The problem is that computer languages rarely have string booleans, that is, methods that are of type string that do anything useful. SQL Server at least has is[not]null and nullif, which when combined solve the delimiter problem, by the way: isnotnull(nullif(columnvalue, ""),"," + columnvalue))
The problem is that in languages there are booleans, and there are strings, and never the twain shall meet except in ugly coding forms, e.g.
concatstring = string1 + "," + string2;
if (fubar)
concatstring += string3
concatstring += string4 etc
I've tried mightily to avoid all this ugliness, playing comma games and concatenating with joins, but I'm still left with some of it, including SQL Server errors when I've missed one of the commas and a variable is empty.
Jonathan

Since you tagged this language agnostic,
This is how you would do it in python
# delimiter can be multichar like "| trlalala |"
delimiter = ";"
# sequence can be any list, or iterator/generator that returns list of strings
result = delimiter.join(sequence)
#result will NOT have ending delimiter
Edit: I see I got beat to the answer by several people. Sorry for dupication

I thint the best way to do something like that is (I'll use pseudo-code, so we'll make it truly language agnostic):
function concat(<array> list, <boolean> strict):
for i in list:
if the length of i is zero and strict is false:
continue;
if i is not the first element:
result = result + separator;
result = result + i;
return result;
the second argument to concat(), strict, is a flag to know if eventual empty strings have to be considered in concatenation or not.
I'm used to not consider appending a final separator; on the other hand, if strict is false the resulting string could be free of stuff like "A,B,,,F", provided the separator is a comma, but would instead present as "A,B,F".

that's how python solves the problem:
','.join(list_of_strings)
I've never could understand the need for 'algorithms' in trivial cases though

This is a Working solution in C#, in Java, you can use similar for each on iterator.
string result = string.Empty;
// use stringbuilder at some stage.
foreach (string item in list)
result += "," + item ;
result = result.Substring(1);
// output: "item,item,item"
If using .NET, you might want to use extension method so that you can do
list.ToString(",")
For details, check out Separator Delimited ToString for Array, List, Dictionary, Generic IEnumerable
// contains extension methods, it must be a static class.
public static class ExtensionMethod
{
// apply this extension to any generic IEnumerable object.
public static string ToString<T>(this IEnumerable<T> source,
string separator)
{
if (source == null)
throw new ArgumentException("source can not be null.");
if (string.IsNullOrEmpty(separator))
throw new ArgumentException("separator can not be null or empty.");
// A LINQ query to call ToString on each elements
// and constructs a string array.
string[] array =
(from s in source
select s.ToString()
).ToArray();
// utilise builtin string.Join to concate elements with
// customizable separator.
return string.Join(separator, array);
}
}
EDIT:For performance reasons, replace the concatenation code with string builder solution that mentioned within this thread.

Seen the Python answer like 3 times, but no Ruby?!?!?
the first part of the code declares a new array. Then you can just call the .join() method and pass the delimiter and it will return a string with the delimiter in the middle. I believe the join method calls the .to_s method on each item before it concatenates.
["ID", "Description", "Active"].join(",")
>> "ID, Description, Active"
this can be very useful when combining meta-programming with with database interaction.
does anyone know if c# has something similar to this syntax sugar?

In Java 8 we can use:
List<String> list = Arrays.asList(new String[] { "a", "b", "c" });
System.out.println(String.join(",", list)); //Output: a,b,c
To have a prefix and suffix we can do
StringJoiner joiner = new StringJoiner(",", "{", "}");
list.forEach(x -> joiner.add(x));
System.out.println(joiner.toString()); //Output: {a,b,c}
Prior to Java 8 you can do like Jon's answer
StringBuilder sb = new StringBuilder(prefix);
boolean and = false;
for (E e : iterable) {
if (and) {
sb.append(delimiter);
}
sb.append(e);
and = true;
}
sb.append(suffix);

In .NET, I would use the String.join method if possible, which allows you to specify a separator and a string array. A list can be converted to an array with ToArray, but I don't know what the performance hit of that would be.
The three algorithms that you mention are what I would use (I like the second because it does not have an if statement in it, but if the length is not known I would use the third because it does not duplicate the code). The second will only work if the list is not empty, so that might take another if statement.
A fourth variant might be to put a seperator in front of every element that is concatenated and then remove the first separator from the result.
If you do concatenate strings in a loop, note that for non trivial cases the use of a stringbuilder will vastly outperform repeated string concatenations.

You could write your own method AppendTostring(string, delimiter) that appends the delimiter if and only if the string is not empty. Then you just call that method in any loop without having to worry when to append and when not to append.
Edit: better yet of course to use some kind of StringBuffer in the method if available.

string result = "";
foreach(string item in list)
{
result += delimiter + item;
}
result = result.Substring(1);
Edit: Of course, you wouldn't use this or any one of your algorithms to concatenate strings. With C#/.NET, you'd probably use a StringBuilder:
StringBuilder sb = new StringBuilder();
foreach(string item in list)
{
sb.Append(delimiter);
sb.Append(item);
}
string result = sb.ToString(1, sb.Length-1);
And a variation of this solution:
StringBuilder sb = new StringBuilder(list[0]);
for (int i=1; i<list.Count; i++)
{
sb.Append(delimiter);
sb.Append(list[i]);
}
string result = sb.ToString();
Both solutions do not include any error checks.

From http://dogsblog.softwarehouse.co.zw/post/2009/02/11/IEnumerable-to-Comma-Separated-List-(and-more).aspx
A pet hate of mine when developing is making a list of comma separated ids, it is SO simple but always has ugly code.... Common solutions are to loop through and put a comma after each item then remove the last character, or to have an if statement to check if you at the begining or end of the list. Below is a solution you can use on any IEnumberable ie a List, Array etc. It is also the most efficient way I can think of doing it as it relies on assignment which is better than editing a string or using an if.
public static class StringExtensions
{
public static string Splice<T>(IEnumerable<T> args, string delimiter)
{
StringBuilder sb = new StringBuilder();
string d = "";
foreach (T t in args)
{
sb.Append(d);
sb.Append(t.ToString());
d = delimiter;
}
return sb.ToString();
}
}
Now it can be used with any IEnumerable eg.
StringExtensions.Splice(billingTransactions.Select(t => t.id), ",")
to give us 31,32,35

For java a very complete answer has been given in this question or this question.
That is use StringUtils.join in Apache Commons
String result = StringUtils.join(list, ", ");

In Clojure, you could just use clojure.contrib.str-utils/str-join:
(str-join ", " list)
But for the actual algorithm:
(reduce (fn [res cur] (str res ", " cur)) list)

Groovy also has a String Object.join(String) method.

Java (from Jon's solution):
StringBuilder sb = new StringBuilder();
String delimiter = "";
for (String item : items) {
sb.append(delimiter).append(item);
delimeter = ", ";
}
return sb.toString();

Here is my humble try;
public static string JoinWithDelimiter(List<string> words, string delimiter){
string joinedString = "";
if (words.Count() > 0)
{
joinedString = words[0] + delimiter;
for (var i = 0; i < words.Count(); i++){
if (i > 0 && i < words.Count()){
if (joinedString.Length > 0)
{
joinedString += delimiter + words[i] + delimiter;
} else {
joinedString += words[i] + delimiter;
}
}
}
}
return joinedString;
}
Usage;
List<string> words = new List<string>(){"my", "name", "is", "Hari"};
Console.WriteLine(JoinWithDelimiter(words, " "));

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Algorithm to format text to Pascal or camel casing - algorithm

Using this question as the base is there an alogrithm or coding example to change some text to Pascal or Camel casing. For example: mynameisfred becomes Camel: myNameIsFred Pascal: MyNameIsFred

Related

splitting up the contents of a single line

Methods, more than one return?

Build trie faster

For loop construction and code complexity [closed]

Join a string using delimiters

Categories

Resources