Matching & Comparing Strings via Levenshtein Distance - sorting

I have one table with a static list of "template" strings, and another sheet that updates daily with new "populated" strings.
I want to be able to identify which "template" each string is based on by finding the string with the lowest distance, and then displaying the distance between the template and the new string as a percentile.
Is it possible to do a Levenshtein distance in Excel without having to resort to Macros?
I found this function for calculating the distance, but I'm a novice with this kind of code, and I'm not sure how to include the step for locating the closest match before displaying the distance. It also seems like this might not work for long strings (200+ characters) like I'm dealing with.
I'm a sheets guy, not normally a code guy, and I'm up against a brick wall here in terms of my understanding of what to do next. Thank you for your help!

It is possible to calculate the Levenshtein distance between two text strings using a named function such as ztiaa's LEVDIST.
But it may be more practical to use a custom function like the one you link to, or the following implementation that fuzzy matches values to a corpus using fuzzyset.js which is a JavaScript port of the Python fuzzyset.
/**
* Finds the closest matches to words using the FuzzySet library.
*
* #param {A2:A} words_to_find The text strings to find matches for.
* #param {Dictionary!W2:W} dictionary A list of words to match words_to_find against.
* #param {0.5} match_score_min Optional. A threshold value for how good the match must be. 1 means an exact match. The default is 0.33.
* #return {Array} The closest match to words_to_find found in the dictionary, or a blank value if there is no match.
* #customfunction
*/
function FindFuzzyMatch(words_to_find, dictionary, match_score_min) {
'use strict';
// version 1.2, written by --Hyde, 14 September 2022
// - hat tip to Andreas Muller
// - uses fuzzyset.js by Glen Chiacchieri, a port of fuzzyset by Mike Axiak
if (arguments.length < 2 || arguments.length > 3) {
throw `Wrong number of arguments to FindFuzzyMatch. Expected 2 or 3 arguments, but got ${arguments.length} arguments.`;
}
const words = Array.isArray(words_to_find) ? words_to_find : [[words_to_find]];
const dict = Array.isArray(dictionary) ? dictionary.flat() : [dictionary];
const fuzzyDictionary = FuzzySet(dict);
return words.map(row => row.map(word => {
if (!word) {
return null;
}
const fuzzyMatches = fuzzyDictionary.get(word, null, match_score_min);
return fuzzyMatches ? fuzzyMatches[0][1] : null; // get just the first result, ignoring its score
}));
}
// include https://github.com/Glench/fuzzyset.js/blob/master/lib/fuzzyset.js below,
// and remove the final export {} block

Related

Office Script - Split strings in vector & use dynamic cell address - Run an excel script from Power Automate

I'm completely new on Office Script (with only old experience on Python and C++) and I'm trying to run a rather "simple" Office Script on excel from power automate. The goal is to fill specific cells (always the same, their position shouldn't change) on the excel file.
The power Automate part is working, the problem is managing to use the information sent to Excel, in excel.
The script take three variables from Power automate (all three strings) and should fill specific cells based on these. CMQ_Name: string to use as is.
Version: string to use as is.
PT_Name: String with names separated by a ";". The goal is to split it in as much string as needed (I'm stocking them in an Array) and write each name in cells on top of each other, always starting on the same position (cell A2).
I'm able to use CMQ_Names & Version and put them in the cell they're supposed to go in, I've already make it works.
However, I cannot make the last part (in bold above, part 2 in the code below) work.
Learning on this has been pretty frustrating as some elements seems to sometime works and sometimes not. Newbie me is probably having syntax issues more than anyting...  
function main(workbook: ExcelScript.Workbook,
CMQ_Name: string,
Version: string,
PT_Name: string )
{
// create reference for each sheet in the excel document
let NAMES = workbook.getWorksheet("CMQ_NAMES");
let TERMS = workbook.getWorksheet("CMQ_TERMS");
//------Part 1: Update entries in sheet CMQ_NAMES
NAMES.getRange("A2").setValues(CMQ_Name);
NAMES.getRange("D2").setValues(Version);
//Update entries in sheet CMQ_TERMS
TERMS.getRange("A2").setValues(CMQ_Name);
//-------Part 2: work with PT_Name
//Split PT_Name
let ARRAY1: string[] = PT_Name.split(";");
let CELL: string;
let B: string = "B"
for (var i = 0; i < ARRAY1.length; i++) {
CELL = B.concat(i.toString());
NAMES.getRange(CELL).setValues(ARRAY1[i]);
}
}
  I have several problems:
Some parts (basically anything with red are detected as a problem and I have no idea why. Some research indicated it could be false positive, other not. It's not the biggest problem either as it seems the code sometimes works despite these warnings.
Argument of type 'string' is not assignable to parameter of type '(string | number | boolean)[ ][ ]'.
I couldn't find a way to use a variable as address to select a specific cell to write in, which is preventing the for loop at the end from working.  I've been bashing my head against this for a week now without solving it.
Could you kindly take a look?
Thank you!!
I tried several workarounds and other syntaxes without much success. Writing the first two strings in cells work, working with the third string doesn't.
EDIT: Thanks to the below comment, I managed to make it work:
function main(
workbook: ExcelScript.Workbook,
CMQ_Name: string,
Version: string,
PT_Name: string )
{
// create reference for each table
let NAMES = workbook.getWorksheet("CMQ_NAMES");
let TERMS = workbook.getWorksheet("CMQ_TERMS");
//------Part 0: clear previous info
TERMS.getRange("B2:B200").clear()
//------Part 1: Update entries in sheet CMQ_NAMES
NAMES.getRange("A2").setValue(CMQ_Name);
NAMES.getRange("D2").setValue(Version);
//Update entries in sheet CMQ_TERMS
TERMS.getRange("A2").setValue(CMQ_Name);
//-------Part 2: work with PT_Name
//Split PT_Name
let ARRAY1: string[] = PT_Name.split(";");
let CELL: string;
let B: string = "B"
for (var i = 2; i < ARRAY1.length + 2; i++) {
CELL = B.concat(i.toString());
//console.log(CELL); //debugging
TERMS.getRange(CELL).setValue(ARRAY1[i - 2]);
}
}
You're using setValues() (plural) which accepts a 2 dimensional array of values that contains the data for the given rows and columns.
You need to look at using setValue() instead as that takes a single argument of type any.
https://learn.microsoft.com/en-us/javascript/api/office-scripts/excelscript/excelscript.range?view=office-scripts#excelscript-excelscript-range-setvalue-member(1)
As for using a variable to retrieve a single cell (or set of cells for that matter), you really just need to use the getRange() method to do that, this is a basic example ...
function main(workbook: ExcelScript.Workbook) {
let cellAddress: string = "A4";
let range: ExcelScript.Range = workbook.getWorksheet("Data").getRange(cellAddress);
console.log(range.getAddress());
}
If you want to specify multiple ranges, just change cellAddress to something like this ... A4:C10
That method also accepts named ranges.

Hashing a long integer ID into a smaller string

Here is the problem, where I need to transform an ID (defined as a long integer) to a smaller alfanumeric identifier. The details are the following:
Each individual on the problem as an unique ID, a long integer of size 13 (something like 123123412341234).
I need to generate a smaller representation of this unique ID, a alfanumeric string, something like A1CB3X. The problem is that 5 or 6 character length will not be enough to represent such a large integer.
The new ID (eg A1CB3X) should be valid in a context where we know that only a small number of individuals are present (less than 500). The new ID should be unique within that small set of individuals.
The new ID (eg A1CB3X) should be the result of a calculation made over the original ID. This means that taking the original ID elsewhere and applying the same calculation, we should get the same new ID (eg A1CB3X).
This calculation should occur when the individual is added to the set, meaning that not all individuals belonging to that set will be know at that time.
Any directions on how to solve such a problem?
Assuming that you don't need a formula that goes in both directions (which is impossible if you are reducing a 13-digit number to a 5 or 6-character alphanum string):
If you can have up to 6 alphanumeric characters that gives you 366 = 2,176,782,336 possibilities, assuming only numbers and uppercase letters.
To map your larger 13-digit number onto this space, you can take a modulo of some prime number slightly smaller than that, for example 2,176,782,317, the encode it with base-36 encoding.
alphanum_id = base36encode(longnumber_id % 2176782317)
For a set of 500, this gives you a
2176782317P500 / 2176782317500 chance of a collision
(P is permutation)
Best option is to change the base to 62 using case sensitive characters
If you want it to be shorter, you can add unicode characters. See below.
Here is javascript code for you: https://jsfiddle.net/vewmdt85/1/
function compress(n) {
var symbols = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïð'.split('');
var d = n;
var compressed = '';
while (d >= 1) {
compressed = symbols[(d - (symbols.length * Math.floor(d / symbols.length)))] + compressed;
d = Math.floor(d / symbols.length);
}
return compressed;
}
$('input').keyup(function() {
$('span').html(compress($(this).val()))
})
$('span').html(compress($('input').val()))
How about using some base-X conversion, for example 123123412341234 becomes 17N644R7CI in base-36 and 9999999999999 becomes 3JLXPT2PR?
If you need a mapping that works both directions, you can simply go for a larger base.
Meaning: using base 16, you can reduce 1 to 16 to a single character.
So, base36 is the "maximum" that allows for shorter strings (when 1-1 mapping is required)!

Determine if one string is a prefix of another

I have written down a simple function that determines if str1 is a prefix of str2. It's a very simple function, that looks like this (in JS):
function isPrefix(str1, str2) // determine if str1 is a prefix of a candidate string
{
if(str2.length < str1.length) // candidate string can't be smaller than prefix string
return false;
var i = 0;
while(str1.charAt(i) == str2.charAt(i) && i <= str1.length)
i++;
if(i < str1.length) // i terminated => str 1 is smaller than str 2
return false;
return true;
}
As you can see, it loops through the entire length of the prefix string to gauge if it is a prefix of the candidate string. This means it's complexity is O(N), which isn't bad but this becomes a problem when I have a huge data set to consider looping through to determine which strings have the prefix string as a part of the prefix. This makes the complexity multiple like O(M*N) where M is the total number of strings in a given data set. Not good.
I explored the Internet a bit to determine that the best answer would be a Patricia/Radix trie. Where strings are stored as prefixes. Even then, when I attempt to insert/look-up a string, there will be a considerable overhead in string matching if I use the aforementioned prefix gauging function.
Say I had a prefix string 'rom' and a set of candidate words
var dataset =["random","rapid","romance","romania","rome","rose"];
that would like this in a radix trie :
r
/ \
a o
/ \ / \
ndom pid se m
/ \
an e
/ \
ia ce
This means, for every node, I will be using the prefix match function, to determine which node has a value that matches the prefix string at the index. Somehow, this solution still seems arduous and does not sit too well with me. Is there something better or anyway I can improve the core prefix matching function ?
Looks like you've got two different problems.
One is to determine if a string is contained as a prefix in another string. For this I would suggest using a function already implemented in the language's string library. In JavaScript you could do this
if (str2.indexOf(str1) === 0) {
// string str1 is a prefix of str2
}
See documentation for String.indexOf here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf
For the other problem, in a bunch of strings, find out which ones have a given string as a prefix, building a data structure like a Trie or the one you mention seems like the way to go, if you want fast look-ups.
Check out this thread on stackoverflow - How to check if a string "StartsWith" another string? . Mark Byers solution seems to be very efficient. Also for Java there are built in String functions "endsWith" and "startsWith" - http://docs.oracle.com/javase/tutorial/java/data/comparestrings.html

How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

What is the right way to split a string into words ?
(string doesn't contain any spaces or punctuation marks)
For example: "stringintowords" -> "String Into Words"
Could you please advise what algorithm should be used here ?
! Update: For those who think this question is just for curiosity. This algorithm could be used to camеlcase domain names ("sportandfishing .com" -> "SportAndFishing .com") and this algo is currently used by aboutus dot org to do this conversion dynamically.
Let's assume that you have a function isWord(w), which checks if w is a word using a dictionary. Let's for simplicity also assume for now that you only want to know whether for some word w such a splitting is possible. This can be easily done with dynamic programming.
Let S[1..length(w)] be a table with Boolean entries. S[i] is true if the word w[1..i] can be split. Then set S[1] = isWord(w[1]) and for i=2 to length(w) calculate
S[i] = (isWord[w[1..i] or for any j in {2..i}: S[j-1] and isWord[j..i]).
This takes O(length(w)^2) time, if dictionary queries are constant time. To actually find the splitting, just store the winning split in each S[i] that is set to true. This can also be adapted to enumerate all solution by storing all such splits.
As mentioned by many people here, this is a standard, easy dynamic programming problem: the best solution is given by Falk Hüffner. Additional info though:
(a) you should consider implementing isWord with a trie, which will save you a lot of time if you use properly (that is by incrementally testing for words).
(b) typing "segmentation dynamic programming" yields a score of more detail answers, from university level lectures with pseudo-code algorithm, such as this lecture at Duke's (which even goes so far as to provide a simple probabilistic approach to deal with what to do when you have words that won't be contained in any dictionary).
There should be a fair bit in the academic literature on this. The key words you want to search for are word segmentation. This paper looks promising, for example.
In general, you'll probably want to learn about markov models and the viterbi algorithm. The latter is a dynamic programming algorithm that may allow you to find plausible segmentations for a string without exhaustively testing every possible segmentation. The essential insight here is that if you have n possible segmentations for the first m characters, and you only want to find the most likely segmentation, you don't need to evaluate every one of these against subsequent characters - you only need to continue evaluating the most likely one.
If you want to ensure that you get this right, you'll have to use a dictionary based approach and it'll be horrendously inefficient. You'll also have to expect to receive multiple results from your algorithm.
For example: windowsteamblog (of http://windowsteamblog.com/ fame)
windows team blog
window steam blog
Consider the sheer number of possible splittings for a given string. If you have n characters in the string, there are n-1 possible places to split. For example, for the string cat, you can split before the a and you can split before the t. This results in 4 possible splittings.
You could look at this problem as choosing where you need to split the string. You also need to choose how many splits there will be. So there are Sum(i = 0 to n - 1, n - 1 choose i) possible splittings. By the Binomial Coefficient Theorem, with x and y both being 1, this is equal to pow(2, n-1).
Granted, a lot of this computation rests on common subproblems, so Dynamic Programming might speed up your algorithm. Off the top of my head, computing a boolean matrix M such M[i,j] is true if and only if the substring of your given string from i to j is a word would help out quite a bit. You still have an exponential number of possible segmentations, but you would quickly be able to eliminate a segmentation if an early split did not form a word. A solution would then be a sequence of integers (i0, j0, i1, j1, ...) with the condition that j sub k = i sub (k + 1).
If your goal is correctly camel case URL's, I would sidestep the problem and go for something a little more direct: Get the homepage for the URL, remove any spaces and capitalization from the source HTML, and search for your string. If there is a match, find that section in the original HTML and return it. You'd need an array of NumSpaces that declares how much whitespace occurs in the original string like so:
Needle: isashort
Haystack: This is a short phrase
Preprocessed: thisisashortphrase
NumSpaces : 000011233333444444
And your answer would come from:
location = prepocessed.Search(Needle)
locationInOriginal = location + NumSpaces[location]
originalLength = Needle.length() + NumSpaces[location + needle.length()] - NumSpaces[location]
Haystack.substring(locationInOriginal, originalLength)
Of course, this would break if madduckets.com did not have "Mad Duckets" somewhere on the home page. Alas, that is the price you pay for avoiding an exponential problem.
This can be actually done (to a certain degree) without dictionary. Essentially, this is an unsupervised word segmentation problem. You need to collect a large list of domain names, apply an unsupervised segmentation learning algorithm (e.g. Morfessor) and apply the learned model for new domain names. I'm not sure how well it would work, though (but it would be interesting).
This is basically a variation of a knapsack problem, so what you need is a comprehensive list of words and any of the solutions covered in Wiki.
With fairly-sized dictionary this is going to be insanely resource-intensive and lengthy operation, and you cannot even be sure that this problem will be solved.
Create a list of possible words, sort it from long words to short words.
Check if each entry in the list against the first part of the string. If it equals, remove this and append it at your sentence with a space. Repeat this.
A simple Java solution which has O(n^2) running time.
public class Solution {
// should contain the list of all words, or you can use any other data structure (e.g. a Trie)
private HashSet<String> dictionary;
public String parse(String s) {
return parse(s, new HashMap<String, String>());
}
public String parse(String s, HashMap<String, String> map) {
if (map.containsKey(s)) {
return map.get(s);
}
if (dictionary.contains(s)) {
return s;
}
for (int left = 1; left < s.length(); left++) {
String leftSub = s.substring(0, left);
if (!dictionary.contains(leftSub)) {
continue;
}
String rightSub = s.substring(left);
String rightParsed = parse(rightSub, map);
if (rightParsed != null) {
String parsed = leftSub + " " + rightParsed;
map.put(s, parsed);
return parsed;
}
}
map.put(s, null);
return null;
}
}
I was looking at the problem and thought maybe I could share how I did it.
It's a little too hard to explain my algorithm in words so maybe I could share my optimized solution in pseudocode:
string mainword = "stringintowords";
array substrings = get_all_substrings(mainword);
/** this way, one does not check the dictionary to check for word validity
* on every substring; It would only be queried once and for all,
* eliminating multiple travels to the data storage
*/
string query = "select word from dictionary where word in " + substrings;
array validwords = execute(query).getArray();
validwords = validwords.sort(length, desc);
array segments = [];
while(mainword != ""){
for(x = 0; x < validwords.length; x++){
if(mainword.startswith(validwords[x])) {
segments.push(validwords[x]);
mainword = mainword.remove(v);
x = 0;
}
}
/**
* remove the first character if any of valid words do not match, then start again
* you may need to add the first character to the result if you want to
*/
mainword = mainword.substring(1);
}
string result = segments.join(" ");

Best way to write a conversion function

Let's say that I'm writing a function to convert between temperature scales. I want to support at least Celsius, Fahrenheit, and Kelvin. Is it better to pass the source scale and target scale as separate parameters of the function, or some sort of combined parameter?
Example 1 - separate parameters:
function convertTemperature("celsius", "fahrenheit", 22)
Example 2 - combined parameter:
function convertTemperature("c-f", 22)
The code inside the function is probably where it counts. With two parameters, the logic to determine what formula we're going to use is slightly more complicated, but a single parameter doesn't feel right somehow.
Thoughts?
Go with the first option, but rather than allow literal strings (which are error prone), take constant values or an enumeration if your language supports it, like this:
convertTemperature (TempScale.CELSIUS, TempScale.FAHRENHEIT, 22)
Depends on the language.
Generally, I'd use separate arguments with enums.
If it's an object oriented language, then I'd recommend a temperature class, with the temperature stored internally however you like and then functions to output it in whatever units are needed:
temp.celsius(); // returns the temperature of object temp in celsius
When writing such designs, I like to think to myself, "If I needed to add an extra unit, what would design would make it the easiest to do so?" Doing this, I come to the conclusion that enums would be easiest for the following reasons:
1) Adding new values is easy.
2) I avoid doing string comparison
However, how do you write the conversion method? 3p2 is 6. So that means there are 6 different combinations of celsius, Fahrenheit, and kelvin. What if I wanted to add a new temperate format "foo"? That would mean 4p2 which is 12! Two more? 5p2 = 20 combination. Three more? 6p2 = 30 combinations!
You can quickly see how each additional modification requires more and more changes to the code. For this reason I don't do direct conversions! Instead, I do an intermediate conversion. I'd pick one temperature, say Kelvin. And initially, I'd convert to kelvin. I'd then convert kelvin to the desired temperature. Yes, It does result in an extra calculation. However, it makes scalling the code a ton easier. adding adding a new temperature unit will always result in only two new modifications to the code. Easy.
A few things:
I'd use an enumerated type that a syntax checker or compiler can check rather than a string that can be mistyped. In Pseudo-PHP:
define ('kCelsius', 0); define ('kFarenheit', 1); define ('kKelvin', 2);
$a = ConvertTemperature(22, kCelsius, kFarenheit);
Also, it seems more natural to me to place the thing you operate on, in this case the temperature to be converted, first. It gives a logical ordering to your parameters (convert -- what? from? to?) and thus helps with mnemonics.
Your function will be much more robust if you use the first approach. If you need to add another scale, that's one more parameter value to handle. In the second approach, adding another scale means adding as many values as you already had scales on the list, times 2. (For example, to add K to C and F, you'd have to add K-C, K-F, C-K, and C-F.)
A decent way to structure your program would be to first convert whatever comes in to an arbitrarily chosen intermediate scale, and then convert from that intermediate scale to the outgoing scale.
A better way would be to have a little library of slopes and intercepts for the various scales, and just look up the numbers for the incoming and outgoing scales and do the calculation in one generic step.
In C# (and probaly Java) it would be best to create a Temperature class that stores temperatures privately as Celcius (or whatever) and which has Celcius, Fahrenheit, and Kelvin properties that do all the conversions for you in their get and set statements?
Depends how many conversions you are going to have. I'd probably choose one parameter, given as an enum: Consider this expanded version of conversion.
enum Conversion
{
CelsiusToFahrenheit,
FahrenheitToCelsius,
KilosToPounds
}
Convert(Conversion conversion, X from);
You now have sane type safety at point of call - one cannot give correctly typed parameters that give an incorrect runtime result. Consider the alternative.
enum Units
{
Pounds,
Kilos,
Celcius,
Farenheight
}
Convert(Unit from, Unit to, X fromAmount);
I can type safely call
Convert(Pounds, Celcius, 5, 10);
But the result is meaningless, and you'll have to fail at runtime. Yes, I know you're only dealing with temperature at the moment, but the general concept still holds (I believe).
I would choose
Example 1 - separate parameters: function convertTemperature("celsius", "fahrenheit", 22)
Otherwise within your function definition you would have to parse "c-f" into "celsius" and "fahrenheit" anyway to get the required conversion scales, which could get messy.
If you're providing something like Google's search box to users, having handy shortcuts like "c-f" is nice for them. Underneath, though, I would convert "c-f" into "celsius" and "fahrenheit" in an outer function before calling convertTemperature() as above.
In this case single parameters looks totally obscure;
Function convert temperature from one scale to another scale.
IMO it's more natural to pass source and target scales as separate parameters. I definitely don't want to try to grasp format of first argument.
I would make an enumeration out of the temperature types and pass in the 2 scale parameters. Something like (in c#):
public void ConvertTemperature(TemperatureTypeEnum SourceTemp,
TemperatureTypeEnum TargetTemp,
decimal Temperature)
{}
I'm always on the lookout for ways to use objects to solve my programming problems. I hope this means that I'm more OO than when I was only using functions to solve problems, but that remains to be seen.
In C#:
interface ITemperature
{
CelciusTemperature ToCelcius();
FarenheitTemperature ToFarenheit();
}
struct FarenheitTemperature : ITemperature
{
public readonly int Value;
public FarenheitTemperature(int value)
{
this.Value = value;
}
public FarenheitTemperature ToFarenheit() { return this; }
public CelciusTemperature ToCelcius()
{
return new CelciusTemperature((this.Value - 32) * 5 / 9);
}
}
struct CelciusTemperature
{
public readonly int Value;
public CelciusTemperature(int value)
{
this.Value = value;
}
public CelciusTemperature ToCelcius() { return this; }
public FarenheitTemperature ToFarenheit()
{
return new FarenheitTemperature(this.Value * 9 / 5 + 32);
}
}
and some tests:
// Freezing
Debug.Assert(new FarenheitTemperature(32).ToCelcius().Equals(new CelciusTemperature(0)));
Debug.Assert(new CelciusTemperature(0).ToFarenheit().Equals(new FarenheitTemperature(32)));
// crossover
Debug.Assert(new FarenheitTemperature(-40).ToCelcius().Equals(new CelciusTemperature(-40)));
Debug.Assert(new CelciusTemperature(-40).ToFarenheit().Equals(new FarenheitTemperature(-40)));
and an example of a bug that this approach avoids:
CelciusTemperature theOutbackInAMidnightOilSong = new CelciusTemperature(45);
FarenheitTemperature x = theOutbackInAMidnightOilSong; // ERROR: Cannot implicitly convert type 'CelciusTemperature' to 'FarenheitTemperature'
Adding Kelvin conversions is left as an exercise.
By the way, it doesn't have to be more work to implement the three-parameter version, as suggested in the question statement.
These are all linear functions, so you can implement something like
float LinearConvert(float in, float scale, float add, bool invert);
where the last bool indicates if you want to do the forward transform or reverse it.
Within your conversion technique, you can have a scale/add pair for X -> Kelvin. When you get a request to convert format X to Y, you can first run X -> Kelvin, then Kelvin -> Y by reversing the Y -> Kelvin process (by flipping the last bool to LinearConvert).
This technique gives you something like 4 lines of real code in your convert function, and one piece of data for every type you need to convert between.
Similar to what #Rob #wcm and #David explained...
public class Temperature
{
private double celcius;
public static Temperature FromFarenheit(double farenheit)
{
return new Temperature { Farhenheit = farenheit };
}
public static Temperature FromCelcius(double celcius)
{
return new Temperature { Celcius = celcius };
}
public static Temperature FromKelvin(double kelvin)
{
return new Temperature { Kelvin = kelvin };
}
private double kelvinToCelcius(double kelvin)
{
return 1; // insert formula here
}
private double celciusToKelvin(double celcius)
{
return 1; // insert formula here
}
private double farhenheitToCelcius(double farhenheit)
{
return 1; // insert formula here
}
private double celciusToFarenheit(double kelvin)
{
return 1; // insert formula here
}
public double Kelvin
{
get { return celciusToKelvin(celcius); }
set { celcius = kelvinToCelcius(value); }
}
public double Celcius
{
get { return celcius; }
set { celcius = value; }
}
public double Farhenheit
{
get { return celciusToFarenheit(celcius); }
set { celcius = farhenheitToCelcius(value); }
}
}
I think I'd go whole hog one direction or another. You could write a mini-language that does any sort of conversion like units does:
$ units 'tempF(-40)' tempC
-40
Or use individual functions like the recent Convert::Temperature Perl module does:
use Convert::Temperature;
my $c = new Convert::Temperature();
my $res = $c->from_fahr_to_cel('59');
But that brings up an important point---does the language you are using already have conversion functions? If so, what coding convention do they use? So if the language is C, it would be best to follow the example of the atoi and strtod library functions (untested):
double fahrtocel(double tempF){
return ((tempF-32)*(5/9));
}
double celtofahr(double tempC){
return ((9/5)*tempC + 32);
}
In writing this post, I ran across a very interesting post on using emacs to convert dates. The take-away for this topic is that it uses the one function-per-conversion style. Also, conversions can be very obscure. I tend to do date calculations using SQL because it seems unlikely there are many bugs in that code. In the future, I'm going to look into using emacs.
Here is my take on this (using PHP):
function Temperature($value, $input, $output)
{
$value = floatval($value);
if (isset($input, $output) === true)
{
switch ($input)
{
case 'K': $value = $value - 273.15; break; // Kelvin
case 'F': $value = ($value - 32) * (5 / 9); break; // Fahrenheit
case 'R': $value = ($value - 491.67) * (5 / 9); break; // Rankine
}
switch ($output)
{
case 'K': $value = $value + 273.15; break; // Kelvin
case 'F': $value = $value * (9 / 5) + 32; break; // Fahrenheit
case 'R': $value = ($value + 273.15) * (9 / 5); break; // Rankine
}
}
return $value;
}
Basically the $input value is converted to the standard Celsius scale and then converted back again to the $output scale - one function to rule them all. =)
My vote is two parameters for conversion types, one for the value (as in your first example). I would use enums instead of string literals, however.
Use enums, if your language allows it, for the unit specifications.
I'd say the code inside would be easier with two. I'd have a table with pre-add, multiplty, and post-add, and run the value through the item for one unit, and then through the item for the other unit in reverse. Basically converting the input temperature to a common base value inside, and then out to the other unit. This entire function would be table-driven.
I wish there was some way to accept multiple answers. Based on everyone's recommendations, I think I will stick with the multiple parameters, changing the strings to enums/constants, and moving the value to be converted to the first position in the parameter list. Inside the function, I'll use Kelvin as a common middle ground.
Previously I had written individual functions for each conversion and the overall convertTemperature() function was merely a wrapper with nested switch statements. I'm writing in both classic ASP and PHP, but I wanted to leave the question open to any language.

Resources