Visual Studio code metrics misreporting lines of code - visual-studio

The code metrics analyser in Visual Studio, as well as the code metrics power tool, report the number of lines of code in the TestMethod method of the following code as 8.
At the most, I would expect it to report lines of code as 3.
[TestClass]
public class UnitTest1
{
private void Test(out string str)
{
str = null;
}
[TestMethod]
public void TestMethod()
{
var mock = new Mock<UnitTest1>();
string str;
mock.Verify(m => m.Test(out str));
}
}
Can anyone explain why this is the case?
Further info
After a little more digging I've found that removing the out parameter from the Test method and updating the test code causes LOC to be reported as 2, which I believe is correct. The addition of out causes the jump, so it's not because of braces or attributes.
Decompiling the DLL with dotPeek reveals a fair amount of additional code generated because of the out parameter which could be considered 8 LOC, but removing the parameter and decompiling also reveals generated code, which could be considered 5 LOC, so it's not simply a matter of VS counting compiler generated code (which I don't believe it should do anyway).

There are several common definitions of 'Lines Of Code' (LOC). Each tries to bring some sense to what I think of as an almost meaningless metric. For example google of effective lines of code (eLOC).
I think that VS is including the attribute as part of the method declaration and is trying to give eLOC by counting statements and even braces. One possiblity is that 'm => m.Test(out str)' is being counted as a statement.
Consider this:
if (a > 1 &&
b > 2)
{
var result;
result = GetAValue();
return result;
}
and this:
if (a> 1 && b >2)
return GetAValue();
One definition of LOC is to count the lines that have any code. This may even include braces. In such an extreme simplistic definition the count varies hugely on coding style.
eLOC tries to reduce or eliminate the influence of code style. For example, as may the case here, a declaration may be counted as a 'line'. Not justifying it, just explaining.
Consider this:
int varA = 0;
varA = GetAValue();
and this:
var varA = GetAValue();
Two lines or one?
It all comes down to what is the intent. If it is to measure how tall a monitor you need then perhaps use a simple LOC. If the intent is to measure complexity then perhaps counting code statements is better such as eLOC.
If you want to measure complexity then use a complexity metric like cyclomatic complexity. Don't worry about how VS is measuring LOC as, i think, it is a useless metric anyway.

With the tool NDepend we get a # Lines of Code (LoC) of 2 for TestMethod(). (Disclaimer I am one of the developers of this tool). I wrote an article about How do you count your number of Lines Of Code (LOC) ? that is shedding light on what is logical LoC, and how all .NET LoC counting tooling rely on the PDB sequence points technology.
My guess concerning this LoC value of 8 provided by VS metric, is that it includes the LoC of the method generated by the lambda expression + it includes the PDB sequences points related to open/ending braces (which NDepend doesn't). Also lot of gymnastic is done by the compiler to do what is called capturing the local variable str, but this shouldn't impact the #LoC that is inferred from the PDB sequence points.
Btw, I wrote 2 others related LoC articles:
Why is it useful to count the number of Lines Of Code (LOC) ?
Mythical man month : 10 lines per developer day

I was wondering about the Visual Studio line counting and why what I was seeing wasn't what was being reported. So I wrote a small C# console program to count pure lines of code and write the results to a CSV file (see below).
Open a new solution, copy and paste it into the Program.cs file, build the executable, and then you're ready to go. It's a .Net 3.5 application. Copy it into the topmost directory of your code base. Open a command window and run the executable. You get two prompts, first for name of the program/subsystem, and for any extra file types you want to analyze. It then writes the results to a CSV file in the current directory. Nice simple thing for your purposes or to hand to management.
Anyhoo, here it is, FWIW, and YMMV:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace CodeMetricsConsole
{
class Program
{
// Concept here is that the program has a list of file extensions to do line counts on; it
// gets any extra extensions at startup from the user. Then it gets a list of files based on
// each extension in the current directory and all subdirectories. Then it walks through
// each file line by line and will display counts for that file and for that file extension.
// It writes that information to a CSV file in the current directory. It uses regular expressions
// on each line of each file to figure out what it's looking at, and how to count it (i.e. is it
// a line of code, a single or multi line comment, a multi-line string, or a whitespace line).
//
static void Main(string[] args)
{
try
{
Console.WriteLine(); // spacing
// prompt user for subsystem or application name
String userInput_subSystemName;
Console.Write("Enter the name of this application or subsystem (required): ");
userInput_subSystemName = Console.ReadLine();
if (userInput_subSystemName.Length == 0)
{
Console.WriteLine("Application or subsystem name required, exiting.");
return;
}
Console.WriteLine(); // spacing
// prompt user for additional types
String userInput_additionalFileTypes;
Console.WriteLine("Default extensions are asax, css, cs, js, aspx, ascx, master, txt, jsp, java, php, bas");
Console.WriteLine("Enter a comma-separated list of additional file extensions (if any) you wish to analyze");
Console.Write(" --> ");
userInput_additionalFileTypes = Console.ReadLine();
// tell user processing is starting
Console.WriteLine();
Console.WriteLine("Getting LOC counts...");
Console.WriteLine();
// the default file types to analyze - hashset to avoid duplicates if the user supplies extensions
HashSet allowedExtensions = new HashSet { "asax", "css", "cs", "js", "aspx", "ascx", "master", "txt", "jsp", "java", "php", "bas" };
// Add user-supplied types to allowedExtensions if any
String[] additionalFileTypes;
String[] separator = { "," };
if (userInput_additionalFileTypes.Length > 0)
{
// split string into array of additional file types
additionalFileTypes = userInput_additionalFileTypes.Split(separator, StringSplitOptions.RemoveEmptyEntries);
// walk through user-provided file types and append to default file types
foreach (String ext in additionalFileTypes)
{
try
{
allowedExtensions.Add(ext.Trim()); // remove spaces
}
catch (Exception e)
{
Console.WriteLine("Exception: " + e.Message);
}
}
}
// summary file to write to
String summaryFile = userInput_subSystemName + "_Summary.csv";
String path = Directory.GetCurrentDirectory();
String pathAndFile = path + Path.DirectorySeparatorChar + summaryFile;
// regexes for the different line possibilities
Regex oneLineComment = new Regex(#"^\s*//"); // match whitespace to two slashes
Regex startBlockComment = new Regex(#"^\s*/\*.*"); // match whitespace to /*
Regex whiteSpaceOnly = new Regex(#"^\s*$"); // match whitespace only
Regex code = new Regex(#"\S*"); // match anything but whitespace
Regex endBlockComment = new Regex(#".*\*/"); // match anything and */ - only used after block comment detected
Regex oneLineBlockComment = new Regex(#"^\s*/\*.*\*/.*"); // match whitespace to /* ... */
Regex multiLineStringStart = new Regex("^[^\"]*#\".*"); // match #" - don't match "#"
Regex multiLineStringEnd = new Regex("^.*\".*"); // match double quotes - only used after multi line string start detected
Regex oneLineMLString = new Regex("^.*#\".*\""); // match #"..."
Regex vbaComment = new Regex(#"^\s*'"); // match whitespace to single quote
// Uncomment these two lines to test your regex with the function testRegex() below
//new Program().testRegex(oneLineMLString);
//return;
FileStream fs = null;
String line = null;
int codeLineCount = 0;
int commentLineCount = 0;
int wsLineCount = 0;
int multiLineStringCount = 0;
int fileCodeLineCount = 0;
int fileCommentLineCount = 0;
int fileWsLineCount = 0;
int fileMultiLineStringCount = 0;
Boolean inBlockComment = false;
Boolean inMultiLineString = false;
try
{
// write to summary CSV file, overwrite if exists, don't append
using (StreamWriter outFile = new StreamWriter(pathAndFile, false))
{
// outFile header
outFile.WriteLine("filename, codeLineCount, commentLineCount, wsLineCount, mlsLineCount");
// walk through files with specified extensions
foreach (String allowed_extension in allowedExtensions)
{
String extension = "*." + allowed_extension;
// reset accumulating values for extension
codeLineCount = 0;
commentLineCount = 0;
wsLineCount = 0;
multiLineStringCount = 0;
// Get all files in current directory and subdirectories with specified extension
String[] fileList = Directory.GetFiles(Directory.GetCurrentDirectory(), extension, SearchOption.AllDirectories);
// walk through all files of this type
for (int i = 0; i < fileList.Length; i++)
{
// reset values for this file
fileCodeLineCount = 0;
fileCommentLineCount = 0;
fileWsLineCount = 0;
fileMultiLineStringCount = 0;
inBlockComment = false;
inMultiLineString = false;
try
{
// open file
fs = new FileStream(fileList[i], FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (TextReader tr = new StreamReader(fs))
{
// walk through lines in file
while ((line = tr.ReadLine()) != null)
{
if (inBlockComment)
{
if (whiteSpaceOnly.IsMatch(line))
{
fileWsLineCount++;
}
else
{
fileCommentLineCount++;
}
if (endBlockComment.IsMatch(line)) inBlockComment = false;
}
else if (inMultiLineString)
{
fileMultiLineStringCount++;
if (multiLineStringEnd.IsMatch(line)) inMultiLineString = false;
}
else
{
// not in a block comment or multi-line string
if (oneLineComment.IsMatch(line))
{
fileCommentLineCount++;
}
else if (oneLineBlockComment.IsMatch(line))
{
fileCommentLineCount++;
}
else if ((startBlockComment.IsMatch(line)) && (!(oneLineBlockComment.IsMatch(line))))
{
fileCommentLineCount++;
inBlockComment = true;
}
else if (whiteSpaceOnly.IsMatch(line))
{
fileWsLineCount++;
}
else if (oneLineMLString.IsMatch(line))
{
fileCodeLineCount++;
}
else if ((multiLineStringStart.IsMatch(line)) && (!(oneLineMLString.IsMatch(line))))
{
fileCodeLineCount++;
inMultiLineString = true;
}
else if ((vbaComment.IsMatch(line)) && (allowed_extension.Equals("txt") || allowed_extension.Equals("bas"))
{
fileCommentLineCount++;
}
else
{
// none of the above, thus it is a code line
fileCodeLineCount++;
}
}
} // while
outFile.WriteLine(fileList[i] + ", " + fileCodeLineCount + ", " + fileCommentLineCount + ", " + fileWsLineCount + ", " + fileMultiLineStringCount);
fs.Close();
fs = null;
} // using
}
finally
{
if (fs != null) fs.Dispose();
}
// update accumulating values
codeLineCount = codeLineCount + fileCodeLineCount;
commentLineCount = commentLineCount + fileCommentLineCount;
wsLineCount = wsLineCount + fileWsLineCount;
multiLineStringCount = multiLineStringCount + fileMultiLineStringCount;
} // for (specific file)
outFile.WriteLine("Summary for: " + extension + ", " + codeLineCount + ", " + commentLineCount + ", " + wsLineCount + ", " + multiLineStringCount);
} // foreach (all files with specified extension)
} // using summary file streamwriter
Console.WriteLine("Analysis complete, file is: " + pathAndFile);
} // try block
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
}
catch (Exception e2)
{
Console.WriteLine("Error: " + e2.Message);
}
} // main
// local testing function for debugging purposes
private void testRegex(Regex rx)
{
String test = " asdfasd asdf #\" adf ++--// /*\" ";
if (rx.IsMatch(test))
{
Console.WriteLine(" -->| " + rx.ToString() + " | matched: " + test);
}
else
{
Console.WriteLine("No match");
}
}
} // class
} // namespace
Here's how it works:
the program has a set of the file extensions you want to analyze.
It walks through each extension in the set, getting all files of that type in the current and all subdirectories.
It selects each file, goes through each line of that file, compares each line to a regex to figure out what it's looking at, and increments the line count after it figures out what it's looking at.
If a line isn't whitespace, a single or multi-line comment, or a multi-line string, it counts it as a line of code. It reports all the counts for each of those types of lines (code, comments, whitespace, multi-line strings) and writes them to a CSV file. No need to explain why Visual Studio did or did not count something as a line of code.
Yes, there are three loops embedded in each other (O(n-cubed) O_O ) but it's just a simple, standalone developer tool, and the biggest code base I've run it on was about 350K lines and it took like 10 seconds to run on a Core i7.
Edit: Just ran it on the Firefox 12 code base, about 4.3 million lines (3.3M code, 1M comments), about 21K files, with an AMD Phenom processor - took 7 minutes, watched the performance tab in Task Manager, no stress. FYI.
My attitude is if I wrote it to be part of an instruction fed to a compiler, it's a line of code and should be counted.
It can easily be customized to ignore or count whatever you want (brackets, namespaces, the includes at the top of the file, etc). Just add the regex, test it with the function that's right there below the regexes, then update the if statement with that regex.

Related

Reading a record broken down into two lines because of /n in MapReduce

I am trying to write a custom reader which serves me the purpose of reading a record (residing in two lines) with defined number of fields.
For Eg
1,2,3,4("," can be there or not)
,5,6,7,8
My requirement is to read the record and push it into mapper as a single record like {1,2,3,4,5,6,7,8}. Please give some inputs.
UPDATE:
public boolean nextKeyValue() throws IOException, InterruptedException {
if(key == null) {
key = new LongWritable();
}
//Current offset is the key
key.set(pos);
if(value == null) {
value = new Text();
}
int newSize = 0;
int numFields = 0;
Text temp = new Text();
boolean firstRead = true;
while(numFields < reqFields) {
while(pos < end) {
//Read up to the '\n' character and store it in 'temp'
newSize = in.readLine( temp,
maxLineLength,
Math.max((int) Math.min(Integer.MAX_VALUE, end - pos),
maxLineLength));
//If 0 bytes were read, then we are at the end of the split
if(newSize == 0) {
break;
}
//Otherwise update 'pos' with the number of bytes read
pos += newSize;
//If the line is not too long, check number of fields
if(newSize < maxLineLength) {
break;
}
//Line too long, try again
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
//Exit, since we're at the end of split
if(newSize == 0) {
break;
}
else {
String record = temp.toString();
StringTokenizer fields = new StringTokenizer(record,"|");
numFields += fields.countTokens();
//Reset 'value' if this is the first append
if(firstRead) {
value = new Text();
firstRead = false;
}
if(numFields != reqFields) {
value.append(temp.getBytes(), 0, temp.getLength());
}
else {
value.append(temp.getBytes(), 0, temp.getLength());
}
}
}
if(newSize == 0) {
key = null;
value = null;
return false;
}
else {
return true;
}
}
}
This is the nextKeyValue method which I am trying to work on. But still the mapper are not getting proper values.
reqFields is 4.
Look at how TextInputFormat is implemented. Look at it's superclass, FileInputFormat as well. You must subclass Either TextInputFormat of FileInputFormat and implement your own record handling.
Thing to be aware when implementing any kind of file input format is this:
Framework will split the file and give you the start offset and byte length of the piece of the file you have to read. It may very well happen that it splits the file right across some record. That is why your reader must skip the bytes of the record at the beginning of the split if that record is not fully contained in the split, as well as read past the last byte of the split to read the whole last record if that one is not fully contained in the split.
For example, TextInoutFormat treats \n characters as record delimiters so when it gets the split it skips the bytes until the first \n character and read past the end of the split until the \n character.
As for the code example:
You need to ask yourself the following question: Say you open the file, seek to a random position and start reading forward. How do you detect the start of the record? I don't see anything in your code that deals with that, and without it, you cannot write a good input format, because you don't know what are the record boundaries.
Now it is still possible to make the input format read the whole file end to end by making the isSplittable(JobContext,Path) method return false. That makes the file read wholly by single map task which reduces parallelism.
Your inner while loop seems problematic since it's checking for lines that are too long and is skipping them. Given that your records are written using multiple lines, it can happen that you merge one part of one record and another part of another record when you read it.
The string had to be tokenized using StringTokenizer and not split. The code has been updated with the new implmentation.

Script to rename files

I have about 2200 different files in a few different folders, and I need to rename about about 1/3 of them which are in their own subfolder. Those 700 are also in various folders as well.
For example, there might be
The top-most folder is Employees, which has a few files in it, then the folder 2002 has a few, 2003 has more files, 2004 etc.
I just need to attach the word "Agreement" before the existing name of each file. So instead of it just being "Joe Schmoe.doc" It would be "Agreement Joe Schmoe.doc" instead.
I've tried googling such scripts, and I can find stuff similar to what I want but it all looks completely foreign to me so I can't understand how I'd modify it to suit my needs.
Oh, and this is for windows server '03.
I need about 2 minutes to write such script for *NIX systems (may be less), but for Windows it is a long song ... ))
I've write simple VBS script for WSH, try it (save to {script-name}.vbs, change Path value (on the first line of the script) and execute). I recommend to test script on small amount of data for the first time just to be sure if it works correctly.
Path = "C:\Users\rootDirectory"
Set FSO = CreateObject("Scripting.FileSystemObject")
Sub visitFolder(folderVar)
For Each fileToRename In folderVar.Files
fileToRename.Name = "Agreement " & fileToRename.Name
Next
For Each folderToVisit In folderVar.SubFolders
visitFolder(folderToVisit)
Next
End Sub
If FSO.FolderExists(Path) Then
visitFolder(FSO.getFolder(Path))
End If
I used to do bulk renaming with batch scripts under Windows. I know it's a snap on *nix (find . -maxdepth N -type f -name "$pattern" | sed -e 'p' -e "s/$str1/$str2/g" | xargs -n2 mv). Buf after some struggle in vain, I found out, to achieve that effect using batch scripts is almost impossible. So I turned to javascript.
With this script, you can add prefix to file names by 'rename.js "s/^/Agreement /" -r *.doc'. A caret(^) means to match the beginning. The '-r' options means 'recursively', i.e. including sub-folders. You can specify a max depth with the '-d N' option. If neither '-r' or '-d N' is given, the script does not recurse.
If you know the *nix 'find' utility, you would notice that 'find' will match the full path (not just the file name part) to specified regular expression. This behavior can be achieved by supplying the '-f' option. By default, this script will match the file name part with the given regular expression.
If you are familiar with regular expressions, complicated renaming is possible. For example, 'rename.js "s/(\d+)/[$1]/" *' which uses grouping to add brackets to number sequences in filenames.
// rename.js --- bulk file renaming utility (like *nix rename.pl)
// (c) Copyright 2012, Ji Han (hanji <at> outlook <dot> com)
// you are free to distribute it under the BSD license.
// oops... jscript doesn't have array.map
Array.prototype.map = function(f, t){
var o = Object(this);
var a = new Array(o.length >>> 0);
for (var i = 0; i < a.length; ++i){ if (i in o) a[i] = f.call(t, o[i], i, o) }
return a;
};
/// main
(function(){
if (WScript.Arguments.Length == 0){
WScript.Echo('rename "<operator>/<pattern>/<string>/[<modifiers>]" [-f] [-r] [-d <maxdepth>] [<files>]');
WScript.Quit(1);
}
var fso = new ActiveXObject('Scripting.FileSystemObject');
// folder is a Folder object [e.g. from fso.GetFolder()]
// fn is a function which operates on File/Folder object
var recurseFolder = function(folder, fn, depth, maxdepth){
if (folder.Files){
for (var e = new Enumerator(folder.Files); !e.atEnd(); e.moveNext()){
fn(e.item())
}
}
if (folder.Subfolders){
for (var e = new Enumerator(folder.SubFolders); !e.atEnd(); e.moveNext()){
fn(e.item());
if (depth < maxdepth){ arguments.callee(e.item(), fn, depth + 1, maxdepth) }
}
}
}
// expand wildcards (asterisk [*] and question mark [?]) recursively
// given path may be relative, and may contain environment variables.
// but wildcards only work for the filename part of a path.
// return an array of full paths of matched files.
// {{{
var expandWildcardsRecursively = function(n, md){
var pattern = fso.GetFileName(n);
// escape regex metacharacters (except \, /, * and ?)
// \ and / wouldn't appear in filename
// * and ? are treated as wildcards
pattern = pattern.replace(/([\[\](){}^$.+|-])/g, '\\$1');
pattern = pattern.replace(/\*/g, '.*'); // * matches zero or more characters
pattern = pattern.replace(/\?/g, '.'); // ? matches one character
pattern = pattern.replace(/^(.*)$/, '\^$1\$'); // matches the whole filename
var re = new RegExp(pattern, 'i'); // case insensitive
var folder = fso.GetFolder(fso.GetParentFolderName(fso.GetAbsolutePathName(n)));
var l = [];
recurseFolder(folder, function(i){ if (i.Name.match(re)) l.push(i.Path) }, 0, md);
return l;
}
// }}}
// parse "<operator>/<pattern>/<string>/[<modifiers>]"
// return an array splitted at unescaped forward slashes
// {{{
var parseExpr = function(s){
// javascript regex doesn't have lookbehind...
// reverse the string and lookahead to parse unescaped forward slashes.
var z = s.split('').reverse().join('');
// match unescaped forward slashes and get their positions.
var re = /\/(\\\\)*(?!\\)/g;
var l = [];
while (m = re.exec(z)){ l.push(m.index) }
// split s at unescaped forward slashes.
var b = [0].concat(l.map(function(x){ return s.length - x }).reverse());
var e = (l.map(function(x){ return s.length - x - 1 }).reverse()).concat([s.length]);
return b.map(function(_, i){ return s.substring(b[i], e[i]) });
}
// }}}
var expr = WScript.Arguments(0);
var args = [];
var options = {};
for (var i = 1; i < WScript.Arguments.Length; ++i){
if (WScript.Arguments(i).substring(0, 1) != '-'){
args.push(WScript.Arguments(i));
} else if (WScript.Arguments(i) == '-f'){
options['fullpath'] = true;
} else if (WScript.Arguments(i) == '-r'){
options['recursive'] = true;
} else if (WScript.Arguments(i) == '-d'){
options['maxdepth'] = WScript.Arguments(++i);
} else if (WScript.Arguments(i) == '--'){
continue;
} else {
WScript.Echo('invalid option \'' + WScript.Arguments(i) +'\'');
WScript.Quit(1);
}
}
if (options['maxdepth']){
var md = options['maxdepth'];
} else if (options['recursive']){
var md = 1<<31>>>0;
} else {
var md = 0;
}
var tokens = parseExpr(expr);
if (tokens.length != 4){
WScript.Echo('error parsing expression \'' + expr + '\'.');
WScript.Quit(1);
}
if (tokens[0] != 's'){
WScript.Echo('<operator> must be s.');
WScript.Quit(1);
}
var pattern = tokens[1];
var substr = tokens[2];
var modifiers = tokens[3];
var re = new RegExp(pattern, modifiers);
for (var i = 0; i < args.length; ++i){
var l = expandWildcardsRecursively(args[i], md);
for (var j = 0; j < l.length; ++j){
var original = l[j];
if (options['fullpath']){
var nouveau = original.replace(re, substr);
} else {
var nouveau = fso.GetParentFolderName(original) + '\\' + fso.GetFileName(original).replace(re, substr);
}
if (nouveau != original){
(fso.FileExists(original) && fso.GetFile(original) || fso.GetFolder(original)).Move(nouveau)
}
}
}
})();

Creating a unique filename from a list of alphanumeric strings

I apologize for creating a similar thread to many that are out there now, but I mainly wanted to also get some insight on some methods.
I have a list of Strings (could be just 1 or over a 1000)
Format = XXX-XXXXX-XX where each one is alphanumeric
I am trying to generate a unique string (currently 18 in length but probably could be longer ensuring not to maximize file length or path length) that I could reproduce if I have that same list. Order doesn't matter; although I may be interested if its easier to restrict the order as well.
My current Java code is follows (which failed today, hence why I am here):
public String createOutputFileName(ArrayList alInput, EnumFPFunction efpf, boolean pHeaders) {
/* create file name based on input list */
String sFileName = "";
long partNum = 0;
for (String sGPN : alInput) {
sGPN = sGPN.replaceAll("-", ""); //remove dashes
partNum += Long.parseLong(sGPN, 36); //(base 36)
}
sFileName = Long.toString(partNum);
if (sFileName.length() > 19) {
sFileName.substring(0, 18); //Max length of 19
}
return alInput;
}
So obviously just adding them did not work out so well I found out (also think I should take last 18 digits and not first 18)
Are there any good methods out there (possibly CRC related) that would work?
To assist with my key creation:
The first 3 characters are almost always numeric and would probably have many duplicate (out of 100, there may only be 10 different starting numbers)
These characters are not allowed - I,O
There will never be a character then a number in the last two alphachar subset.
I would use the system time. Here's how you might do it in Java:
public String createOutputFileName() {
long mills = System.currentTimeMillis();
long nanos = System.nanoTime();
return mills + " " + nanos;
}
If you want to add some information about the items and their part numbers, you can, of course!
======== EDIT: "What do I mean by batch object" =========
class Batch {
ArrayList<Item> itemsToProcess;
String inputFilename; // input to external process
boolean processingFinished;
public Batch(ArrayList<Item> itemsToProcess) {
this.itemsToProcess = itemsToProcess;
inputFilename = null;
processingFinished = false;
}
public void processWithExternal() {
if(inputFilename != null || processingFinished) {
throw new IllegalStateException("Cannot initiate process more than once!");
}
String base = System.currentTimeMillis() + " " + System.nanoTime();
this.inputFilename = base + "_input";
writeItemsToFile();
// however you build your process, do it here
Process p = new ProcessBuilder("myProcess","myargs", inputFilename);
p.start();
p.waitFor();
processingFinished = true;
}
private void writeItemsToFile() {
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(inputFilename)));
int flushcount = 0;
for(Item item : itemsToProcess) {
String output = item.getFileRepresentation();
out.println(output);
if(++flushcount % 10 == 0) out.flush();
}
out.flush();
out.close();
}
}
In addition to GlowCoder's response, I have thought of another "decent one" that would work.
Instead of just adding the list in base 36, I would do two separate things to the same list.
In this case, since there is no way for negative or decimal numbers, adding every number and multiplying every number separately and concatenating these base36 number strings isn't a bad way either.
In my case, I would take the last nine digits of the added number and last nine of the multiplied number. This would eliminate my previous errors and make it quite robust. It obviously is still possible for errors once overflow starts occurring, but could also work in this case. Extending the allowable string length would make it more robust as well.
Sample code:
public String createOutputFileName(ArrayList alInput, EnumFPFunction efpf, boolean pHeaders) {
/* create file name based on input list */
String sFileName1 = "";
String sFileName2 = "";
long partNum1 = 0; // Starting point for addition
long partNum2 = 1; // Starting point for multiplication
for (String sGPN : alInput) {
//remove dashes
sGPN = sGPN.replaceAll("-", "");
partNum1 += Long.parseLong(sGPN, 36); //(base 36)
partNum2 *= Long.parseLong(sGPN, 36); //(base 36)
}
// Initial strings
sFileName1 = "000000000" + Long.toString(partNum1, 36); // base 36
sFileName2 = "000000000" + Long.toString(partNum2, 36); // base 36
// Cropped strings
sFileName1 = sFileName1.substring(sFileName1.length()-9, sFileName1.length());
sFileName2 = sFileName2.substring(sFileName2.length()-9, sFileName2.length());
return sFileName1 + sFileName2;
}

For loop construction and code complexity [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My group is having some discussion and strong feelings about for loop construction.
I have favored loops like:
size_t x;
for (x = 0; x < LIMIT; ++x) {
if (something) {
break;
}
...
}
// If we found what we're looking for, process it.
if (x < LIMIT) {
...
}
But others seem to prefer a Boolean flag like:
size_t x;
bool found = false;
for (x = 0; x < LIMIT && !found; ++x) {
if (something) {
found = true;
}
else {
...
}
}
// If we found what we're looking for, process it.
if (found) {
...
}
(And, where the language allows, using "for (int x = 0; ...".)
The first style has one less variable to keep track of and a simpler loop header. Albeit at the cost of "overloading" the loop control variable and (some would complain), the use of break.
The second style has clearly defined roles for the variables but a more complex loop condition and loop body (either an else, or a continue after found is set, or a "if (!found)" in the balance of the loop).
I think that the first style wins on code complexity. I'm looking for opinions from a broader audience. Pointers to actual research on which is easier to read and maintain would be even better. "It doesn't matter, take it out of your standard" is a fine answer, too.
OTOH, this may be the wrong question. I'm beginning to think that the right rule is "if you have to break out of a for, it's really a while."
bool found = false;
x = 0;
while (!found && x < LIMIT) {
if (something) {
found = true;
...handle the thing...
}
else {
...
}
++x;
}
Does what the first two examples do but in fewer lines. It does divide the initialization, test, and increment of x across three lines, though.
I'd actually dare to suggest consideration of GOTO to break out of loops in such cases:
for (size_t x = 0; x < LIMIT && !found; ++x) {
if (something)
goto found;
else {
...
}
}
// not found
...
return;
found:
...
return;
I consider this form to be both succint and readable. It may do some good in many simple cases (say, when there is no common processing in this function, in both found/unfound cases).
And about the general frowning goto receives, I find it to be a common misinterpretation of Dijkstra's original claims: his arguments favoured structured loop clauses, as for or while, over a primitive loop-via-goto, that still had a lot of presence circa 1968. Even the almighty Knuth eventualy says -
The new morality that I propose may
perhaps be stated thus: "Certain go to
statements which arise in connection with
well-understood transformations are acceptable, provided that the program documentation explains what the transformation was."
Others here occasionaly think the same.
While I disagree that an extra else really makes the 2nd more complicated, I think it's primarily a matter of aesthetics and keeping to your standard.
Personally, I have a probably irrational dislike of breaks and continues, so I'm MUCH more likely to use the found variable.
Also, note that you CAN add the found variable to the 1st implementation and do
if(something)
{
found = true;
break;
}
if you want to avoid the variable overloading problem at the expense of the extra variable, but still want the simple loop terminator...
The former example duplicates the x < LIMIT condition, whereas the latter doesn't.
With the former, if you want to change that condition, you have to remember to do it in two places.
I would prefer a different one altogether:
for (int x = 0; x < LIMIT; ++x) {
if (something) {
// If we found what we're looking for, process it.
...
break;
}
...
}
It seems you have not any trouble you mention about one or the other... ;-)
no duplication of condition, or readability problem
no additional variable
I don't have any references to hand (-1! -1!), but I seem to recall that having multiple exit points (from a function, from a loop) has been shown to cause issues with maintainability (I used to know someone who wrote code for the UK military and it was Verboten to do so). But more importantly, as RichieHindle points out, having a duplicate condition is a Bad Thing, it cries out for introducing bugs by changing one and not the other.
If you weren't using the condition later, I wouldn't be bothered either way. Since you are, the second is the way to go.
This sort of argument has been fought out here before (probably many times) such as in this question.
There are those that will argue that purity of code is all-important and they'll complain bitterly that your first option doesn't have identical post-conditions for all cases.
What I would answer is "Twaddle!". I'm a pragmatist, not a purist. I'm as against too much spaghetti code as much as the next engineer but some of the hideous terminating conditions I've seen in for loops are far worse than using a couple of breaks within your loop.
I will always go for readability of code over "purity" simply because I have to maintain it.
This looks like a place for a while loop. For loops are Syntactic Sugar on top of a While loop anyway. The general rule is that if you have to break out of a For loop, then use a While loop instead.
package com.company;
import java.io.*;
import java.util.Scanner;
public class Main {
// "line.separator" is a system property that is a platform independent and it is one way
// of getting a newline from your environment.
private static String NEWLINE = System.getProperty("line.separator");
public static void main(String[] args) {
// write your code here
boolean itsdone = false;
String userInputFileName;
String FirstName = null;
String LastName = null;
String user_junk;
String userOutputFileName;
String outString;
int Age = -1;
int rint = 0;
int myMAX = 100;
int MyArr2[] = new int[myMAX];
int itemCount = 0;
double average = 0;
double total = 0;
boolean ageDone = false;
Scanner inScan = new Scanner(System.in);
System.out.println("Enter First Name");
FirstName = inScan.next();
System.out.println("Enter Last Name");
LastName = inScan.next();
ageDone = false;
while (!ageDone) {
System.out.println("Enter Your Age");
if (inScan.hasNextInt()) {
Age = inScan.nextInt();
System.out.println(FirstName + " " + LastName + " " + "is " + Age + " Years old");
ageDone = true;
} else {
System.out.println("Your Age Needs to Have an Integer Value... Enter an Integer Value");
user_junk = inScan.next();
ageDone = false;
}
}
try {
File outputFile = new File("firstOutFile.txt");
if (outputFile.createNewFile()){
System.out.println("firstOutFile.txt was created"); // if file was created
}
else {
System.out.println("firstOutFile.txt existed and is being overwritten."); // if file had already existed
}
// --------------------------------
// If the file creation of access permissions to write into it
// are incorrect the program throws an exception
//
if ((outputFile.isFile()|| outputFile.canWrite())){
BufferedWriter fileOut = new BufferedWriter(new FileWriter(outputFile));
fileOut.write("==================================================================");
fileOut.write(NEWLINE + NEWLINE +" You Information is..." + NEWLINE + NEWLINE);
fileOut.write(NEWLINE + FirstName + " " + LastName + " " + Age + NEWLINE);
fileOut.write("==================================================================");
fileOut.close();
}
else {
throw new IOException();
}
} // end of try
catch (IOException e) { // in case for some reason the output file could not be created
System.err.format("IOException: %s%n", e);
e.printStackTrace();
}
} // end main method
}

Algorithm to format text to Pascal or camel casing

Using this question as the base is there an alogrithm or coding example to change some text to Pascal or Camel casing.
For example:
mynameisfred
becomes
Camel: myNameIsFred
Pascal: MyNameIsFred
I found a thread with a bunch of Perl guys arguing the toss on this question over at http://www.perlmonks.org/?node_id=336331.
I hope this isn't too much of a non-answer to the question, but I would say you have a bit of a problem in that it would be a very open-ended algorithm which could have a lot of 'misses' as well as hits. For example, say you inputted:-
camelCase("hithisisatest");
The output could be:-
"hiThisIsATest"
Or:-
"hitHisIsATest"
There's no way the algorithm would know which to prefer. You could add some extra code to specify that you'd prefer more common words, but again misses would occur (Peter Norvig wrote a very small spelling corrector over at http://norvig.com/spell-correct.html which might help algorithm-wise, I wrote a C# implementation if C#'s your language).
I'd agree with Mark and say you'd be better off having an algorithm that takes a delimited input, i.e. this_is_a_test and converts that. That'd be simple to implement, i.e. in pseudocode:-
SetPhraseCase(phrase, CamelOrPascal):
if no delimiters
if camelCase
return lowerFirstLetter(phrase)
else
return capitaliseFirstLetter(phrase)
words = splitOnDelimiter(phrase)
if camelCase
ret = lowerFirstLetter(first word)
else
ret = capitaliseFirstLetter(first word)
for i in 2 to len(words): ret += capitaliseFirstLetter(words[i])
return ret
capitaliseFirstLetter(word):
if len(word) <= 1 return upper(word)
return upper(word[0]) + word[1..len(word)]
lowerFirstLetter(word):
if len(word) <= 1 return lower(word)
return lower(word[0]) + word[1..len(word)]
You could also replace my capitaliseFirstLetter() function with a proper case algorithm if you so wished.
A C# implementation of the above described algorithm is as follows (complete console program with test harness):-
using System;
class Program {
static void Main(string[] args) {
var caseAlgorithm = new CaseAlgorithm('_');
while (true) {
string input = Console.ReadLine();
if (string.IsNullOrEmpty(input)) return;
Console.WriteLine("Input '{0}' in camel case: '{1}', pascal case: '{2}'",
input,
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.CamelCase),
caseAlgorithm.SetPhraseCase(input, CaseAlgorithm.CaseMode.PascalCase));
}
}
}
public class CaseAlgorithm {
public enum CaseMode { PascalCase, CamelCase }
private char delimiterChar;
public CaseAlgorithm(char inDelimiterChar) {
delimiterChar = inDelimiterChar;
}
public string SetPhraseCase(string phrase, CaseMode caseMode) {
// You might want to do some sanity checks here like making sure
// there's no invalid characters, etc.
if (string.IsNullOrEmpty(phrase)) return phrase;
// .Split() will simply return a string[] of size 1 if no delimiter present so
// no need to explicitly check this.
var words = phrase.Split(delimiterChar);
// Set first word accordingly.
string ret = setWordCase(words[0], caseMode);
// If there are other words, set them all to pascal case.
if (words.Length > 1) {
for (int i = 1; i < words.Length; ++i)
ret += setWordCase(words[i], CaseMode.PascalCase);
}
return ret;
}
private string setWordCase(string word, CaseMode caseMode) {
switch (caseMode) {
case CaseMode.CamelCase:
return lowerFirstLetter(word);
case CaseMode.PascalCase:
return capitaliseFirstLetter(word);
default:
throw new NotImplementedException(
string.Format("Case mode '{0}' is not recognised.", caseMode.ToString()));
}
}
private string lowerFirstLetter(string word) {
return char.ToLower(word[0]) + word.Substring(1);
}
private string capitaliseFirstLetter(string word) {
return char.ToUpper(word[0]) + word.Substring(1);
}
}
The only way to do that would be to run each section of the word through a dictionary.
"mynameisfred" is just an array of characters, splitting it up into my Name Is Fred means understanding what the joining of each of those characters means.
You could do it easily if your input was separated in some way, e.g. "my name is fred" or "my_name_is_fred".

Resources