I am looking for a quick way to remove null characters from a text file in Windows.
The solution consisting in using Notepad++ and replacing "\0" by nothing in all document (as described here) is not working with very big files. Mine is about 180M and notepad++ is stuck infinitely trying to do the job.
I know that this is an old post, but i think it would be useful for others.
This approach works only if the nulls to delete are on the end of the line (In my case i have lines long 1000+ with 600 characters of null on the end).
Just copy the whole thing, and past it on a new file tab, and automatically notepad will replace all nulls in spaces. Then just save using ctrl+space+s to trim all lines.
Hope this helps
Here is the solution I found for Windows. The idea is to import this solution from UNIX to Windows.
1) Downdload and install CoreUtil which is a collection of basic file, shell and text manipulation utilities for Windows.
In windows 7 exec files will be typically be installed in
c:\Program Files (x86)\GnuWin32\bin
2) remove NULL characters by running this command in cmd window:
tr -d '\000' <input_file >output_file
example:
c:\Program Files (x86)\GnuWin32\bin>tr -d '\000' <putty_measurements_1.log >putty_measurements_2.log
I've been looking for tools to remove trailing NULLs from big files, and the solutions I found didn't work with 1GB+ files or took ages. Therefore I designed my own in C#, which works pretty well, and here it is:
private void CopyContentsUntilNull(string source, bool keepFileDate = true)
{
string destination = $"{Path.GetDirectoryName(source)}{Path.GetFileNameWithoutExtension(source)}_fixed{Path.GetExtension(source)}";
var sourceDate = File.GetLastWriteTime(source);
int bufferSize = 10000;
var buffer = new byte[bufferSize];
int nullCount = 0;
int readCount;
using (var srcStream = File.OpenRead(source))
using (var dstStream = File.OpenWrite(destination))
{
do
{
readCount = srcStream.Read(buffer, 0, bufferSize);
int bytesToCopy = FindTrailingNull(buffer, readCount);
if (bytesToCopy > 0)
{
if (nullCount > 0)
{
var block = Enumerable.Repeat((byte)0, nullCount).ToArray();
dstStream.Write(block, 0, nullCount);
nullCount = 0;
}
dstStream.Write(buffer, 0, bytesToCopy);
}
nullCount += bufferSize - bytesToCopy;
} while (readCount == bufferSize);
}
if (keepFileDate)
File.SetLastWriteTime(destination, sourceDate);
}
private int FindTrailingNull(byte[] buffer, int readCount)
{
for (int i = readCount - 1; i >= 0; i--)
if (buffer[i] != 0)
return i + 1;
return 0;
}
Beware that some files already have NULLs at the end, like zip files (from 2 to 4), so you might need to add some at the end until it works. The same applies to docx, xlsx, etc. as they are also zip files.
Related
This is a problem I have encountered in a tech interview. You have 500,000 files in a directory, which is configured so that they are always in alphabetical order. They have names as such:
Afile
Bfile
File00000001
File00000002
...
You want to rename all the files while preserving their order as such:
File00000001
File00000002
File00000003
...
You can probably see the obvious issue here. If you rename Afile into File00000001, it will collide with the existing file with the same name and also the order will be altered, which is not what we want.
The question here is, how can you devise an algorithm with the most optimal run-time to do the renaming task efficiently?
You cannot go through the files in ascending order and also not in decending order, both could lead to a conflict. Also renaming the files to something else first could potentially lead to a conflict. The goal seems to be to rename each file only once, so you can do something as follows:
private static File dir;
public static void renameFiles(String path) {
dir = new File(path);
File[] files = dir.listFiles();
Map<String, String> map = new HashMap<>();
int number = 1;
for (int i = 0; i < files.length; i++)
if (files[i].isFile())
map.put(files[i].getName(), "File" + pad(number++));
// so we created a map with original file names and the name it should get
for (int i = 0; i < files.length; i++)
if (!files[i].getName().equals(map.get(files[i].getName())) // not same name
renameFile(files[i].getName(), map);
}
private static void renameFile(String file, Map<String, String> map) {
String newName = map.get(file);
if (newName != null) {
if (map.containsKey(newName))
renameFile(newName, map)
File f = new File(dir, file);
f.renameTo(new File(dir, newName));
map.remove(file);
}
}
Time complexity O(n). We recursively go ahead until we don't have a renaming conflict any more and then start renaming from the tail. There won't be a conflict because it is possible that File004 becomes File007 or that File007 becomes File004 but not both, so no circular renaming. If there are too many files then recursion depth might not be sufficient and we have to implement it with a stack, but it is the same principle.
private static void renameFile(String file, Map<String, String> map) {
String newName = map.get(file);
if (newName != null) {
Stack<String> stack = new Stack<>();
do {
stack.push(file);
file = newName;
newName = map.get(file);
} while (newName != null);
while (!stack.empty()) {
file = stack.pop();
File f = new File(dir, file);
f.renameTo(new File(dir, map.get(file)));
map.remove(file);
}
}
}
This will work on Linux, but for Windows you could still have problems, because the file names are not case sensitive. You could store all the keys in the map as lower case and always call toLowerCase() when accessing the map.
for i in {100..1..-1} ; do o=$(printf "File%04d" $i); n=$(printf "File%04d" $((i + 2))); echo mv $o $n; done;
or better readable:
for i in {100..1..-1}
do
o=$(printf "File%04d" $i)
n=$(printf "File%04d" $((i + 2)))
echo mv $o $n
done
FileA and FileB can be renamed by hand.
You have to adapt the size, but for testing, a human number of files seemed more appropriate to me.
Ah, yes, that's bash-syntax; important to notice. And it doesn't mv files yet, only echos the mv-command.
Don't try to run it in parallel. :)
But you could as well move them in opposite, normal order to a new dir, and then move them all back into the old dir, to prevent overriding. This would allow to perform it in parallel.
The for-statement is equivalent to what is elsewhere known as
for (i = 100; i >=1; --i)
and
printf "File%04d" $i
prints a 4 digit i with leading zeros.
I am using an external .txt file to save the incrementing name index for whenever someone "takes a picture" in my app (i.e. image_1.jpg, image_2.jpg, etc...). I am trying to save the data externally so that a user does not overwrite their pictures each time they run the program. However, because of the way that Processing packages its contents for export I cannot both read and write to the same file. It reads the appropriate file located in the apps package contents, however, when it tries to write to that file, it creates a new folder in the same directory as the app itself and writes to a new file with the same name instead.
Essentially, it reads the proper file but refuses to write to it, instead making a copy and writing to that one. The app runs fine but every time you open it and take pictures you overwrite the images you already had.
I have tried naming the "write to" location the explicitly same link as where the exported app stores the data folder inside the package contents (Contents/Resources/Java/data/assets) but this creates a copy of this directory in the same file as the app.
I have also tried excluding the file I am trying to read/write from my data folder when I export the app by changing the read code to ../storage/pictureNumber.txt and then putting this file next to app itself. When I do this the app doesn't launch at all because it is looking in its own data folder for storage and refuses to go outside of itself with ../ . Has anyone had luck both reading from and writing to the same file in an exported processing .app?
Here is the code for the class that is handling the loading and saving of the file:
class Camera {
PImage cameraImage;
int cameraPadding = 10;
int cameraWidth = 60;
int opacity = 0;
int flashDecrementer = 50; //higher number means quicker flash
int pictureName;
Camera() {
String[] pictureIndex = loadStrings("assets/pictureNumber.txt");
pictureName = int(pictureIndex[0]);
cameraImage = loadImage("assets/camera.jpg");
String _pictureName = "" + char(pictureName);
println(pictureName);
}
void display(float mx, float my) {
image(cameraImage, cameraPadding, cameraPadding,
cameraWidth, cameraWidth-cameraWidth/5);
}
boolean isOver(float mx, float my) {
if (mx >= cameraPadding &&
mx <= cameraPadding+cameraWidth &&
my >= cameraPadding &&
my <= cameraPadding+cameraWidth-cameraWidth/5) {
return true;
}
else {
return false;
}
}
void captureImage() {
save("pictures/"+lines.picturePrefix+"_"+pictureName+".jpg");
pictureName++;
String _null = "";
// String _tempPictureName = _null.valueOf(pictureName);
String[] _pictureName = {_null.valueOf(pictureName)};
saveStrings("assets/pictureNumber.txt", _pictureName);
println(_pictureName);
}
void flash() {
fill(255, opacity);
rect(0,0,width,height);
opacity -= flashDecrementer;
if(opacity <= 0) opacity = 0;
}
}
After a lot of searching I found that you have to use savePath() in order to read from a directory outside of the project.jar. The camera class constructor now looks like this:
path = savePath("storage");
println(path);
String[] pictureIndex = loadStrings(path+"/pictureNumber.txt");
pictureName = int(pictureIndex[0]);
cameraImage = loadImage("assets/camera.jpg");
String _pictureName = ""+char(pictureName);
and everything works!
I'd like to read a file line-by-line. I have fgets() working okay, but am not sure what to do if a line is longer than the buffer sizes I've passed to fgets()? And furthermore, since fgets() doesn't seem to be Unicode-aware, and I want to allow UTF-8 files, it might miss line endings and read the whole file, no?
Then I thought I'd use getline(). However, I'm on Mac OS X, and while getline() is specified in /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/stdio.h, it's not in /usr/include/stdio, so gcc doesn't find it in the shell. And it's not particularly portable, obviously, and I'd like the library I'm developing to be generally useful.
So what's the best way to read a file line-by-line in C?
First of all, it's very unlikely that you need to worry about non-standard line terminators like U+2028. Normal text files are not expected to contain them, and the very overwhelming majority of all existing software that reads normal text files doesn't support them. You mention getline() which is available in glibc but not in MacOS's libc, and it would surprise me if getline() did support such fancy line terminators. It's almost a certainly that you can get away with just supporting LF (U+000A) and maybe also CR+LF (U+000D U+000A). To do that, you don't need to care about UTF-8. That's the beauty of UTF-8's ASCII compatibility and is by design.
As for supporting lines that are longer than the buffer you pass to fgets(), you can do this with a little extra logic around fgets. In pseudocode:
while true {
fgets(buffer, size, stream);
dynamically_allocated_string = strdup(buffer);
while the last char (before the terminating NUL) in the buffer is not '\n' {
concatenate the contents of buffer to the dynamically allocated string
/* the current line is not finished. read more of it */
fgets(buffer, size, stream);
}
process the whole line, as found in the dynamically allocated string
}
But again, I think you will find that there's really quite a lot of software out there that simply doesn't bother with that, from software that parses system config files like /etc/passwd to (some) scripting languages. Depending on your use case, it may very well be good enough to use a "big enough" buffer (e.g. 4096 bytes) and declare that you don't support lines longer than that. You can even call it a security feature (a line length limit is protection against resource exhaustion attacks from a crafted input file).
Based on this answer, here's what I've come up with:
#define LINE_BUF_SIZE 1024
char * getline_from(FILE *fp) {
char * line = malloc(LINE_BUF_SIZE), * linep = line;
size_t lenmax = LINE_BUF_SIZE, len = lenmax;
int c;
if(line == NULL)
return NULL;
for(;;) {
c = fgetc(fp);
if(c == EOF)
break;
if(--len == 0) {
len = lenmax;
char * linen = realloc(linep, lenmax *= 2);
if(linen == NULL) {
// Fail.
free(linep);
return NULL;
}
line = linen + (line - linep);
linep = linen;
}
if((*line++ = c) == '\n')
break;
}
*line = '\0';
return linep;
}
To read stdin:
char *line;
while ( line = getline_from(stdin) ) {
// do stuff
free(line);
}
To read some other file, I first open it with fopen():
FILE *fp;
fp = fopen ( filename, "rb" );
if (!fp) {
fprintf(stderr, "Cannot open %s: ", argv[1]);
perror(NULL);
exit(1);
}
char *line;
while ( line = getline_from(fp) ) {
// do stuff
free(line);
}
This works very nicely for me. I'd love to see an alternative that uses fgets() as suggested by #paul-tomblin, but I don't have the energy to figure it out tonight.
The code metrics analyser in Visual Studio, as well as the code metrics power tool, report the number of lines of code in the TestMethod method of the following code as 8.
At the most, I would expect it to report lines of code as 3.
[TestClass]
public class UnitTest1
{
private void Test(out string str)
{
str = null;
}
[TestMethod]
public void TestMethod()
{
var mock = new Mock<UnitTest1>();
string str;
mock.Verify(m => m.Test(out str));
}
}
Can anyone explain why this is the case?
Further info
After a little more digging I've found that removing the out parameter from the Test method and updating the test code causes LOC to be reported as 2, which I believe is correct. The addition of out causes the jump, so it's not because of braces or attributes.
Decompiling the DLL with dotPeek reveals a fair amount of additional code generated because of the out parameter which could be considered 8 LOC, but removing the parameter and decompiling also reveals generated code, which could be considered 5 LOC, so it's not simply a matter of VS counting compiler generated code (which I don't believe it should do anyway).
There are several common definitions of 'Lines Of Code' (LOC). Each tries to bring some sense to what I think of as an almost meaningless metric. For example google of effective lines of code (eLOC).
I think that VS is including the attribute as part of the method declaration and is trying to give eLOC by counting statements and even braces. One possiblity is that 'm => m.Test(out str)' is being counted as a statement.
Consider this:
if (a > 1 &&
b > 2)
{
var result;
result = GetAValue();
return result;
}
and this:
if (a> 1 && b >2)
return GetAValue();
One definition of LOC is to count the lines that have any code. This may even include braces. In such an extreme simplistic definition the count varies hugely on coding style.
eLOC tries to reduce or eliminate the influence of code style. For example, as may the case here, a declaration may be counted as a 'line'. Not justifying it, just explaining.
Consider this:
int varA = 0;
varA = GetAValue();
and this:
var varA = GetAValue();
Two lines or one?
It all comes down to what is the intent. If it is to measure how tall a monitor you need then perhaps use a simple LOC. If the intent is to measure complexity then perhaps counting code statements is better such as eLOC.
If you want to measure complexity then use a complexity metric like cyclomatic complexity. Don't worry about how VS is measuring LOC as, i think, it is a useless metric anyway.
With the tool NDepend we get a # Lines of Code (LoC) of 2 for TestMethod(). (Disclaimer I am one of the developers of this tool). I wrote an article about How do you count your number of Lines Of Code (LOC) ? that is shedding light on what is logical LoC, and how all .NET LoC counting tooling rely on the PDB sequence points technology.
My guess concerning this LoC value of 8 provided by VS metric, is that it includes the LoC of the method generated by the lambda expression + it includes the PDB sequences points related to open/ending braces (which NDepend doesn't). Also lot of gymnastic is done by the compiler to do what is called capturing the local variable str, but this shouldn't impact the #LoC that is inferred from the PDB sequence points.
Btw, I wrote 2 others related LoC articles:
Why is it useful to count the number of Lines Of Code (LOC) ?
Mythical man month : 10 lines per developer day
I was wondering about the Visual Studio line counting and why what I was seeing wasn't what was being reported. So I wrote a small C# console program to count pure lines of code and write the results to a CSV file (see below).
Open a new solution, copy and paste it into the Program.cs file, build the executable, and then you're ready to go. It's a .Net 3.5 application. Copy it into the topmost directory of your code base. Open a command window and run the executable. You get two prompts, first for name of the program/subsystem, and for any extra file types you want to analyze. It then writes the results to a CSV file in the current directory. Nice simple thing for your purposes or to hand to management.
Anyhoo, here it is, FWIW, and YMMV:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace CodeMetricsConsole
{
class Program
{
// Concept here is that the program has a list of file extensions to do line counts on; it
// gets any extra extensions at startup from the user. Then it gets a list of files based on
// each extension in the current directory and all subdirectories. Then it walks through
// each file line by line and will display counts for that file and for that file extension.
// It writes that information to a CSV file in the current directory. It uses regular expressions
// on each line of each file to figure out what it's looking at, and how to count it (i.e. is it
// a line of code, a single or multi line comment, a multi-line string, or a whitespace line).
//
static void Main(string[] args)
{
try
{
Console.WriteLine(); // spacing
// prompt user for subsystem or application name
String userInput_subSystemName;
Console.Write("Enter the name of this application or subsystem (required): ");
userInput_subSystemName = Console.ReadLine();
if (userInput_subSystemName.Length == 0)
{
Console.WriteLine("Application or subsystem name required, exiting.");
return;
}
Console.WriteLine(); // spacing
// prompt user for additional types
String userInput_additionalFileTypes;
Console.WriteLine("Default extensions are asax, css, cs, js, aspx, ascx, master, txt, jsp, java, php, bas");
Console.WriteLine("Enter a comma-separated list of additional file extensions (if any) you wish to analyze");
Console.Write(" --> ");
userInput_additionalFileTypes = Console.ReadLine();
// tell user processing is starting
Console.WriteLine();
Console.WriteLine("Getting LOC counts...");
Console.WriteLine();
// the default file types to analyze - hashset to avoid duplicates if the user supplies extensions
HashSet allowedExtensions = new HashSet { "asax", "css", "cs", "js", "aspx", "ascx", "master", "txt", "jsp", "java", "php", "bas" };
// Add user-supplied types to allowedExtensions if any
String[] additionalFileTypes;
String[] separator = { "," };
if (userInput_additionalFileTypes.Length > 0)
{
// split string into array of additional file types
additionalFileTypes = userInput_additionalFileTypes.Split(separator, StringSplitOptions.RemoveEmptyEntries);
// walk through user-provided file types and append to default file types
foreach (String ext in additionalFileTypes)
{
try
{
allowedExtensions.Add(ext.Trim()); // remove spaces
}
catch (Exception e)
{
Console.WriteLine("Exception: " + e.Message);
}
}
}
// summary file to write to
String summaryFile = userInput_subSystemName + "_Summary.csv";
String path = Directory.GetCurrentDirectory();
String pathAndFile = path + Path.DirectorySeparatorChar + summaryFile;
// regexes for the different line possibilities
Regex oneLineComment = new Regex(#"^\s*//"); // match whitespace to two slashes
Regex startBlockComment = new Regex(#"^\s*/\*.*"); // match whitespace to /*
Regex whiteSpaceOnly = new Regex(#"^\s*$"); // match whitespace only
Regex code = new Regex(#"\S*"); // match anything but whitespace
Regex endBlockComment = new Regex(#".*\*/"); // match anything and */ - only used after block comment detected
Regex oneLineBlockComment = new Regex(#"^\s*/\*.*\*/.*"); // match whitespace to /* ... */
Regex multiLineStringStart = new Regex("^[^\"]*#\".*"); // match #" - don't match "#"
Regex multiLineStringEnd = new Regex("^.*\".*"); // match double quotes - only used after multi line string start detected
Regex oneLineMLString = new Regex("^.*#\".*\""); // match #"..."
Regex vbaComment = new Regex(#"^\s*'"); // match whitespace to single quote
// Uncomment these two lines to test your regex with the function testRegex() below
//new Program().testRegex(oneLineMLString);
//return;
FileStream fs = null;
String line = null;
int codeLineCount = 0;
int commentLineCount = 0;
int wsLineCount = 0;
int multiLineStringCount = 0;
int fileCodeLineCount = 0;
int fileCommentLineCount = 0;
int fileWsLineCount = 0;
int fileMultiLineStringCount = 0;
Boolean inBlockComment = false;
Boolean inMultiLineString = false;
try
{
// write to summary CSV file, overwrite if exists, don't append
using (StreamWriter outFile = new StreamWriter(pathAndFile, false))
{
// outFile header
outFile.WriteLine("filename, codeLineCount, commentLineCount, wsLineCount, mlsLineCount");
// walk through files with specified extensions
foreach (String allowed_extension in allowedExtensions)
{
String extension = "*." + allowed_extension;
// reset accumulating values for extension
codeLineCount = 0;
commentLineCount = 0;
wsLineCount = 0;
multiLineStringCount = 0;
// Get all files in current directory and subdirectories with specified extension
String[] fileList = Directory.GetFiles(Directory.GetCurrentDirectory(), extension, SearchOption.AllDirectories);
// walk through all files of this type
for (int i = 0; i < fileList.Length; i++)
{
// reset values for this file
fileCodeLineCount = 0;
fileCommentLineCount = 0;
fileWsLineCount = 0;
fileMultiLineStringCount = 0;
inBlockComment = false;
inMultiLineString = false;
try
{
// open file
fs = new FileStream(fileList[i], FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (TextReader tr = new StreamReader(fs))
{
// walk through lines in file
while ((line = tr.ReadLine()) != null)
{
if (inBlockComment)
{
if (whiteSpaceOnly.IsMatch(line))
{
fileWsLineCount++;
}
else
{
fileCommentLineCount++;
}
if (endBlockComment.IsMatch(line)) inBlockComment = false;
}
else if (inMultiLineString)
{
fileMultiLineStringCount++;
if (multiLineStringEnd.IsMatch(line)) inMultiLineString = false;
}
else
{
// not in a block comment or multi-line string
if (oneLineComment.IsMatch(line))
{
fileCommentLineCount++;
}
else if (oneLineBlockComment.IsMatch(line))
{
fileCommentLineCount++;
}
else if ((startBlockComment.IsMatch(line)) && (!(oneLineBlockComment.IsMatch(line))))
{
fileCommentLineCount++;
inBlockComment = true;
}
else if (whiteSpaceOnly.IsMatch(line))
{
fileWsLineCount++;
}
else if (oneLineMLString.IsMatch(line))
{
fileCodeLineCount++;
}
else if ((multiLineStringStart.IsMatch(line)) && (!(oneLineMLString.IsMatch(line))))
{
fileCodeLineCount++;
inMultiLineString = true;
}
else if ((vbaComment.IsMatch(line)) && (allowed_extension.Equals("txt") || allowed_extension.Equals("bas"))
{
fileCommentLineCount++;
}
else
{
// none of the above, thus it is a code line
fileCodeLineCount++;
}
}
} // while
outFile.WriteLine(fileList[i] + ", " + fileCodeLineCount + ", " + fileCommentLineCount + ", " + fileWsLineCount + ", " + fileMultiLineStringCount);
fs.Close();
fs = null;
} // using
}
finally
{
if (fs != null) fs.Dispose();
}
// update accumulating values
codeLineCount = codeLineCount + fileCodeLineCount;
commentLineCount = commentLineCount + fileCommentLineCount;
wsLineCount = wsLineCount + fileWsLineCount;
multiLineStringCount = multiLineStringCount + fileMultiLineStringCount;
} // for (specific file)
outFile.WriteLine("Summary for: " + extension + ", " + codeLineCount + ", " + commentLineCount + ", " + wsLineCount + ", " + multiLineStringCount);
} // foreach (all files with specified extension)
} // using summary file streamwriter
Console.WriteLine("Analysis complete, file is: " + pathAndFile);
} // try block
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
}
catch (Exception e2)
{
Console.WriteLine("Error: " + e2.Message);
}
} // main
// local testing function for debugging purposes
private void testRegex(Regex rx)
{
String test = " asdfasd asdf #\" adf ++--// /*\" ";
if (rx.IsMatch(test))
{
Console.WriteLine(" -->| " + rx.ToString() + " | matched: " + test);
}
else
{
Console.WriteLine("No match");
}
}
} // class
} // namespace
Here's how it works:
the program has a set of the file extensions you want to analyze.
It walks through each extension in the set, getting all files of that type in the current and all subdirectories.
It selects each file, goes through each line of that file, compares each line to a regex to figure out what it's looking at, and increments the line count after it figures out what it's looking at.
If a line isn't whitespace, a single or multi-line comment, or a multi-line string, it counts it as a line of code. It reports all the counts for each of those types of lines (code, comments, whitespace, multi-line strings) and writes them to a CSV file. No need to explain why Visual Studio did or did not count something as a line of code.
Yes, there are three loops embedded in each other (O(n-cubed) O_O ) but it's just a simple, standalone developer tool, and the biggest code base I've run it on was about 350K lines and it took like 10 seconds to run on a Core i7.
Edit: Just ran it on the Firefox 12 code base, about 4.3 million lines (3.3M code, 1M comments), about 21K files, with an AMD Phenom processor - took 7 minutes, watched the performance tab in Task Manager, no stress. FYI.
My attitude is if I wrote it to be part of an instruction fed to a compiler, it's a line of code and should be counted.
It can easily be customized to ignore or count whatever you want (brackets, namespaces, the includes at the top of the file, etc). Just add the regex, test it with the function that's right there below the regexes, then update the if statement with that regex.
I want to remove extra spaces in a number of filepaths as the filepaths under scrutiny are rather long.
For example, I have this filepath:
C:\TEST Filepath\TEST Filepath\TEST Filepath\..\File.doc
and would like it to become:
C:\TEST Filepath\TEST Filepath\..\File.doc
I have hundreds of filepaths which are like this and would like to know if there is a quick and efficient way to remove the extra space from them?
Many thanks.
Tried with a small set on a spare disk. Please be careful.
void RemoveExtraSpace(string sourceDir)
{
var filePaths = Directory.GetDirectories(sourceDir, "*.*", SearchOption.AllDirectories);
Regex rx = new Regex(#"\s\s+");
for(int x = filePaths.Length - 1; x >= 0; x--)
{
string cur = filePaths[x];
DirectoryInfo di = new DirectoryInfo(cur);
if(rx.IsMatch(di.Name))
{
string result = Regex.Replace(di.Name, #"\s\s+", " ");
result = Path.Combine(di.Parent.FullName, result);
Directory.Move(di.FullName, result);
}
}
}