Is there any way to load a text file in Processing while ignoring the case of the file name? I am opening multiple csv files, and some have the extension capitalized, ".CSV" rather than the standard ".csv", which results in errors due to the loadStrings() function being case-sensitive.
String file = sketchPath("test.csv");
String[] array = loadStrings(file);
The above gives the error:
This file is named test.CSV not test.csv. Rename the file or change your code.
I need a way to make the case of the file name or extension not matter. Any thoughts?
Short answer: No. The case-sensitivity of files comes from the operating system itself.
Longer answer: you could create code that just tries to load from multiple places.
Another approach would be to use Java's File class, which has functions for listing various files under a directory, then iterating through them and finding the file that you want. More info is available in the Java reference, but it might look something like this:
String[] array = null;
File dir = new File(sketchPath(""));
for(String file : dir.list()){
if(file.startsWith(yourFileNameHere)){
array = loadStrings(file);
break;
}
}
I haven't tested this code so you might have to play with it a little bit, but that's the basic idea. Of course, you might just want to rename your files ahead of time to avoid this problem.
Why not get the new filename from the error itself? To get the error statement into a String, we need to wrap loadStrings in a try and catch statement.
String[] array;
String file = "heLlo.txt";
try {
//if all is good then we load the file
array = loadStrings(file);
}catch(Exception e){
//otherwise when we get the error, we store it in a String
String error = e.toString();
Then we need to use regular expressions to get the filename from the error statement using match. The regex is /named ([^ +])/ (the filename can be assumed not to have any spaces in it).
String[]matches = match(error, "named ([^ ]+)");
The capture group with be in element 1 in the array containing the matches. So that would be the "real" filename,
String realFile = matches[1];
Finally we load the real file and store it in our array.
array = loadStrings(realFile);
}
Sure, if you want, you can put all of this into a function so that you won't have to use this code again and again every time you load a file. But obviously, it would just be easier if you just renamed or checked your filenames ahead in time.
Related
I am facing a problem importing a flat file into SSIS.
The file is seperated by "|" and has deliminater as ";;". However the deliminator is inconsistent. Sometimes, at the and of the rows, there is only ";" or nothing "". When importing to SSIS I get the result
Column 1 Column 2 Column 3 Column 4 Column 5
a b c d e;|a1|b1|c1|d1|e1
This should instead look like
Column 1 Column 2 Column 3 Column 4 Column 5
a b c d e
a1 b1 c1 d1 e1
And the problem arrises because in the first row there is only one or none ";".
Note this is an example, many of the rows are correct and have ";;" as deliminator. I am only pointing out the problem.
The .csv file would look like
Column 1|Column 2|Column 3|Column 4|Column 5;;
a|b|c|d|e;
a1|b1|c1|d1|e1;;
and should instead look like
Column 1|Column 2|Column 3|Column 4|Column 5;;
a|b|c|d|e;;
a1|b1|c1|d1|e1;;
The data set is very big with almost 600.000 rows and 50 columns.
The first problem I face is when I import the file, since SSIS standard DataType reading is string [DT_STR]. with a length of 50. Since sometimes there are multiple rows with wrong deliminators, I get a very long strings in the last column cell. I Use Visual Studio, and in the Advanced Editor I changed the length to something very big.
Advanced editor in Visual studio were I have changed the length
So the question is, how do I in SSIS and Visual Studio Community separate the values in some cells in one column and split op these into a entire new row (with the already defined column variables).
I have tried manually to find all the cases where there is a error and changed this in the .csv file. After this SSIS works. However this is not a durable solution because I am getting a new file every month.
I have tried reading suggestions as:
Split a single column of data with comma delimiters into multiple columns in SSIS
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/49a764e7-1a6f-4a6f-9c92-2462ffa3add2/regarding-ssis-split-multi-value-column-into-multiple-records?forum=sqlintegrationservices
but their problem is not he same, since they have a column value the replicate, and I want a entire new row.
Thanks for any help,
ss
!! EDIT trying using the answers from J Weezy and R M: !!
I try to create a script task and follow that solution.
In Visual Studio, I add a script task using a Script Component and I choose "Transformation". Under Input Columns I choose all.
After this i direct the flat file source to the script component and run the code. Running the script like this (where the script component doesn't do anything) works.
Then I enter "Edit Script" in the script component, and under public override void Input0_ProcessInputRow(Input0Buffer Row) I enter (using the help from R M):
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
public static string[] SplitLine(string input)
{
Regex lineSplit = new Regex("[0-9]\;$", RegexOptions.Compiled);
List<string> list = new List<string>();
string curr = null;
foreach (Match match in lineSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(';'));
}
return list.ToArray();
}
}
However this doesn't work (I am not even allowed to execute the task).
I have never worked with c# before so everything is new to me. As i understand the code, it search each line to find the pattern where there is numbers in front of only one ";" at the end, hence it will not find those lines which ends with numbers following by ";;" (two ;).
When there is a match, one ";" is added.
Please let me know, what I am not understanding and doing wrong.
Maybe it is also wrong to put the script component after the flat file source, because adding ";" will not result in a new line, which is what I want.
Inconsistent row delimiters is bad data and there really is no way to correct for this in either the connection manager or the data flow. Fixing bad data within the data flow is not what SSIS was designed for. Your best bet is to do one of the two following:
Work with the data source provider to fix the issue on their end
Create a script task to first modify the file to correct the bad data
From there, you will be able to process the file normally in SSIS.
Update 1:
If the only problem is a duplicate delimiter (;;), then read in the row and use the Replace(";;",";"); function. If you have either multiple duplicate or invalid end of row delimiters, then you are better served by using StringBuilder(). For a solution on using StringBuilder(), see the weblink below.
https://stackoverflow.com/a/49949787/4630376
Update 2:
One thing that I just remembered, you will need to adjust for handling only those characters that are outside of double quotes, assuming that double quotes exist within the file as the text qualifier. This is important because without it you will remove any characters that are within quotes, which may be valid data.
I would agree with J Weezy to create a script task to correct the bad data. In the script task you could possibly use regex to deal with the “;” and “;;” issue. The script task may be your only way of dealing with the the “;” and “;;” issue.
While the below code in its current form will not work for your case, it possibly could be changed to work for your case. I have used it to deal with processing a text\csv file to correct formatting issues with each line of data. Note I got this from another post on Stackoverflow.
public static string[] SplitLine(string input)
{
Regex lineSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
List<string> list = new List<string>();
string curr = null;
foreach (Match match in lineSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(','));
}
return list.ToArray();
}
I'm using Photoshop script. I get files from folders. My problem is that when I get the files and place them in an array the array contains hidden files that are in the folder for example ".DS_Store". I can get around this by using:
if (folders[i] != "~/Downloads/start/.DS_Store"){}
But I would like to use something better as I sometimes look in lots of folders and don't know the "~/Downloads/start/" part.
I tried to use indexOf but Photoshop script does not allow indexOf. Does anybody know of a way to check if ".DS_Store" is in the string "~/Downloads/start/.DS_Store" that works in Photoshop script?
I see this answer but I don't know how to use it to test: Photoshop script to ignore .ds_store
For anyone else looking for a solution to this problem, rather than explicitly trying to skip hidden files like .DS_Store, you can use the Folder Object's getFiles() method and pass an expression to build an array of file types you actually want to open. A simple way to use this method is as follows:
// this expression will match strings that end with .jpg, .tif, or .psd and ignore the case
var fileTypes = new RegExp(/\.(jpg|tif|psd)$/i);
// declare our path
var myFolder = new Folder("~/Downloads/start/");
// create array of files utilizing the expression to filter file types
var myFiles = myFolder.getFiles(fileTypes);
// loop through all the files in our array and do something
for (i = 0; i < myFiles.length; i++) {
var fileToOpen = myFiles[i];
open(fileToOpen);
// do stuff...
}
For anybody looking I used the Polyfill found here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/indexOf
indexOf() was added to the ECMA-262 standard in the 5th edition; as
such it may not be present in all browsers. You can work around this
by utilizing the following code at the beginning of your scripts. This
will allow you to use indexOf() when there is still no native support.
This algorithm matches the one specified in ECMA-262, 5th edition,
assuming TypeError and Math.abs() have their original values.
My problem is the same as the one mentioned in this answer. I've been trying to understand the code and this is what I learned:
It is failing in the file parse_xml.cgi, tries to get messages (return $message{$name}) from a file named messages (located in the html_en directory).
The $messages value comes from the method GetMessageHash in file adminprotocol-lib.pl:
sub GetMessageHash
{
return $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"}
}
The $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} is set in the file streamingadminserver.pl:
$ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} = $messages{"en"}
I dont know anything about Perl so I have no idea of what the problem can be, for what I saw $messages{"en"} has the correct value (if I do print($messages{"en"}{'SunStr'} I get the value "Sun")).
However, if I try to do print($ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"}{'SunStr'} I get nothing. Seems like $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} is not set
I tried this simple example and it worked fine:
$ENV{"HELLO"} = "hello";
print($ENV{"HELLO"});
and it works fine, prints "hello".
Any idea of what the problem can be?
Looks like $messages{"en"} is a HashRef: A pointer to some memory address holding a key-value-store. You could even print the associated memory address:
perl -le 'my $hashref = {}; print $hashref;'
HASH(0x1548e78)
0x1548e78 is the address, but it's only valid within the same running process. Re-run the sample command and you'll get different addresses each time.
HASH(0x1548e78) is also just a human-readable representation of the real stored value. Setting $hashref2="HASH(0x1548e78)"; won't create a real reference, just a copy of the human-readable string.
You could easily proof this theory using print $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} in both script.
Data::Dumper is typically used to show the contents of the referenced hash (memory location):
use Data::Dumper;
print Dumper($messages{"en"});
# or
print Dumper($ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"});
This will also show if the pointer/reference could be dereferenced in both scripts.
The solution for your problem is probably passing the value instead of the HashRef:
$ENV{"QTSSADMINSERVER_EN_SUN"} = $messages{"en"}->{SunStr};
Best Practice is using a -> between both keys. The " or ' quotes for the key also optional if the key is a plain word.
But passing everything through environment variables feels wrong. They might not be able to hold references on OSX (I don't know). You might want to extract the string storage to a include file and load it via require.
See http://www.perlmaven.com/ or http://learn.perl.org for more about Perl.
fix code:
$$ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} = $messages{"en"};
sub GetMessageHash
{
return $$ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"};
}
ref:
https://github.com/guangbin79/dss6.0.3-linux-patch
I've been writing a program in R that outputs randomization schemes for a research project I'm working on with a few other people this summer, and I'm done with the majority of it, except for one feature. Part of what I've been doing is making it really user friendly, so that the program will prompt the user for certain pieces of information, and therefore know what needs to be randomized. I have it set up to check every piece of user input to make sure it's a valid input, and give an error message/prompt the user again if it's not. The only thing I can't quite figure out is how to get it to check whether or not the file name for the .csv output is valid. Does anyone know if there is a way to get R to check if a string makes a valid windows file name? Thanks!
These characters aren't allowed: /\:*?"<>|. So warn the user if it contains any of those.
Some other names are also disallowed: COM, AUX, NUL, COM1 to COM9, LPT1 to LPT9.
You probably want to check that the filename is valid using a regular expression. See this other answer for a Java example that should take minimal tweaking to work in R.
https://stackoverflow.com/a/6804755/134830
You may also want to check the filename length (260 characters for maximum portability, though longer names are allowed on some systems).
Finally, in R, if you try to create a file in a directory that doesn't exist, it will still fail, so you need to split the name up into the filename and directory name (using basename and dirname) and try to create the directory first, if necessary.
That said, David Heffernan gives good advice in his comment to let Windows do the wok in deciding whether or not it can create the file: you don't want to erroneously tell the user that a filename is invalid.
You want something a little like this:
nice_file_create <- function(filename)
{
directory_name <- dirname(filename)
if(!file.exists(directory_name))
{
ok <- dir.create(directory_name)
if(!ok)
{
warning("The directory of that path could not be created.")
return(invisible())
}
}
tryCatch(
file.create(filename),
error = function(e)
{
warning("The file could not be created.")
}
)
}
But test it thoroughly first! There are all sorts of edge cases where things can fall over: try UNC network path names, "~", and paths with "." and ".." in them.
I'd suggest that the easiest way to make sure a filename is valid is to use fs::path_sanitize().
It removes control characters, reserved characters, and Windows-reserved filenames, truncating the string at 255 bytes in length.
I created a GUI and used uiimport to import a dataset into matlab workspace, I would like to pass this imported data to another function in matlab...How do I pass this imported dataset into another function....I tried doing diz...but it couldnt pick diz....it doesnt pick the data on the matlab workspace....any ideas??
[file_input, pathname] = uigetfile( ...
{'*.txt', 'Text (*.txt)'; ...
'*.xls', 'Excel (*.xls)'; ...
'*.*', 'All Files (*.*)'}, ...
'Select files');
uiimport(file_input);
M = dlmread(file_input);
X = freed(M);
I think that you need to assign the result of this statement:
uiimport(file_input);
to a variable, like this
dataset = uiimport(file_input);
and then pass that to your next function:
M = dlmread(dataset);
This is a very basic feature of Matlab, which suggests to me that you would find it valuable to read some of the on-line help and some of the documentation for Matlab. When you've done that you'll probably find neater and quicker ways of doing this.
EDIT: Well, #Tim, if all else fails RTFM. So I did, and my previous answer is incorrect. What you need to pass to dlmread is the name of the file to read. So, you either use uiimport or dlmread to read the file, but not both. Which one you use depends on what you are trying to do and on the format of the input file. So, go RTFM and I'll do the same. If you are still having trouble, update your question and provide details of the contents of the file.
In your script you have three ways to read the file. Choose one on them depending on your file format. But first I would combine file name with the path:
file_input = fullfile(pathname,file_input);
I wouldn't use UIIMPORT in a script, since user can change way to read the data, and variable name depends on file name and user.
With DLMREAD you can only read numerical data from the file. You can also skip some number of rows or columns with
M = dlmread(file_input,'\t',1,1);
skipping the first row and one column on the left.
Or you can define a range in kind of Excel style. See the DLMREAD documentation for more details.
The filename you pass to DLMREAD must be a string. Don't pass a file handle or any data. You will get "Filename must be a string", if it's not a string. Easy.
FREAD reads data from a binary file. See the documentation if you really have to do it.
There are many other functions to read the data from file. If you still have problems, show us an example of your file format, so we can suggest the best way to read it.