System.IO - Use Directory/LINQ to Replace Some Files in a List with Others - linq

I get a list of files using the System.IO Directory thusly:
var srcFiles = Directory.GetFiles(remotePath);
I also have a comma separated list of strings, each of which I want to check for NON-existence in the names of the above files. For example:
string[] filterOn = EndPoint.FileNameDoesNotContainFilter.Split(',').ToArray();
gives me the following array:
filterOn contains ["GoodFile", "EvenBetterFile"]
Now, the files without either of the two strings would replace all the files currently in the srcFiles list above (or a new list, if that makes more sense). I am trying to do this with LINQ, but can't quite get there. How is it done?
EDIT: The answer from #dvo gives me the correct files, however, sometimes the filter strings are contained in the remotePath passed in.
A typical path/file: C:\TEMP\APPS\AMS\Services\sc0189v\APPS\GoodFile\test.txt.
As you can see, "GoodFile" is in the path, but not the filename. Yet this file should be rejected. I suppose I'm looking for something in System.IO.Directory that might help. Not sure, really.

Try this:
var noMatch = srcFiles.Where(file => !filterOn.Any(filter => file.ToUpperInvariant().Contains(filter.ToUpperInvariant()))).ToList();
This will create a new list where the file name does not contain any filter. This ignores case by converting both the file name and filter to uppercase. You can store the results in the same list if you replace noMatch with srcFiles. You can make it case sensitive if you remove .ToUpperInvariant() from both parts of the Any clause. You can capture the filter matches by removing the ! from the Where clause.
Hope this helps!

Related

gcs bucket how to get objects using filters [golang]

Is there a way to get list of objects which starts and ends with specific criteria
as example:
a/b/c/id1.json
a/b/c/id2.json
a/b/c/id3.json
a/c/id1.json
and we wanna query for
Prefix: "a/",
EndOffset: "id1.json"
expected output should be:
a/b/c/id1.json
and we wanna filter out other options and we don't know what the b folder name would be.
So:
a is always a
b is random uniq string
c is always c
and we always want the specific json.
As i am tying to achieve this with:
query := &storage.Query{
Prefix: "a/",
//StartOffset: "",
EndOffset: "id1.json",
//Delimiter:
}
query.SetAttrSelection([]string{"Name"})
or
Prefix: "c/id1.json",
Delimiter: "/",
IncludeTrailingDelimiter: true,
and for some reason i am getting in return all of those files.
And of course i would like to limit the results as much as possible for better performances.
Maybe there is a way to use some regex in Prefix definition ?
like a/*/c/id1.json
Thanks
----------------------- ========= Edited ========= -----------------------
Please note that this is already implemented by me storage_list_files_with_prefix-go and do not work as i would like to have it. So the main question is HOWTO make this filtering working with the example I am showing.
Key points:
Cloud Storage Buckets do not have directories.
The namespace is flat.
Object names are just strings.
The slash / character which is often used to separate directory names in file systems is just a character in an Bucket object name. The slash has no significance but can be used as a delimiter.
You can specify a prefix and a delimiter to reduce the returned object list.
Cloud Storage does not support regex expressions.
The asterisk * is a character and not a wildcard.
Summary:
You must implement additional filtering in your code.
List the objects in a bucket using a prefix filter

how to make Google sheets REGEXREPLACE return empty string if it fails?

I'm using the REGEXREPLACE function in google sheets to extract some text contained in skus. If it doesn't find a text, it seems to return the entire original string. Is it possible to make it return an empty string instead? Having problems with this because it doesn't actually error out, so using iserror doesn't seem to work.
Example: my sku SHOULD contain 5 separate groups delimited by the underscore character '_'. in this example it is missing the last group, so it returns the entire original string.
LDSS0107_SS-WH_5
=REGEXREPLACE($A3,"[^_]+_[^_]+_[^_]+_[^_]+_(.*)","$1")
Fails to find the fifth capture group, that is correct... but I need it to give me an empty string when it fails... presently gives me the whole original string. Any ideas??
Perhaps the solution would be to add the missing groups:
=REGEXREPLACE($A1&REPT("_ ",4-(LEN($A1)-LEN(SUBSTITUTE($A1,"_","")))),"[^_]+_[^_]+_[^_]+_[^_]+_(.*)","$1")
This returns space as result for missing group. If you don't want to, use TRIM function.

How do I filter file names out of a SQLite dump?

I'm trying to filter out all file names from an SQLite text dump using Ruby. I'm not very handy/familiar with regex and need a way to read, and write to a file, another dump of image files that are within the SQLite dump. I can filter out everything except stuff like this:
VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);
and this:
src="/images/9/94/folder%2FGraph.JPG"
I can't figure out the easiest way to filter through this. I've tried using split and other functions, but instead of splitting the string into an array by the character specified, it just removed the character.
You should be able to use .gsub('%2', ' ') the %2 with a space, while quoted, it should be fine.
Split does remove the character that is being split, though. So you may not want to do that, or if you do, you may want to use the Array#join method with the argument of the character you split with to put it back in.
I want to 'extract' the file name from the statements above. Say I have src="/images/9/94/folder%2FGraph.JPG", I want folder%2FGraph.JPG to be extracted out.
If you want to extract what is inside the src parameter:
foo = 'src="/images/9/94/folder%2FGraph.JPG"'
foo[/^src="(.+)"/, 1]
=> "/images/9/94/folder%2FGraph.JPG"
That returns a string without the surrounding parenthesis.
Here's how to do the first one:
bar = "VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);"
bar.split(',')[4][1..-2]
=> "/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG"
Not everything in programming is a regex problem. Somethings, actually, in my opinion, most things, are not candidates for a pattern. For instance, the first example could be written:
foo.split('=')[1][1..-2]
and the second:
bar[/'(.+?)'/, 1]
The idea is to use whichever is most clean and clear and understandable.
If all you want is the filename, then use a method designed to return only the filename.
Use one of the above and pass its output to File.basename. Filename.basename returns only the filename and extension.

Filtering in Pig

I am trying to do a filter for a relation in pig, I need all those records in which there is an occurrence of third field in the first field string.
I tried with:
(Assume my source relation is SRC)
Filtered= FILTER SRC BY $0 matches 'CONCAT(".*",$2,".")';
DUMP Filtered;
There is no syntax error but I am not getting any output for Filtered.
Pig's CONCAT takes only two arguments. See the documentation at http://pig.apache.org/docs/r0.10.0/func.html#concat
I'm not sure why it isn't complaining at runtime, but you're going to want to string together two CONCAT statements, like
CONCAT(".*", CONCAT($2, "."))
to get the string you want.
I don't think that the CONCAT is resolving to what you're expecting, more so the matches is probably trying to match the entire unevalutated string CONCAT(".*",$2,"."), which is why you are not getting any results
Can you break this out into two statements, the first where you create a field containing the evalulated content of the CONCAT, and a second to perform the matches operation:
TMP = FOREACH SRC GENERATE $0, CONCAT(".*",$2,".");
Filtered = FILTER TMP BY $0 matches $1;
DUMP Filtered;
Or something like that (completely untested)
I think you just have some syntax errors
As noted by A. Leistra, CONCAT only takes two arguments.
"." at the end should be ".*" if you want double sided wildcards
FILTER statement prefers parenthesis around the argument
Pig has a lot of weird edge cases involving double quotes, so just use single when you can
Filtered= FILTER SRC BY ($0 matches CONCAT('.*', CONCAT($2, '.*')));
Try this,
Filtered= FILTER SRC BY $0 matches '(.*)$2(.*)';
DUMP Filtered;
If the third field contains the first field then that results will be filtered.
This is done by using Regex.

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I donĀ“t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources