Removing last characters in a string with LINQ - linq

I wanna read a file that eventually will be a matrix but I am bugged about a detail with this code below.
Text file looks like this:
[[15,7,18,11,19,10,14,16,8,2,3,6,5,1,17,12,9,4,13
[17,15,9,8,11,13,7,6,5,1,3,16,12,19,10,2,4,14,18],
[...],
[...]] // Ignore the new lines, in my text file is one single line. I did multiple lines here for readability
So the problem is that it ends with ]] and the end I get two empty list entries which is really bugging me. I wonder if I can remove them during projection, I tried using SkipLast() but the result empty. Any ideas?
var readFile = File.ReadLines("RawTestcases.txt")
.Select(x => x.Replace("[", string.Empty)
.Split("]"))
.ToList();
Ok I actually just put the ]] on a new line and did a SkipLast(1) before projection, but can I do it if is one line?

That looks like Json, so use the right tool:
string file = File.ReadAllText("RawTestcases.txt");
string[][] matrix = JsonConvert.DeserializeObject<string[][]>(file)
.Where(array => array.Length > 0)
.ToArray();
If you need them as integers you can directly use the right type:
int[][] matrix = JsonConvert.DeserializeObject<int[][]>(file)
.Where(array => array.Length > 0)
.ToArray();

Related

What is the most efficient way to replace a list of words without touching html attributes?

I absolutely disagree that this question is a duplicate! I am asking for an efficiency way to replace hundreds of words at once. This is an algorithm question! All the provided links are about to replace one word. Should I repeat that expensive operation hundreds of times? I'm sure that there are better ways as a suffix tree where I sort out html while building that tree. I removed that regex tag since for no good reason you are focusing on that part.
I want to translate a given set of words (more then 100) and translate them. My first idea was to use a simple regular expression that works better then expected. As sample:
const input = "I like strawberry cheese cake with apple juice"
const en2de = {
apple: "Apfel",
strawberry: "Erdbeere",
banana: "Banane",
/* ... */}
input.replace(new RegExp(Object.keys(en2de).join("|"), "gi"), match => en2de[match.toLowerCase()])
This works fine on the first view. However it become strange if you words which contains each other like "pineapple" that would return "pineApfel" which is totally nonsense. So I was thinking about checking word boundaries and similar things. While playing around I created this test case:
Apple is a company
That created the output:
Apfel is a company.
The translation is wrong, which is somehow tolerable, but the link is broken. That must not happen.
So I was thinking about extend the regex to check if there is a quote before. I know well that html parsing with regex is a bad idea, but I thought that this should work anyway. In the end I gave up and was looking for solutions of other devs and found on Stack Overflow a couple of questions, all without answers, so it seems to be a hard problem (or my search skills are bad).
So I went two steps back and was thinking to implement that myself with a parser or something like that. However since I have multiple inputs and I need to ignore the case I was thinking what the best way is.
Right now I think to build a dictionary with pointers to the position of the words. I would store the dict in lower case so that should be fast, I could also skip all words with the wrong prefix etc to get my matches. In the end I would replace the words from the end to the beginning to avoid breaking the indices. But is that way efficiency? Is there a better way to achieve that?
While my sample is in JavaScript the solution must not be in JS as long the solution doesn't include dozens of dependencies which cannot be translated easy to JS.
TL;DR:
I want to replace multiple words by other words in a case insensitive way without breaking html.
You may try a treeWalker and replace the text inplace.
To find words you may tokenize your text, lower case your words and map them.
const mapText = (dic, s) => {
return s.replace(/[a-zA-Z-_]+/g, w => {
return dic[w.toLowerCase()] || w
})
}
const dic = {
'grodzi': 'grodzila',
'laaaa': 'forever',
}
const treeWalker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT
)
// skip body node
let currentNode = treeWalker.nextNode()
while(currentNode) {
const newS = mapText(dic, currentNode.data)
currentNode.data = newS
currentNode = treeWalker.nextNode()
}
p {background:#eeeeee;}
<p>
grodzi
LAAAA
</p>
The link stay untouched.
However mapping each word in an other language is bound to fail (be it missing representation of some word, humour/irony, or simply grammar construct). For this matter (which is a hard problem on its own) you may rely on some tools to translate data for you (neural networks, api(s), ...)
Here is my current work in progress solution of a suffix tree (or at least how I interpreted it). I'm building a dictionary with all words, which are not inside of a tag, with their position. After sorting the dict I replace them all. This works for me without handling html at all.
function suffixTree(input) {
const dict = new Map()
let start = 0
let insideTag = false
// define word borders
const borders = ' .,<"\'(){}\r\n\t'.split('')
// build dictionary
for (let end = 0; end <= input.length; end++) {
const c = input[end]
if (c === '<') insideTag = true
if (c === '>') {
insideTag = false
start = end + 1
continue
}
if (insideTag && c !== '<') continue
if (borders.indexOf(c) >= 0) {
if(start !== end) {
const word = input.substring(start, end).toLowerCase()
const entry = dict.get(word) || []
// save the word and its position in an array so when the word is
// used multiple times that we can use this list
entry.push([start, end])
dict.set(word, entry)
}
start = end + 1
}
}
// last word handling
const word = input.substring(start).toLowerCase()
const entry = dict.get(word) || []
entry.push([start, input.length])
dict.set(word, entry)
// create a list of replace operations, we would break the
// indices if we do that directly
const words = Object.keys(en2de)
const replacements = []
words.forEach(word => {
(dict.get(word) || []).forEach(match => {
// [0] is start, [1] is end, [2] is the replacement
replacements.push([match[0], match[1], en2de[word]])
})
})
// converting the input to a char array and replacing the found ranges.
// beginning from the end and replace the ranges with the replacement
let output = [...input]
replacements.sort((a, b) => b[0] - a[0])
replacements.forEach(match => {
output.splice(match[0], match[1] - match[0], match[2])
})
return output.join('')
}
Feel free to leave a comment how this can be improved.

The sum of the numbers at the beginning of the line

I have a file : site.log
19 www.mysite.org
300 cod.mysite.org
100 www.newfile.com
199 py.mysite.org
45 python.mysite.org/als
45 mysite.org/als/4/d
I would like to go through all the poems containing the string mysite.org and get the number from the beginning of the text and sum all numbers in front of the given text
File.ReadLines(filePath).Where(x => x.Contains("mysite.org")).SelectMany(...));
You are close. After the Where use Select to project only the beginning of the line:
var result = File.ReadLines(filePath)
.Where(x => x.Contains("mysite.org"))
.Select(x => int.Parse(x.Split()[0]))
.Sum();
Notice that the parsing to int might fail of the prefix is not an integer. You can use TryParse instead. If you will want so you can have a look here: https://stackoverflow.com/a/46150189/6400526
I Split by a space and TryParse first value, if success, I take this value, if not, I take 0
var result = File.ReadLines("")
.Where(x => x.Contains("mysite.org"))
.Select(x => {int temp; return int.TryParse(x.Split(' ')[0], out temp) ? temp : 0;})
.Sum();;

Sort files based upon variable length of numeric digits in their name C#

I'm writing a program to Monitor Blender rendering projects.
But I have a problem of sorting the output of blender (a 3d program).
it generates files in numeric format. Or in some text then ending in a numeric format.
The problem here is that it numbers like:
someprojectname-0001.png
someprojectname-0010.png
someprojectname-10104.png
someprojectname-10105.png
someprojectname-9999.png
There could be some text before the numbers but the problem seams that so far any sorting I tried, sees 9999 higher than 10104 (some how the problem is extra digits, and the number of digits isn't fixed either.
it should output
someprojectname-0001.png
someprojectname-0010.png
someprojectname-9999.png
someprojectname-10104.png
essentially concat all numbers (ignore any text), then order by number, so i tried:
d = new DirectoryInfo(renderpath);
// FileInfo[] files = d.GetFiles( rendertpye).OrderBy(p => p.Name).ToArray();
FileInfo[] files = d.GetFiles(rendertpye).OrderByDescending(o => string.IsNullOrEmpty(o.Name)).ThenBy(o =>
{
int result;
if (Int32.TryParse(o.Name, out result))
{ return result; }
else
{ return Int32.MaxValue; }
}).ToArray();
I also tried Numeric beforeAlpha sort ( https://www.codeproject.com/Tips/543502/Numeric-before-alpha-sort-in-LINQ )
And I tried ( https://www.dotnetperls.com/alphanumeric-sorting).
The problem though neither they can handle the non fixed numeric length of blenders file numbering.
update
My goal here is not to put rules on how people save their BlendFiles, the program requires an array that retrieves all saved files, and sorts based upon the number in the filename, which can be of variable digit size.
I tried it with linq, but I begin to doubt it can be done directly with Linq.
Maybe though it can if we create a int list based upon filename, then order that list and use that one to order the original text named list.
Solved it, created a struct (which is not ideal for this), on which i added a int counter for each fileinfo entry. And some regex to scrub text from filenames, the function keeps original names, and fileInfo but returns them sorted by the numbers they contain.
A future extension might be to also sort on last modified property
that might be easier. and a slight improvement might be to only find numbers in the end of the name as project2-frame33.blend would become problematic.
probably some clever regex could do that.
struct numericfile
{
public int filenr; public FileInfo file;
}
public List<FileInfo> SortedbyNumber(List<FileInfo> files)
{
List<numericfile> filesnrs = new List<numericfile>();
foreach (FileInfo file in files)
{
numericfile nf = new numericfile();
nf.filenr = Convert.ToInt32(Regex.Replace(file.Name, "[^0-9]+", string.Empty));
nf.file = file;
filesnrs.Add(nf);
}
filesnrs.Sort((s1, s2) => s1.filenr.CompareTo(s2.filenr));
files.Clear();
foreach (numericfile fnr in filesnrs)
{
files.Add(fnr.file);
}
return files;
}

How to use Linq Aggregate Query to print line sequence number alongwith some data from the List<string>?

I have a simple List like
List<string> Test=new List<string>() {"A","B","C"}
I am using a simple Linq.Aggregate as under to get a string from the elements of this List as under:-
string Output= Test.Aggregate((First, Next) => First + "\r\n" + Next);
This gives me the result like(new line separated):
A
B
C
However,i want result with a sequence number on each line,ie like this:-
1)A
2)B
3)C
How do i do this using linq?
Select has an overload that will give you the index of the element to work with so you could do Select((x,i)=>String.Format("{0}){1}", i+1,x)). Or in full:
string output= Test.Select((x,i)=>String.Format("{0}){1}", i+1,x)).Aggregate((First, Next) => First + "\r\n" + Next);
One thing worth mentioning though is that string concatenation in a loop (and in the Aggregate counts as in a loop) is generally considered a bad idea for performance reasons. You should consider using alternative methods such as a StringBuilder:
string output = Test
.Aggregate (new StringBuilder(), (sb, x) => sb.AppendFormat("{0}){1}\r\n", lineCount++, x))
.ToString();
I wouldn't use Aggregate here, just a Select to get the index and join the resulting list back together to make a single string, for example:
var output = string.Join("\r\n",
Test.Select((s, index) => $"{index+1}){s}"));

Array Intersection Based Upon A String Comparison

I've got two arrays where I want to find all the elements in Array0 where the full string from Array1 is contained in the string of Array0. Here is the scenario:
I've got a string array that contains the full path of all the xml files in a certain directory. I then get a list of locations and only want to return the subset of xml file paths where the filename of the xml file is the loc id.
So, my Array0 has something like:
c:\some\directory\6011044.xml
c:\some\directory\6028393.xml
c:\some\directory\6039938.xml
c:\some\directory\6028833.xml
And my Array1 has:
6011044
6028833
...and I only want to have the results from Array0 where the filepath string contains string from Array1.
Here is what I've got...
filesToLoad = (from f in Directory.GetFiles(Server.MapPath("App_Data"), "*.xml")
where f.Contains(from l in locs select l.CdsCode.ToString())
select f).ToArray();
...but I get the following compiler error...
Argument '1': cannot convert from 'System.Collections.Generic.IEnumerable<string>' to 'string'
...which I can understand from an English standpoint, but do not know how to resolve.
Am I coming at the from the wrong angle?
Am I missing just one piece?
EDIT
Here is what I've changed it to:
filesToLoad = (Directory.GetFiles(Server.MapPath("App_Data"), "*.xml"))
.Where(path => locs.Any(l => path.Contains(l.CdsCode.ToString()))
).ToArray();
...but this still gets me all the .xml files even though one of them is not in my locs entity collection. What did I put in the wrong place?
Obviously I'm missing the main concept so perhaps a little explanation as to what each piece is doing would be helpful too?
Edit 2
See Mark's comment below. The answer to my problem, was me. I had one record in my locs collection that had a zero for the CDS value and thus was matching all records in my xml collection. If only I could find a way to code without myself, then I'd be the perfect developer!
You're missing Any:
string[] result = paths.Where(x => tests.Any(y => x.Contains(y))).ToArray();
you can also join them
var filesToLoad = (from f in Directory.GetFiles(Server.MapPath("App_Data"), "*.xml")
from l in locs
where f.Contains(l.CdsCode.ToString())
select f).ToArray();

Resources