Alternating information between two textfiles into a third textfile via stringlists - text-files

I have 2 files with information that looks like this
File 1
Show1
Show2
Show3
Show4
Show5
File2
Advert1
Advert2
I want to merge these two files into a 3rd textfile so that the output looks like this
Show1
Advert1
Show2
Advert2
Show3
Advert1
Show4
Advert2
Show5
The amount of information can vary, but file two will never have more lines than file 1
I have tried putting the information of the two text files into two different string lists and then reading them alternatively into a third string list via the indexes, I have tried hopping the information across different textiles and string lists and even arrays, but no matter what I try I always end up with an error, most often a list index error...
My latest attempt at this looks like this
For k := 0 to StringlistAdvert.Count - 1 do
Begin
StringListFinal.Add(StringlistShow[k]);
StringListFinal.Add(StringListAdvert[k]);
StringlistAdvert.Delete(k)
End;
I realise that this was a stupid idea as I'm deleting the items in the string list which I later would have to use in the loop again. My other attempts were not much better
This is a relatively simple program compared to others I have coded before yet the solution to this keeps eluding me, it is for a school project so any help would be greatly appreciated

Related

To build a flow using Power Automate to download linked csv report in gmail

I'm trying to create a flow using Power Automate (which I'm quite new to) that can get the link/URL in an email I receive daily, then download the .csv file that normally a click to the link would do, and then save the file to a given local folder.
An example of the email I get:
Screenshot of the email I get daily
I searched in Power Automate Community and found this insightful LINK post & answer almost solved it. However, after following the steps and built the flow, it kept failing at the Compose step.
Screenshot of the Flow & Error Message
The flow
Error message
Expression used:
substring(body('Html_to_text'),add(indexOf(body('Html_to_text'),'here'),5),sub(indexOf(body('Html_to_text'),'Name'),5))
Seems the expression couldn't really get the URL/Link? I'm not sure and searched but couldn't find any more posts that can help.
Please kindly share all insights on approaches or workarounds that you think may help me solve the problem and truly thanks!
PPPPPPPPisces
We need to breakdown the bits of the function here which needs 3 bits of info
substring(1 text to search, 2 starting position of the text you want, 3 length of text)
For example, if you were trying to return an unknown number from the text dog 4567 bird
Our function would have 3 parts.
body('Html_to_text'), this bit gets the text we are searching for
add(indexOf(body('Html_to_text'),'dog'),4), this bit finds the position in the text 4 characters after the start of the word dog (3 letters for dog + the space)
sub(sub(indexOf(body('Html_to_text'),'bird'),2)),add(indexOf(body('Html_to_text'),'dog'),4)), I've changed the structure of your code here because this part needs to return the length of the URL, not the ending position. So here, we take the position of the end of the URL (position of the word bird minus two spaces) and subtract it from the position of the start of the URL (position of the word dog + 4 spaces) to get the length.
In your HTML to text output, you need to check what the HTML looks like, and search for a word before the URL starts, and a word after the URL starts, and count the exact amount of spaces to reach the URL. You can then put those words and counts into your code.
More generally, when you have a complicated problem that you need to troubleshoot, you can break it down into steps. For example. Rather than putting that big mess of code into a single block, you can make each chunk of the code in its own compose, and then one final compose to bring them all together - that way when you run it you can see what information each bit is giving out, or where it is failing, and experiment from there to discover what is wrong.

Use cut and grep to separate data while printing multiple fields

I want to be able to separate data by weeks, and the week is stated in a specific field on every line and would like to know how to use grep, cut, or anything else that's relevant JUST on that field the week is specified in while still being able to save the rest of the data that's being given to me. I need to be able to pipe the information into it via | because that's how the rest of my program needs it to be.
as the output gets processed, it should look something like this
asset.14548.extension 0
asset.40795.extension 0
asset.98745.extension 1
I want to be able to sort those names by their week number while still being able to keep the asset name in my output because the number of times that asset shows up is counted up, but my problem is I can't make my program smart enough to take just the "1" from the week number but smart enough to ignore the "1" located in the asset name.
UPDATE
The closest answer I found was
grep "^.........................$week" ;
That's good, but it relies on every string being the same length. Is there a way I can have it start from the right instead of the left? Because if so then that'd answer my question.
^ tells grep to start checking from the left and . tells grep to ignore whatever's in that space
I found what I was looking for in some documentation. Anchor matches!
grep "$week$" file
would output this if $week was 0
asset.14548.extension 0
asset.40795.extension 0
I couldn't find my exact question or a closely similar question with a simple answer, so hopefully it helps the next person scratching their head on this.

How to fetch all records using NCBI Batch Entrez

I have over 200,000 accessions in a flat file, which need to retrieve relevant entry from NBCI.
I use Batch Entrez (http://www.ncbi.nlm.nih.gov/sites/batchentrez) to do the job. But encountered several problems:
The initial file was splitted into multiple sub-files, each containing 4000 lines. But it seems Batch Entrez has some size limitation on the returned file. For example: if the first 1000 accessions all have tens of thousands lines which reach the size limitation, then the rest 3000 accessions will be rejected and won't be searched.
One possible solution in my head is to split the file into more sub-files and search individually. However this requires too much manual effort.
So I am just wondering if there is any other solution, or any code could be used.
Thanks in advance
Your problem sounds a good fit for a Bio-star toolkit. This is a solution using BioSmalltalk
| giList gbReader |
giList := (BioObject openFullFileNamed: 'd:\Batch_entrez_1.txt') contents lines.
gbReader := BioNCBIGenBankReader new.
gbReader
genBankRecordsFrom: 'nuccore'
format: #setModeXML
uids: giList.
(BioGBSeqCollection newFromXMLCollection: gbReader searchResults)
collect: [: e | BioParser
tokenizeNcbiXmlBlast: e contents
nodes: #('GBAuthor' 'GBSeq_definition') ]
To execute/debug the script, just select it and a right-click will open the Smalltalk world-menu.
The API automatically split and fetch your accession list (in the script contained in Batch_entrez_1.txt) maintaining the NCBI Entrez post limits to avoid penalities.
The result format is XML (which is an "easy" format to parse or filter specific fields) although it could be any of the retrieval modes supported by Entrez, for example setting #setModeText will answer an ASN.1 representation. Replace 'nuccore' for the database you want to query. Finally choose the interesting fields, in the script I have choosed 'GBAuthor' and 'GBSeq_definition', but you are free to choose anyone of the available nodes.

two file comparison in hdfs

I want to write a map reduce to compare two large file in hdfs. any thoughts how to achieve that. Or if there is nay other way to do the comparison because file size is very large, so thought map-reduce would be an ideal approach.
Thanks for your help.
You may do this in 2 steps.
First make the line number to be the part of text files:
Say initial file looks like:
I am awesome
He is my best friend
Now, convert this to something like this:
1,I am awesome
2,He is my best friend
This may well be done by a MapReduce job itself or some other tool.
2. Now write a MapReduce step where in mapper emit the line number as the key and rest of the actual sentence as value. Then in reducer just compare the values. As and when it doesn't match emit out the line number (the key) and the payloads, whatever you may want here. Also if the count of the values is just 1 then also it is a mismatch.
EDIT: Better approach
Better still what you can do is, just emit the complete line read at a time in the mapper as the key and make the value a number, say 1. So taking my above example your mapper output would be as follows:
< I am awesome,1 >
< He is my best friend,1 >
And in reducer just check the count of values, if it isn't 2, you have a mismatch.
But there is one catch in this approach, if there is a possibility of exactly same line occurring at two different places then instead of checking for the length of values for a given key in reducer, you should be checking it to be a multiple of 2.
One possible solution could be, put line number as count in map job.
There are two files like below :
File 1:
I am here --Line 1
I am awesome -- Line 2
You are my best friend -- Line 3
File 2 also similar kind
Now your map job output should be like , < I am awesome, 2>...
Once you done with Map job for both the file, you have two record(key,value) which has same value to reduce.
At the time of reduce, you can either compare the counter or generate the output as , and so on. If the line is exist in the different location too than out put could be which indicates that this line is mismatch.
I have a solution for comparing files with keys. In your case if you know that your ID's are unique, you could emit the ID's as keys in the map, the entire record as value. Lets say your file has ID,Line1 then emit as key and as value from mapper.
In the shuffle and sort phase, the ID's will be sorted and you will get an iterator with data from both the files. ie, the records from both files with same ID will end up in same iterator.
Then in the reducer, compare both the values from the iterator and if they match move on with next record. Else, if they do not match write them to an output.
I have done this and it worked like a charm.
Scenario - No matching key
If there is no matching ID between two files, they will have only one iterator value.
Scenario 2 - Duplicate keys
If the files have duplicate keys, the iterator will have more than 2 values.
Note: You should compare the values only when the iterator has only 2 values.
**Tip:**The iterator will not have values in order always. To identify the value from a particular file, in the mapper add a small indicator at the end of the line like Line1;file1
Line1;file2
Then on the reducer you will be able to identify which value belongs to which mapper.

Fastest way to skip lines while parsing files in Ruby?

I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.
I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?
Currently I'm using something along the lines of:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
It works, but I just can't help but think it could work better.
I'm very new to Ruby and am interested in learning new ways of doing things in it.
file.lines.drop(500).take(100) # will get you lines 501-600
Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.
PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:
file.lines.select.with_index{|l,i| (501..600) === i}
PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:
enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100
or, if you prefer FP:
file.lines.tap{|enum| 500.times{enum.next}}.take(100)
Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)
I don't know if there is an equivalent way of doing this for lines, but you can use seek or the offset argument on an IO object to "skip" bytes.
See IO#seek, or see IO#open for information on the offset argument.
Sounds like rio might be of help here. It provides you with a lines() method.
You can use IO#readlines, that returns an array with all the lines
IO.readlines(file_in)[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end
or
f = File.new(file_in)
f.readlines[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end

Resources