Merge two text files

Merge two text files - windows

I'm using Windows and Notepad++ to separate file in txt. I have 2 files which is I have to merge it side by side or line by line for my data analysis.
Here is the example:
file1.txt
Abcdefghijk
abcdefghijk
file2.txt
123456
123456
then the output I want is like this:
Abcdefghijk123456
abcdefghijk123456
in the next file or output file. Does anybody here know how to do this?

Your question answered here by TheMadTechnician. Using powershell, you should take both source files (1 and 2) as arrays of lines. Then comes simple cycle, like "merge line x from file1 with line x from file2 as long you have some lines in file1".
Unfortunately its impossible with pure cmd.

#riki.. you could also write a batch program to do this pro grammatically. There should probably be no limit over the number of lines.

It may depend on the number of lines you're having in each files. I suggest to copy paste the same if it is less than 50 lines.
Otherwise,
use some powerful languages like python, c,php etc. And make it run before performing data analysis.

There is a free utility you can download and run on your computer, called txtcollector. I read about it here. I used it because I had a whole folder of files to concatenate. It was a breeze. The only slight imperfection I noticed was that I couldn't paste in the path to the specific folder in the first step (choosing the folder where the files to be concatenated were). However, I could do this when choosing where to save the result.

Related

Archiving differences between time sequence of text files

There is a sensor network from which I download measurements every ten minutes or on demand. Each download is a text file consisting of several lines with a timestamp and values. The name of the text file also contains a timestamp of when the download occured. So as time progresses I collect a lot of text files, which consist a sequence. Because of the physical parameters which the values are taken from, there are little to no differences between adjacent text files.
As I want to archive into a (compressed) file all of the text files that are being downloaded, in an efficient way. So I thought that archiving the differences between adjacent text files is one such way.
I want some ideas to work it out in BASH, using well-known tools like tar and diff. I know also about git, but it is not useful for creating an archive file.
I will try to clarify a bit. A text file is consisting of several lines of the following space-separated format:
timestamp sensor_uuid value_1 ... value_N
Not every line has exactly the same (say N) values, but there is little variation of tokens per line. Also the values themselves have little variation in time. As they come from sensors, and there is a single sensor per line, the number of the lines of the text file depends on how many responses I got for each call. Zero lines is possible.
Finally the text filename takes its own timestamp, a concatenation of an original name with a date time string:
sensors_2019-12-11_153043.txt for today’s 15:30:43 request.
Needless to say that timestamps in the lines of this example filename are usually earlier than the filename’s, or even there are lines and timestamps repeated from text files created before.
So my idea for efficient archiving is putting the first text file into the archive and then putting only the updates, i.e. the differences between two adjacent text files, which eventually will be tracing back to the first one text file actually archived. But at retrieving I need to get a complete text file, as if it was itself archived and not its difference from the past.
Tar takes in the whole text files, and a couple of differences between the text files’ lines are not producing a repeatable pattern suitable for strong compression.

tar command already identifies the repeating patterns and compress them. But if you want to eliminate the parts that are repeated you can use "diff" command with some other simple manipulation of diff output and then redirect all to file.
Let's say we have 2 file "file1.txt" and "file2.txt" you can use this command line to get only the line added from the second file (file2.txt) :
diff -u file1.txt file2.txt | grep -E "^\+" | sed -E 's/^\+//' | grep -v "\+"
then we need just to redirect the output or to the same file (example file2.txt) or in another file and then delete the file2.txt before the tar operation.

How to extract specific lines from a huge data file?

I have a very large data file, about 32GB. The file is made up of about 130k lines, each of which mainly contains numbers, but also has few characters.
The task I need to perform is very clear: I have to extract 20 lines and write them to a new text file.
I know the exact line number for each of the 20 lines that I want to copy.
So the question is: how can I extract the content at a specific line number from the large file? I am on Windows. Is there a tool that can do such sort of operations, or I need to write some code?
If there is no direct way of doing that, I was thinking that a possible approach is to first extract small blocks of the original file (so that each block contains one or more lines to extract) and then use a standard editor to find the lines within each block. In this case, the question would be: how can I split a large file in blocks by line on windows? I use a tool named HJ-Split which works very well with large files, but it can only split by size, not by line.

Install[1] Babun Shell (or Cygwin, but I recommend the Babun), and then use sed command as described here: How can I extract a predetermined range of lines from a text file on Unix?
[1] Installing Babun means actually just unzipping it somewhere, so you don't have to have the Administrator rights on the server.

Is there a way to show the exact place where two files differ using FC command? or any other good way to compare 2 files in windows?

I'm comparing two files with the command:
FC /W file1.txt file2.txt > log.txt
And I'm getting 4000 lines of differences...
But when it shows the difference, it just paste 3~5 lines from each file and I sorta have to compare them myself to find the place where the difference occured.
Is there a way to make it easier to see where the difference is? Maybe something like adding a flag like "HERE" in one of them where it first differ from the other?

I found this online diff very useful and simple to use... very good error displaying showing too

Diff for 3 binary files

I have 3 binary files. Let's call them file1.bin, file2.bin and file3.bin.
file1.bin and file2.bin have some common parts.
file2.bin and file3.bin have some common parts.
I want to find the common parts between file1.bin and file2.bin that are different between file2.bin and file3.bin.
How do you recommend to accomplish that? I have already dumped the binary files to text files using xxd and then did a 3-way diff using vim -d file1.txt file2.txt file3.txt.
However, vim marks a part as changed in all the files even if it has only changed in one file and remains the same in the other two files. I want those special kind of occurrences to be marked differently.

Perhaps you can use the built-in unix diff (I think it is part of OSX), but use the --unchanged-group-format to list the similarities. Do that for file1 and file 2. Then do it for file2 and file3. You can then do a regular diff on the two resulting files.
For an idea of how to get the similarities, have a look at this post.

The tool that I work for (ECMerge) does that. You just have to diff the 3 binary files, it will present equal portions in front of each other, and modified bytes appropriately placed in between. No need to first get an hex dump. You can script in JavaScript to output whatever you like based on the diff results and the bytes in the files (it works also in command line).

Chromium uses bsdiff, then switched to courgette for doing binary diff as explained in their blog here. You might find useful leads from their blog.

Find completely commented files

In a solution with lots of files and projects - how would you find all completely commented files? I assume that every line of code starts with // (EDIT: or is empty) in such files.
I am using VS 2008, C#, ReSharper is available.
I know, normally such files should not exist - that's what a source safe is for ...

To find all files in and under the current directory in which all lines begin with '//':
find . -type f -exec sh -c 'grep -vq "^//" {} || echo {}' \;
Note that this will report empty files.
The argument to grep can easily be expanded to account for whitespace, or generalized to match an arbitrary regex.

There is no way to achieve this with a simple search style with the components you've mentioned. Doing this would require a bit of interpretation on the file but could be done with a fairly simple script.
It sounds like you're looking for files without code though vs. files with all comments. For example if there are 1000 lines where 900 are commented and 100 are blank, it seems to meet your criteria.
The script should be fairly straight forward to write but you would need to look out for the following weird cases
Block comments
if blocks which are always false. For example #if 0
Empty lines

Well, you could write a program (probably a console app) to recursively walk the directory and file tree. Read in all .cs files and check each line to see if its first non-space and non-tab characters are "//". If you wanted to get really fancy, you could count the total lines and the lines with "//" and display the percentages so you could catch files that didn't have absolutely every line commented out. You'll just need to understand a little bit about System.IO to get the files and string functions to look for the characters you are looking for. That should cover it.

This should be close to what you're looking for: http://www.codeproject.com/KB/cs/csharplinecounter.aspx
Look for the method in the project that determines if a line is commented or not, and you can use that to build a count, etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio