I need to concatenate some relatively large text files, and would prefer to do this via the command line. Unfortunately I only have Windows, and cannot install new software.
type file1.txt file2.txt > out.txt
allows me to almost get what I want, but I don't want the 1st line of file2.txt to be included in out.txt.
I have noticed that more has the +n option to specify a starting line, but I haven't managed to combine these to get the result I want. I'm aware that this may not be possible in Windows, and I can always edit out.txt by hand to get rid of the line, but is there a simple way of doing it from the command line?
more +2 file2.txt > temp
type temp file1.txt > out.txt
or you can use copy. See copy /? for more.
copy /b temp+file1.txt out.txt
I use this, and it works well for me:
TYPE \\Server\Share\Folder\*.csv >> C:\Folder\ConcatenatedFile.csv
Of course, before every run, you have to DELETE C:\Folder\ConcatenatedFile.csv
The only issue is that if all files have headers, then it will be repeated in all files.
I don't have enough reputation points to comment on the recommendation to use *.csv >> ConcatenatedFile.csv, but I can add a warning:
If you create ConcatenatedFile.csv file in the same directory that you are using for concatenation it will be added to itself.
Use the FOR command to echo a file line by line, and with the 'skip' option to miss a number of starting lines...
FOR /F "skip=1" %i in (file2.txt) do #echo %i
You could redirect the output of a batch file, containing something like...
FOR /F %%i in (file1.txt) do #echo %%i
FOR /F "skip=1" %%i in (file2.txt) do #echo %%i
Note the double % when a FOR variable is used within a batch file.
I would put this in a comment to ghostdog74, except my rep is too low, so here goes.
more +2 file2.txt > temp
This code will actually ignore rows 1 and 2 of the file. OP wants to keep all rows from the first file (to maintain the header row), and then exclude the first row (presumably the same header row) on the second file, so to exclude only the header row OP should use more +1.
type temp file1.txt > out.txt
It is unclear what order results from this code. Is temp appended to file1.txt (as desired), or is file1.txt appended to temp (undesired as the header row would be buried in the middle of the resulting file).
In addition, these operations take a REALLY LONG TIME with large files (e.g. 300MB)
Here's how to do this:
(type file1.txt && more +1 file2.txt) > out.txt
In powershell:
Get-Content file1.txt | Out-File out.txt
Get-Content file2.txt | Select-Object -Skip 1 | Out-File -Append out.txt
I know you said that you couldn't install any software, but I'm not sure how tight that restriction is. Anyway, I had the same issue (trying to concatenate two files with presumably the same headers) and I thought I'd provide an alternative answer for others who arrive at this page, since it worked just great for me.
After trying a whole bunch of commands in windows and being severely frustrated, and also trying all sorts of graphical editors that promised to be able to open large files, but then couldn't, I finally got back to my Linux roots and opened my Cygwin prompt. Two commands:
cp file1.csv out.csv
tail -n+2 file2.csv >> out.csv
For file1.csv 800MB and file2.csv 400MB, those two commands took under 5 seconds on my machine. In a Cygwin prompt, no less. I thought Linux commands were supposed to be slow in Cygwin but that approach took far less effort and was way easier than any windows approach I could find.
You can also simply try this
type file2.txt >> file1.txt
It will append the content of file2.txt at the end of file1.txt
If you need original file1.txt, take a backup beforehand. Or you can do this
type file1.txt > out.txt
type file2.txt >> out.txt
If you want to have a line break at the end of the first file, you can try the following command before appending.
type file1.txt > out.txt
printf "\n" >> out.txt
type file2.txt >> out.txt
The help for copy explains that wildcards can be used to concatenate multiple files into one.
For example, to copy all .txt files in the current folder that start with "abc" into a single file named xyz.txt:
copy abc*.txt xyz.txt
more +2 file1.txt > type > out.txt && type file2.txt > out.txt
This takes Test.txt with headers and appends Test1.txt and Test2.txt and writes results to Testresult.txt file after stripping headers from second and third files respectively:
type C:\Test.txt > C:\Testresult.txt && more +1 C:\Test1.txt >> C:\Testresult.txt && more +1 C:\Test2.txt >> C:\Testresult.txt
Related
This should be fairly easy and I understand the logic of it but my shell scripting is rather beginner.
Basically, I have a directory with a hundred files or so, and I want to copy their filenames to a .txt file. One line per filename. I know I'd want a loop for all the files in the directory, copy name to text file, repeat until there are no more files but not sure how to write that out in a .sh file.
(Also, just out of pure curiosity, how would I omit the file extensions? In this case, they're all the same extension but potentially in the future they may not be, and while I need the extensions right now I may not in the future. I'm assuming there might be a flag for this or would I use '.' as a delimiter to stop copying at that point?)
Thanks in advance!
It could be very easy with ls:
ls -1 [directory] > filename.txt
Note the flag -1, it tells ls to output filenames one per line regardless what the output is. Usually ls acts like ls -C if the stdout is a tty, and acts like ls -1 otherwise. Explicitly specifying this flag forces ls to output one per line.
If you want to do it manually, this is an example:
#!/bin/sh
cd [directory]
for i in *
do
echo "$i"
done > filename.txt
To omit extensions, you can use string replacement:
echo "${i%.*}"
For the first part, you can do
ls <dirname> > files.txt
I alias ls to ls -F, so to avoid any extraneous characters in the output, you would do
printf "%s\n" * > ../filename.txt
I put the output txt file in a different directory so the list of files does not include "filename.txt"
If you want to omit file extensions:
printf "%s\n" * | sed 's/\.[^.]*$//' > ../filename.txt
I have a folder filled with ~300 files. They are named in this form username#mail.com.pdf. I need about 40 of them, and I have a list of usernames (saved in a file called names.txt). Each username is one line in the file. I need about 40 of these files, and would like to copy over the files I need into a new folder that has only the ones I need.
Where the file names.txt has as its first line the username only:
(eg, eternalmothra), the PDF file I want to copy over is named eternalmothra#mail.com.pdf.
while read p; do
ls | grep $p > file_names.txt
done <names.txt
This seems like it should read from the list, and for each line turns username into username#mail.com.pdf. Unfortunately, it seems like only the last one is saved to file_names.txt.
The second part of this is to copy all the files over:
while read p; do
mv $p foldername
done <file_names.txt
(I haven't tried that second part yet because the first part isn't working).
I'm doing all this with Cygwin, by the way.
1) What is wrong with the first script that it won't copy everything over?
2) If I get that to work, will the second script correctly copy them over? (Actually, I think it's preferable if they just get copied, not moved over).
Edit:
I would like to add that I figured out how to read lines from a txt file from here: Looping through content of a file in bash
Solution from comment: Your problem is just, that echo a > b is overwriting file, while echo a >> b is appending to file, so replace
ls | grep $p > file_names.txt
with
ls | grep $p >> file_names.txt
There might be more efficient solutions if the task runs everyday, but for a one-shot of 300 files your script is good.
Assuming you don't have file names with newlines in them (in which case your original approach would not have a chance of working anyway), try this.
printf '%s\n' * | grep -f names.txt | xargs cp -t foldername
The printf is necessary to work around the various issues with ls; passing the list of all the file names to grep in one go produces a list of all the matches, one per line; and passing that to xargs cp performs the copying. (To move instead of copy, use mv instead of cp, obviously; both support the -t option so as to make it convenient to run them under xargs.) The function of xargs is to convert standard input into arguments to the program you run as the argument to xargs.
In order to simplify my work I usually do this:
for FILE in ./*.txt;
do ID=`echo ${FILE} | sed 's/^.*\///'`;
bin/Tool ${FILE} > ${ID}_output.txt;
done
Hence process loops over all *.txt files.
Now I have two file groups - my Tool uses two inputs (-a & -b). Is there any command to run Tool for every FILE_A over every FILE_B and name the output file as a combination of both them?
I imagine it should look like something like this:
for FILE_A in ./filesA/*.txt;
do for FILE_B in ./filesB/*.txt;
bin/Tool -a ${FILE_A} -b ${FILE_B} > output.txt;
done
So the process would run number of *.txt in filesA over number of *.txt in filesB.
And also the naming issue which I even don't know where to put in...
Hope it is clear what I am asking. Never had to do such task before and a command line would be really helpful.
Looking forward!
NEWNAME="${FILE_A##*/}_${FILE_B##*/}_output.txt"
Using a for loop, I can merge all of the files in a directory that end with *.txt:
for filename in *.txt; do
cat "${filename}"
echo
done > output.txt
After doing this, I will run output.txt through various scripts, in which the text will be changed considerably. After that, I want to split the files, at the same places at which they were merged, into different files (output01.txt, output02.txt, etc.).
How can I split the files at the same place they were merged?
This cannot be based on line number, because the scripts will add \t in places.
I think a solution that might work is to place "#########" at the end of each of the initial *.txt files before merging them, but I don't know how to get BASH to split the files again at that mark.
Instead of that for loop for concatenating, you can just use cat *.txt.
Anyway, why don't you just perform the scripts on each file independently within the for loop?
If you really want to combine and re-segregate, you can use:
for filename in *.txt; do
cat "${filename}"
echo "#####"
done > output.txt
# Pass output.txt through whatever
awk 'BEGIN { fileno = 1; file = sprintf("output%02d.txt", fileno) };
{ if($1 ~ /#####/) { fileno++;
file = sprintf("output%02d.txt", fileno);
next }
else print >file
}' output.txt
The canonical answer would be:
tar c *.txt > output.txt
You could split/unmerge them exactly by doing
tar xf output.txt # in the current directory
tar x -C /tmp/splitfiles/ -f output.txt
Now if you really want to do stuff like that in a loop and extract to stdout/a pipe, you could:
while read fname < <(tar tf output.txt)
do
# extract named to pipe
tar -xOf output.txt "$fname" | myprogram "$fname"
done
However, that would possibly not be very efficient. You could consider just doing
while read fname < <(tar x -v -C /tmp/splitfiles/ -f output.txt)
do
# handle extracted file
myprogram "/tmp/splitfiles/$fname"
unlink "/tmp/splitfiles/$fname" # drop the temp file
done
This will be completely asynchronous (so if extraction or even the transmission of the archive is slow, the first files can already be processed while waiting for more data to arrive).
See also my other answer https://stackoverflow.com/a/8341221/85371 (look for the older answer part, since that question was changed to be very specific later)
As Fredrik wrote here you can use csplit to split your merged file.
What is the easiest way to add a text to the beginning of another text file in Command Line (Windows)?
echo "my line" > newFile.txt
type myOriginalFile.txt >> newFile.txt
type newFile.txt > myOriginalFile.txt
Untested.
Double >> means 'append'
Another variation on the theme.
(echo New Line 1) >file.txt.new
type file.txt >>file.txt.new
move /y file.txt.new file.txt
Advantages over other posted answers:
minimal number of steps
no temp file left over
parentheses prevents unwanted trailing space in first line
the move command "instantaneously" replaces the old version with the new
the original file remains unchanged until the last instant when it is replaced
the final content is only written once - potentially important if the file is huge.
The following sequence will do what you want, adding the line "new first line" to the file.txt file.
ren file.txt temp.txt
echo.new first line>file.txt
type temp.txt >>file.txt
del temp.txt
Note the structure of the echo. "echo." allows you to put spaces at the beginning of the line if necessary and abutting the ">" redirection character ensures there's no trailing spaces (unless you want them, of course).
The following will also work:
echo "my line" > newFile.txt
type newfile.txt myOriginalFile.txt > myOriginalFile.txt
In the first line you are writing my line into newfile.txt. In the second line you are replacing the text from myOriginalFile.txt by overwriting it with the text from newfile.txt and myOriginalFile.txt, creating a new myOriginalFile.txt that contains both.
If you want to process bigger files the accepted solution becomes pretty slow.
Then it's faster to use copy with a '+'
echo "my line" > newFile.txt
copy newFile.txt+myOriginalFile.txt combinedFile.txt
move /Y combinedFile.txt myOriginalFile.txt
del newFile.txt
If the first part of the line can be sorted, such as date/time then use the SORT /R command to put the most recent entries at the top of the file.
The following will place a date/time stamp in the form of "YYYY-MM-DD HH:DD:SS AM/PM" to the start of each line:
echo %DATE:~-4%-%DATE:~7,2%-%DATE:~4,2% %TIME% "my text line" >> myOriginalFile.txt
sort /R myOriginalFile.txt /O myOriginalFile.txt
However, as files grow very large this method (as well as the ones above) become slow. Consider sorting only when required instead of with each entry - or use another scripting/programming language
Via PowerShell;
#("NEW Line 1","NEW Line 2") + (Get-Content "C:\Data\TestFile.txt") | Set-Content "C:\data\TestFile.txt"