concatenating .txt files into a csv file with a tab delimiter - windows

I am trying to concatenate a set of .txt files using windows command line, into a csv file.
so i use
type *.txt > me_new_file.csv
but a the fields of a given row, which is tab delimited, ends up in one column. How do I take advantage of tab separation in the original text file to create a csv file such that fields are aligned in columns correctly, using one or more command lines? I am thinking there might be something like...
type *.txt > me_new_file.csv delim= ' '
but haven't been able to find anything yet.
Thank You for your help. Would also appreciate if someone could direct me to a related answer.

From the command line you'd have a fairly complicated time of it. The Windows cmd.exe command processor is much, much simpler than dash, ash, or bash, et.al.
Best thing would be to concatenate all of your files into the .csv file, open it in a text editor, and do a global find and replace replacing with ,
Be careful that your other data doesn't have any commas in it.

If the source files are tab delimited, then the output file is also tab delimited. Depending on the software you are using, you should be able load the tab delimited data properly.
Suppose you are using Excel. If the output file has a .csv extension, then Excel will default to comma delimited columns when it opens the file. Of course that does not work for you. But if you rename the file to have some other extension like .txt, then when you open it with Excel, it will open a series of dialog boxes where you can specify the format, including tab delimited.
If you want to keep the .csv extension and have Excel automatically open it properly, then you need to transform the data. This can be done very easily with JREPL.BAT - a hybrid JScript/batch utility that performs a regular expression search and replace on text data. JREPL.BAT is pure script that runs natively on any Windows machine from XP onward.
The following encloses each value in quotes, just in case a value contains a comma literal.
type *.txt 2>nul | jrepl "\t" "\q,\q" /x /jendln "$txt='\x22'+$txt+'\x22'" /o output.csv
Beware: Your use of type *.txt will fail if the last line in any of your source .txt files does not end with a newline. In such a case, the first line of the next file will be appended to the last line of the previous file. Not good.
You can solve that problem by processing each file individually in a FOR loop.
(for %F in (*.txt) do jrepl "\t" "\q,\q" /x /jendln "$txt='\x22'+$txt+'\x22'" /f "%F") >output.csv
The above is designed to run on the command line. If used in a batch script, then a few changes are needed:
(for %%F in (*.txt) do call jrepl "\t" "\q,\q" /x /jendln "$txt='\x22'+$txt+'\x22'" /f "%%F") >output.csv
Note: My answer assumes none of the source files contain quotes. If they do contain quotes, then a more complicated search and replace is required. But it still can be done efficiently with JREPL.

Related

MS-DOS how to get output of command as variable

I've written a program that returns keycodes as integers for DOS
but i don't know how to get it's output as a variable.
Note: I'm using MS-DOS 7 / Windows 98, so i can't use FOR /F or SET /P
Does anyone know how i could do that?
A few solutions are described by Eric Pement here. However, for older versions of cmd the author was forced to use external tools.
For example, program tools like STRINGS by Douglas Boling, allows for following code:
echo Greetings! | STRINGS hi=ASK # puts "Greetings!" into %hi%
Same goes for ASET by Richard Breuer:
echo Greetings! | ASET hi=line # puts "Greetings!" into %hi%
One of alternative pure DOS solutions needs the program output to be redirected to the file (named ANSWER.DAT in example below) and then uses a specially prepared batch file. To cite the aforementioned page:
[I]n the batch file we need to be able to issue the command
set MYVAR={the contents of ANSWER.DAT go here}. This is a difficult task, since MS-DOS doesn't offer an easy way to prepend "set MYVAR=" to a file [...]
Normal DOS text files and batch files end all lines with two consecutive bytes: a carriage return (Ctrl-M, hex 0D, or ASCII 13) and a linefeed (Ctrl-J, hex 0A or ASCII 10). In the batch file, you must be able to embed a Ctrl-J in the middle of a line.
Many text editors have a way to do this: via a Ctrl-P followed by Ctrl-J (DOS EDIT with Win95/98, VDE), via a Ctrl-Q prefix (Emacs, PFE), via direct entry with ALT and the numeric keypad (QEdit, Multi-Edit), or via a designated function key (Boxer). Other editors absolutely will not support this (Notepad, Editpad, EDIT from MS-DOS 6.22 or earlier; VIM can insert a linefeed only in binary mode, but not in its normal text mode).
If you can do it, your batch file might look like this:
#echo off
:: assume that the datafile exists already in ANSWER.DAT
echo set myvar=^J | find "set" >PREFIX.DAT
copy PREFIX.DAT+ANSWER.DAT VARIAB.BAT
call VARIAB.BAT
echo Success! The value of myvar is: [%myvar%].
:: erase temp files ...
for %%f in (PREFIX.DAT ANSWER.DAT VARIAB.BAT) do del %%f >NUL
Where you see the ^J on line 3 above, the linefeed should be embedded at that point. Your editor may display it as a square box with an embedded circle.

Renaming large amount of files

i need a script which will rename large amount of files. I got a folder with a lot of files. Every file is named by ID. Then i have a CSV file like this:
oldID;newID
oldID;newID
etc...
Every old and new id is specific and original. I'd like to ask what should be the best way to do it or little help in bash/batch.
The solution for batch is very similar to e0k's solution for bash; you read the file in one line at a time, split the line on semicolons, and rename the file accordingly.
for /f "tokens=1,2 delims=;" %%A in (ids.csv) do ren "%%A" "%%B"
This assumes that your IDs are in a file called ids.csv
If you are using bash (the shell used in world of Linux, UNIX, etc.), you can use the following short script based on this internal field separator answer. This assumes that you are using a semicolon (;) as the delimiter of your "CSV" file and that there is only one such delimiter.
#!/bin/bash
while IFS=';' read -ra names; do
mv "${names[0]}" "${names[1]}";
done < translation.csv
where translation.csv is your file containing the name translations with an oldname;newname format.
If you are instead asking for a batch file (i.e. for Windows, DOS, etc.) then that is a different animal in a different world.
Given that your OS is some unix (like linux), and given that the use of csv files has been your own choice, there might be an easier way to go: mmv can rename many files in one go, using patterns to match original files, and allowing to use the matched strings in the target file names. See http://ss64.com/bash/mmv.html.

How to Find and Replace file content in batch script

For example I have the file sample.txt. This file contains:
1111101
2222203
3333303
44444A1
55555A1
66666A1
Now, I want to replace user defined specific pattern. For example I have other file where use defines what he want to replace with. Example the file name is replace.txt. This file contains 2 Columns, first column for the pattern and the 2nd column for the text to be replace.
Example:
replace.txt
2222203 2222203ADD
55555A1 55555A1SUB
Now, when the batch file has been executed, I would like the file sample.txt to have a contents like this:
1111101
2222203ADD
3333303
44444A1
55555A1SUB
66666A1
Also is it possible to have a "space" as part of the text to be replace(column 2?
You may use FindRepl.bat program that is a Batch-JScript hybrid application that perform these replacements in a very efficient way via regular expressions; it uses JScript language that is standard in all Windows versions from XP on. In the basic use of FindRepl.bat you redirect the input file to it and place two strings as parameters, a "search" string and a "replacement" string. For example:
< sample.txt FindRepl.bat "2222203" "2222203ADD"
Previous command will replace all 2222203 strings in the file by 2222203ADD. In order to perform the replacement of several strings, you may include several alternatives in both the search and replacement strings separated by a pipe character (this is called alternation), and include the /A switch to select this feature; for example:
< sample.txt FindRepl.bat "2222203|55555A1" /A "2222203ADD|55555A1SUB"
If you want to define the set of replacements in a separated file, you just need to load the strings from the file, assemble the alternations in two variables and use they in FindRepl preceded by an equal-sign to indicate that they are variables, not literal strings. If you want that the strings may have spaces, then you must use a different character to separate the search and replace parts in the file. For example, if you use a colon in replace.txt file this way:
2222203:2222203 ADD
55555A1:55555A1 SUB
Then the Batch file below solve your problem:
#echo off
setlocal EnableDelayedExpansion
set "search="
set "replace="
for /F "tokens=1,2 delims=:" %%a in (replace.txt) do (
set "search=!search!|%%a"
set "replace=!replace!|%%b"
)
set "search=!search:~1!"
set "replace=!replace:~1!"
< sample.txt FindRepl.bat =search /A =replace
You may download FindRepl.bat and review an explanation of its use from this site; you must place it in the same folder of previous program or, better yet, in a folder included in PATH variable.

Windows command line/shell - While appending file to another file, how to ignore lines that match a regex?

I'm not familiar with Windows shell. So, let's say my file is like:
DontAppend this line shouldn't be appended
DontAppend this line shouldn't be either
Some lines
more lines
And I'm appending like this:
type file.txt >> AppendHere.txt
This appends the whole file. How do I make it so it skips lines that begin with "DontAppend"?
The command findstr will let you search for lines not containing a string or regular expression so you can use:
findstr /vrc:"^[^A-Za-z0-9]*DontAppend" file.txt >> AppendHere.txt
The /r option says it should use regular expressions and the caret (^) says it should begin with the string.
Edit: added a filter for non alphanumeric chars that may solve the Unicode issues (Unicode files sometimes have a non-printable indicator characters in the beginning).
Either get grep for windows or you could use Windows' own find command
type so.txt|find /v "DontAppend" >> output.txt
The /v option means output lines that dont match your string.
find works for very simple things like this but any more you will need a real filtering tool like grep

Why is this batch file producing extra, unexpected, unwanted characters?

I'm trying to use the following batch script to concatenate some files together:
copy NUL bin\translate.js
for %%f in (source\Libraries\sprintf.js, source\translate-namespace.js, source\util.js, source\translator.js, source\translate.js) do (
type %%f >> bin\translate.js
echo. >> bin\translate.js
)
However, when I do this, an extra character seems to be printed at the end of each file. When I view the file in ASCII, it is interpreted as these three characters:

Why is this happening? What can I do to fix it?
The  looks like a unicode byte order mark. Is it possible to start with files that are stored without the byte mark? I am not aware of any command line commands that can remove the mark.
The DOS copy command works like the UNIX cat command. That is, you can list multiple source files and one destination file, seperated with + signs.
copy source\Libraries\sprintf.js+source\translate-namespace.js bin\translate.js

Resources