I have about 6GB of various text files, the files have many lines but each record is missing its commas so all the data is in 1 record. I want to create a batch file where I can add commas at the appropriate places in each "record". I'm hoping to add commas so I can then import this into a database.
For example the file would be structured like this.
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
Each field has a unique length which I know, and it's static between all files.
For example
ID - 10 characters
NAME - 40 characters
ADDRESS - 30 characters
etc
This will need to be run on an ongoing basis as new files come in so I'm hoping for something I can give a non technical person they can just run.
Any quick way to do this in a bat file?
Using your example above. Note we count the characters starting from 0, then tell the set to use letters starting at a certain count, counting the word length from there. See bottom for layout.
#echo off
setlocal enabledelayedexpansion
for /F "tokens=* delims=" %%a in (filename.txt) do (
set str=%%a
set id=!str:~0,2!
set na=!str:~2,4!
set add=!str:~6,7!
set ph=!str:~13,5!
set em=!str:~18,5!
set etc=!str:~23,3!
echo !id!,!na!,!add!,!ph!,!em!,!etc!
)
Characters assigned in a string as:
I D n a m e A D D R E S S p h o n e E M A I L e t c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
ID starts at Character 0 and is 2 characters, including itself :~0,2
name starts at character 2 and is 4 characters long :~2,4
etc..
For many files just add another loop as a main loop or give a list of files.
Based on your provided example, here is a quick powershell command, (despite no tag):
(GC 'Report.txt' | Select -First 1).Insert(10,',').Insert(51,',').Insert(82,',') > 'Fixed.txt'
It takes the first line of Report.txt…
After 10 characters insert ,(0 + 10 = 10) + 1
After another 40 characters insert ,(11 + 40 = 51) + 1
After another 30 characters insert ,(52 + 30 = 82) + 1
etc.
…then outputs the line complete with insertions to Fixed.txt
Just continue the .Insert(<number>,',') sequence for your other fixed width column sizes and ensure you've changed the filenames to suit your circumstances.
Edit
The following as an update to your comment and subsequent edit should work for all lines in the file.
GC 'Report.txt' | % {($_).Insert(10,',').Insert(51,',').Insert(82,',')} | Out-File 'Fixed.txt'
I use F95/90 and IBM compiler. I am trying to extract the numerical values from block and write in a file. I am facing a strange error in the output which I cannot understand. Every time I execute the program it skips the loop between 'Beta' and 'END'. I am trying to read and store the values.
The number of lines inside the Alpha- and Beta loops are not fixed. So a simple 'do loop' is of no use to me. I tried the 'do while' loop and also 'if-else' but it still skips the 'Beta' part.
Alpha Singles Amplitudes
15 3 23 4 -0.186952
15 3 26 4 0.599918
15 3 31 4 0.105048
15 3 23 4 0.186952
Beta Singles Amplitudes
15 3 23 4 0.186952
15 3 26 4 -0.599918
15 3 31 4 -0.105048
15 3 23 4 -0.186952
END `
The simple short code is :
program test_read
implicit none
integer::nop,a,b,c,d,e,i,j,k,l,m,ios
double precision::r,t,rr
character::dummy*300
character*15::du1,du2,du3
open (unit=10, file="1.txt", status='old',form='formatted')
100 read(10,'(a100)')dummy
if (dummy(1:3)=='END') goto 200
if(dummy(2:14)=='Alpha Singles') then
i=0
160 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
do while(du1.ne.' Bet')
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'AS',du1,b,du2,c,du3,d,du4,e,r
goto 160
end do
elseif (dummy(2:14)=='Beta Singles') then
170 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
if((du1=='END'))then
stop
else
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'BS',du1,b,du2,c,du3,d,du4,e,r
goto 170
end if
end if
goto 100
200 print*,'This is the end'
end program test_read
Your program never gets out of the loop which checks for Beta because when your while loop exits, it has already read the line with Beta. It then goes to 100 which reads the next line after Beta, so you never actually see Beta Singles. Try the following
character(len=2):: tag
read(10,'(a100)')dummy
do while (dummy(1:3).ne.'END')
if (dummy(2:14)=='Alpha Singles') then
tag = 'AS'
else if (dummy(2:14)=='Beta Singles') then
tag = 'BS'
else
read(dummy,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')tag,du1,b,du2,c,du3,d,du4,e,r
end if
read(10, '(a100)') dummy
end do
print*,'This is the end'
Following on from my query last week reading badly formed csv in R - mismatched quotes, these same CSV files also have embedded control characters such as the ASCII Substitute Character which is decimal 26 or 0x1A. Unfortunately readLines() seems to truncate the line at this character, so I am having difficulty in matching quotes - apart from losing the later fields in these lines!
I have tried to readBin() but I can't get it to read this file. I'm afraid I can't cleanly read this into R to give you an example and I'm having difficulty in creating these in R. Sorry not to be able to demonstrate with a clean example. Thoughts?
Update
Now I'm confused - when I use the code
h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(as.integer(k1), 26, 65))), '",99')
identical(readLines(textConnection(h3)), h3)
I get TRUE which I find quite surprising!
Update 2
h3
[1] "1,34,44.4,\" HIJK\032A \",99"
> writeLines(h3, 'h3.txt')
> h3a <- readLines('h3.txt')
Warning message:
In readLines("h3.txt") : incomplete final line found on 'h3.txt'
> h3a
[1] "1,34,44.4,\" HIJK"
So readLines() reacts differently when coming from a textConnection() and it silently truncates at the SUB character.
I would be surprised if it makes a difference but I'm on 2.15.2 on Windows-64.
Update 3
Some vague success in solving this...
zb <- file('h3.txt', "rb")
tmp <- readBin(zb, raw(), size=1, n=400) # raw is always of size =1
nchar(tmp)
# [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
close(zb)
tmp
# [1] 31 2c 33 34 2c 34 34 2e 34 2c 22 20 48 49 4a 4b 1a 41 20 22 2c 39 39 0d 0a
rawToChar(tmp)
# [1] "1,34,44.4,\" HIJK\032A \",99\r\n"
i.e. if I read in the file as binary and convert to character() afterwards it seems to work... this will be tedious for large CSV files...
Could there be a bug in R in incorrectly detecting a Control-Z as end of file on windows??
I think I've figured out a solution - because there appears to be a problem reading a Control-Z in the middle of a file on Windows, we need to read the file in binary / raw mode.
fnam <- 'h3.txt'
tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(dfnam)$size, 100))=1
tmp.char <- rawToChar(tmp.bin)
txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
txt
[1] "1,34,44.4,\" HIJK\032A \",99"
Update
The following better answer was posted by Duncan Murdoch to R-Devel refer. Converting it into a function I get:
sReadLines <- function(fnam) {
f <- file(fnam, "rb")
res <- readLines(f)
close(f)
res
}
I also ran into this problem when I used read.csv with a csv file that contained the SUB or CTRL-Z in the middle of the file.
Solved it with the readr package (if your file is comma separated)
library(readr)
read_csv("h3.txt")
If you have a ; as a separator, then use:
library(readr)
read_csv2("h3.txt")
I have written the following function that find if a pixel belongs to an image in matlab.
At the beginning, I wanted to test it as if a number in a set belongs to a vector like the following:
function traverse_pixels(img)
for i:1:length(img)
c(i) = img(i)
end
But, when I run the following commands for example, I get the error shown at the end:
>> A = [ 34 565 456 535 34 54 5 5 4532 434 2345 234 32332434];
>> traverse_pixels(A);
??? Error: File: traverse_pixels.m Line: 2 Column: 6
Unexpected MATLAB operator.
Why is that? How can I fix the problem?
Thanks.
There is a syntax error in the head of your for loop, it's supposed to be:
for i = 1:length(img)
Also, to check if an array contains a specific value you could use:
A = [1 2 3]
if sum(A==2)>0
disp('there is at least one 2 in A')
end
This should be faster since no for loop is included.
for i = 1:length(image)
silly error, not : , it is =
I have a program dnapars
I execute the program from command line as following:
./dnapars
The program then prompts me some message as a user menu from where I have to select a series of options in the order R U Y R. And then I copy the output file (outfile) in another result file.
I wrote the following script, but the execution hangs where it is supposed to execute the R option
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
do
cp ../../../EditDistanceRandomParsimonator/RAxML_parsimonyTree.test4D20RI$i.0 intree
./dnapars
R <----- This doesn't execute
U
Y
R
cp outfile result$i
done
How can I make the script to run the options R U Y R under the dnapars program ?
You may be able to use a shell here document, for example:
./dnapars <<EndOfOptions
R
U
Y
R
EndOfOptions
This will generally work if the program reads its options from stdin.