I'm trying to read that string in a formatted file: " PARAMETER (NE_M=10,NL_M=12)".
I want to replace the 12 by 11.
I tried to read the sting like this :
integer :: i
character(len=30) :: text
10 format(6x,24a,2i) text,i
read(text_data,10) text, i
write(6,100) text, 11
But it doesn't work. Any idea?
The reading and writing you have written will not do what you want. The input statement you presented for reading is 33 characters wide, and your formatting only accounts for 32 of those characters and your write will not contain the closing ).
Consider the following code, if you do not need to capture the 12 in the input.
program test
character(len=30) :: text
101 format(a30, i2, ')')
open(unit=10, file='testinput.f')
read(10,101) text
write(*,101) text, 11
end program
and the input (with 6 leading spaces) in file testinput.f:
PARAMETER (NE_M=10,NL_M=12)
when run, produces the output:
% ./test
PARAMETER (NE_M=10,NL_M=11)
This code was compiled and tested with GNU gfortran 4.8.2.
assuming test_data is a unit number of an open file and 100 is a
format statement number.
integer :: i
character(len=30) :: text
10 format(6x,a24,i2)
read(text_data,10) text, i
write(6,100) text(:24), i
fixing those other issues:
integer :: i
character(len=30) :: text
open(unit=20,file='filename')
10 format(6x,a24,i2)
read(20,10) text, i
write(6,10) text(:24), i
Related
I need to convert my oracle NUMBER into a string with this format: 999,999
I'm trying with TO_CHAR but I'm not having the correct output.
This is the expected behavior:
9 ---> 9,000
9,88 --> 9,880
0 --> 0,000
-1 --> -1,000
80 --> 80,000
Just use a format mask with TO_CHAR - if you are using , as decimal character:
TO_CHAR(-1, 'FM999G999G990D000') -> -1,000
TO_CHAR(9.88, 'FM999G999G990D000') -> 9,880
...
And make sure your format mask is long enough to fit for all possible length of the numeric string.
I have about 6GB of various text files, the files have many lines but each record is missing its commas so all the data is in 1 record. I want to create a batch file where I can add commas at the appropriate places in each "record". I'm hoping to add commas so I can then import this into a database.
For example the file would be structured like this.
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
Each field has a unique length which I know, and it's static between all files.
For example
ID - 10 characters
NAME - 40 characters
ADDRESS - 30 characters
etc
This will need to be run on an ongoing basis as new files come in so I'm hoping for something I can give a non technical person they can just run.
Any quick way to do this in a bat file?
Using your example above. Note we count the characters starting from 0, then tell the set to use letters starting at a certain count, counting the word length from there. See bottom for layout.
#echo off
setlocal enabledelayedexpansion
for /F "tokens=* delims=" %%a in (filename.txt) do (
set str=%%a
set id=!str:~0,2!
set na=!str:~2,4!
set add=!str:~6,7!
set ph=!str:~13,5!
set em=!str:~18,5!
set etc=!str:~23,3!
echo !id!,!na!,!add!,!ph!,!em!,!etc!
)
Characters assigned in a string as:
I D n a m e A D D R E S S p h o n e E M A I L e t c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
ID starts at Character 0 and is 2 characters, including itself :~0,2
name starts at character 2 and is 4 characters long :~2,4
etc..
For many files just add another loop as a main loop or give a list of files.
Based on your provided example, here is a quick powershell command, (despite no tag):
(GC 'Report.txt' | Select -First 1).Insert(10,',').Insert(51,',').Insert(82,',') > 'Fixed.txt'
It takes the first line of Report.txt…
After 10 characters insert ,(0 + 10 = 10) + 1
After another 40 characters insert ,(11 + 40 = 51) + 1
After another 30 characters insert ,(52 + 30 = 82) + 1
etc.
…then outputs the line complete with insertions to Fixed.txt
Just continue the .Insert(<number>,',') sequence for your other fixed width column sizes and ensure you've changed the filenames to suit your circumstances.
Edit
The following as an update to your comment and subsequent edit should work for all lines in the file.
GC 'Report.txt' | % {($_).Insert(10,',').Insert(51,',').Insert(82,',')} | Out-File 'Fixed.txt'
I'm trying to print the first 5 lines from a set of large (>500MB) csv files into small headers in order to inspect the content more easily.
I'm using Ruby code to do this but am getting each line padded out with extra Chinese characters, like this:
week_num type ID location total_qty A_qty B_qty count㌀㐀ऀ猀漀爀琀愀戀氀攀ऀ㤀㜀ऀ䐀䔀开伀渀氀礀ऀ㔀㐀㜀㈀ ㌀ऀ㔀㐀㜀㈀ ㌀ऀ ऀ㤀㈀㔀㌀ഀ
44 small 14 A 907859 907859 0 550360㐀ऀ猀漀爀琀愀戀氀攀ऀ㐀㈀ऀ䐀䔀开伀渀氀礀ऀ㌀ ㈀㜀㐀ऀ㌀ ㈀
The first few lines of input file are like so:
week_num type ID location total_qty A_qty B_qty count
34 small 197 A 547203 547203 0 91253
44 small 14 A 907859 907859 0 550360
41 small 421 A 302174 302174 0 18198
The strange characters appear to be Line 1 and Line 3 of the data.
Here's my Ruby code:
num_lines=ARGV[0]
fh = File.open(file_in,"r")
fw = File.open(file_out,"w")
until (line=fh.gets).nil? or num_lines==0
fw.puts line if outflag
num_lines = num_lines-1
end
Any idea what's going on and what I can do to simply stop at the line end character?
Looking at input/output files in hex (useful suggestion by #user1934428)
Input file - each character looks to be two bytes.
Output file - notice the NULL (00) between each single byte character...
Ruby version 1.9.1
The problem is an encoding mismatch which is happening because the encoding is not explicitly specified in the read and write parts of the code. Read the input csv as a binary file "rb" with utf-16le encoding. Write the output in the same format.
num_lines=ARGV[0]
# ****** Specifying the right encodings <<<< this is the key
fh = File.open(file_in,"rb:utf-16le")
fw = File.open(file_out,"wb:utf-16le")
until (line=fh.gets).nil? or num_lines==0
fw.puts line
num_lines = num_lines-1
end
Useful references:
Working with encodings in Ruby 1.9
CSV encodings
Determining the encoding of a CSV file
Following on from my query last week reading badly formed csv in R - mismatched quotes, these same CSV files also have embedded control characters such as the ASCII Substitute Character which is decimal 26 or 0x1A. Unfortunately readLines() seems to truncate the line at this character, so I am having difficulty in matching quotes - apart from losing the later fields in these lines!
I have tried to readBin() but I can't get it to read this file. I'm afraid I can't cleanly read this into R to give you an example and I'm having difficulty in creating these in R. Sorry not to be able to demonstrate with a clean example. Thoughts?
Update
Now I'm confused - when I use the code
h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(as.integer(k1), 26, 65))), '",99')
identical(readLines(textConnection(h3)), h3)
I get TRUE which I find quite surprising!
Update 2
h3
[1] "1,34,44.4,\" HIJK\032A \",99"
> writeLines(h3, 'h3.txt')
> h3a <- readLines('h3.txt')
Warning message:
In readLines("h3.txt") : incomplete final line found on 'h3.txt'
> h3a
[1] "1,34,44.4,\" HIJK"
So readLines() reacts differently when coming from a textConnection() and it silently truncates at the SUB character.
I would be surprised if it makes a difference but I'm on 2.15.2 on Windows-64.
Update 3
Some vague success in solving this...
zb <- file('h3.txt', "rb")
tmp <- readBin(zb, raw(), size=1, n=400) # raw is always of size =1
nchar(tmp)
# [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
close(zb)
tmp
# [1] 31 2c 33 34 2c 34 34 2e 34 2c 22 20 48 49 4a 4b 1a 41 20 22 2c 39 39 0d 0a
rawToChar(tmp)
# [1] "1,34,44.4,\" HIJK\032A \",99\r\n"
i.e. if I read in the file as binary and convert to character() afterwards it seems to work... this will be tedious for large CSV files...
Could there be a bug in R in incorrectly detecting a Control-Z as end of file on windows??
I think I've figured out a solution - because there appears to be a problem reading a Control-Z in the middle of a file on Windows, we need to read the file in binary / raw mode.
fnam <- 'h3.txt'
tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(dfnam)$size, 100))=1
tmp.char <- rawToChar(tmp.bin)
txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
txt
[1] "1,34,44.4,\" HIJK\032A \",99"
Update
The following better answer was posted by Duncan Murdoch to R-Devel refer. Converting it into a function I get:
sReadLines <- function(fnam) {
f <- file(fnam, "rb")
res <- readLines(f)
close(f)
res
}
I also ran into this problem when I used read.csv with a csv file that contained the SUB or CTRL-Z in the middle of the file.
Solved it with the readr package (if your file is comma separated)
library(readr)
read_csv("h3.txt")
If you have a ; as a separator, then use:
library(readr)
read_csv2("h3.txt")
The gist of my question is this:
How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?
Details:
I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:
>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc
enc =
UTF-8
>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}
ans =
ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ
>>
As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:
>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'
pasted =
>>
Thanks!
I present below my findings after doing some digging... Consider these test files:
a.txt
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
b.txt
தமிழ்
First, we read files:
%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')'; %'# read bytes
fclose(fid);
%# decode as unicode string
str = native2unicode(b,'UTF-8');
If you try to print the string, you get a bunch of nonsense:
>> str
str =
Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):
>> double(str)
ans =
Columns 1 through 13
915 916 920 923 926 928 931 934 937 945 946 947 948
Columns 14 through 26
949 950 951 952 953 954 955 956 957 958 960 961 962
Columns 27 through 35
963 964 965 966 967 968 969 13 10
Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:
figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)
One trick I found is to use the embedded Java capability:
%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);
As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):
feature('DefaultCharacterSet','UTF-8');
Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):
>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
>> disp(str)
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω
And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):
uicontrol('Style','text', 'String',str, ...
'Units','normalized', 'Position',[0 0 1 1], ...
'FontName','Arial Unicode MS', 'FontSize',30)
Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:
As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.