l have a massive dataset that l divided into k mini datasets where k=100. Know l want to store these mini datasets in different files.
to store my massive dataset l used the following instructions :
using JLD, HDF5
X=rand(100000)
file = jldopen("path to my file/mydata.jld", "w") # the extension of file is jld so you should add packages JLD and HDF5, Pkg.add("JLD"), Pkg.add("HDF5"),
write(file, "X", X) # alternatively, say "#write file A"
close(file)
Know l divided my dataset into k sub dataset where k=100
function get_mini_batch(X)
mini_batches = round(Int, ceil(X / 100))
for i=1:mini_batches
mini_batch = X[((i-1)*100 + 1):min(i*100, end)]
file= jldopen("/path to my file/mydata.jld", "w")
write(file, "mini_batch", mini_batch) # alternatively, say "#write file mini_batch"
lose(file)
end
end
but this function allows to store the different sub dataset in one file which is overwritten at each iteration.
file= jldopen("/path to my file/mydata1.jld", "w") # at each iteration l want to get files : mydata1, mydata2 ... mydata100
file= jldopen("/path to my file/mydata2.jld", "w")
file= jldopen("/path to my file/mydata3.jld", "w")
file= jldopen("/path to my file/mydata4.jld", "w")
.
.
.
file= jldopen("/path to my file/mydata100.jld", "w")
Alternatively l tried out this procedure
function get_mini_batch(X)
mini_batches = round(Int, ceil(X / 100))
for i=1:mini_batches
mini_batch[i] = X[((i-1)*100 + 1):min(i*100, end)]
file[i]= jldopen("/path to my file/mydata.jld", "w")
write(file, "mini_batch", mini_batch) # alternatively, say "#write file mini_batch"
lose(file)
end
end
but l don't have the idea of how to make a variable i=1....100 within this line code file[i]= jldopen("/path to my file/mydata(i).jld", "w")
You are looking for string formatting.
To create the filenames, you can use #sprintf(). Then you can use these strings to write your objects to disk.
julia> using Printf # Needed in Julia 1.0.0
julia> #sprintf("myfilename%02.d.jld", 5)
"myfilename05.jld"
Example in a loop:
julia> for i in 1:3
println(#sprintf("myfilename%03.d.jl", i))
end
myfilename001.jl
myfilename002.jl
myfilename003.jl
I used %03.d here to show how you can add leading zeros to your file names. This will help later on when it comes to sorting.
I agree with niczky12 that you are looking for string formatting. But I would personally write it this alternative way:
"/path to my file/mydata$i.jld"
instead of using sprintf.
Example:
julia> i = 4
4
julia> "/path/mydata$i.jld"
"/path/mydata4.jld"
Related
How to read by the number at each iteration of the loop? Dynamic work is important, (not once to read the entire line and convert to an array), at each iteration, take one number from the file string and work with it. How to do it right?
input.txt :
5
1 7 5 2 3
Work with 2nd line of the file.
fin = File.open("input.txt", "r")
fout = File.open("output.txt", "w")
n = fin.readline.to_i
heap_min = Heap.new(:min)
heap_max = Heap.new(:max)
for i in 1..n
a = fin.read.to_i #code here <--
heap_max.push(a)
if heap_max.size > heap_min.size
tmp = heap_max.top
heap_max.pop
heap_min.push(tmp)
end
if heap_min.size > heap_max.size
tmp = heap_min.top
heap_min.pop
heap_max.push(tmp)
end
if heap_max.size == heap_min.size
heap_max.top > heap_min.top ? median = heap_min.top : median = heap_max.top
else
median = heap_max.top
end
fout.print(median, " ")
end
If you're 100% sure that your file separate numbers by space you can try this :
a = fin.gets(' ', -1).to_i
Read the 2nd line of a file:
line2 = File.readlines('input.txt')[1]
Convert it to an array of integers:
array = line2.split(' ').map(&:to_i).compact
Hello i am encountering this error message in a Haskell program and i do not know where is the loop coming from.There are almost no IO methods so that i can hook myself to them and print the partial result in the terminal.
I start with a file , i read it and then there are only pure methods.How can i debug this ?
Is there a way to attach to methods or create a helper that can do the following:
Having a method method::a->b how can i somehow wrap it in a iomethod::(a->b)->IO (a->b) to be able to test in in GHCI (i want to insert some putStrLn-s etc ?
P.S My data suffer transformations IO a(->b->c->d->......)->IO x and i do not know how to debug the part that is in the parathesis (that is the code that contains the pure methods)
Types and typeclass definitions and implementations
data TCPFile=Rfile (Maybe Readme) | Dfile Samples | Empty
data Header=Header { ftype::Char}
newtype Samples=Samples{values::[Maybe Double]}deriving(Show)
data Readme=Readme{ maxClients::Int, minClients::Int,stepClients::Int,maxDelay::Int,minDelay::Int,stepDelay::Int}deriving(Show)
data FileData=FileData{ header::Header,rawContent::Text}
(>>?)::Maybe a->(a->Maybe b)->Maybe b
(Just t) >>? f=f t
Nothing >>? _=Nothing
class TextEncode a where
fromText::Text-> a
getHeader::TCPFile->Header
getHeader (Rfile _ ) = Header { ftype='r'}
getHeader (Dfile _ )= Header{ftype='d'}
getHeader _ = Header {ftype='e'}
instance Show TCPFile where
show (Rfile t)="Rfile " ++"{"++content++"}" where
content=case t of
Nothing->""
Just c -> show c
show (Dfile c)="Dfile " ++"{"++show c ++ "}"
instance TextEncode Samples where
fromText text=Samples (map (readMaybe.unpack) cols) where
cols=splitOn (pack ",") text
instance TextEncode Readme where
fromText txt =let len= length dat
dat= case len of
6 ->Prelude.take 6 .readData $ txt
_ ->[0,0,0,0,0,0] in
Readme{maxClients=Prelude.head dat,minClients=dat!!1,stepClients=dat!!2,maxDelay=dat!!3,minDelay=dat!!4,stepDelay=dat!!5} where
instance TextEncode TCPFile where
fromText = textToFile
Main
module Main where
import Data.Text(Text,pack,unpack)
import Data.Text.IO(readFile,writeFile)
import TCPFile(TCPFile)
main::IO()
main=do
dat<-readTcpFile "test.txt"
print dat
readTcpFile::FilePath->IO TCPFile
readTcpFile path =fromText <$> Data.Text.IO.readFile path
textToFile::Text->TCPFile
textToFile input=case readHeader input >>? (\h -> Just (FileData h input)) >>? makeFile of
Just r -> r
Nothing ->Empty
readHeader::Text->Maybe Header
readHeader txt=case Data.Text.head txt of
'r' ->Just (Header{ ftype='r'})
'd' ->Just (Header {ftype ='d'})
_ -> Nothing
makeFile::FileData->Maybe TCPFile
makeFile fd= case ftype.header $ fd of
'r'->Just (Rfile (Just (fromText . rawContent $ fd)))
'd'->Just (Dfile (fromText . rawContent $ fd))
_ ->Nothing
readData::Text->[Int]
readData =catMaybes . maybeValues where
maybeValues=mvalues.split.filterText "{}"
#all the methods under this line are used in the above method
mvalues::[Text]->[Maybe Int]
mvalues arr=map (\x->(readMaybe::String->Maybe Int).unpack $ x) arr
split::Text->[Text]
split =splitOn (pack ",")
filterText::[Char]->Text->Text
filterText chars tx=Data.Text.filter (\x -> not (x `elem` chars)) tx
I want first to clean the Text from given characters , in our case }{ then split it by ,.After the text is split by commas i want to parse them, and create either a Rfile which contains 6 integers , either a Dfile (datafile) which contains any given number of integers.
Input
I have a file with the following content: r,1.22,3.45,6.66,5.55,6.33,2.32} and i am running runghc main 2>err.hs
Expected Output : Rfile (Just (Readme 1.22 3.45 6.66 5.55 6.33 2.32))
In the TextEncode Readme instance, len and dat depend on each other:
instance TextEncode Readme where
fromText txt =let len= length dat
dat= case len of
To debug this kind of thing, other than staring at the code, one thing you can do is compile with -prof -fprof-auto -rtsopts, and run your program with the cmd line options +RTS -xc. This should print a trace when the <<loop>> exception is raised (or if the program loops instead, when you kill it (Ctrl+C)). See the GHC manual https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-flag--xc
As Li-yao Xia said part of the problem is the infinite recursion, but if you tried the following code, then the problem still remains.
instance TextEncode Readme where
fromText txt =let len= length [1,2,3,4,5,6] --dat
dat= case len of
The second issue is that the file contains decimal numbers but all the conversion function are expecting Maybe Int, changing the definitions of the following functions should give the expected results, on the other hand probably the correct fix is that the file should have integers and not decimal numbers.
readData::Text->[Double]
--readData xs = [1,2,3,4,5,6,6]
readData =catMaybes . maybeValues where
maybeValues = mvalues . split . filterText "{}"
--all the methods under this line are used in the above method
mvalues::[Text]->[Maybe Double]
mvalues arr=map (\x->(readMaybe::String->Maybe Double).unpack $ x) arr
data Readme=Readme{ maxClients::Double, minClients::Double,stepClients::Double,maxDelay::Double,minDelay::Double,stepDelay::Double}deriving(Show)
I have 200 images in a folder and each file may contain several versions (example: car_image#2, car_image#2, bike_image#2, etc ). My requirement is to build a utility to copy all the latest files from this directory to another.
My approach is:
Put the imagesNames (without containing version numbers) into a list
Eliminate the duplicates from the list
Iterate through the list and identify the latest version of each unique file (I am little blurred on this step)
Can someone throw some better ideas/algorithm to achieve this?
My approach would be:
Make a list of unique names by getting each filename up to the #, only adding unique values.
Make a dictionary with filenames as keys, and set values to be the version number, updating when it's larger than the one stored.
Go through the dictionary and produce the filenames to grab.
My go-to would be a python script but you should be able to do this in pretty much whatever language you find suitable.
Ex code for getting the filename list:
#get the filename list
myList = []
for x in file_directory:
fname = x.split("#")[0]
if not fname in myList:
myList = myList + [fname]
myDict = {}
for x in myList:
if not x in myDict:
myDict[x] = 0
for x in file_directory:
fversion = x.split("#")[-1]
if myDict[x] < int(fversion):
myDict[x] = fversion
flist = []
for x in myDict:
fname = str(x) + "#" + str(myDict[x])
flist.append(fname)
Then flist would be a list of filenames of the most recent versions
I didn't run this or anything but hopefully it helps!
In Python 3
>>> images = sorted(set(sum([['%s_image#%i' % (nm, random.randint(1,9)) for i in range(random.randint(2,5))] for nm in 'car bike cat dog man tree'.split()], [])))
>>> print('\n'.join(images))
bike_image#2
bike_image#3
bike_image#4
bike_image#5
car_image#2
car_image#7
cat_image#3
dog_image#2
dog_image#5
dog_image#9
man_image#1
man_image#2
man_image#4
man_image#6
man_image#7
tree_image#3
tree_image#4
>>> from collections import defaultdict
>>> image2max = defaultdict(int)
>>> for image in images:
name, _, version = image.partition('#')
version = int(version)
if version > image2max[name]:
image2max[name] = version
>>> # Max version
>>> for image in sorted(image2max):
print('%s#%i' % (image, image2max[image]))
bike_image#5
car_image#7
cat_image#3
dog_image#9
man_image#7
tree_image#4
>>>
I have 2 txt files with different strings and numbers in them splitted with ;
Now I need to subtract the
((number on position 2 in file1) - (number on position 25 in file2)) = result
Now I want to replace the (number on position 2 in file1) with the result.
I tried my code below but it only appends the number in the end of the file and its not the result of the calculation which got appended.
def calc
f1 = File.open("./file1.txt", File::RDWR)
f2 = File.open("./file2.txt", File::RDWR)
f1.flock(File::LOCK_EX)
f2.flock(File::LOCK_EX)
f1.each.zip(f2.each).each do |line, line2|
bg = line.split(";").compact.collect(&:strip)
bd = line2.split(";").compact.collect(&:strip)
n = bd[2].to_i - bg[25].to_i
f2.print bd[2] << n
#puts "#{n}" Only for testing
end
f1.flock(File::LOCK_UN)
f2.flock(File::LOCK_UN)
f1.close && f2.close
end
Use something like this:
lines1 = File.readlines('file1.txt').map(&:to_i)
lines2 = File.readlines('file2.txt').map(&:to_i)
result = lines1.zip(lines2).map do |value1, value2| value1 - value2 }
File.write('file1.txt', result.join(?\n))
This code load all files in memory, then calculate result and write it to first file.
FYI: If you want to use your code just save result to other file (i.e. result.txt) and at the end copy it to original file.
I am writing a simple function that reads a sequence of images, re-sizes them and then saves each set of re-sized images to a new folder. Here is my code:
function [ image ] = FrameResize(Folder, ImgType)
Frames = dir([Folder '/' ImgType]);
NumFrames = size(Frames,1);
new_size = 2;
for i = 1 : NumFrames,
image = double(imread([Folder '/' Frames(i).name]));
for j = 2 : 10,
new_size = power(new_size, j);
% Creating a new folder called 'Low-Resolution' on the
% previous directory
mkdir ('.. Low-Resolution');
image = imresize(image, [new_size new_size]);
imwrite(image, 'Low-Resolution');
end
end
end
I have mainly two doubts:
How can I save those images with specific names, like im_1_64, im_2_64, etc. according to the iteration and to the resolution?
How can I make the name of the folder being created change with each iteration so that I save images with the same resolution on the same folder?
Since you know the resolution will be: new_size x new_size, you can use this in the imwrite function:
imwrite(image, ['im_' num2str(i) '_' num2str(new_size) '.' ImgType]);
Assuming that ImgType holds the extension.
To setup the folders you can do something like this:
mkdir(num2str(new_size))
cd(num2str(new_size))
imwrite(image, ['im_' num2str(i) '_' num2str(new_size) '.' ImgType]);
cd ..
You have an answer you are satisfied with, but I strongly suggest doing two things differently:
Use fullfile to create/concatenate file and path names.
For example, instead of:
imread([Folder '/' Frames(i).name])
do
imread(fullfile(Folder,Frames(i).name))
It's good for relative paths too:
fullfile('..','Low-Resolution')
ans =
..\Low-Resolution
Use sprintf to create strings containing numerical data from variables. Instead of:
['im_' num2str(i) '_' num2str(new_size) '.' ImgType]
do
sprintf('im_%d_%d.%s', i, new_size, ImgType)
You can even specify how many digits you want per integer. Compare:
K>> sprintf('im_%d_%d.%s', i, new_size, ImgType)
ans =
im_2_64.png
K>> sprintf('im_%02d_%d.%s', i, new_size, ImgType)
ans =
im_02_64.png