How does crypto/rand generate a secure random number? - go

I started to geek out and I wanted to see how https://golang.org/src/crypto/rand/rand_unix.go works on the inside.
I wanted to see when it generates rands from dev/random (more secure) and when it generates from dev/urandom(less security)
It looks like if rand_batched.go is initialized (this initializes altGetRandom) and GOOS is not plan9 (then r.name = urandomDevice it will return the length of the random array and not the content (which is surprising, why the length?)
see line 57:
if altGetRandom != nil && r.name == urandomDevice && altGetRandom(b) {
return len(b), nil
}
else it will simply return the content of the array which will be based on dev/random only if GOOS=plan9.
So why should it ever return len(b)?
Also it looks to me that most of the time it will use dev/urandom which is suboptimal... am I wrong (guess so because of docs, but help me understand)?

altGetRandom is used on systems where there is a system call to get random bytes, rather than depending on the existence of /dev/urandom. This is sometimes useful in special environments (a chroot where there is no /dev, Docker-ish systems with weird/wrong /dev, FreeBSD jails with an incorrect /dev setup, etc.), and also is a bit faster than opening the file (as it does not go through quite as many system call layers), though in general one should just use the file.
The call in question is in an io.Reader-style function, whose job is to return the length of the block of bytes read, and any error. When using the system call, the OS fills in—or is assumed to fill in—the array b completely, so len(b) is the correct result.

Related

Darwin Streaming Server install problems os x

My problem is the same as the one mentioned in this answer. I've been trying to understand the code and this is what I learned:
It is failing in the file parse_xml.cgi, tries to get messages (return $message{$name}) from a file named messages (located in the html_en directory).
The $messages value comes from the method GetMessageHash in file adminprotocol-lib.pl:
sub GetMessageHash
{
return $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"}
}
The $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} is set in the file streamingadminserver.pl:
$ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} = $messages{"en"}
I dont know anything about Perl so I have no idea of what the problem can be, for what I saw $messages{"en"} has the correct value (if I do print($messages{"en"}{'SunStr'} I get the value "Sun")).
However, if I try to do print($ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"}{'SunStr'} I get nothing. Seems like $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} is not set
I tried this simple example and it worked fine:
$ENV{"HELLO"} = "hello";
print($ENV{"HELLO"});
and it works fine, prints "hello".
Any idea of what the problem can be?
Looks like $messages{"en"} is a HashRef: A pointer to some memory address holding a key-value-store. You could even print the associated memory address:
perl -le 'my $hashref = {}; print $hashref;'
HASH(0x1548e78)
0x1548e78 is the address, but it's only valid within the same running process. Re-run the sample command and you'll get different addresses each time.
HASH(0x1548e78) is also just a human-readable representation of the real stored value. Setting $hashref2="HASH(0x1548e78)"; won't create a real reference, just a copy of the human-readable string.
You could easily proof this theory using print $ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} in both script.
Data::Dumper is typically used to show the contents of the referenced hash (memory location):
use Data::Dumper;
print Dumper($messages{"en"});
# or
print Dumper($ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"});
This will also show if the pointer/reference could be dereferenced in both scripts.
The solution for your problem is probably passing the value instead of the HashRef:
$ENV{"QTSSADMINSERVER_EN_SUN"} = $messages{"en"}->{SunStr};
Best Practice is using a -> between both keys. The " or ' quotes for the key also optional if the key is a plain word.
But passing everything through environment variables feels wrong. They might not be able to hold references on OSX (I don't know). You might want to extract the string storage to a include file and load it via require.
See http://www.perlmaven.com/ or http://learn.perl.org for more about Perl.
fix code:
$$ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"} = $messages{"en"};
sub GetMessageHash
{
return $$ENV{"QTSSADMINSERVER_EN_MESSAGEHASH"};
}
ref:
https://github.com/guangbin79/dss6.0.3-linux-patch

How To Deal With MaxLineLengthExceeded With Indy TIdTCPClient

I'm new to indy, using whatever version comes with CBuilder XE4. Here's very simple code that works fine until what I'm reading exceeds the 16K limit....
String Ttcp_mgr::send(String data)
{
tcpClient->Socket->WriteLn(data);
return tcpClient->Socket->ReadLn();
}
The server is not using indy, there is no length header, going both ways is json terminated by \r\n. Blocking reads are fine, there's nothing for my app to do until it get's it's response, and it will be coming very quickly anyway. But the amount of data that is returned could be a few bytes or 100K, in a few cases. Generally the length will be < 500 bytes.
I've looked at IOHandler but I have no idea how to apply it to what I'm doing, not even sure it's what I need. As you can probably tell I'm not using the component on a form which probably makes no difference.
TIdIOHandler::ReadLn() has an optional AMaxLineLength input parameter. If you do not specify a value for it, the TIdIOHandler::MaxLineLength property is used, which is set to 16K by default. The TIdIOHandler::MaxLineAction property specifies what happens if ReadLn() actually reaches the max line length.
If MaxLineAction is maException (the default), an EIdReadLnMaxLineLengthExceeded exception is raised.
If MaxLineAction is maSplit, the TIdIOHandler::ReadLnSplit property is set to true and ReadLn() returns what it can. You would have to call ReadLn() again to read more data for the current line. This can end up chopping the data incorrectly if it is using a multi-byte encoding for non-ASCII characters, like UTF-8 (which is JSON's default encoding), so I do not recommend this approach.
In your case, you should either:
set the TIdIOHandler::MaxLineLength property to MaxInt:
// TIdTCPClient::OnConnected event handler...
void __fastcall Ttcp_mgr::tcpCllientConnected(TObject *Sender)
{
tcpClient->IOHandler->MaxLineLength = MaxInt;
}
pass MaxInt as a parameter to TIdIOHandler::ReadLn().
String Ttcp_mgr::send(String data)
{
tcpClient->Socket->WriteLn(data);
return tcpClient->Socket->ReadLn(EOL, IdTimeoutDefault, MaxInt);
}
To all Delphi users: setting IOHandler.MaxLineLength to "MaxInt" must be done after starting the connection, otherwise you will get a memory error.
IdPOP31.Connect;
IdPOP31.IOHandler.MaxLineLength := MaxInt;
Everything else works as above, and solves the problem of parsing too long emails for Delphi.

confirm conditional statement applies to >0 observations in Stata

This is something that has puzzled me for some time and I have yet to find an answer.
I am in a situation where I am applying a standardized data cleaning process to (supposedly) similarly structured files, one file for each year. I have a statement such as the following:
replace field="Plant" if field=="Plant & Machinery"
Which was a result of the original code-writing based on the data file for year 1. Then I generalize the code to loop through the years of data. The problem becomes if in year 3, the analogous value in that variable was coded as "Plant and MachInery ", such that the code line above would not make the intended change due to the difference in the text string, but not result in an error alerting the change was not made.
What I am after is some sort of confirmation that >0 observations actually satisfied the condition each instance the code is executed in the loop, otherwise return an error. Any combination of trimming, removing spaces, and standardizing the text case are not workaround options. At the same time, I don't want to add a count if and then assert statement before every conditional replace as that becomes quite bulky.
Aside from going to the raw files to ensure the variable values are standardized, is there any way to do this validation "on the fly" as I have tried to describe? Maybe just write a custom program that combines a count if, assert and replace?
The idea has surfaced occasionally that replace should return the number of observations changed, but there are good reasons why not, notably that it is not a r-class or e-class command any way and it's quite important not to change the way it works because that could break innumerable programs and do-files.
So, I think the essence of any answer is that you have to set up your own monitoring process counting how many values have (or would be) changed.
One pattern is -- when working on a current variable:
gen was = .
foreach ... {
...
replace was = current
replace current = ...
qui count if was != current
<use the result>
}

How can I check if a string is a valid file name for windows using R?

I've been writing a program in R that outputs randomization schemes for a research project I'm working on with a few other people this summer, and I'm done with the majority of it, except for one feature. Part of what I've been doing is making it really user friendly, so that the program will prompt the user for certain pieces of information, and therefore know what needs to be randomized. I have it set up to check every piece of user input to make sure it's a valid input, and give an error message/prompt the user again if it's not. The only thing I can't quite figure out is how to get it to check whether or not the file name for the .csv output is valid. Does anyone know if there is a way to get R to check if a string makes a valid windows file name? Thanks!
These characters aren't allowed: /\:*?"<>|. So warn the user if it contains any of those.
Some other names are also disallowed: COM, AUX, NUL, COM1 to COM9, LPT1 to LPT9.
You probably want to check that the filename is valid using a regular expression. See this other answer for a Java example that should take minimal tweaking to work in R.
https://stackoverflow.com/a/6804755/134830
You may also want to check the filename length (260 characters for maximum portability, though longer names are allowed on some systems).
Finally, in R, if you try to create a file in a directory that doesn't exist, it will still fail, so you need to split the name up into the filename and directory name (using basename and dirname) and try to create the directory first, if necessary.
That said, David Heffernan gives good advice in his comment to let Windows do the wok in deciding whether or not it can create the file: you don't want to erroneously tell the user that a filename is invalid.
You want something a little like this:
nice_file_create <- function(filename)
{
directory_name <- dirname(filename)
if(!file.exists(directory_name))
{
ok <- dir.create(directory_name)
if(!ok)
{
warning("The directory of that path could not be created.")
return(invisible())
}
}
tryCatch(
file.create(filename),
error = function(e)
{
warning("The file could not be created.")
}
)
}
But test it thoroughly first! There are all sorts of edge cases where things can fall over: try UNC network path names, "~", and paths with "." and ".." in them.
I'd suggest that the easiest way to make sure a filename is valid is to use fs::path_sanitize().
It removes control characters, reserved characters, and Windows-reserved filenames, truncating the string at 255 bytes in length.

Should I get a habit of removing unused variables in R?

Currently I'm working with relatively large data files, and my computer is not a super computer. I'm creating many subsets of these data sets temporarily and don't remove them from workspace. Obviously those are making a clutter of many variables. But, is there any effect of having many unused variables on performance of R? (i.e. does memory of computer fill at some point?)
When writing code should I start a habit of removing unused variables? Does it worth it?
x <- rnorm(1e8)
y <- mean(x)
# After this point I will not use x anymore, but I will use y
# Should I add following line to my code? or
# Maybe there will not be any performance lag if I skip the following line:
rm(x)
I don't want to add another line to my code. Instead of my code to seem cluttered I prefer my workspace to be cluttered (if there will be no performance improvement).
Yes, having unused objects will affect your performance, since R stores all its objects in memry. Obviously small objects will have negligible impact, and you mostly need to remove only the really big ones (data frames with millions of rows, etc) but having an uncluttered workspace won't hurt anything.
The only risk is removing something that you need later. Even when using a repo, as suggested, breaking stuff accidentally is something you want to avoid.
One way to get around these issues is to make extensive use of local. When you do a computation that scatters around lots of temporary objects, you can wrap it inside a local call, which will effectively dispose of those objects for you afterward. No more having to clean up lots of i, j, x, temp.var, and whatnot.
local({
x <- something
for(i in seq_along(obj))
temp <- some_unvectorised function(obj[[i]], x)
for(j in 1:temp)
temp2 <- some_other_unvectorised_function(temp, j)
# x, i, j, temp, temp2 only exist for the duration of local(...)
})
Adding to the above suggestions, for assisting beginners like me, I would like to list steps to check on R memory:
List the objects that are unused using ls().
Check the objects of interest using object.size("Object_name")
Remove unused/unnecessary objects using rm("Object_name")
Use gc()
Check memory cleared using memory.size()
In case, you are using a new session, use rm(list=ls()) followed by gc().
If one feels that the habit of removing unused variables, can be dangerous, it is always a good practice to save the objects into R images occasionally.
I think it's a good programming practice to remove unused code, regardless of language.
It's also a good practice to use a version control system like Subversion or Git to track your change history. If you do that you can remove code without fear, because it's always possible to roll back to earlier versions if you need to.
That's fundamental to professional coding.
Show distribution of the largest objects and return their names, based on #Peter Raynham:
memory.biggest.objects <- function(n=10) { # Show distribution of the largest objects and return their names
Sizes.of.objects.in.mem <- sapply( ls( envir = .GlobalEnv), FUN = function(name) { object.size(get(name)) } );
topX= sort(Sizes.of.objects.in.mem,decreasing=T)[1:n]
Memorty.usage.stat =c(topX, 'Other' = sum(sort(Sizes.of.objects.in.mem,decreasing=T)[-(1:n)]))
pie(Memorty.usage.stat, cex=.5, sub=make.names(date()))
# wpie(Memorty.usage.stat, cex=.5 )
# Use wpie if you have MarkdownReports, from https://github.com/vertesy/MarkdownReports
print(topX)
print("rm(list=c( 'objectA', 'objectB'))")
# inline_vec.char(names(topX))
# Use inline_vec.char if you have DataInCode, from https://github.com/vertesy/DataInCode
}

Resources