golang flag hidden options from print defaults - go

this is my actual code :
package main
import (
"flag"
)
var loadList = ""
var threads = 50
var skip = 0
func main() {
//defaults variables
flag.StringVar(&loadList, "f", "", "load links list file (required)")
flag.IntVar(&threads,"t", 50, "run `N` attempts in parallel threads")
flag.IntVar(&skip, "l", 0, "skip first `n` lines of input")
flag.Parse()
flag.PrintDefaults()
}
and this is output :
-f string
load links list file (required)
-l n
skip first n lines of input
-t N
run N attempts in parallel threads (default 50)
i want hide from printdefaults -l and -t, how i can do this ?

There might be multiple ways of doing this. An easy one would be to use VisitAll:
func VisitAll(fn func(*Flag))
In the function you pass you can decide whether or not to output a flag based on any of the exported fields of Flag.
Example:
flag.VisitAll(func(f *flag.Flag) {
if f.Name == "l" || f.Name == "t" {
return
}
fmt.Println("Flag: ", f)
})
Run it at: https://play.golang.org/p/rsrKgWeAQf

Related

How to loop through a folder with conditional statement in Stata?

I have a folder with a bunch of csv files and I want to loop through each file, check if a list of variables in each of the files = 0, then save the csv file as a .dta if they do not equal 0.
I'm having trouble finding online help to do this but here's what I've tried so far:
foreach file in `files' {
import delimited using `file', clear
if a & b & c & d != 0
save "Desktop\myfolder\`file'.dta"
}
when I try this though Stata tells me "{ required r(100)".
Any help appreciated. Thanks.
Stealing some code from the estimable #Wouter Wakker, let's first suppose that the criterion is that a non-zero value is found somewhere in a b c d
foreach file in `files' {
import delimited using `file', clear
local OK = 0
quietly foreach v in a b c d {
count if `v' != 0
if r(N) > 0 local OK = 1
}
if `OK' save "Desktop/myfolder/`file'.dta"
}
Whatever your precise criterion, I think you need to loop over a b c d and (say) count or summarize according to what you want or do not want.
From help ifcmd:
Syntax
if exp { or if exp single_command
multiple_commands
}
So you can do either
foreach file in `files' {
import delimited using `file', clear
if a & b & c & d != 0 save "Desktop\myfolder\`file'.dta"
}
or
foreach file in `files' {
import delimited using `file', clear
if a & b & c & d != 0 {
save "Desktop\myfolder\`file'.dta"
}
}
However, I don't think your if condition does what you think it does. What you're looking for would rather be:
if a != 0 & b != 0 & c != 0 & d != 0

Hidden flag default values

Go provides easy CLI switches aka flags.
var debug = flag.Bool("debug", false, "enable debugging")
var hostname = flag.String("hostname", "127.0.0.1", "hostname")
flag.Parse()
As expected this yields
> ./program -h
Usage:
-debug
enable debugging
-hostname string
hostname (default "127.0.0.1")
I would like to hide the (default "127.0.0.1") part of specific flags.
Searching on SO and around suggested use of flag.FlagSet.
var shown flag.FlagSet
var hidden flag.FlagSet
var debug = shown.Bool("debug", false, "enable debugging")
var hostname = hidden.String("hostname", "127.0.0.1", "hostname")
flag.Usage = func() {
shown.PrintDefaults()
}
flag.Parse()
//shown.Parse(os.Args[0:]) // tried to solve "flag provided but not defined"
Output part shows only "debug" flag, however this breaks actual flag usage.
> ./program -debug
flag provided but not defined: -debug
Usage of ./program:
-debug
enable debugging
And this is not ideal either, since I would like to see the available flag, just hide the default value.
Desired output:
> ./program -h
Usage:
-debug
enable debugging
-hostname string
hostname
Best solution so far is the one Eugene proposed. Thanks!
var debug = flag.Bool("debug", false, "enable debugging")
var hostname = flag.String("hostname", "", "hostname")
flag.Parse()
defaultHostname := "127.0.0.1"
if *hostname == "" {
*hostname = defaultHostname
}
You can just copy & paste the codes from the source and remove the print default part.
flag.Usage = func() {
f := flag.CommandLine
_, _ = fmt.Fprintf(f.Output(), "Usage of %s:\n", os.Args[0])
flag.VisitAll(func(flag_ *flag.Flag) {
if flag_.Usage == "" {
return
}
s := fmt.Sprintf(" -%s", flag_.Name) // Two spaces before -; see next two comments.
name, usage := flag.UnquoteUsage(flag_)
if len(name) > 0 {
s += " " + name
}
// Boolean flags of one ASCII letter are so common we
// treat them specially, putting their usage on the same line.
if len(s) <= 4 { // space, space, '-', 'x'.
s += "\t"
} else {
// Four spaces before the tab triggers good alignment
// for both 4- and 8-space tab stops.
s += "\n \t"
}
s += strings.ReplaceAll(usage, "\n", "\n \t")
_, _ = fmt.Fprint(f.Output(), s, "\n")
})
}
flag.Parse()

Parsing file, ignoring comments and blank lines

As the title says, I am trying to parse a file but ignore comments (started with #) or blank lines. I have tried to make a system for this, yet it always seems to ignore that it should be ignoring comments and/or blank lines.
lines := strings.Split(d, "\n")
var output map[string]bool = make(map[string]bool)
for _, line := range lines {
if strings.HasPrefix(line, "#") != true {
output[line] = true
} else if len(line) > 0 {
output[line] = true
}
}
When run (this is part of a function), it outputs the following
This is the input ('d' variable):
Minecraft
Zerg Rush
Pokemon
# Hello
This is the output when printed ('output' variable):
map[Minecraft:true Zerg Rush:true Pokemon:true :true # Hello:true]
My issue here is that it still keeps the "" and "# Hello" values, meaning that something failed, something I haven't been able to figure out.
So, what am I doing wrong that this keeps the improper values?
len(line) > 0 will be true for the "# Hello" line, so it will get added to output.
Currently, you are adding lines that either don't start with a # or are not empty. You need to only add lines that satisfy both conditions:
if !strings.HasPrefix(line, "#") && len(line) > 0 {
output[line] = true
}

How to define new commands or macros in awk

I like to define a new command that wraps an existing awk command, such as print. However, I do not want to use a function:
#wrap command with function
function warn(text) { print text > "/dev/stderr" }
NR%1e6 == 0 {
warn("processed rows: "NR)
}
Instead, I like to define a new command that can be invoked without brackets:
#wrap command with new command ???
define warn rest... { print rest... > "/dev/stderr" }
NR%1e6 == 0 {
warn "processed rows: "NR
}
One solution I can imagine is using a preprocessor and maybe setting up the shebang of the awk script nicely to invoke this preproccessor followed by awk. However, I was more hoping for a pure awk solution.
Note: The solution should also work in mawk, which I use, because it is much faster than vanilla GNU/awk.
Update: The discussion revealed that gawk (GNU/awk) can be quite fast and mawk is not required.
You cannot do this within any awk and you cannot do it robustly outside of awk without writing an awk language parser and by that point you may as well write your own awk-like command which then would actually no longer really be awk in as much as it would not behave the same as any other command by that name.
It is odd that you refer to GNU awk as "vanilla" when it has many more useful features than any other currently available awk while mawk is simply a stripped down awk optimized for speed which is only necessary in very rare circumstances.
Looking at Mawk's source I see that commands are special and cannot be added at runtime. From kw.c:
keywords[] =
{
{ "print", PRINT },
{ "printf", PRINTF },
{ "do", DO },
{ "while", WHILE },
{ "for", FOR },
{ "break", BREAK },
{ "continue", CONTINUE },
{ "if", IF },
{ "else", ELSE },
{ "in", IN },
{ "delete", DELETE },
{ "split", SPLIT },
{ "match", MATCH_FUNC },
{ "BEGIN", BEGIN },
{ "END", END },
{ "exit", EXIT },
{ "next", NEXT },
{ "nextfile", NEXTFILE },
{ "return", RETURN },
{ "getline", GETLINE },
{ "sub", SUB },
{ "gsub", GSUB },
{ "function", FUNCTION },
{ (char *) 0, 0 }
};
You could add a new command by patching Mawk's C code.
I created a shell wrapper script called cppawk which combines the C preprocessor (from GCC) with Awk.
BSD licensed, it comes with a man page, regression tests and simple install instructions.
Normally, the C preprocessor creates macros that look like functions; but using certain control flow tricks, which work in Awk also much as they do in C, we can pull off minor miracles of syntactic sugar:
function __warn(x)
{
print x
return 0
}
#define warn for (__w = 1; __w; __w = __warn(__x)) __x =
NR % 5 == 0 {
warn "processed rows: "NR
}
Run:
$ cppawk -f warn.cwk
a
b
c
d
e
processed rows: 5
f
g
h
i
j
processed rows: 10
k
Because the entire for trick is in a single line of code, we could use the __LINE__ symbol to make the hidden variables quasi-unique:
function __warn(x)
{
print x
return 0
}
#define xcat(a, b, c) a ## b ## c
#define cat(a, b, c) xcat(a, b, c)
#define uq(sym) cat(__, __LINE__, sym)
#define warn for (uq(w) = 1; uq(w); uq(w) = __warn(uq(x))) uq(x) =
NR % 5 == 0 {
warn "processed rows: "NR
}
The expansion is:
$ cppawk --prepro-only -f warn.cwk
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "<stdin>"
function __warn(x)
{
print x
return 0
}
NR % 5 == 0 {
for (__13w = 1; __13w; __13w = __warn(__13x)) __13x = "processed rows: "NR
}
The u() macro interpolated 13 into the variables because warn is called on line 13.
Hope you like it.
PS, maybe don't do this, but find some less hacky way of using cppawk.
You can use C99/GNUC variadic macros, for instance:
#define warn(...) print __VA_ARGS__ >> "/dev/stderr"
NR % 5 == 0 {
warn("processed rows:", NR)
}
We made a humble print wrapper which redirects to standard error.It seems like nothing, yet you can't do that with an Awk function: not without making it a one-argument function and passing the value of an expression which catenates everything.

R: tm Textmining package: Doc-Level metadata generation is slow

I have a list of documents to process, and for each record I want to attach some metadata to the document "member" inside the "corpus" data structure that tm, the R package, generates (from reading in text files).
This for-loop works but it is very slow,
Performance seems to degrade as a function f ~ 1/n_docs.
for (i in seq(from= 1, to=length(corpus), by=1)){
if(opts$options$verbose == TRUE || i %% 50 == 0){
print(paste(i, " ", substr(corpus[[i]], 1, 140), sep = " "))
}
DublinCore(corpus[[i]], "title") = csv[[i,10]]
DublinCore(corpus[[i]], "Publisher" ) = csv[[i,16]] #institutions
}
This may do something to the corpus variable but I don't know what.
But when I put it inside a tm_map() (similar to lapply() function), it runs much faster, but the changes are not made persistent:
i = 0
corpus = tm_map(corpus, function(x){
i <<- i + 1
if(opts$options$verbose == TRUE){
print(paste(i, " ", substr(x, 1, 140), sep = " "))
}
meta(x, tag = "Heading") = csv[[i,10]]
meta(x, tag = "publisher" ) = csv[[i,16]]
})
Variable corpus has empty metadata fields after exiting the tm_map function. It should be filled. I have a few other things to do with the collection.
The R documentation for the meta() function says this:
Examples:
data("crude")
meta(crude[[1]])
DublinCore(crude[[1]])
meta(crude[[1]], tag = "Topics")
meta(crude[[1]], tag = "Comment") <- "A short comment."
meta(crude[[1]], tag = "Topics") <- NULL
DublinCore(crude[[1]], tag = "creator") <- "Ano Nymous"
DublinCore(crude[[1]], tag = "Format") <- "XML"
DublinCore(crude[[1]])
meta(crude[[1]])
meta(crude)
meta(crude, type = "corpus")
meta(crude, "labels") <- 21:40
meta(crude)
I tried many of these calls (with var "corpus" instead of "crude"), but they do not seem to work.
Someone else once seemed to have had the same problem with a similar data set (forum post from 2009, no response)
Here's a bit of benchmarking...
With the for loop :
expr.for <- function() {
for (i in seq(from= 1, to=length(corpus), by=1)){
DublinCore(corpus[[i]], "title") = LETTERS[round(runif(26))]
DublinCore(corpus[[i]], "Publisher" ) = LETTERS[round(runif(26))]
}
}
microbenchmark(expr.for())
# Unit: milliseconds
# expr min lq median uq max
# 1 expr.for() 21.50504 22.40111 23.56246 23.90446 70.12398
With tm_map :
corpus <- crude
expr.map <- function() {
tm_map(corpus, function(x) {
meta(x, "title") = LETTERS[round(runif(26))]
meta(x, "Publisher" ) = LETTERS[round(runif(26))]
x
})
}
microbenchmark(expr.map())
# Unit: milliseconds
# expr min lq median uq max
# 1 expr.map() 5.575842 5.700616 5.796284 5.886589 8.753482
So the tm_map version, as you noticed, seems to be about 4 times faster.
In your question you say that the changes in the tm_map version are not persistent, it is because you don't return x at the end of your anonymous function. In the end it should be :
meta(x, tag = "Heading") = csv[[i,10]]
meta(x, tag = "publisher" ) = csv[[i,16]]
x

Resources