How to view full dependency tree for nested Go dependencies - go

I'm trying to debug the following build error in our CI where "A depends on B which can't build because it depends on C." I'm building my data service which doesn't directly depend on kafkaAvailMonitor.go which makes this error hard to trace. In other words:
data (what I'm building) depends on (?) which depends on
kafkaAvailMonitor.go
It may seem trivial to fix for a developer they just do "go get whatever" but I can't do that as part of the release process - I have to find the person that added the dependency and ask them to fix it.
I'm aware that there are tools to visualize the dependency tree and other more sophisticated build systems, but this seems like a pretty basic issue: is there any way I can view the full dependency tree to see what's causing the build issue?
go build -a -v
../../../msgq/kafkaAvailMonitor.go:8:2: cannot find package
"github.com/Shopify/sarama/tz/breaker" in any of:
/usr/lib/go-1.6/src/github.com/Shopify/sarama/tz/breaker (from $GOROOT)
/home/jenkins/go/src/github.com/Shopify/sarama/tz/breaker (from $GOPATH)
/home/jenkins/vendor-library/src/github.com/Shopify/sarama/tz/breaker
/home/jenkins/go/src/github.com/Shopify/sarama/tz/breaker
/home/jenkins/vendor-library/src/github.com/Shopify/sarama/tz/breaker

When using modules you may be able to get what you need from go mod graph.
usage: go mod graph
Graph prints the module requirement graph (with replacements applied)
in text form. Each line in the output has two space-separated fields: a module
and one of its requirements. Each module is identified as a string of the form
path#version, except for the main module, which has no #version suffix.
I.e., for the original question, run go mod graph | grep github.com/Shopify/sarama then look more closely at each entry on the left-hand side.

if the following isn't a stack trace what is it?
It is the list of path where Go is looking for your missing package.
I have no idea who is importing kafkaAvailMonitor.go
It is not "imported", just part of your sources and compiled.
Except it cannot compile, because it needs github.com/Shopify/sarama/tz/breaker, which is not in GOROOT or GOPATH.
Still, check what go list would return on your direct package, to see if kafkaAvailMonitor is mentioned.
go list can show both the packages that your package directly depends, or its complete set of transitive dependencies.
% go list -f '{{ .Imports }}' github.com/davecheney/profile
[io/ioutil log os os/signal path/filepath runtime runtime/pprof]
% go list -f '{{ .Deps }}' github.com/davecheney/profile
[bufio bytes errors fmt io io/ioutil log math os os/signal path/filepath reflect run
You can then script go list in order to list all dependencies.
See this bash script for instance, by Noel Cower (nilium)
#!/usr/bin/env bash
# Usage: lsdep [PACKAGE...]
#
# Example (list github.com/foo/bar and package dir deps [the . argument])
# $ lsdep github.com/foo/bar .
#
# By default, this will list dependencies (imports), test imports, and test
# dependencies (imports made by test imports). You can recurse further by
# setting TESTIMPORTS to an integer greater than one, or to skip test
# dependencies, set TESTIMPORTS to 0 or a negative integer.
: "${TESTIMPORTS:=1}"
lsdep_impl__ () {
local txtestimps='{{range $v := .TestImports}}{{print . "\n"}}{{end}}'
local txdeps='{{range $v := .Deps}}{{print . "\n"}}{{end}}'
{
go list -f "${txtestimps}${txdeps}" "$#"
if [[ -n "${TESTIMPORTS}" ]] && [[ "${TESTIMPORTS:-1}" -gt 0 ]]
then
go list -f "${txtestimps}" "$#" |
sort | uniq |
comm -23 - <(go list std | sort) |
TESTIMPORTS=$((TESTIMPORTS - 1)) xargs bash -c 'lsdep_impl__ "$#"' "$0"
fi
} |
sort | uniq |
comm -23 - <(go list std | sort)
}
export -f lsdep_impl__
lsdep_impl__ "$#"

I just want to mention here that go mod why can also help. Anyway you cannot get and display the whole tree. But you can trace back one single branch of a child dependency until its parent root.
Example:
$ go mod why github.com/childdep
# github.com/childdep
github.com/arepo.git/service
github.com/arepo.git/service.test
github.com/anotherrepo.git/mocks
github.com/childdep
That means, you have imported 'childdep' finally in 'anotherrepo.git/mocks'.

can try this https://github.com/vc60er/deptree
redis git:(master) go mod graph | deptree -d 3
package: github.com/go-redis/redis/v9
dependence tree:
┌── github.com/cespare/xxhash/v2#v2.1.2
├── github.com/dgryski/go-rendezvous#v0.0.0-20200823014737-9f7001d12a5f
├── github.com/fsnotify/fsnotify#v1.4.9
│ └── golang.org/x/sys#v0.0.0-20191005200804-aed5e4c7ecf9
├── github.com/nxadm/tail#v1.4.8
│ ├── github.com/fsnotify/fsnotify#v1.4.9
│ │ └── golang.org/x/sys#v0.0.0-20191005200804-aed5e4c7ecf9
│ └── gopkg.in/tomb.v1#v1.0.0-20141024135613-dd632973f1e7
├── github.com/onsi/ginkgo#v1.16.5
│ ├── github.com/go-task/slim-sprig#v0.0.0-20210107165309-348f09dbbbc0
│ │ ├── github.com/davecgh/go-spew#v1.1.1
│ │ └── github.com/stretchr/testify#v1.5.1
│ │ └── ...

The above answer still doesn't show me a dependency tree so I've taken the time to write a Python script to do what I need - hopefully that helps other people.
The issue with the above solution (the others proposed like go list) is that it only tells me the top level. They don't "traverse the tree." This is the output I get - which doesn't help any more than what go build gives me.
.../npd/auth/
.../mon/mlog
.../auth/service
This is what I'm trying to get - I know that auth is broken (top) and that breaker is broken (bottom) from go build but I have no idea what's in between - my script below gives me this output.
.../npd/auth/
.../npd/auth/service
.../npd/auth/resource
.../npd/auth/storage
.../npd/middleware
.../npd/metrics/persist
.../npd/kafka
.../vendor-library/src/github.com/Shopify/sarama
.../vendor-library/src/github.com/Shopify/sarama/vz/breaker
My Python script:
import subprocess
import os
folder_locations=['.../go/src','.../vendor-library/src']
def getImports(_cwd):
#When the commands were combined they overflowed the bugger and I couldn't find a workaround
cmd1 = ["go", "list", "-f", " {{.ImportPath}}","./..."]
cmd2 = ["go", "list", "-f", " {{.Imports}}","./..."]
process = subprocess.Popen(' '.join(cmd1), cwd=_cwd,shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out1, err = process.communicate()
process = subprocess.Popen(' '.join(cmd2), cwd=_cwd,shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out2, err = process.communicate()
out2clean=str(out2).replace("b'",'').replace('[','').replace(']','').replace("'",'')
return str(out1).split('\\n'),out2clean.split('\\n')
def getFullPath(rel_path):
for i in folder_locations:
if os.path.exists(i+'/'+rel_path):
return i+'/'+rel_path
return None
def getNextImports(start,depth):
depth=depth+1
indent = '\t'*(depth+1)
for i,val in enumerate(start.keys()):
if depth==1:
print (val)
out1,out2=getImports(val)
noDeps=True
for j in out2[i].split(' '):
noDeps=False
_cwd2=getFullPath(j)
new_tree = {_cwd2:[]}
not_exists = (not _cwd2 in alltmp)
if not_exists:
print(indent+_cwd2)
start[val].append(new_tree)
getNextImports(new_tree,depth)
alltmp.append(_cwd2)
if noDeps:
print(indent+'No deps')
_cwd = '/Users/.../npd/auth'
alltmp=[]
start_root={_cwd:[]}
getNextImports(start_root,0)

The dependency of a Go project is a directional graph. This graph consists of multiple layers, ranging from several to hundreds or thousands. Here is an dependency graph of redis. The cascaded tree can be difficult to understand due to the presence of many duplicated subtrees. To make the layout easier to view, the tree can be flattened via gomoddeps to fit the width of the screen.
tzhang:~/github.com/redis/go-redis$ go mod graph | gomoddeps
├─ github.com/redis/go-redis/v9
│ └─ dependencies
│ ├─ github.com/bsm/ginkgo/v2#v2.5.0
│ ├─ github.com/bsm/gomega#v1.20.0
│ ├─ github.com/cespare/xxhash/v2#v2.2.0
│ ├─ github.com/davecgh/go-spew#v1.1.1
│ ├─ github.com/dgryski/go-rendezvous#v0.0.0-20200823014737-9f7001d12a5f
│ ├─ github.com/pmezard/go-difflib#v1.0.0
│ ├─ github.com/stretchr/testify#v1.8.1
│ └─ gopkg.in/yaml.v3#v3.0.1
│
├─ github.com/bsm/ginkgo/v2#v2.5.0
│ └─ dependents
│ └─ github.com/redis/go-redis/v9
│
├─ github.com/bsm/gomega#v1.20.0
│ └─ dependents
│ └─ github.com/redis/go-redis/v9
│
├─ github.com/cespare/xxhash/v2#v2.2.0
│ └─ dependents
│ └─ github.com/redis/go-redis/v9
│
├─ github.com/davecgh/go-spew#v1.1.1
│ └─ dependents
│ ├─ github.com/redis/go-redis/v9
│ ├─ github.com/stretchr/testify#v1.8.1
│ ├─ github.com/stretchr/testify#v1.8.0
│ └─ github.com/stretchr/objx#v0.4.0
...

Related

Windows tree command that follows .lnk links

so i have a folder structure that i want to make a tree of (im currently using the tree command), the current output looks like this:
C:.
└───example
├───example2
└───folder with link
│ link to example 2.lnk
│
└───other folder
would it be possible to show the destiny of the link?
example:
C:.
└───example
├───example2
└───folder with link
│ link to example 2.lnk -> example2
│
└───other folder
it doesn't have to look exactly like that, i just want to see the link destination
i tried to find something on the internet but the only thing i found, was a linux solution that looked like this
tree -l
.
├── aaa
│ └── bbb
│ └── ccc
└── slink -> /home/kaa/test/aaa/bbb
sadly -l or /l doesn't exist in windows

bash: How do I selectively copy directories from one tree to another?

My directory tree looks somewhat like this:
/Volumes/Data/TEMP/DROP
├───R1
│ ├───morestuff
│ │ └───stuff2
│ │ └───C.tool
│ └───stuff
│ ├───A.tool
│ └───B.Tool
└───R2
├───morestuff
│ └───stuff2
│ └───C.tool
└───stuff
├───A.tool
└───B.Tool
How do I copy the *.tool directories recursively from R1 to (overwrite) those in R2? My bash has about 20 years of rust on it.
This will work (expanding on the idea of #Maxim Egorushkin)
# The trailing slash important in the next line
SOURCE=/Volumes/Data/TEMP/DROP/R1/
DEST=/Volumes/Data/TEMP/DROP/R2
rsync -zarv --include "*/" --include="*.tool" --exclude="*" "$SOURCE" "$DEST"

How do I reach end of file in less terminal without ...skipping

If I have tree output in terminal with less with this function
function tre() {
tree -aC -I '.git|node_modules|bower_components' --dirsfirst "$#" | less -FRNX;
}
, it will scroll 1 line by pressing key each time.
I need a shorcut or command to reach and of file.
If I press "G" the output would be with "...skipping..."
19 │ │ │ └── someotherfile.db
20 │ │ ├── static
...skipping...
62 │ │ ├── user
63 │ │ │ ├── admin.py
How do I get to the end of file with all lines loaded without "...skipping..."?
The issue was with this
less -FRNX;
The last (X) forced the output line by line. So the solution was not to use it
less -FRN;
(Why I use less for tree output)
On screenshot below is the difference between default tree output and output it with less. Same folder, but with less output is with colors, line numbers and directory first.
enter image description here

snakemake manage pair sample and indelrealigner

I want to connect the realigner process with the indel reallignement.
This is the rules:
rule gatk_IndelRealigner:
input:
tumor="mapped_reads/merged_samples/{tumor}.sorted.dup.reca.bam",
normal="mapped_reads/merged_samples/{normal}.sorted.dup.reca.bam",
id="mapped_reads/merged_samples/operation/{tumor}_{normal}.realign.intervals"
output:
"mapped_reads/merged_sample/CoClean/{tumor}.sorted.dup.reca.cleaned.bam",
"mapped_reads/merged_sample/CoClean/{normal}.sorted.dup.reca.cleaned.bam",
params:
genome=config['reference']['genome_fasta'],
mills= config['mills'],
ph1_indels= config['know_phy'],
log:
"mapped_reads/merged_samples/logs/{tumor}.indel_realign_2.log"
threads: 8
shell:
"gatk -T IndelRealigner -R {params.genome} "
"-nt {threads} "
"-I {input.tumor} -I {input.normal} -known {params.ph1_indels} -known {params.mills} -nWayOut .cleaned.bam --maxReadsInMemory 500000 --noOriginalAligmentTags --targetIntervals {input.id} >& {log} "
This is the error:
Not all output files of rule gatk_IndelRealigner contain the same wildcards.
I suppose I need to use also the {tumor}_{normal} but I can't use.
Snakemake:
rule all:
input:expand("mapped_reads/merged_samples/CoClean/{sample}.sorted.dup.reca.cleaned.bam",sample=config['samples']),
expand("mapped_reads/merged_samples/operation/{sample[1][tumor]}_{sample[1][normal]}.realign.intervals", sample=read_table(config["conditions"], ",").iterrows())
config.yml
conditions: "conditions.csv"
conditions.csv
tumor,normal
411,412
Here you can see an example of the code (for testing purpose) gave the same error:
directory
$ tree prova/
prova/
├── condition.csv
├── config.yaml
├── output
│   ├── ABC.bam
│   ├── pippa.bam
│   ├── Pippo.bam
│   ├── TimBorn.bam
│   ├── TimNorm.bam
│   ├── TimTum.bam
│   └── XYZ.bam
└── Snakefile
this is snakemake
$ cat prova/Snakefile
from pandas import read_table
configfile: "config.yaml"
rule all:
input:
expand("{pathDIR}/{sample[1][tumor]}_{sample[1][normal]}.bam", pathDIR=config["pathDIR"], sample=read_table(config["sampleFILE"], " ").iterrows()),
expand("CoClean/{sample[1][tumor]}.bam", sample=read_table(config["sampleFILE"], " ").iterrows()),
expand("CoClean/{sample[1][normal]}.bam", sample=read_table(config["sampleFILE"], " ").iterrows())
rule gatk_RealignerTargetCreator:
input:
"{pathGRTC}/{normal}.bam",
"{pathGRTC}/{tumor}.bam",
output:
"{pathGRTC}/{tumor}_{normal}.bam"
# wildcard_constraints:
# tumor = '[^_|-|\/][0-9a-zA-Z]*',
# normal = '[^_|-|\/][0-9a-zA-Z]*'
run:
call('touch ' + str(wildcard.tumor) + '_' + str(wildcard.normal) + '.bam', shell=True)
rule gatk_IndelRealigner:
input:
t1="output/{tumor}.bam",
n1="output/{normal}.bam",
output:
"CoClean/{tumor}.sorted.dup.reca.cleaned.bam",
"CoClean/{normal}.sorted.dup.reca.cleaned.bam",
log:
"mapped_reads/merged_samples/logs/{tumor}.indel_realign_2.log"
threads: 8
shell:
"gatk -T IndelRealigner -R {params.genome} "
"-nt {threads} -I {input.t1} -I {input.n1} & {log} "
conditions.csv
$ more condition.csv
tumor normal
TimTum TimBorn
XYZ ABC
Pippo pippa
Thanks for any suggestion
I'm not convinced you have to include two input files to the GATK IndelRealigner. Building from that assumption, you can alter the rule to become indifferent to the "type (tumor vs normal)" of file it is process. I read the specs here. Please, if I am wrong, stop reading and correct me.
rule gatk_IndelRealigner:
input:
inputBAM="output/{sampleGATKIR}.bam",
output:
"CoClean/{sampleGATKIR}.sorted.dup.reca.cleaned.bam",
log:
"mapped_reads/merged_samples/logs/{sampleGATKIR}.indel_realign_2.log"
params:
genome="**DONT FORGET TO ADD THIS""
threads: 8
shell:
"gatk -T IndelRealigner -R {params.genome} "
"-nt {threads} -I {input.inputBAM} & {log} "
By changing the rule to be bam-type agnostic (made up word) you gain two advantages, and there is one main disadvantage.
Advantages:
Now we only have a single wild-card
We can run the alignment of each .bam file independently, which with a devoted CPU should hopefully make things faster.
Disadvantage:
We are now likely putting two copies of the genome onto memory somewhere, since the threads are now being run as separate processes, no more memory sharing of the genome file. (In my previous position, hardware availability wasn't typically an issue, so I heavily am biased towards splitting everything up)
The reason I think that the GATK documentation has it setup to accept multiple 'bam' files is because if you are just using it as a 1-off call you want to list all the files at the same time. We are not needing that since we are automating the call process. We're indifferent to 1 call or 100 calls.

Creating Impala external table from a partitioned file structure

Provided a partitioned fs structure like the following:
logs
└── log_type
└── 2013
├── 07
│   ├── 28
│   │   ├── host1
│   │   │   └── log_file_1.csv
│   │   └── host2
│   │   ├── log_file_1.csv
│   │   └── log_file_2.csv
│   └── 29
│   ├── host1
│   │   └── log_file_1.csv
│   └── host2
│   └── log_file_1.csv
└── 08
I've been trying to create an external table in Impala:
create external table log_type (
field1 string,
field2 string,
...
)
row format delimited fields terminated by '|' location '/logs/log_type/2013/08';
I wish Impala would recurse into the subdirs and load all the csv files; but no cigar.
No errors are thrown but no data is loaded into the table.
Different globs like /logs/log_type/2013/08/*/* or /logs/log_type/2013/08/*/*/* did not work either.
Is there a way to do this? Or should I restructure the fs - any advice on that?
in case you are still searching for an answer.
You need to register each individual partition manually.
See here for details Registering External Table
Your schema for the table needs to be adjusted
create external table log_type (
field1 string,
field2 string,
...)
partitioned by (year int, month int, day int, host string)
row format delimited fields terminated by '|';
After you changed your schema, to include year, month, day and host, you recursively have to add each partition to the table.
Something like this
ALTER TABLE log_type ADD PARTITION (year=2013, month=07, day=28, host="host1")
LOCATION '/logs/log_type/2013/07/28/host1';
Afterwards you need to refresh the table in impala.
invalidate log_type;
refresh log_type;
Another way to do this might be to use the LOAD DATA function in Impala. If your data is in a SequenceFile or other less Impala-friendly format (Impala file formats), you can create your external table like Joey does above but instead of ALTER TABLE, you can do something like
LOAD DATA INPATH '/logs/log_type/2013/07/28/host1/log_file_1.csv' INTO TABLE log_type PARTITION (year=2013, month=07, day=28, host=host1);
With the newer versions of impala you can use the
ALTER TABLE name RECOVER PARTITIONS
command. More info
What you have to be careful about is that, the partitioning fields has to be lowercase as the the directory structure is case sensitive but the impala queries are not.

Resources