Bash Scripting - Search files for line above search criteria

Bash Scripting - Search files for line above search criteria - bash

I have 100s of config files, each 10,000 to 20,000 lines long. These are config files for hardware. I need to search through all the config files to find the "profile" associated with a given cert name. There are several different versions of hardware software so the configs files are somewhat different. However the profile name is always above the cert. The profile name does not necessarily contain the cert name.
Example of Profile Names:
clientssl_www.profile-cert
clientssl_www.example.com-cert
Example of Cert Name:
www.example.com.crt
Example sections of config:
profile clientssl clientssl_www.profile-cert {
defaults from clientssl
key "www.example.com.key"
cert "www.example.com.crt"
chain "Intermediate-bundle.crt"
options {
cipher server preference
dont insert empty fragments
no sslv2
}
}
ltm profile client-ssl /Common/clientssl_www.example.com-cert {
app-service none
cert /Common/www.example.com.crt
cert-key-chain {
www.example.com_www.example.com {
cert /Common/www.example.com.crt
chain /Common/Intermediate-bundle.crt
key /Common/www.example.com.key
}
}
chain /Common/Intermediate-bundle.crt
ciphers
key /Common/www.example.com.key
options { dont-insert-empty-fragments cipher-server-preference no-sslv2 }
}
I cannot read the config files line by line as there are millions of lines and it simply takes too long.
I can find the cert name with grep using something like this:
$ grep www.example.com *file.conf | egrep 'cert "|cert /Common'
Which gives me something like this:
cert "www.example.com.crt"
cert /Common/www.example.com.crt
cert /Common/www.example.com.crt
I need to find the 'profile name' that is above my search for a given cert name.
Any suggestions?
Thanks!

You can use -B option of grep which comes handy in such cases. From the man pages for grep:
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups
of matches. With the -o or --only-matching option, this has no effect and a warning is given.
So, the pattern match will now be:
$ grep www.example.com *file.conf | egrep -B3 'cert "|cert /Common'
Output:
profile clientssl clientssl_www.profile-cert {
defaults from clientssl
key "www.example.com.key"
cert "www.example.com.crt"
--
ltm profile client-ssl /Common/clientssl_www.example.com-cert {
app-service none
cert /Common/www.example.com.crt
cert-key-chain {
www.example.com_www.example.com {
cert /Common/www.example.com.crt
However, you will still need to figure out some common pattern in the line containing profile name to single them out. It becomes difficult in your example to filter it further because in the first case, the profile name is three lines before the cert " pattern whereas in the second example, it is two lines before cert / pattern.
Another approach which i find better is to find some pattern in the profile name itself. If all profile names contain the string profile or if they have a pattern such as clientssl.*-cert, then the following pattern match will do what you need:
$ grep www.example.com *file.conf | egrep 'profile|clientssl.*-cert'
Output:
profile clientssl clientssl_www.profile-cert {
ltm profile client-ssl /Common/clientssl_www.example.com-cert {
Even better, if you know that the profile name starts with clientssl_ and ends with -cert, then
$ grep www.example.com *file.conf | grep -o clientssl_.*-cert
Output:
clientssl_www.profile-cert
clientssl_www.example.com-cert

This may be madness, but whenever I see sample data that fits Tcl's syntax rules, I look to produce a Tcl solution:
#!/usr/bin/env tclsh
proc unknown {cmdname args} {
set data [lindex $args end]
if {[set idx [lsearch -exact $data "cert"]] != -1 && [string match $::cert_pattern [lindex $data [incr idx]]]} {
set idx [expr {$cmdname eq "profile" ? 1 : [lsearch -exact $args "profile"] + 2}]
puts [lindex [split [lindex $args $idx] /] end]
}
}
set cert_pattern "*[lindex $argv 0]*"
foreach file [lrange $argv 1 end] {
source $file
}
Then
$ ./cert.tcl www.example.com file.conf
file.conf
clientssl_www.profile-cert
clientssl_www.example.com-cert
I won't bother to explain how it works unless there's a hue and cry.

Related

Nextflow: Missing output file(s) expected by process

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!

Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.

Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

Reorder cert in cert store within shell [duplicate]

I'm working with OpenSSL and need a sane default list of CAs. I'm using Mozilla's list of trusted CAs, as bundled by cURL. However, I need to split this bundle of CA certs, because the OpenSSL documentation says:
If CApath is not NULL, it points to a directory containing CA certificates in PEM format. The files each contain one CA certificate. The files are looked up by the CA subject name hash value, which must hence be available.
For example, using the ca-bundle.crt file directly works fine:
openssl-1.0.1g> ./apps/openssl s_client -connect www.google.com:443 -CAfile /home/user/certs/ca-bundle.crt
...
Verify return code: 0 (ok)
---
DONE
But specifying the directory containing the ca-bundle.crt file does not work:
openssl-1.0.1g> ./apps/openssl s_client -connect www.google.com:443 -CApath /opt/aspera/certs
Verify return code: 20 (unable to get local issuer certificate)
---
DONE
I presume this is because my folder doesn't adhere to what the documentation asks for (namely, a directory containing CA certs in PEM format, with each file containing one cert, named by hash value). My directory just has the single bundle of certs.
How can I split my bundle of certs to adhere to OpenSSL's request that each cert be in an individual file? Bonus points if the hashing can be done too (though if needed I could write a script to do that myself if all the certs are in individual files).

You can split the bundle with awk, like this, in an appropriate directory:
awk 'BEGIN {c=0;} /BEGIN CERT/{c++} { print > "cert." c ".pem"}' < ca-bundle.pem
Then, create the links OpenSSL wants by running the c_rehash utility that comes with OpenSSL:
c_rehash .
Note: use 'gawk' on non linux-platforms - as above relies on a GNU specific feature.

Just to give an alternative; facing the same issue I ended up with csplit:
csplit -k -f bar foo.pem '/END CERTIFICATE/+1' {10}

If you want to get a single certificate out of a multi-certificate PEM, try:
$ awk '/subject.*CN=host.domain.com/,/END CERTIFICATE/' INPUT.PEM
source

The following Ruby-script will split the bundle (with one or more certificates in it) into files named after the hashes -- side-stepping the c_rehash step in most cases.
To use, cd into the right directory (such as /etc/ssl/certs/) and run the script with the path to your certificate bundle as the sole argument. For example: ruby /tmp/split-certificates.rb ca-root-nss.crt.
#!/usr/bin/env ruby
require 'openssl'
blob = IO.binread(ARGV[0]) # Read the entire file at once
DELIMITER = "\n-----END CERTIFICATE-----\n"
blobs = blob.split(DELIMITER)
blobs.each do |blob|
blob.strip!
blob += DELIMITER # Does not break DER
begin
cert = OpenSSL::X509::Certificate.new blob
rescue
puts "Skipping what seems like junk"
next
end
begin
# XXX Need to handle clashes, suffix other than 0
filename=sprintf("%x.0", cert.subject.hash)
File.open(filename,
File::WRONLY|File::CREAT|File::EXCL) do |f|
f.write(blob)
end
rescue Errno::EEXIST
puts "#{filename} already exists, skipping"
end
end

Here is mine in Perl (so much code, but I like gonzo programming):
#!/usr/bin/perl -w
# -------
# Split "certificate bundles" like those found in /etc/pki/tls/certs into
# individual files and append the X509 cleartext description to each file.
#
# The file to split is given on the command line or piped via STDIN.
#
# Files are simply created in the current directory!
#
# Created files are named "certificate.XX" or "trusted-certificate.XX",
# with XX an index value.
#
# If a file with the same name as the output file already exists, it is not
# overwritten. Instead a new name with a higher index is tried.
#
# This works for bundles of both trusted and non-trusted certificates.
#
# See http://tygerclan.net/?q=node/49 for another program of this kind,
# which sets the name of the split-off files in function of the subject
# -------
my #lines = <> or die "Could not slurp: $!";
my $state = "outside"; # reader state machine state
my $count = 0; # index of the certificate file we create
my $fh; # file handle of the certificate file we create
my $fn; # file name of the certificate file we create
my $trusted; # either undef or "TRUSTED" depend on type of certificate
for my $line (#lines) {
chomp $line;
if ($state eq "outside") {
if ($line =~ /^(-----BEGIN (TRUSTED )?CERTIFICATE-----)\s*$/) {
my $marker = $1;
$trusted = $2;
$state = "inside";
my $created = 0;
my $prefix = "";
if ($trusted) {
$prefix = "trusted-"
}
while (!$created) {
$fn = "${prefix}certificate.$count";
$count++;
if (-f $fn) {
# print STDERR "File '$fn' exists; increasing version number to $count\n";
}
else {
print STDERR "Certificate data goes to file '$fn'\n";
open($fh,">$fn") || die "Could not create file '$fn': $!\n";
$created = 1;
print $fh "$marker\n"
}
}
}
else {
print STDERR "Skipping line '$line'\n"
}
}
else {
if ($line =~ /^(-----END (TRUSTED )?CERTIFICATE-----)\s*$/) {
my $marker = $1;
my $trustedCheck = $2;
if (!((($trusted && $trustedCheck) || (!$trusted && !$trustedCheck)))) {
die "Trusted flag difference detected\n"
}
$state = "outside";
print $fh "$marker\n";
print STDERR "Closing file '$fn'\n";
close $fh;
# Append x509 cleartext output by calling openssl tool
`openssl x509 -noout -text -in '$fn' >> '$fn'`;
if ($? != 0) {
die "Could not run 'openssl x509' command: $!\n";
}
}
else {
print $fh "$line\n"
}
}
}
if ($state eq "inside") {
die "Last certificate was not properly terminated\n";
}

How can I edit a .conf file easily?

So I read the easiest way to use .conf files for bash scripts is to use source to load such files. Now, what if I want to edit this file ?
Some code I found does a really good job :
function set_config(){
sed -i "s/^\($1\s*=\s*\).*\$/\1$2/" $conf_file
}
But, if the variable is not yet defined, it doesn't define it, nor does it check if the parameters are passed well, isn't secure, doesn't handle default values etc...
Does reliable tools/code already exists to edit .conf file which contain key="value" pairs ? For instance, I would like to be able to do things like this :
$conf_file="my_script.conf"
conf_load $conf_file #should create the file if it doesn't exist !
read=$(conf_get_value "data" "default_value") #should read the value with key "data", defaulting to "default_value"
if [[ $? = 0 ]] #we should be able to know if the read was successful
then
echo "Successfully read value for field \"data\" : $read"
else
echo "Default value for field \"data\" : $read"
fi
conf_set "something_new" "a great value!" #should add the key "something_new" as it doesn't exist
conf_set "data" "new_value" #should edit the value with key "data"
if [[ $? = 0 ]]
then
echo "Edit successful !"
else #something went wrong :-/
echo "Edit failed !"
fi
before running this code, the conf file would contain
data="some_value"
and after it would be
data="new_value"
something_new="a great value!"
and the code should output
Successfully read value for field "data" : some_value
Edit successful !
I am using bash version 4.3.30 .
Thanks for your help.

I'd to that with awk since it's rather good at tokenizing:
# overwrite config's entries for KEY with VALUE or else appends the definition
# Usage: set_config KEY VALUE
set_config() {
[ -n "$1" ] && awk -F= -v key="$1" -v new="$1=\"$2\"" '
$1 == key { $0 = new; key_found = 1; }
{ print }
END { if (!key_found) { print new; }
' "$conf_file" > "$conf_file.new" \
&& cat "$conf_file.new" > "$conf_file" && rm "$conf_file.new"
}
If run without arguments, set_config() will do nothing and return false. If run with only one argument, it will create an empty value (outputting KEY="").
The awk command parses the .conf file line by line, looking for each definition of the given key and altering it to the new value. All lines are then printed (with or without modification), preserving the original order. If the key hasn't yet been found by the end of the file, this appends the new definition.
Because you can't pipe a file atop itself, this gets saved with a ".new" extension and then copied atop the original in a manner that preserves permissions. The ".new" copy is then removed. I used && to ensure that these never happen if an error occurred earlier in the function.
Also note that the type of ".conf file" you're referring to (the type you source with a POSIX shell) will never have spaces around its equals signs, so the \s* parts of your sed command aren't needed.

How to remove multiple lines after string in powershell?

I have been at my internship for a couple days and recently was asked to do some Powershell scripting and was excited to learn some new things!
However, it has been a time consuming task because searching for things is so hard to find what you want.
Anyways, I was tasked with removing all sensitive data from a word file. It has not been too bad except until now. For example from a text file:
User pass created now moving on..
Password 7 ##########
All done
Which I had to remove all the numbers after searching the file for "Password 7" and similar tasks which did not take me too long.
Now, I have things such as that are a fixed length after:
Self-Service certificate ####### ######## #######
######## ######## ######## ########## #########
########## ##### ######## ########## ##########
With strings on multiple lines. I can remove the top line, but cannot figure out the next lines because they are just random numbers and I have nothing to search for. I have tried things like nr \n \r and many combinations. I am stumped.
$configFiles=get-childitem . *.txt -rec
foreach ($file in $configFiles)
{
$readIn = #(Get-Content $file.PSPath)
$readIn -replace "Password 7.*" , "Password 7 <REMOVED>" -replace "Secret 5.*" , "Secret 5 <REMOVED>" -replace "snmp-server community\s\S*" , "snmp-server community <REMOVED>" |
Set-Content $file.PSPath
}
That is my current code and it is working well so far. I have been messing around with the multi-line removal in a separate script. Thanks for the help.

Sounds like you are trying to delete a certificate from a Cisco configuration.
$config = #"
!
crypto ca certificate chain TP-self-signed-12345678
certificate self-signed 01
3082022B 30820194 A0030201 02020101 300D0609 2A864886 F70D0101 04050030
4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 37363538
528BD5A8 E7E26C51 10BAB609 5B60228F C8DE0299 7BE85C2D 9769FF05 C295706F
3082022B 30820194 A0030201 02020101 300D0609 2A864886 F70D0101 04050030
4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 37363538
528BD5A8 E7E26C51 10BAB609 5B60228F C8DE0299 7BE85C2D 9769FF05 C295706F
3082022B 30820194 A0030201 02020101 300D0609 2A864886 F70D0101 04050030
4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 37363538
528BD5A8 E7E26C51 10BAB609 5B60228F C8DE0299 7BE85C2D 9769FF05 C295706F
quit
Username joe password bloggs
!
"#
$regex = [regex] '(?sm)certificate self-signed 01\s+([0-9A-F\s]+?)\s+quit'
$result = $config | Select-String -Pattern $regex
$cert = $result.Matches.Groups[1].Value
$censored = $config -replace $cert, '<REMOVED>'
Write-Output $censored
Output:
!
crypto ca certificate chain TP-self-signed-12345678
certificate self-signed 01
<REMOVED>
quit
Username joe password bloggs
!

The trick is to parse the entire text as a single string block and construct a regular expression that can detect the carriage returns embedded in the text.
By default Get-Content will return an array of strings separated at the carriage return. TO load in the test as a single text blob, use the -Raw parameter:
$readIn = Get-Content $file.PSPath -Raw
Then construct a regular expression that can detect the portion you want to remove, carriage returns and all. In this example here, I am assuming that the sensitive bit is 13 blocks of characters at least 5 character long separated by whitespace or carriage return:
$readIn -replace 'Self-Service certificate (\S{5,}[\s\n]+){13}', "Self-Service certificate <removed>`n" | Set-Content $file.PSPath

Bash Scripting - Using Entry immediately after certain phrase

In my project, addgpg-apt (https://launchpad.net/addgpg-apt), I'd like to be able to have input sent into the program. From that input, say...
Unable to verify signatures in PPA. Check that this is fixed: NO_PUBKEY <PGPkeyID>
... how can I, using basic Bash, grep, etc. get that <PGPkeyID> from the string and ignore everything else in the string? (Note that these errors are generated by apt-get/apt, and as such the end of the string is always NO_PUBKEY <PGPkeyID>)
In Java, this could be done with substring, and grab only that PGPkeyID based on the location of the phrase NO_PUBKEY, but I want this to be done in Bash only, so any solutions would be appreciated.

Piping to grep -o "NO_PUBKEY <.*>" | sed -e 's/.*<\(.*\)>.*/\1/'
will yield you:
PGPkeyID
Update
Assuming your input is like this:
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY CAFE0123DEADBEEF
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0123DEADBEEFCAFE
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY DEADBEEFCAFE0123
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY BEEFCAFE0123DEAD
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY CAFE0123DEADBEEF
The following command will extract the keys:
grep -o 'PUBKEY [A-F0-9]\{16\}' | cut -f2 -d" " | sort -u
Like this:
0123DEADBEEFCAFE
BEEFCAFE0123DEAD
CAFE0123DEADBEEF
DEADBEEFCAFE0123

You can do that in pure shell without spawning a process:
cat << EOF > file
...
Unable to verify signatures in PPA. Check that this is fixed: NO_PUBKEY <PGPkeyID1>
...
Unable to verify signatures in PPA. Check that this is fixed: NO_PUBKEY <PGPkeyID2>
...
EOF
cat file | while read line; do
if [[ $line == *\ NO_PUBKEY\ * ]]
then
echo ${line#* NO_PUBKEY }
fi
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio