AWK - match multiple patterns at once - shell

I'm trying to process CSV file to find patterns like 'duser=','dhost=' and 'dproc=' and once found print next string after. I have to use pattern match first due to fact that content of CSV file is not constant. Field separators are not constant as well. Please take into consideration that CSV file contains logs in CEF format and contains much more other patterns and values. Sample log format:
CEF:0|Microsoft|Microsoft Windows|Windows 7|Microsoft-Windows-Security-Auditing:4688|A new process has been created.|Low| eventId=1010044130 externalId=4688 msg=Token Elevation Type indicates the type of token that was assigned to the new process in accordance with User Account Control policy.Type 1 is a full token with no privileges removed or groups disabled. Type 2 is an elevated token with no privileges removed or groups disabled.Type 3 is a limited token with administrative privileges removed and administrative groups disabled. type=1 start=1523950846517 categorySignificance=/Informational categoryBehavior=/Execute/Start categoryDeviceGroup=/Operating System catdt=Operating System categoryOutcome=/Success categoryObject=/Host/Resource/Process art=1523950885975 cat=Security deviceSeverity=Audit_success rt=1523950863727 dhost=A-Win7Test.*****.net dst=**.**.**.46 destinationZoneURI=/All Zones/ArcSight System/Public Address Space Zones/******* dntdom=****** oldFileHash=en_US|UTF-8 cnt=5 cs2=Process Creation cs6=TokenElevationTypeDefault (1) cs1Label=Mandatory Label cs2Label=EventlogCategory cs3Label=New Process ID cs4Label=Process Command Line cs5Label=Creator Process ID cs6Label=Token Elevation Type ahost=a-server09.****.net agt=**.**.**.9 agentZoneURI=/All Zones/ArcSight System/Public Address Space Zones/******** amac=00-50-56-B8-4F-BB av=7.7.0.8044.0 atz=GMT at=winc dvchost=A-Win7Test.*****.net dvc=**.**.**.46 deviceZoneURI=/All Zones/ArcSight System/Public Address Space Zones/********** deviceNtDomain=***** dtz=GMT _cefVer=0.1 aid=3AaTkhlEBABCABcfWDDqDbw\=\=
Ref: https://community.softwaregrp.com/t5/ArcSight-User-Discussions/Issue-with-Windows-Event-4688/td-p/1641345
Seems that below command works:
... | awk 'sub(/.*duser=/,""){print "User:",$1}
However, it works only for the first pattern. After execution as you can guess, there are no more lines to process. Is there any option to execute above command 3 times with different pattern to get a list of 3 columns?
I would like to achieve:
duser=AAA dhost=BBB dproc=CCC
duser=DDD dhost=EEE dproc=FFF
duser=GGG dhost=HHH dproc=III
Appreciate your help, thank you

Like this?
$ cat file
duser=AAA dhost=BBB dproc=CCC
duser=DDD dhost=EEE dproc=FFF
duser=GGG dhost=HHH dproc=III
$ awk '{print gensub("duser=([^ \t,]+)[ \t,]+dhost=([^ \t,]+)[ \t,]+dproc=([^ \t,]+)", "User: \\1, Host: \\2, Proc: \\3
", 1);}' file
User: AAA, Host: BBB, Proc: CCC
User: DDD, Host: EEE, Proc: FFF
User: GGG, Host: HHH, Proc: III
If the three parts are in different positions and with different sequences, then try this:
awk '{match($0,"duser=([^ \t,]+)",user); match($0,"dhost=([^ \t,]+)",host); match($0,"dproc=([^ \t,]+)",proc); print "User: " user[1] ", Host: " host[1] ", Proc: " proc[1];}' file
Please read mcve before you ask another question.

You can try Perl.
$ cat lack_of_threat.txt
duser=AAA dhost=BBB dproc=CCC
duser=DDD dhost=EEE dproc=FFF
duser=GGG dhost=HHH dproc=III
$ perl -ne ' /duser=(\S+)\s*dhost=(\S+)\s*dproc=(\S+)/; print "User:$1, Host:$2, Proc:$3\n" ' lack_of_threat.txt
User:AAA, Host:BBB, Proc:CCC
User:DDD, Host:EEE, Proc:FFF
User:GGG, Host:HHH, Proc:III
$

Related

How to run command in parallel way within a shell on a single ssh connection in a for loop

I am running a for loop around 3000 volumes within a ssh connection on a storage Server where this runs in a loop one by one, whereas i want the command vol show-footprint "$vols" -fields volume-blocks-footprint,volume-blocks-footprint-bin0,volume-blocks-footprint-bin1 to run parallel over multiple volumes at a time, lets say run it at 10 volumes in a go.
Here myTotalVol contains all 3k volume names like below:
myvol001
myvol002
myvol003
myvol004
myvol005
Below is the small for loop which is working.
for vols in $(cat myTotalVol);
do
echo -n "$vols " ;\
ssh storageServer01 "row 0; set -unit MB; \
vol show-footprint "$vols" -fields volume-blocks-footprint,volume-blocks-footprint-bin0,volume-blocks-footprint-bin1"; \
done
Please suggest if I can run the mentioned command over multiple volumes at a time which are kept in myTotalVol file.
Edit:
As asked by Mark Setchell in the comment section, hence below is just a view how its working ...
$ ssh store01
Last login time: 6/30/2022 10:49:41
store01::> row 0;set -unit MB
(rows)
store01::> vol show-footprint myvol001 -fields volume-blocks-footprint,volume-blocks-footprint-bin0,volume-blocks-footprint-bin1
vserver volume volume-blocks-footprint volume-blocks-footprint-bin0 volume-blocks-footprint-bin1
-------- -------------------- ----------------------- ---------------------------- ----------------------------
myvol001 myvol00198703MB 48272MB 51988MB
as you see the command vol show-footprint myvol001 -fields volume-blocks-footprint,volume-blocks-footprint-bin0,volume-blocks-footprint-bin1 here, i have to run this command over 3000 Volumes like i have myvol001 in this command so, myvol001 will go into the variable like i am using into the for loop and there i am using "$vols" which are referring to 3k vols from a file.
I'm not sure exactly what you are driving at, but you should be able to make a compound statement that generates the commands you want and then send that via ssh like this:
{ printf "row 0; set -unit MB;\n"
while read -r vol ; do
printf "vol show-footprint ${vol} -fields volume-blocks-footprint,volume-blocks-footprint-bin0,volume-blocks-footprint-bin1\n"
done < myTotalVol } | ssh store01
You can try it out and see what it produces if you put a comment character before the | like this:
{ ...
...
done < myTotalVol } # | ssh store01
Or you can do it with awk:
awk 'BEGIN{print "row 0; set -unit MB"} {print "vol show-footprint", $1, "-fields volume-blocks-footprint,volume-blocks-footprint-bin0,volume-blocks-footprint-bin1"}' myTotalVol | ssh store01
Again, put # in front of | ssh store01 in order to see and check the output without sending it to ssh.

Nextflow: Missing output file(s) expected by process

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!
Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.
Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

Echo printing variables in a completely wrong order

I am trying to create a string with a query that will be save / send to another location, this string contains different variables.
The issue that I am having is that the echo of the variables are completely upside down and mix.
See code below:
tokenID=$(docker exec -ti $dockerContainerID /bin/sh -c "cat /tempdir/tokenfile.txt")
serverName="asdasd"
attQuery="$tokenID $serverName"
agentRegQuery="$./opt/mule/bin/amc_setup -H $attQuery"
echo TOKEN ID $tokenID
echo SERVER NAME $serverName
echo $attQuery
echo $agentRegQuery
Find below the output I am receiving:
TOKEN ID 29a6966f-fa0e-4f08-87eb-418722872d80---46407
SERVER NAME asdasd
asdasdf-fa0e-4f08-87eb-418722872d80---46407
asdasdmule/bin/amc_setup -H 29a6966f-fa0e-4f08-87eb-418722872d80---46407
There's a carriage return character at the end of the tokenID variable, probably because /tempdir/tokenfile.txt is in DOS/Windows format (lines end with carriage return+linefeed), not unix (lines end with just linefeed). When you print tokenID by itself, it looks ok, but if you print something else after that on the same line, it winds up overwriting the first part of the line. So when you print $attQuery, it prints this:
29a6966f-fa0e-4f08-87eb-418722872d80---46407[carriage return]
asdasd
...but with the second line printed on top of the first, so it comes out as:
asdasdf-fa0e-4f08-87eb-418722872d80---46407
The solution is to either convert the file to unix format (dos2unix will do this if you have it), or remove the carriage return in your script. You can do it like this:
tokenID=$(docker exec -ti $dockerContainerID /bin/sh -c "cat /tempdir/tokenfile.txt" | tr -d '\r')
I think everything works as it should
echo TOKEN ID $tokenID -> TOKEN ID 29a6966f-fa0e-4f08-87eb-418722872d80---46407
echo SERVER NAME $serverName -> SERVER NAME asdasd
echo $attQuery -> asdasdf-fa0e-4f08-87eb-418722872d80---46407
echo $agentRegQuery -> asdasdmule/bin/amc_setup -H 29a6966f-fa0e-4f08-87eb-418722872d80---46407
Why do you think something is wron here?
Best regards, Georg

Custom fact with home directorys as domains for puppet

I'm trying to generate a custom fact called domains.
the idea is to list all the directories within /home but remove some default directory's such as centos, ec2-user, myadmin.
I'm using bash as I don't know ruby. so far my script outputs the list into a txt file which it then cats the answer for factors. but it is treated as one long answer and not multiple like an array?
My script is as follows:
#!/bin/bash
ls -m /home/ | sed -e 's/, /,/g' | tr -d '\n' > /tmp/domains.txt
cat /tmp/domains.txt | awk '{gsub("it_support,", "");print}'| awk '{gsub("ec2-user,", "");print}'| awk '{gsub("myadmin,", "");print}'| awk '{gsub("nginx", "");print}'| awk '{gsub("lost+found,", "");print}' > /tmp/domains1.txt
echo "domains={$(cat /tmp/domains1.txt)}"
exit
Foremans sees my domains as
facts.domains = "{domain1,domain2,domain3,domain4,lost+found,}"
I also need to remove lost+found, some how.
Any help or advice would be appreciated
Kevin
I'm also not familiar with ruby, but I have an idea for some workaround:
Please look at the following example about returning an array of network interfaces. Now to create domain_array fact use the following code:
Facter.add(:domain_array) do
setcode do
domains = Facter.value(:domains)
domain_array = domains.split(',')
domain_array
end
end
You can put a parser function to do this. Parser functions go inside:
modules/<modulename>/lib/puppet/parser/functions/getdomain.rb
Note: Parser function will compile only in the puppet master. See below for a custom fact that will run on the agent.
getdomain.rb can contain something like the following for your purpose:
module Puppet::Parser::Functions
newfunction(:getdomain, :type => :rvalue) do |args|
dnames=Array.new
Dir.foreach("/home/") do |d|
# Avoid listing directories starts with . or ..
if !d.start_with?('.') then
# You can put more names inside the [...] that you want to avoid
dnames.push(d) unless ['lost+found','centos'].include?(d)
end
end
domainlist=dnames.join(',')
return domainlist
end
end
You can call it from a manifest and assign to a variable:
$myhomedomains=getdomain()
$myhomedomains should return something similar to this : user1,user2,user3
.......
For a custom fact with similar code. You can put it in :
modules/<modulename>/lib/facter/getdomain.rb
Content of getdomain.rb :
Facter.add(:getdomain) do
setcode do
dnames=Array.new
Dir.foreach("/home/") do |d|
# Avoid listing directories starts with . or ..
if !d.start_with?('.') then
# You can put more names inside the [...] that you want to avoid
dnames.push(d) unless ['lost+found','centos'].include?(d)
end
end
getdomain=dnames.join(',')
getdomain
end
end
You can call the getdomain fact in any manifest, for example, calling it from the same module's init.pp :
notify { "$::getdomain" : }
will result in something similar :
Notice: /Stage[main]/Testmodule/Notify[user1,user2,user3]

Puppet: How can I wrap a command into two line if >80 characters?

In puppet, if define command is > 80 characters, how can I wrap into two line to do it?
exec { 'create_domain':
command => "some command exceed 80 character...........................................................how to do how to do?.......",
}
It's sort of ugly, but if the last character in a string is a \ followed by a newline, then the string is continued on the next line. My sample.pp manifest is below:
exec { 'wrapped_string_example':
command => "/bin/echo 12345678901234567890123456789012345678901234567890\
wrapped > /var/tmp/test.txt";
}
Running this with puppet apply sample.pp gives the following output
$ puppet apply sample.pp
notice: /Stage[main]/Exec[wrapped_string_example]/returns: executed successfully
notice: Finished catalog run in 0.10 seconds
And catting the created file shows the lines have wrapped:
$ cat /var/tmp/test.txt
12345678901234567890123456789012345678901234567890wrapped
See https://github.com/puppetlabs/puppet/blob/9fbb36de/lib/puppet/parser/lexer.rb#L537 (as of Puppet v2.7.0)
Also this is sort of a known issue: http://projects.puppetlabs.com/issues/5022
For big chunks of data, heredocs are the best way of dealing with long lines in Puppet manifests. The /L interpolation option is particularly useful. /L causes \ at the end of a line to remove newlines. For example, the following does what you'd expect, stripping indentation and newlines, including the trailing newline.
sshkey { 'example.com':
ensure => present,
type => 'ssh-rsa',
key => #(KEY/L),
RfrXBrU1T6qMNllnhXsJdaud9yBgWWm6OprdEQ3rpkTvCc9kJKH0k8MNfKxeBiGZVsUn435q\
e83opnamtGBz17gUOrzjfmpRuBaDDGmGGTPcO8Dohwz1zYuir93bJmxkNldjogbjAWPfrX10\
8aoDw26K12sK61lOt6GTdR9yjDPdG4zL5G3ZjXCuDyQ6mzcNHdAPPFRQdlRRyCtG2sQWpWan\
3AlYe6h6bG48thlo6vyNvOD8s9K0YBnwl596DJiNCY6EsxnSAhA3Uf9jeKqlVqqrxhEzHufx\
07iP1nXIXCMUV
|-KEY
target => '/home/user/.ssh/authorized_keys',
}
Or to keep the final newline, leave out the - before the end text:
exec { 'create_domain':
command => #(CMD/L),
/bin/echo 123456789012345678901234567890123456789012345678901234567890123456\
wrapped > /var/tmp/test.txt
| CMD
}
As of Puppet 3.5 you have a couple of options that i have used. Ruby allows you to concat strings over a couple of lines.
string = "line #1"\
"line #2"\
"line #3"
p string # => "line #1line #2line #3"
Another option, as of Puppet 3.5 they have added HereDoc functionality. This will allow you to put the string in a section of a source code file that is treated as if it were a separate file.
$mytext = #(EOT)
This block of text is
visibly separated from
everything around it.
| EOT
The puppet documentation is here: https://docs.puppet.com/puppet/4.9/lang_data_string.html#heredocs
If you really care about the 80cols limit you can always abuse a template to achieve that goal
exec {'VeryLongExec':
command => template("${module}/verylongexec")
}
Then put the actual command in that template file
Credits should go to Jan Vansteenkiste to figure

Resources