I tried to automate the process of cleaning up various wordlists I am working with. This is the following code for it:
#!/bin/bash
# Removes spaces and duplicates in a wordlist
echo "Please be in the same directory as wordlist!"
read -p "Enter Worldlist: " WORDLIST
RESULT=$( awk '{print length, $0}' $WORDLIST | sort -n | cut -d " " -f2- )
awk '!(count[$0]++)' $RESULT > better-$RESULT
This is the error I recieve after running the program:
./wordlist-cleaner.sh: fork: Cannot allocate memory
First post, I hope I formatted it correctly.
You didn't describe your intentions or desired output, but I guess this may do what you want
awk '{print length, $0}' "$WORDLIST" | sort -n | cut -d " " -f2- | uniq > better-RESULT
Notice that it's better-RESULT instead of better-$RESULT as you don't want that as a filename.
Yeah okay I got it to run successfully. I was trying to clean up wordlists I was downloading of the net. I have some knowledge of the basic variable usage in Bash, but not enough of the text manipulation commands like sed or awk. Thanks for the support.
Related
Ok so with the new High Sierra, I am trying to write a script to automatically delete there local snapshots that eat up HDD space. I know you can shrink using thinlocalsnapshots / 1000000000 4 but I feel like that is only a band-aid.
So what I am trying to do is extract the date 2018-02-##-###### from:
sudo tmutil listlocalsnapshots /
com.apple.TimeMachine.2018-02-15-170531
com.apple.TimeMachine.2018-02-15-181655
com.apple.TimeMachine.2018-02-15-223352
com.apple.TimeMachine.2018-02-16-000403
com.apple.TimeMachine.2018-02-16-013400
com.apple.TimeMachine.2018-02-16-033621
com.apple.TimeMachine.2018-02-16-063811
com.apple.TimeMachine.2018-02-16-080812
com.apple.TimeMachine.2018-02-16-090939
com.apple.TimeMachine.2018-02-16-100459
com.apple.TimeMachine.2018-02-16-110325
com.apple.TimeMachine.2018-02-16-122954
com.apple.TimeMachine.2018-02-16-141223
com.apple.TimeMachine.2018-02-16-151309
com.apple.TimeMachine.2018-02-16-161040
I have tried variations of
| awk '{print $ } (insert number after $)
along with
| cut -d ' ' -f 10-.
Please if you know what I am missing here I would greatly appreciate it
edit: Here is the script that will get rid of those pesky Local snapshots.If anyone is interested, Thanks again:
#! /bin/bash
dates=`tmutil listlocalsnapshots / | awk -F "." 'NR++1{print $4}'`
for dates in $dates
do
tmutil deletelocalsnapshots $dates
done
You were close:
somecommand | cut -d"." -f4-
# or
somecommand | awk -F"." '{print $4}'
You can also try sed, but cut is made for this.
1- awk: you can either specify the field separator with the -F option, or print a substring
awk -F. '{print $4}'
awk '{print substr($0,23)}'
2- cut: equivalently.
cut -d. -f4
cut -c23-
3- Pure bash (sloooooow!): same as above.
while IFS=. read s1 s2 s3 d; do echo "$d"; done
while read line; do echo "${line:23}"; done
In practice, with a small number of records as in your use case, speed is not an issue and even pure bash or regexps (as in other aswers) can be used. As the number of records grows, the higher speed of awk and cut becomes noticeable.
Using grep and a regex :
$ grep -oP '\d{4}-\d{2}-\d{2}-\d{6}$'
2018-02-15-170531
2018-02-15-181655
2018-02-15-223352
2018-02-16-000403
2018-02-16-013400
2018-02-16-033621
2018-02-16-063811
2018-02-16-080812
2018-02-16-090939
2018-02-16-100459
2018-02-16-110325
2018-02-16-122954
2018-02-16-141223
2018-02-16-151309
2018-02-16-161040
I am archiving and using split to produce several parts while also printing the output files (from split on STDERR, which I am redirecting to STDOUT). However the loop over the output data doesn't happen until after the command returns.
Is there anyway to actively split over the STDOUT output of a command before it returns?
The following is what I currently have, but it only prints the list of filenames after the command returns:
export IFS=$'\n'
for line in `data_producing_command | split -d -b $CHUNK_SIZE --verbose - $ARCHIVE_PREFIX 2>&1`; do
FILENAME=`echo $line | awk '{ print $3 }'`
echo " - $FILENAME"
done
Try this:
data_producing_command | split -d -b $CHUNK_SIZE --verbose - $ARCHIVE_PREFIX 2>&1 | while read -r line
do
FILENAME=`echo $line | awk '{ print $3 }'`
echo " - $FILENAME"
done
Note however that any variables set in the while loop will not preserve their values after the loop (the while loop runs in a subshell).
There's no reason for the for loop or the read or the echo. Just pipe the stream to awk:
... | split -d -b $CHUNK_SIZE --verbose - test 2>&1 |
awk '{printf " - %s\n", $3 }'
You are going to see some delay from buffering, but unless your system is very slow or you are very perceptive, you're not likely to notice it.
The command substitution needs1 to run before the for loop can start.
for item in $(command which produces items); do ...
whereas a while read -r can start consuming output as soon as the first line is produced (or, more realistically, as soon as the output buffer is full):
command which produces items |
while read -r item; do ...
1 Well, it doesn't absolutely need to, from a design point of view, I suppose, but that's how it currently works.
As William Pursell already noted, there is no particular reason to run Awk inside a while read loop, because that's something Awk does quite well on its own, actually.
command which produces items |
awk '{ print " - " $3 }'
Of course, with a reasonably recent GNU Coreutils split, you could simply do
split --filter='printf " - %s\n" "$FILE"'; cat >"$FILE" ... options
I'm getting confused by this script that I am trying to write. Any help would be appreciated, I searched around and wasn't able to come up with a solution. I'm sure it's right in front of my eyes.
I have a python script that makes an API call which returns a value. I have a file (examplefile.txt) where I would like to submit each of the lines to the python script and save the returned text in test.txt
Here is what I came up with, but isn't working. The script appears to run correctly, and I see all of my submitted values from the exampleFile.txt, but nothing is being saved to the test.txt file
cat exampleFile.txt | while read line; do ./apiCall.py -v $line | cut -f2 -d, > test.txt |; done
Any ideas on how to fix?
ANSWERED THANKS!:
cat exampleFile.txt | while read line; do ./apiCall.py -v $line | cut -f2 -d, >> test.txt |; done
Also could use
while read line; do ./apiCall.py -v $line | cut -f2 -d; done < exampleFile.txt >> test.txt
This is a shell scripting question more than it is a python one. However, I think your issue is " > test.txt" the ">" will start from a blank file each time instead of appending the results. try " >> test.txt"
Protagonists
The Admin
Pipes
The Cron Daemon
A bunch of text processing utilities
netstat
>> the Scribe
Setting
The Cron Daemon is repeatedly performing the same job where he forces an innocent netstat to show the network status (netstat -n). Pipes then have to pick up the information and deliver it to bystanding text processing utilities (| grep tcp | awk '{ print $5 }' | cut -d "." -f-4). >> has to scribe the important results to a file. As his highness, The Admin, is a lazy and easily annoyed ruler, >> only wants to scribe new information to the file.
*/1 * * * * netstat -n | grep tcp | awk '{ print $5 }' | cut -d "." -f-4 >> /tmp/file
Soliloquy by >>
To append, or not append, that is the question:
Whether 'tis new information to bother The Admin with
and earn an outrageous Fortune,
Or to take Arms against `netstat` and the others,
And by opposing, ignore them? To die: to sleep;
note by the publisher: For all those that had problems understanding Hamlet, like I did, the question is, how do I check if the string is already included in the file and if not, append it to the file?
Unless you are dealing with a very big file, you can use the uniq command to remove the duplicate lines from the file. This means you will also have the file sorted, I don't know if this is an advantage or disadvantage for you:
netstat -n | grep tcp | awk '{ print $5 }' | cut -d "." -f-4 >> /tmp/file && sort /tmp/file | uniq > /tmp/file.uniq
This will give you the sorted results without duplicates in /tmp/file.uniq
What a piece of work is piping, how easy to reason about,
how infinite in use cases, in bash and script,
how elegant and admirable in action,
how like a vim in flexibility,
how like a gnu!
Here is a slightly different take:
netstat -n | awk -F"[\t .]+" '/tcp/ {print $9"."$10"."$11"."$12}' | sort -nu | while read ip; do if ! grep -q $ip /tmp/file; then echo $ip >> /tmp/file; fi; done;
Explanation:
awk -F"[\t .]+" '/tcp/ {print $9"."$10"."$11"."$12}'
Awk splits the input string by tabs and ".". The input string is filtered (instead of using a separate grep invocation) by lines containing "tcp". Finally the resulting output fields are concatenated with dots and printed out.
sort -nu
Sorts the IPs numerically and creates a set of unique entries. This eliminates the need for the separate uniq command.
if ! grep -q $ip /tmp/file; then echo $ip >> /tmp/file; fi;
Greps for the ip in the file, if it doesn't find it, the ip gets appended.
Note: This solution does not remove old entries and clean up the file after each run - it merely appends - as your question implied.
How can I replace a column with its hash value (like MD5) in awk or sed?
The original file is super huge, so I need this to be really efficient.
So, you don't really want to be doing this with awk. Any of the popular high-level scripting languages -- Perl, Python, Ruby, etc. -- would do this in a way that was simpler and more robust. Having said that, something like this will work.
Given input like this:
this is a test
(E.g., a row with four columns), we can replace a given column with its md5 checksum like this:
awk '{
tmp="echo " $2 " | openssl md5 | cut -f2 -d\" \""
tmp | getline cksum
$2=cksum
print
}' < sample
This relies on GNU awk (you'll probably have this by default on a Linux system), and it uses openssl to generate the md5 checksum. We first build a shell command line in tmp to pass the selected column to the md5 command. Then we pipe the output into the cksum variable, and replace column 2 with the checksum. Given the sample input above, the output of this awk script would be:
this 7e1b6dbfa824d5d114e96981cededd00 a test
I copy pasted larsks's response, but I have added the close line, to avoid the problem indicated in this post: gawk / awk: piping date to getline *sometimes* won't work
awk '{
tmp="echo " $2 " | openssl md5 | cut -f2 -d\" \""
tmp | getline cksum
close(tmp)
$2=cksum
print
}' < sample
This might work using Bash/GNU sed:
<<<"this is a test" sed -r 's/(\S+\s)(\S+)(.*)/echo "\1 $(md5sum <<<"\2") \3"/e;s/ - //'
this 7e1b6dbfa824d5d114e96981cededd00 a test
or a mostly sed solution:
<<<"this is a test" sed -r 'h;s/^\S+\s(\S+).*/md5sum <<<"\1"/e;G;s/^(\S+).*\n(\S+)\s\S+\s(.*)/\2 \1 \3/'
this 7e1b6dbfa824d5d114e96981cededd00 a test
Replaces is from this is a test with md5sum
Explanation:
In the first:- identify the columns and use back references as parameters in the Bash command which is substituted and evaluated then make cosmetic changes to lose the file description (in this case standard input) generated by the md5sum command.
In the second:- similar to the first but hive the input string into the hold space, then after evaluating the md5sum command, append the string G to the pattern space (md5sum result) and using substitution arrange to suit.
You can also do that with perl :
echo "aze qsd wxc" | perl -MDigest::MD5 -ne 'print "$1 ".Digest::MD5::md5_hex($2)." $3" if /([^ ]+) ([^ ]+) ([^ ]+)/'
aze 511e33b4b0fe4bf75aa3bbac63311e5a wxc
If you want to obfuscate large amount of data it might be faster than sed and awk which need to fork a md5sum process for each lines.
You might have a better time with read than awk, though I haven't done any benchmarking.
the input (scratch001.txt):
foo|bar|foobar|baz|bang|bazbang
baz|bang|bazbang|foo|bar|foobar
transformed using read:
while IFS="|" read -r one fish twofish red fishy bluefishy; do
twofish=`echo -n $twofish | md5sum | tr -d " -"`
echo "$one|$fish|$twofish|$red|$fishy|$bluefishy"
done < scratch001.txt
produces the output:
foo|bar|3858f62230ac3c915f300c664312c63f|baz|bang|bazbang
baz|bang|19e737ea1f14d36fc0a85fbe0c3e76f9|foo|bar|foobar