Curl and xargs in piped commands - bash

I want to process an old database where password are plain text (comma separated ; passwd is the 5th field in the csv file where the database has been exported) to crypt them for further use by dokuwiki. Here is my bash command (grep and sed are there to extract the crypted passwd from curl output) :
cat users.csv | awk 'FS="," { print $4 }' | xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' | xargs | grep -o '<tt.*tt>' | sed -e 's/tt//g' | sed -e 's/<[^>]*>//g'
I get the following comment from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
And only the first line of the file is processed, and nothing appends then.
Using the -0 option, and playing around with quotes, doesn't solve anything. Where am I wrong in the command line ? May be a more advanced language will be more adequate to do this.
Thank for help, LM

In general, if you have such a long pipe of commands, it is better to split them if things go wrong. Going through your pipe:
cat users.csv |
Nothing unexpected there.
awk 'FS="," { print $4 }' |
You probably wanted to do awk 'BEGIN {FS=","} { print $4 }'. Try the first two commands in the pipe and see if they produce the correct answer.
xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' |
Nothing wrong there, although there might be better ways to do an MD5 hash.
xargs |
What is this xargs doing in the pipe? It should be removed.
grep -o '<tt.*tt>' |
Note that this will produce two lines:
<tt>$1$17ab075e$0VQMuM3cr5CtElvMxrPcE0</tt>
<tt><your_docuwiki_root>/conf/users.auth.php</tt>
which is probably not what you expected.
sed -e 's/tt//g' |
sed -e 's/<[^>]*>//g'
which will remove the html-tags, though
sed 's/<tt>//;s/<.tt>//'
will do the same.
So I'd say a wrong awk and an xargs too many.

Related

How can I deduplicate filenames across directories?

I run the following gsutil command:
gsutil ls -d gs://mybucket/v${version}/folder1/*/*.whl |
sort -V |
grep -e "/*.whl"
I get:
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561595893/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561654308/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319372/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319400/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563329633/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563411368/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565916833/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565921265/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1566258114/file1-cp27-cp27mu-linux_x86_64.whl
Since some files in different folders have the same names, how can I retrieve unique filenames ignoring the path?
I would do it like this:
blabla_your_command | rev | sort -t'/' -u -k1,1 | rev
rev reverses lines. Then I unique sort using / as a separator on the first field. After the line is reversed, the first field will be the filename, so sorting -u on it would return only unique filenames. Then the line needs to be reversed back.
The following command:
cat <<EOF |
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561595893/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561654308/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319372/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319400/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563329633/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563411368/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565916833/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565921265/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1566258114/file1-cp27-cp27mu-linux_x86_64.whl
EOF
rev | sort -t'/' -u -k1,1 | rev
outputs:
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
Please check awk option given below, this will print the last occurrence of delimiter '/', it worked for me
example:
gsutil ls gs://mybucket/v1.0.0/folder1/1560930522 | awk -F/ '{print $(NF)}'
print all the file names under '1560930522'
your_command|awk -F/ '!($NF in a){a[$NF]; print}'
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
4 different ways of saying the same thing
nawk -F'^.+/' '++_[$NF]<NF'
gawk -F'/' '__[$NF]++<!_'
mawk -F/ '_^__[$NF]++'
mawk2 -F/ '!_[$NF]--'
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
Here's a simple, straightforward solution:
$ your_gsutil_command | xargs -L 1 basename | sort -u
The easiest way to remove paths is with basename. Unfortunately it accepts only a single filename, which must be on the command line (not from stdin), so we need to take the following steps:
Create the list of files.
We do this with your_gsutil_command, but you can use any command that generates a list of files.
Send each one to basename to remove its path.
The xargs command does this for us by reading its stdin and invoking basename repeatedly, passing the data as command-line arguments. But xargs efficiently tries to reduce the number of invocations by passing multiple filenames on each command line, and that breaks basename. We prevent that with -L 1, limiting it to only one line (that is, one filename) at a time.
Remove duplicates.
The sort -u command does this.
Using your example data:
$ gsutil ls -d gs://mybucket/v${version}/folder1/*/*.whl |
xargs -L 1 basename | sort -u
file1-cp27-cp27mu-linux_x86_64.whl
file1-cp35-cp35m-linux_x86_64.whl
file1-cp36-cp36m-linux_x86_64.whl
file1-cp37-cp37m-linux_x86_64.whl
Caveat: Spaces break everything. 😡
So far we've assumed the filenames and folders do not contain spaces. Spaces break basename because needs exactly one filename, and it would interpret spaces as separators between multiple filenames. We can get around this in two ways:
ls -Q: If you're deduplicating local filenames, you can use the (non-gsutil) ls command with the -Q flag to put the filenames in quotes, so basename will interpret spaces as part of the filenames rather than separators.
gsutil: The -Q flag is unfortunately not supported, so we'll need to escape the spaces manually:
$ your_gsutil_command | sed 's/ /\\ /g' | xargs -L 1 basename | sort -u
Here we use the sed command to escape each space by inserting a backslash before it. (That is, we replace with \ . Note that we also need to escape the backslash in the sed command, which is why we use \\ and not just \.)

Execute piped shell commands in Tcl

I want to execute these piped shell commands in Tcl:
grep -v "#" inputfile | grep -v ">" | sort -r -nk7 | head
I try:
exec grep -v "#" inputfile | grep -v ">" | sort -r -nk7 | head
and get an error:
Error: grep: invalid option -- 'k'
When I try to pipe only 2 of the commands:
exec grep -v "#" inputfile | grep -v ">"
I get:
Error: can't specify ">" as last word in command
Update: I also tried {} and {bash -c '...'}:
exec {bash -c 'grep -v "#" inputfile | grep -v ">"'}
Error: couldn't execute "bash -c 'grep -v "#" inputfile | grep -v ">"'": no such file or directory
My question: how can I execute the initial piped commands in a tcl script?
Thanks
The problem is that exec does “special things” when it sees a > on its own (or at the start of a word) as that indicates a redirection. Unfortunately, there's no practical way to avoid this directly; this is an area where Tcl's syntax system doesn't help. You end up having to do something like this:
exec grep -v "#" inputfile | sh -c {exec grep -v ">"} | sort -r -nk7 | head
You can also move the entire pipeline to the Unix shell side:
exec sh -c {grep -v "#" inputfile | grep -v ">" | sort -r -nk7 | head}
Though to be frank this is something that you can do in pure Tcl, which will then make it portable to Windows too…
The > is causing problems here.
You need to escape it from tcl and the shell to make it work here.
exec grep -v "#" inputfile | grep -v {\\>} | sort -r -nk7 | head
or (and this is better since you have one less grep)
exec grep -Ev {#|>} inputfile | sort -r -nk7 | head
If you look in the directory you were running this from (assuming tclsh or similar) you'll probably see that you created an oddly named file (i.e. |) before.
In pure Tcl:
package require fileutil
set lines {}
::fileutil::foreachLine line inputfile {
if {![regexp #|> $line]} {
lappend lines $line
}
}
set lines [lsort -decreasing -integer -index 6 $lines]
set lines [lrange $lines 0 9]
puts [join $lines \n]\n
(-double might be more appropriate than -integer)
Edit: I mistranslated the (1-based) -k index for the command sort when writing the (0-based) -index option for lsort. It is now corrected.
Documentation: fileutil package, if, join, lappend, lrange, lsort, package, puts, regexp, set

Bash code error unexpected syntax error

I am not sure why i am getting the unexpected syntax '( err
#!/bin/bash
DirBogoDict=$1
BogoFilter=/home/nikhilkulkarni/Downloads/bogofilter-1.2.4/src/bogofilter
echo "spam.."
for i in 'cat full/index |fgrep spam |awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500'
do
cat $i |$BogoFilter -d $DirBogoDict -M -k 1024 -v
done
echo "ham.."
for i in 'cat full/index | fgrep ham | awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500'
do
cat $i |$BogoFilter -d $DirBogoDict -M -k 1024 -v
done
Error:
./score.bash: line 7: syntax error near unexpected token `('
./score.bash: line 7: `for i in 'cat full/index |fgrep spam |awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500''
Uh, because you have massive syntax errors.
The immediate problem is that you have an unpaired single quote before the cat which exposes the Awk script to the shell, which of course cannot parse it as shell script code.
Presumably you want to use backticks instead of single quotes, although you should actually not read input with for.
With a fair bit of refactoring, you might want something like
for type in spam ham; do
awk -F"/" -v type="$type" '$0 ~ type && NR>1000 && i++<500 {
print $2"/"$3 }' full/index |
xargs $BogoFilter -d $DirBogoDict -M -k 1024 -v
done
This refactors the useless cat | grep | awk | head into a single Awk script, and avoids the silly loop over each output line. I assume bogofilter can read file name arguments; if not, you will need to refactor the xargs slightly. If you can pipe all the files in one go, try
... xargs cat | $BogoFilter -d $DirBogoDict -M -k 1024 -v
or if you really need to pass in one at a time, maybe
... xargs sh -c 'for f; do $BogoFilter -d $DirBogoDict -M -k 1024 -v <"$f"; done' _
... in which case you will need to export the variables BogoFilter and DirBogoDict to expose them to the subshell (or just inline them -- why do you need them to be variables in the first place? Putting command names in variables is particularly weird; just update your PATH and then simply use the command's name).
In general, if you find yourself typing the same commands more than once, you should think about how to avoid that. This is called the DRY principle.
The syntax error is due to bad quoting. The expression whose output you want to loop over should be in command substitution syntax ($(...) or backticks), not single quotes.

grep pipe with sed

This is my bash command
grep -rl "System.out.print" Project1/ |
xargs -I{} grep -H -n "System.out.print" {} |
cut -f-2 -d: |
sed "s/\(.*\):\(.*\)/filename is \1 and line number is \2/
What I'm trying to do here is,I'm trying to iterate through sub folders and check what files contains "System.out.print" (using grep)
using 2nd grep trying to get file names and line numbers
using sed command I display those to console.
from here I want to remove "System.out.print" with "XXXXX" how I can pipe sed command to this?
pls help me
thanxx
GNU sed has an option to change files in place:
find Project1/ -type f | xargs sed -i 's/System\.out\.print/XXXXX/g'
Btw, your script could be written as:
grep -rsn 'root' /etc/ |
awk -F: '{ print "filename is", $1, "and line number is", $2 }'
I'm just building on hop's answer, which I found to be more useful than find -exec. I had search_text dispersed all over my computer, in logs, config files and so on, but I didn't want to search (or especially change) anything in /dev, /sys, /proc, and so on. One note, read man xargs; it doesn't like file names with spaces.
grep -HriIl --exclude-dir=dev --exclude-dir=proc --exclude-dir=sys search_text / | xargs sed -i 's/search_text/replace_text/g'

How do you pipe input through grep to another utility?

I am using 'tail -f' to follow a log file as it's updated; next I pipe the output of that to grep to show only the lines containing a search term ("org.springframework" in this case); finally I'd like to make is piping the output from grep to a third command, 'cut':
tail -f logfile | grep org.springframework | cut -c 25-
The cut command would remove the first 25 characters of each line for me if it could get the input from grep! (It works as expected if I eliminate 'grep' from the chain.)
I'm using cygwin with bash.
Actual results: When I add the second pipe to connect to the 'cut' command, the result is that it hangs, as if it's waiting for input (in case you were wondering).
Assuming GNU grep, add --line-buffered to your command line, eg.
tail -f logfile | grep --line-buffered org.springframework | cut -c 25-
Edit:
I see grep buffering isn't the only problem here, as cut doesn't allow linewise buffering.
you might want to try replacing it with something you can control, such as sed:
tail -f logfile | sed -u -n -e '/org\.springframework/ s/\(.\{0,25\}\).*$/\1/p'
or awk
tail -f logfile | awk '/org\.springframework/ {print substr($0, 0, 25);fflush("")}'
On my system, about 8K was buffered before I got any output. This sequence worked to follow the file immediately:
tail -f logfile | while read line ; do echo "$line"| grep 'org.springframework'|cut -c 25- ; done
What you have should work fine -- that's the whole idea of pipelines. The only problem I see is that, in the version of cut I have (GNU coreutiles 6.10), you should use the syntax cut -c 25- (i.e. use a minus sign instead of a plus sign) to remove the first 24 characters.
You're also searching for different patterns in your two examples, in case that's relevant.

Resources