why shell for expression cannot parse xargs parameter correctly - shell

I have a black list to save tag id list, e.g. 1-3,7-9, actually it represents 1,2,3,7,8,9. And could expand it by below shell
for i in {1..3,7..9}; do for j in {$i}; do echo -n "$j,"; done; done
1,2,3,7,8,9
but first I should convert - to ..
echo -n "1-3,7-9" | sed 's/-/../g'
1..3,7..9
then put it into for expression as a parameter
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # for i in {#}; do for j in {$i}; do echo -n "$j,"; done; done
zsh: parse error near `do'
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # echo #
1..3,7..9
but for expression cannot parse it correctly, why is so?

Because you didn't do anything to stop the outermost shell from picking up the special keywords and characters ( do, for, $, etc ) that you mean to be run by xargs.
xargs isn't a shell built-in; it gets the command line you want it to run for each element on stdin, from its arguments. just like any other program, if you want ; or any other sequence special to be bash in an argument, you need to somehow escape it.
It seems like what you really want here, in my mind, is to invoke in a subshell a command ( your nested for loops ) for each input element.
I've come up with this; it seems to to the job:
echo -n "1-3,7-9" \
| sed 's/-/../g' \
| xargs -I # \
bash -c "for i in {#}; do for j in {\$i}; do echo -n \"\$j,\"; done; done;"
which gives:
{1..3},{7..9},

Could use below shell to achieve this
# Mac newline need special treatment
echo "1-3,7-9" | sed -e 's/-/../g' -e $'s/,/\\\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,%
#Linux
echo "1-3,7-9" | sed -e 's/-/../g' -e 's/,/\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,
but use this way is a little complicated maybe awk is more intuitive
# awk
echo "1-3,7-9,11,13-17" | awk '{n=split($0,a,","); for(i=1;i<=n;i++){m=split(a[i],a2,"-");for(j=a2[1];j<=a2[m];j++){print j}}}' | tr '\n' ','
1,2,3,7,8,9,11,13,14,15,16,17,%

echo -n "1-3,7-9" | perl -ne 's/-/../g;$,=",";print eval $_'

Related

How to find all non-dictionary words in a file in bash/zsh?

I'm trying to find all words in a file that don't exist in the dictionary. If I look for a single word the following works
b=ther; look $b | grep -i "^$b$" | ifne -n echo $b => ther
b=there; look $b | grep -i "^$b$" | ifne -n echo $b => [no output]
However if I try to run a "while read" loop
while read a; do look $a | grep -i "^$a$" | ifne -n echo "$a"; done < <(tr -s '[[:punct:][:space:]]' '\n' <lotr.txt |tr '[:upper:]' '[:lower:]')
The output seems to contain all (?) words in the file. Why doesn't this loop only output non-dictionary words?
Regarding ifne
If stdin is non-empty, ifne -n reprints stdin to stdout. From the manpage:
-n Reverse operation. Run the command if the standard input is empty
Note that if the standard input is not empty, it is passed through
ifne in this case.
strace on ifne confirms this behavior.
Alternative
Perhaps, as an alternative:
1 #!/bin/bash -e
2
3 export PATH=/bin:/sbin:/usr/bin:/usr/sbin
4
5 while read a; do
6 look "$a" | grep -qi "^$a$" || echo "$a"
7 done < <(
8 tr -s '[[:punct:][:space:]]' '\n' < lotr.txt \
9 | tr '[A-Z]' '[a-z]' \
10 | sort -u \
11 | grep .
12 )

GNU parallel with custom script doing string comparison

The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit

count all the lines in all folders in bash [duplicate]

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

echo -e cat: argument line too long

I have bash script that would merge a huge list of text files and filter it. However I'll encounter 'argument line too long' error due to the huge list.
echo -e "`cat $dir/*.txt`" | sed '/^$/d' | grep -v "\-\-\-" | sed '/</d' | tr -d \' | tr -d '\\\/<>(){}!?~;.:+`*-_ͱ' | tr -s ' ' | sed 's/^[ \t]*//' | sort -us -o $output
I have seen some similar answers here and i know i could rectify it using find and cat the files 1st. However, i would i like to know what is the best way to run a one liner code using echo -e and cat without breaking the code and to avoid the argument line too long error. Thanks.
First, with respect to the most immediate problem: Using find ... -exec cat -- {} + or find ... -print0 | xargs -0 cat -- will prevent more arguments from being put on the command line to cat than it can handle.
The more portable (POSIX-specified) alternative to echo -e is printf '%b\n'; this is available even in configurations of bash where echo -e prints -e on output (as when the xpg_echo and posix flags are set).
However, if you use read without the -r argument, the backslashes in your input string are removed, so neither echo -e nor printf %b will be able to process them later.
Fixing this can look like:
while IFS= read -r line; do
printf '%b\n' "$line"
done \
< <(find "$dir" -name '*.txt' -exec cat -- '{}' +) \
| sed [...]
grep -v '^$' $dir/*.txt | grep -v "\-\-\-" | sed '/</d' | tr -d \' \
| tr -d '\\\/<>(){}!?~;.:+`*-_ͱ' | tr -s ' ' | sed 's/^[ \t]*//' \
| sort -us -o $output
If you think about it some more you can probably get rid of a lot more stuff and turn it into a single sed and sort, roughly:
sed -e '/^$/d' -e '/\-\-\-/d' -e '/</d' -e 's/\'\\\/<>(){}!?~;.:+`*-_ͱ//g' \
-e 's/ / /g' -e 's/^[ \t]*//' $dir/*.txt | sort -us -o $output

Remove all chars that are not a digit from a string

I'm trying to make a small function that removes all the chars that are not digits.
123a45a ---> will become ---> 12345
I've came up with :
temp=$word | grep -o [[:digit:]]
echo $temp
But instead of 12345 I get 1 2 3 4 5. How to I get rid of the spaces?
Pure bash:
word=123a45a
number=${word//[^0-9]}
Here's a pure bash solution
var='123a45a'
echo ${var//[^0-9]/}
12345
is this what you are looking for?
kent$ echo "123a45a"|sed 's/[^0-9]//g'
12345
grep & tr
echo "123a45a"|grep -o '[0-9]'|tr -d '\n'
12345
I would recommend using sed or perl instead:
temp="$(sed -e 's/[^0-9]//g' <<< "$word")"
temp="$(perl -pe 's/\D//g' <<< "$word")"
Edited to add: If you really need to use grep, then this is the only way I can think of:
temp="$( grep -o '[0-9]' <<< "$word" \
| while IFS= read -r ; do echo -n "$REPLY" ; done
)"
. . . but there's probably a better way. (It uses grep -o, like your solution, then runs over the lines that it outputs and re-outputs them without line-breaks.)
Edited again to add: Now that you've mentioned that you use can use tr instead, this is much easier:
temp="$(tr -cd 0-9 <<< "$word")"
What about using sed?
$ echo "123a45a" | sed -r 's/[^0-9]//g'
12345
As I read you are just allowed to use grep and tr, this can make the trick:
$ echo "123a45a" | grep -o [[:digit:]] | tr -d '\n'
12345
In your case,
temp=$(echo $word | grep -o [[:digit:]] | tr -d '\n')
tr will also work:
echo "123a45a" | tr -cd '[:digit:]'
# output: 12345
Grep returns the result on different lines:
$ echo -e "$temp"
1
2
3
4
5
So you cannot remove those spaces during the filtering, but you can afterwards, since $temp can transform itself like this:
temp=`echo $temp | tr -d ' '`
$ echo "$temp"
12345

Resources