Bash remove until null - bash

I'm trying to remove all characters up to and including the first null character in an input stream.
I've tried this, which:
uses echo to create an input with two lines. each line has a 0 in the middle.
use translate to convert 0s to nulls. now we have two lines which both have null in the middle.
use cut (the attempted solution)
use od to view the output
$ echo $'ab012\nab012' | tr '0' '\0' | cut -f2- -d '' | od -An -t uC
49 50 10 49 50 10
The output from od reads as: 1, 2, \n, 1, 2, \n.
This solution is incorrect since cut applied this operation to each line, instead of just from the beginning. The correct output should instead be: 1, 2, \n, a, b, \0, 1, 2, \n.
How do I achieve this?

If you have GNU sed:
echo $'ab012\nab012' | tr '0' '\0' | sed -z '1d' | od -tx1
-z option of sed is a GNU extension.
Or, a pure bash way, as stated in the comments:
echo $'ab012\nab012' | tr '0' '\0' | { read -r -d ''; cat; } | od -tx1
When used with the option -d '', bash builtin command read will terminate a line when it reads a NUL character.

Related

Argument in bash script

I have the following bash script called countscript.sh
1 #!/bin/bash
2 echo "Running" $0
3 tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed $1 q
But I don't understand how to pass the argument correctly: ( "3" should be the argument $1 of sed).
$ echo " one two two three three three" | ./countscript.sh 3
Running ./countscript.sh
sed: -e expression #1, char 1: missing command
This works fine:
$ echo "one two three four one one four" | tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed 3q
3 one
2 four
1 two
Thanks.
PS: Anybody else noticed the
bug in this script on page 10, https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-2.pdf ?
In the quoted paper, I think you are misreading
sed ${1}q
as
sed ${1} q
and sed does not consider 3 by itself a valid command. The separate argument q is treated as an input file name. If the value of $1 did result in a single valid sed script, you would have likely gotten an error for the missing input file q.
Proper shell programming would dictate this be written as
sed "${1}q"
or
sed "${1} q"
instead; with the space as part of the script, sed correctly outputs the first $1 lines of input and exits.
It's somewhat curious that the authors used sed instead of head - "$1" to output the first few lines, as one of them (McIlroy) essentially invented the idea of the Unix pipeline as a series of special-purpose, narrowly focused tools. Not having read the full paper, I don't know what Knuth and McIlroy's contributions to the paper were; perhaps Bentley just likes sed. :)
When running the following command:
$ echo " one two two three three three" | ./countscript.sh 3
the special variable $1 will be replaced by 3, your first argument. Hence, the script runs:
tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed 3 q
Notice the space between the 3 and the q. sed does not know what to do, because you give it no command (3 is not a command).
Remove the space, and you should be fine.
tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed "${1}q"

BASH Finding palindromes in a .txt file

I have been given a .txt file in which we have to find all the palindromes in the text (must have at least 3 letters and they cant be the same letters e.g. AAA)
it should be displayed with the first column being the amount of times it appears and the second being the word e.g.
123 kayak
3 bob
1 dad
#!/bin/bash
tmp='mktemp'
awk '{for(x=1;$x;++x)print $x}' "${1}" | tr -d [[:punct:]] | tr -s [:space:] | sed -e 's/#//g' -e 's/[0-9]*//g'| sed -r '/^.{,2}$/d' | sort | uniq -c -i > tmp1
This outputs the file as it should do, ignoring case, words less than 3 letters, punctuation and digits.
However i am now stump on how to pull out the palindromes from this, i thought a temp file might be the way, just don't know where to take it.
any help or guidance is much appreciated.
# modify this to your needs; it should take your input on stdin, and return one word per
# line on stdout, in the same order if called more than once with the same input.
preprocess() {
tr -d '[[:punct:][:digit:]#]' \
| sed -E -e '/^(.)\1+$/d' \
| tr -s '[[:space:]]' \
| tr '[[:space:]]' '\n'
}
paste <(preprocess <"$1") <(preprocess <"$1" | rev) \
| awk '$1 == $2 && (length($1) >= 3) { print $1 }' \
| sort | uniq -c
The critical thing here is to paste together your input file with a stream that has each line from that input file reversed. This gives you two separate columns you can compare.

Bash - stdir words to file

I am trying to store whole user input in a bash variable (appending variable).
Then to sort them etc.
The problem is that for input f.e.:
sdsd fff sss
asdasds
It creates this output:
fff
sdsd
sssasdasds
Expected output is:
asdasds
fff
sdsd
sss
Code follows:
content=''
while read line
do
content+=$(echo "$line")
done
result=`echo "$content" | sed -r 's/[^a-zA-Z ]+/ /g' | tr '[:upper:]' '[:lower:]' | tr ' ' '\n' | sort -u | sed '/^$/d' | sed 's/[^[:alpha:]]/\n/g'`
echo "$result" >> "$dictionary"
You aren't providing a space when you are appending.
content+=$(echo "$line")
You need to make sure there is a space between the end of the old value and the new value.
content+=" $line"
(There's no need for echo for this either as #gniourf_gniourf correctly pointed out.)
Something that will achieve what you're showing in your example:
words_ary=()
while read -r -a line_ary; do
(( ${#line_ary[#]} )) || continue # skip empty lines
words_ary+=( "${line_ary[#],,}" ) # The ,, is to convert to lower-case
done
printf '%s\n' "${words_ary[#]}" | sort -u >> "$dictionary"
We're splitting input into words at spaces and put these words in array line_ary
We're checking that we have a non-empty input
we append each word, converted to lowercase, from input to the array words_ary
finally we sort each word from words_ary and append the sorted words to file $dictionary.

Parsing Strings in Bash w/out a Delimiter

I've got a piece of a script I'm trying to figure out, so maybe its a simple question for someone more experienced out there.
Here is the code:
#!/bin/bash
echo "obase=2;$1" | bc
Used like:
$./script 12
Outputs:
1100
My question is, how can I parse this 4 digit number into separate digits? (to then delimit with cut -d ' ' and input those into an array...)
I'd like to be able to get the following output:
1 1 0 0
Is this even possible in BASH? I know its easier with other languages.
can use sed
echo "obase=2;$1" | bc | sed 's/./& /g'
or if you prefer longer form:
echo "obase=2;$1" | bc | sed 's/\(.\)/\1 /g'
if your sed supports -r
echo "obase=2;$1" | bc | sed -r 's/(.)/\1 /g'
To print individual digits from a string you can use fold:
s=1100
fold -w1 <<< "$s"
1
1
0
0
To create an array:
arr=( $(fold -w1 <<< "$s") )
set|grep arr
arr=([0]="1" [1]="1" [2]="0" [3]="0")

sed: interpolating variables in timestamp format

I would like to use sed to extract all the lines between two specific strings from a file.
I need to do this on a script and my two strings are variables.
The strings will be in a sort of time stamp format, which means they can be something like:
2014/01/01 or 2014/01/01 08:01
I was trying with something like:
sed -n '/$1/,/$2/p' $file
or even
sed -n '/"$1"/,/"$2"/p' $file
with no luck, tried also to replace / as delimiter with ;.
I'm pretty sure the problem is due to the / and blank in input variables, but I can't figure out the proper syntax.
The syntax to use alternate regex delimiters is:
\ c regexp c
Match lines matching the regular expression regexp. The c may be any character.
https://www.gnu.org/software/sed/manual/sed.html#Addresses
So, pick one of
sed -n '\#'"$1"'#,\#'"$2"'#p' "$file"
sed -n "\\#$1#,\\#$2#p" "$file"
sed -n "$( printf '\#%s#,\#%s#p' "$1" "$2" )" "$file"
or awk
awk -v start="$1" -v end="$1" '$0 ~ start {p=1}; p; $0 ~ end {p=0}' "$file"
From the first $1 to the last $2:
sed -n "\\#$1#,\$p" "$file" | tac | sed -n "\\#$2#,\$p" | tac
This prints from the first $1 to the end, reverses the lines, prints from the first $2 to the new end, and reverses the lines again.
An example: from the first "5" to the last "7"
$ set -- 5 7
$ seq 20 | sed -n "\\#$1#,\$p" | tac | sed -n "\\#$2#,\$p" | tac
5
6
7
8
9
10
11
12
13
14
15
16
17
Try using double quotes instead of single ones.
sed -n "/$1/,/$2/p" $file

Resources