Sorting a list with multi-part items

Sorting a list with multi-part items - bash

third try to understand what Im doing wrong.
Ive got a list like this:
array[0] = 1111 Here is much text
array[1] = 2222 Here is even more text
array[2] = 1111.1 Here is special text
Now I want to sort the list to have it like this:
1111 Here is much text
1111.1 Here is special text
2222 Here is even more text
Using
for j in ${array[#]}; do echo $j; done | sort -n
it seperates me every single part because of the spaces.
Using
for j in "${array[#]}"; do echo "$j"; done | sort -n
I get a sorted list like 1111 2222 1111.1

array=(
"1111 Here is much text"
"2222 Here is even more text"
"1111.1 Here is special text"
)
printf "%s\n" "${array[#]}" | sort -n
1111 Here is much text
1111.1 Here is special text
2222 Here is even more text
To save it:
sorted=()
while IFS= read -r line; do
sorted+=("$line")
done < <( printf "%s\n" "${array[#]}" | sort -n )
printf "%s\n" "${sorted[#]}"
# same output as above
or
source <( echo 'sorted=('; printf '"%s"\n' "${array[#]}" | sort -n; echo ')' )
printf "%s\n" "${sorted[#]}"
Carriage returns in your file will mess you up. Consider file named "t" with dos-style line endings:
$ cat -e t
line1^M$
line2^M$
line3^M$
$ for n in {1..3} ; do array[n]="$(echo $n $(cat t))"; done
$ printf "%s\n" "${array[#]}"|od -c
0000000 1 l i n e 1 \r l i n e 2 \r
0000020 l i n e 3 \r \n 2 l i n e 1 \r
0000040 l i n e 2 \r l i n e 3 \r \n 3
0000060 l i n e 1 \r l i n e 2 \r l i
0000100 n e 3 \r \n
0000105
$ printf "%s\n" "${array[#]}"
line31
line31
line31
Clearly this is going to mess up anything you feed with this input. Fix the carriage returns.

Your locale is set in such a way that the . is interpreted as a thousands separator, rather than a decimal point, and the numeric values are sorted accordingly (1111.1 is interpreteted as 11111, e.g. with LC_ALL=de_DE). Use
export LC_ALL=C
before you execute sort (and, of course, use proper quoting, as in glenn's and fedorqui's answers).

Related

How to add a space after special characters in bash script?

I have a text file with something like,
!aa
#bb
#cc
$dd
%ee
expected output is,
! aa
# bb
# cc
$ dd
% ee
What I have tried, echo "${foo//#/# }".
This does work fine with one string but it does not work for all the lines in the file. I have tried with this while loop to read all the lines of the file and do the same using echo but it does not work.
while IFS= read -r line; do
foo=$line
sep="!##$%"
echo "${foo//$sep/$sep }"
done < $1
I have tried with awk split but it does not give the expected output. Is there any workaround for this? by using awk or sed.

The following assumes you want to add a space after every character in the !##$% set (even if it is the last character in a line). Test file:
$ cat file.txt
a!a
#bb
c#c
$dd
ee%
foo
%b%r
$ sep='!##$%'
With sed:
$ sed 's/['"$sep"']/& /g' file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r
With awk:
$ awk '{gsub(/['"$sep"']/,"& "); print}' file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r
With plain bash (not recommended, it is too slow):
$ while IFS= read -r line; do
str=""
for (( i=0; i<${#line}; i++ )); do
char="${line:i:1}"
str="$str$char"
[[ "$char" =~ [$sep] ]] && str="$str "
done
printf '%s\n' "$str"
done < file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r
Or (not sure which is the worst):
$ while IFS= read -r line; do
for (( i=0; i<${#sep}; i++ )); do
char="${sep:i:1}"
line="${line//$char/$char }"
done
printf '%s\n' "$line"
done < file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r

Characters you call special in your example seems to be subset of characters known as [[:punct:]] to GNU sed, thus I propose following solution:
sed 's/\([[:punct:]]\)/\1 /g' file.txt
which with file.txt content being
!aa
#bb
#cc
$dd
%ee
output
! aa
# bb
# cc
$ dd
% ee
Explanation: I use capturing group \(...\) which has any character belonging to [:punct:] then I replace what was captured with content of that capture followed by space. I use g to apply it to all occurences in each line, though this has not visible impact for data above. You might elect to drop g if you are sure there will be at most one character to replace in every line.
If you want to know more about [:punct:] or other similar character sets read about Character Classes on Regular-Expressions.info

If the file always contain a symbol at the start of line like that then use this
sed -Ei 's/^(.)/\1 /g' yourfile.txt
The -E option is to tell sed to use regex. -i modifies the file inline, you can remove it if you want to output to console or another file. The ^(.) regex captures the first character on the line and add a space to it (\1 )

Assuming that special characters are non-numeric and non-alphabetic characters, and special characters can appear anywhere in the line, use the following regular expression to replace them.
sed 's/[^a-zA-Z0-9]/& /g' urfile

How does grep handle DOS end of line?

I have a Windows text file which contains a line (with ending CRLF)
aline
The following is several commands' output:
[root#panel ~]# grep aline file.txt
aline
[root#panel ~]# grep aline$'\r' file.txt
[root#panel ~]# grep aline$'\r'$'\n' file.txt
[root#panel ~]# grep aline$'\n' file.txt
aline
The first command's output is normal. I'm curious about the second and the third output. Why is it an empty line? And the last output, I think it can not find the string but it actually finds it, why? The commands are run on CentOS/bash.

In this case grep really matches the string "aline\r" but you just don't see it because it was overwritten by the ANSI sequence that prints color. Pass the output to od -c and you'll see
$ grep aline file.txt
aline
$ grep aline$'\r' file.txt
$ grep aline$'\r' --color=never file.txt
aline
$ grep aline$'\r' --color=never file.txt | od -c
0000000 a l i n e \r \n
0000007
$ grep aline$'\r' --color=always file.txt | od -c
0000000 033 [ 0 1 ; 3 1 m 033 [ K a l i n e
0000020 \r 033 [ m 033 [ K \n
0000030
With --color=never you can see the output string because grep doesn't print out the color. \r simply resets the cursor to the start of the line and then a new line is printed out, nothing is overwritten. But by default grep will check whether it's running on the terminal or its output is being piped and prints out the matched string in color if supported, and it seems resetting the color then print \n clears the rest of the line
To match \n you can use the -z option to make null bytes the line separator
$ grep -z aline$'\r'$'\n' --color=never file.txt
aline
$ grep -z aline$'\r'$'\n' --color=never file.txt | od -c
0000000 a l i n e \r \n \0
0000010
$ grep -z aline$'\r'$'\n' --color=always file.txt | od -c
0000000 033 [ 0 1 ; 3 1 m 033 [ K a l i n e
0000020 \r 033 [ m 033 [ K \n \0
0000031
Your last command grep aline$'\n' file.txt works because \n is simply a word separator in bash, so the command is just the same as grep aline file.txt. Exactly the same thing happened in the 3rd line: grep aline$'\r'$'\n' file.txt To pass a newline you must quote the argument to prevent word splitting
$ echo "aline" | grep -z "aline$(echo $'\n')"
aline
To demonstrate the effect of the quote with the 3rd line I added another line to the file
$ cat file.txt
aline
another line
$ grep -z "aline$(echo $'\n')" file.txt | od -c
0000000 a l i n e \r \n a n o t h e r l
0000020 i n e \n \0
0000025
$ grep -z "aline$(echo $'\n')" file.txt
aline
another line
$

If the input is not well-formed, the behavior is undefined.
In practice, some versions of GNU grep use CR for internal purposes, so attempting to match it does not work at all, or produces really bizarre results.
For not entirely different reasons, passing in a literal newline as part of the regular expression could have some odd interpretations, including, but not limited to, interpreting the argument as two separate patterns. (Look at how grep -F reads from a file, and imagine that at least some implementations use the same logic to parse the command line.)
In the grand scheme of things, the sane solution is to fix the input so it's a valid text file before attempting to run Unix line-oriented tools on it.
For quick and dirty solutions, some tools have well-defined semantics for random binary input. Perl is a model citizen in this respect.
bash$ perl -ne 'print if /aline\r$/' <<<$'aline\r'
aline
Awk also tends to work amicably, though there are several implementations, so the risk that somebody somewhere has a version which doesn't behave identically to AT&T Awk is higher.
Maybe notice also how \r is the last character before the end of the line (the DOS line ending is the sequence CR LF, where LF is the standard Unix line terminator for text files).

At least for me phuclv's answer doesn't completely cover the last case, i.e. grep aline$'\n' file.txt.
Your mileage my vary depending on which shell and which version and implementation of grep you are using, but for me grep -z "aline$(echo $'\n')" and grep -z aline$'\n' both just match the same pattern as grep -z aline.
This becomes more apparent if the -o switch is used, so that grep outputs only the matched string and not the entire line (which is the entire file for most text files when the -z option is used).
If you use the same file.txt as in phuclv's second example:
$ cat file.txt
aline
another line
$ grep -z "aline$(echo $'\n')" file.txt | od -c
0000000 a l i n e \r \n a n o t h e r l
0000020 i n e \n \0
0000025
$ grep -z -o "aline$(echo $'\n')" file.txt | od -c
0000000 a l i n e \0
0000006
$ grep -z -o aline$'\n' file.txt | od -c
0000000 a l i n e \0
0000006
$ grep -z -o aline file.txt | od -c
0000000 a l i n e \0
0000006
To actually match a \n as part of the pattern I had to use the -P switch to turn on "Perl-compatible regular expression"
$ grep -z -o -P 'aline\r\n' file.txt | od -c
0000000 a l i n e \r \n \0
0000010
$ grep -z -o -P 'aline\r\nanother' file.txt | od -c
0000000 a l i n e \r \n a n o t h e r \0
0000017
For reference:
grep --version|head -n1
grep (GNU grep) 3.1
bash --version|head -n1
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)

Two seemingly identical strings with newlines not equal

I am trying to convert a list of quoted strings, separated by commas, into list of strings separated by newlines using bash and sed.
Here is an example of what I am doing:
#!/bin/bash
comma_to_newline() {
sed -En $'s/[ \t]*"([^"]*)",?[ \t]*/\\1\\\n/gp'
}
input='"one","two","three"'
expected="one\ntwo\nthree"
result="$( echo "${input}" | comma_to_newline )"
echo "Expected: <${expected}>"
echo "Result: <${result}>"
if [ "${result}" = "${expected}" ]; then
echo "EQUAL!"
else
echo "NOT EQUAL!"
fi
And the output I am getting is:
Expected: <one
two
three>
Result: <one
two
three>
NOT EQUAL!
I know it has something to do with the newlines characters, but I can't work out what. If I replace the newlines with some other string, such as XXX, it works fine and bash reports the strings as being equal.

Prompted by the comments on my question, I managed to work out what was going on. I was so focussed on coming up with a working sed expression and ensuring that result was correct, that I failed to noticed that the expected string was incorrect.
In order to use \n newlines in a bash string, you have to use the $'one\ntwo\nthree' syntax - see How can I have a newline in a string in sh? for other solutions.
I was developing against bash version 3.2.57 (the version that comes with Mac OS 10.14.6). When assigning a variable using expected="one\ntwo\nthree" then echoing it, they were being displayed as newlines in the console. Newer versions of bash display these strings as escaped - so I assume it is a bug that has been fixed in later versions of bash.

For diagnosing seemingly identical strings, try combining side-by-side diff output with a one char per line hexdump format. Replace:
else
echo "NOT EQUAL!"
fi
...with:
else
echo "NOT EQUAL!"
diff -y \
<(hexdump -v -e '/1 "%_ad# "' -e '/1 " _%_u\_\n"' <<< "${expected}") \
<(hexdump -v -e '/1 "%_ad# "' -e '/1 " _%_u\_\n"' <<< "${result}")
fi

There is extra new line character \n in string returing from your function.
Octal dump
$echo '"one","two","three"' | sed -En $'s/[ \t]*"([^"]*)",?[ \t]*/\\1\\\n/gp' | od -c
0000000 o n e \n t w o \n t h r e e \n \n
0000017
$echo "one\ntwo\nthree" | od -c
0000000 o n e \ n t w o \ n t h r e e \n
0000020
$
Also, use echo -e
$echo "one\ntwo\nthree"
one\ntwo\nthree
$echo -e "one\ntwo\nthree"
one
two
three
$
From man page
-e enable interpretation of backslash escapes

How to generate all ASCII characters with a brace expansion?

This lists all English characters:
$ echo {A..Z}
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
But how to list all ASCII characters?
I tried this:
$ echo {\!..\~}
{!..~}
and this:
$ echo {$'!'..$'~'}
{!..~}
But both did not work. Is it possible?

This uses only one printf but a more complicated brace expansion.
printf '%b' \\x{0..7}{{0..9},{a..f}}
It also works, but not as nicely (it outputs a lot of whitespace):
echo -e \\x{0..7}{{0..9},{a..f}}

$ printf '%b\n' "$(printf '\%03o' {0..127})"
123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
To see a representation of the non-printable characters in the output from the above and the characters hidden by the effect of trying to print them as-is, you can pipe it to cat -v:
$ printf '%b\n' "$(printf '\%03o' {0..127})" | cat -v
^#^A^B^C^D^E^F^G^H
^K^L^M^N^O^P^Q^R^S^T^U^V^W^X^Y^Z^[^\^]^^^_ !"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~^?
To print just from the ASCII code for ! (33) to the ASCII code for ~ (126):
$ printf '%b\n' "$(printf '\%03o' {33..126})"
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
and to print from ! to ~ without having to know their numeric values:
$ printf '%b\n' "$(eval printf '\\%03o' $(printf '{%d..%d}' "'!" "'~"))"
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
which you can use with shell variables to hold the beginning and ending chars:
$ beg='!'; end='~';
$ printf '%b\n' "$(eval printf '\\%03o' $(printf '{%d..%d}' "'$beg" "'$end"))"
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

How to display a file with multiple lines as a single string with escape chars (\n)

In bash, how I can display the content of a file with multiple lines as a single string where new lines appears as \n.
Example:
$ echo "line 1
line 2" >> file.txt
I need to get the content as this "line 1\nline2" with bash commands.
I tried using a combinations of cat/printf/echo with no success.

You can use bash's printf to get something close:
$ printf "%q" "$(< file.txt)"
$'line1\nline2'
and in bash 4.4 there is a new parameter expansion operator to produce the same:
$ foo=$(<file.txt)
$ echo "${foo#Q}"
$'line1\nline2'

$ cat file.txt
line 1
line 2
$ paste -s -d '~' file.txt | sed 's/~/\\n/g'
line 1\nline 2
You can use paste command to paste all the lines of the serially with delimiter say ~ and replace all ~ with \n with a sed command.

Without '\n' after file 2, you need to use echo -n
echo -n "line 1
line 2" > file.txt
od -cv file.txt
0000000 l i n e 1 \n l i n e 2
sed -z 's/\n/\\n/g' file.txt
line 1\nline 2
With '\n' after line 2
echo "line 1
line 2" > file.txt
od -cv file.txt
0000000 l i n e 1 \n l i n e 2 \n
sed -z 's/\n/\\n/g' file.txt
line 1\nline 2\n

This tools may display character codes also:
$ hexdump -v -e '/1 "%_c"' file.txt ; echo
line 1\nline 2\n
$ od -vAn -tc file.txt
l i n e 1 \n l i n e 2 \n

you could try piping a string from stdin or file and trim the desired pattern...
try this:
cat file|tr '\n' ' '
where file is the file name with the \n. this will return a string with all the text in a single line.
if you want to write the result to a file just redirect the result of the command, like this.
cat file|tr '\n' ' ' >> file2
here is another example:
How to remove carriage return from a string in Bash

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sorting a list with multi-part items - bash

Related

How to add a space after special characters in bash script?

How does grep handle DOS end of line?

Two seemingly identical strings with newlines not equal

How to generate all ASCII characters with a brace expansion?

How to display a file with multiple lines as a single string with escape chars (\n)

Categories

Resources