sort multiple tabs but ignoring spaces - bash

I have a data file like this (\t represents tabs):
short line\t \t \t \t \t 3
very long line with lots of text\t\t 2
How could I sort it by the second column using sort? In other words I want to set the delimiter to be multiple tabs, but not spaces.

It seems that the field separator for sort must be a single character, so this command:
sort -t $'\t' -k2 file
will not handle multiple tabs as a single separator: it will sort the empty 2nd field for both lines.
This command will successfully find the the second field, but it modifies the text:
tr -s '\t' < file | sort -t $'\t' -k2
Note that tr interprets the 2-character string "\t" as a tab character, while sed -t does not. Just a foible of how different commands are implemented.

sort -k2 -t' ' test.txt
worked out of the box for me. Enter the TAB inside ' ' as C-vTab in bash

Setting the field delimiter to something else is accomplished with the -t parameter. But passing a tab character can be tricky, so the solution may look like:
sort -t "$(echo -e '\t')" -k 2 file.txt

Related

Users who are logged on, in alphabetical order, printed on one line

Users who are logged on, in alphabetical order, printed on one line.
What are the minimum amount of changes to get this to work because in a bash script?
this is the given script:
for name in $#
do
who | grep -w "^name" | sed 's/ .*//' | uniq
done | sort | tr '\n' ' '
echo
Single line commands:
who | awk '{print $1}' | sort | uniq | tr '\n' ' '
who: list of logged in users.
awk '{print $1}': keep only the first word or each line, which is the usernames.
sort: put the usernames in alphabetical order.
uniq: remove duplicates.
tr '\n' ' ': remove carriage returns, and replace them with spaces.
Ex
$ who
steve tty7 Mar 5 16:25 (:0)
bernard tty7 Mar 5 16:25 (:0)
sarah tty7 Mar 5 16:25 (:0)
$ who | awk '{print $1}' | sort | uniq | tr '\n' ' '
bernard sara steve
Your code did grep -w "^name", which tells grep to output the lines that start with "name". Not the lines that begin with the value of variable "name". For that you would need to do grep -w "^$name".
Try this Shellcheck-clean code:
for name in "$#"
do
who | sed 's/[[:space:]].*//' | grep -xF -- "$name"
done | sort -u | paste -sd ' '
$# should always have double quotes on it ("$#"). See Accessing bash command line args $# vs $*. Shellcheck correctly complains if double quotes are not used.
sed 's/[[:space:]].*//' removes the first whitespace character, and everything after it, on every input line. Using [[:space:]] instead of a literal space character means that the code will still work if the who output uses tabs as separators. It may be easier to read too. The sed command is run first to ensure that usernames occupy whole lines so it's easier to avoid spurious matches at the next pipeline stage.
grep -xF -- "$name" searches for whole lines in the input that are the "$name" string. The -x option forces matching of whole lines. That prevents, for instance, the username mary matching the username mary.jane (a valid username on at least some Linux systems). The -F option means that regular expression patterns in "$name" are treated as literal strings. That prevents, for instance, the name t.m matching the name tim. The -- prevents a leading hyphen in "$name" being treated as a grep option. No system that I know of allows usernames to have leading hyphens, but there's nothing to stop such an invalid name being provided as a command line argument to the code. The -w option to grep wouldn't be useful here because valid names may contain non-word characters (e.g. t.m).
sort -u takes the output of the for loop (an unsorted list of usernames, one per line, possibly with repetitions) and sorts it. The -u option causes it to remove duplicates (like piping to uniq, but saves a process creation).
paste -sd ' ' puts all the lines in to input on a single line, separated by spaces (specified by the -d option and ' ' (space) option argument), and terminated with a newline character. tr '\n' ' ' would have a similar effect but it produces an unterminated line with a trailing space character.
All you need is:
who | sort -k1,1 -u | awk '{u=u s $1; s=OFS} END{print u}'
That will output a blank-separated list of all logged in users, all on 1 line, with a terminating newline to make it a valid POSIX text file, and without an undesirable trailing blank char.

Replace part of a string Shell Scripting

I have lines and want to do sed operation, on string which comes after it has read '|'character three times. How can I do this in Shell Script?
Input: aaaa|bbbbb|ccccc|hello
Desired Ouput: aaaa|bbbbb|ccccc|hel
This is be done on hello which is after three '|'
-> sed 's/({.3}).*/\1/g'
You don't specify what you want to do with the last field to transform "hello" into "hel". Here's one way:
sed -r 's/^(([^|]+\|){3})(...).*/\1\3/' file
([^|]+\|) denotes a pipe delimited field (with the pipe)
(([^|]+\|){3}) denotes three such fields
requires sed's -r option
on OSX or BSD-ish implementations of sed, use -E instead)
I capture the next three characters with (...)
then replace all with the first and third set of capturing parentheses
Use the cut command instead of sed:
$ echo "aaaa|bbbbb|ccccc|hello" | cut -d '|' -f 4
hello

Cut command with delimiter Control-A

I am trying to get some columns from file1 to file2 using cut command with delimiter Control A.
This is what I tried:
cut -d^A -f2-8 a.dat > b.dat
If my records are like this:
A^AB^AC^AD^AE^AF^AG^AH^A$
my command gives:
AB^AC^AD^AE^AF^AG^AH
Is my command wrong or am I putting the delimiter in a wrong way?
So it leaves Control-A's A in the starting point.
^A is character number 1 in the ASCII table a.k.a Start of Heading character. If you're using bash, you can have this:
cut -f 2-8 -d $'\x01'
Or use printf (can be builtin or an external binary):
CTRL_A=$(printf '\x01')
cut -f 2-8 -d "$CTRL_A"
You can also verify your output with hexdump:
hexdump -C b.dat
I can't really understand your question, but would suggest you use tr to change your Control-As into something else more workable and maybe then change them back when you are finished:
tr '^A' ',' < yourfile | do some cutting using commas | tr ',' '^A' > newfile

printing first word in every line of a txt file unix bash

So I'm trying to print the first word in each line of a txt file. The words are separated by one blank.
cut -c 1 txt file
Thats the code I have so far but it only prints the first character of each line.
Thanks
To print a whole word, you want -f 1, not -c 1. And since the default field delimiter is TAB rather than SPACE, you need to use the -d option.
cut -d' ' -f1 filename
To print the last two words not possible with cut, AFAIK, because it can only count from the beginning of the line. Use awk instead:
awk '{print $(NF-1), $NF;}' filename
you can try
awk '{print $1}' your_file
read word _ < file
echo "$word"
What's nice about this solution is it doesn't read beyond the first line of the file. Even awk, which has some very clean, terse syntax, has to be explicitly told to stop reading past the first line. read just reads one line at a time. Plus it's a bash builtin (and a builtin in many shells), so you don't need a new process to run.
If you want to print the first word in each line:
while read word _; do printf '%s\n' "$word"; done < file
But if the file is large then awk or cut will win out for reading every line.
You can use:
cut -d\ -f1 file
Where:
-d is the delimiter (here using \ for a space)
-f is the field selector
Notice that there is a space after the \.
-c is for characters, you want -f for fields, and -d to indicate your separator of space instead of the default tab:
cut -d " " -f 1 file

count number of tab characters in linux

I want to count the numbers of hard tab characters in my documents in unix shell.
How can I do it?
I tried something like
grep -c \t foo
but it gives counts of t in file foo.
Use tr to discard everything except tabs, and then count:
< input-file tr -dc \\t | wc -c
Bash uses a $'...' notation for specifying special characters:
grep -c $'\t' foo
Use a perl regex (-P option) to grep tab characters.
So, to count the number of tab characters in a file:
grep -o -P '\t' foo | wc -l
You can insert a literal TAB character between the quotes with Ctrl+V+TAB.
In general you can insert any character at all by prefixing it with Ctrl+V; even control characters such as Enter or Ctrl+C that the shell would otherwise interpret.
You can use awk in a tricky way: use tab as the record separator, then the number of tab characters is the total number of records minus 1:
ntabs=$(awk 'BEGIN {RS="\t"} END {print NR-1}' foo)
My first thought was to use sed to strip out all non-tab characters, then use wc to count the number of characters left.
< foo.txt sed 's/[^\t]//g' | wc -c
However, this also counts newlines, which sed won't touch because it is line-based. So, let's use tr to translate all the newlines into spaces, so it is one line for sed.
< foo.txt tr '\n' ' ' | sed 's/[^\t]//g' | wc -c
Depending on your shell and implementation of sed, you may have to use a literal tab instead of \t, however, with Bash and GNU sed, the above works.

Resources