Cut command does not appear to be working - bash

I'm piping a command to cut and nothing appears to be happening.
The output of the command looks like this:
Name File Info OS
11 FileName1 OS1
12 FileName2 OS2
13 FileName3 OS3
I'm trying to extract column 1,2 from all rows (starting with row 2) using the following:
my_command | cut -f1,2 and the output is exactly the same as the original.

Cut doen't behave well with multiple spaces as a delimiter. Use awk instead
mycommand | awk 'NR>1{print $1,$2}'

use tr -s to convert repeating spaces into single space. Now cut can be used where single space is delimiter seperating columns.
mycommand | tr -s ' ' | cut -d' ' -f1,2

If multiple spaces are used for a delimiter and the column positions are fixed, you would use column numbers with cut:
mycommand | cut -c1-27
Or you could lose the front spaces with:
mycommand | cut -c5-27
This will work even if your fields have embedded spaces. The awk method will fail if you have embedded spaces in your fields.

Related

Users who are logged on, in alphabetical order, printed on one line

Users who are logged on, in alphabetical order, printed on one line.
What are the minimum amount of changes to get this to work because in a bash script?
this is the given script:
for name in $#
do
who | grep -w "^name" | sed 's/ .*//' | uniq
done | sort | tr '\n' ' '
echo
Single line commands:
who | awk '{print $1}' | sort | uniq | tr '\n' ' '
who: list of logged in users.
awk '{print $1}': keep only the first word or each line, which is the usernames.
sort: put the usernames in alphabetical order.
uniq: remove duplicates.
tr '\n' ' ': remove carriage returns, and replace them with spaces.
Ex
$ who
steve tty7 Mar 5 16:25 (:0)
bernard tty7 Mar 5 16:25 (:0)
sarah tty7 Mar 5 16:25 (:0)
$ who | awk '{print $1}' | sort | uniq | tr '\n' ' '
bernard sara steve
Your code did grep -w "^name", which tells grep to output the lines that start with "name". Not the lines that begin with the value of variable "name". For that you would need to do grep -w "^$name".
Try this Shellcheck-clean code:
for name in "$#"
do
who | sed 's/[[:space:]].*//' | grep -xF -- "$name"
done | sort -u | paste -sd ' '
$# should always have double quotes on it ("$#"). See Accessing bash command line args $# vs $*. Shellcheck correctly complains if double quotes are not used.
sed 's/[[:space:]].*//' removes the first whitespace character, and everything after it, on every input line. Using [[:space:]] instead of a literal space character means that the code will still work if the who output uses tabs as separators. It may be easier to read too. The sed command is run first to ensure that usernames occupy whole lines so it's easier to avoid spurious matches at the next pipeline stage.
grep -xF -- "$name" searches for whole lines in the input that are the "$name" string. The -x option forces matching of whole lines. That prevents, for instance, the username mary matching the username mary.jane (a valid username on at least some Linux systems). The -F option means that regular expression patterns in "$name" are treated as literal strings. That prevents, for instance, the name t.m matching the name tim. The -- prevents a leading hyphen in "$name" being treated as a grep option. No system that I know of allows usernames to have leading hyphens, but there's nothing to stop such an invalid name being provided as a command line argument to the code. The -w option to grep wouldn't be useful here because valid names may contain non-word characters (e.g. t.m).
sort -u takes the output of the for loop (an unsorted list of usernames, one per line, possibly with repetitions) and sorts it. The -u option causes it to remove duplicates (like piping to uniq, but saves a process creation).
paste -sd ' ' puts all the lines in to input on a single line, separated by spaces (specified by the -d option and ' ' (space) option argument), and terminated with a newline character. tr '\n' ' ' would have a similar effect but it produces an unterminated line with a trailing space character.
All you need is:
who | sort -k1,1 -u | awk '{u=u s $1; s=OFS} END{print u}'
That will output a blank-separated list of all logged in users, all on 1 line, with a terminating newline to make it a valid POSIX text file, and without an undesirable trailing blank char.

Shell: Counting lines per column while ignoring empty ones

I am trying to simply count the lines in the .CSV per column, while at the same time ignoring empty lines.
I use below and it works for the 1st column:
cat /path/test.csv | cut -d, -f1 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 8
And below for the 2nd column:
cat /path/test.csv | cut -d, -f2 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 6
But when I try to count 3rd column, it simply Outputs the Total number of lines in the whole .CSV.
cat /path/test.csv | cut -d, -f3 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 33
#Should be: 19?
I've also tried to use awk instead of cut, but get the same issue.
I have tried creating new file thinking maybe it had some spaces in the lines, still the same.
Can someone clarify what is the difference? Betwen reading 1-2 column and the rest?
20355570_01.tif,,
20355570_02.tif,,
21377804_01.tif,,
21377804_02.tif,,
21404518_01.tif,,
21404518_02.tif,,
21404521_01.tif,,
21404521_02.tif,,
,22043764_01.tif,
,22043764_02.tif,
,22095060_01.tif,
,22095060_02.tif,
,23507574_01.tif,
,23507574_02.tif,
,,23507574_03.tif
,,23507804_01.tif
,,23507804_02.tif
,,23507804_03.tif
,,23509247_01.tif
,,23509247_02.tif
,,23509247_03.tif
,,23527663_01.tif
,,23527663_02.tif
,,23527663_03.tif
,,23527908_01.tif
,,23527908_02.tif
,,23527908_03.tif
,,23535506_01.tif
,,23535506_02.tif
,,23535562_01.tif
,,23535562_02.tif
,,23535636_01.tif
,,23535636_02.tif
That happens when input file has DOS line endings (\r\n). Fix your file using dos2unix and your command will work for 3rd column too.
dos2unix /path/test.csv
Or, you can remove the \r at the end while counting non-empty columns using awk:
awk -F, '{sub(/\r/,"")} $3!=""{n++} END{print n}' /path/test.csv
The problem is in the grep command: the way you wrote it will return 33 lines when you count the 3rd column.
It's better instead to use the following command to count number of lines in .CSV for each column (example below is for the 3rd column):
cat /path/test.csv | cut -d , -f3 | grep -cve '^\s*$'
This will return the exact number of lines for each column and avoid of piping into wc.
See previous post here:
count (non-blank) lines-of-code in bash
edit: I think oguz ismail found the actual reason in their answer. If they are right and your file has windows line endings you can use one of the following commands without having to convert the file.
cut -d, -f3 yourFile.csv cut | tr -d \\r | grep -c .
cut -d, -f3 yourFile.csv | grep -c $'[^\r]' # bash only
old answer: Since I cannot reproduce your problem with the provided input I take a wild guess:
The "empty" fields in the last column contain spaces. A field containing a space is not empty altough it looks like it is empty as you cannot see spaces.
To count only fields that contain something other than a space adapt your regex from . (any symbol) to [^ ] (any symbol other than space).
cut -d, -f3 yourFile.csv | grep -c '[^ ]'

shell script to extract text from a variable separated by forward slashes

I am trying to find a way to to extract text from a variable with words separated by a forward slash. I attempted it using cut, so here's an example:
set variable = '/one/two/three/four'
Say I just want to extract three from this, I used:
cut -d/ -f3 <<<"${variable}"
But this seems to not work. Any ideas of what I'm doing wrong? Or is there a way of using AWK to do this?
You need to remove the spaces before and after to = during string or variable assignment. And tell the cut command to print the 4th field.
$ variable='/one/two/three/four'
$ cut -d/ -f4 <<<"${variable}"
three
With the delimiter /, cut command splits the input like.
/one/two/three/four
| | | | |
1 2 3 4 5
that is, when it splits on first slash , you get an empty string as first column.
I think that the main problem here is in your assignment. Try this:
var='/one/two/three/four'
cut -d/ -f4 <<<"$var"
Here is an awk version:
awk -F\/ '{print $4}' <<< "$variable"
three
or
echo "$variable" | awk -F\/ '{print $4}'
three
PS to set a variable not need for set and remove spaces around =
variable='/one/two/three/four'

Cut command with delimiter Control-A

I am trying to get some columns from file1 to file2 using cut command with delimiter Control A.
This is what I tried:
cut -d^A -f2-8 a.dat > b.dat
If my records are like this:
A^AB^AC^AD^AE^AF^AG^AH^A$
my command gives:
AB^AC^AD^AE^AF^AG^AH
Is my command wrong or am I putting the delimiter in a wrong way?
So it leaves Control-A's A in the starting point.
^A is character number 1 in the ASCII table a.k.a Start of Heading character. If you're using bash, you can have this:
cut -f 2-8 -d $'\x01'
Or use printf (can be builtin or an external binary):
CTRL_A=$(printf '\x01')
cut -f 2-8 -d "$CTRL_A"
You can also verify your output with hexdump:
hexdump -C b.dat
I can't really understand your question, but would suggest you use tr to change your Control-As into something else more workable and maybe then change them back when you are finished:
tr '^A' ',' < yourfile | do some cutting using commas | tr ',' '^A' > newfile

How to make the 'cut' command treat same sequental delimiters as one?

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:
cat text.txt | cut -d " " -f 4
Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk
awk '{ printf $4; }'
or sed
sed -E "s/[[:space:]]+/ /g"
to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?
Try:
tr -s ' ' <text.txt | cut -d ' ' -f4
From the tr man page:
-s, --squeeze-repeats replace each input sequence of a repeated character
that is listed in SET1 with a single occurrence
of that character
As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.
Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
tr | cut
tr -s ' ' < file | cut -d' ' -f4
awk
awk '{print $4}' file
bash
while read -r _ _ _ myfield _
do
echo "forth field: $myfield"
done < file
sed
sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file
Tests
Given this file, let's test the commands:
$ cat a
this is line 1 more text
this is line 2 more text
this is line 3 more text
this is line 4 more text
tr | cut
$ cut -d' ' -f4 a
is
# it does not show what we want!
$ tr -s ' ' < a | cut -d' ' -f4
1
2 # this makes it!
3
4
$
awk
$ awk '{print $4}' a
1
2
3
4
bash
This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.
$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4
sed
This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.
$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4
shortest/friendliest solution
After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cuts for "cut on steroids".
cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.
One example, out of many, addressing this particular question:
$ cat text.txt
0 1 2 3
0 1 2 3 4
$ cuts 2 text.txt
2
2
cuts supports:
auto-detection of most common field-delimiters in files (+ ability to override defaults)
multi-char, mixed-char, and regex matched delimiters
extracting columns from multiple files with mixed delimiters
offsets from end of line (using negative numbers) in addition to start of line
automatic side-by-side pasting of columns (no need to invoke paste separately)
support for field reordering
a config file where users can change their personal preferences
great emphasis on user friendliness & minimalist required typing
and much more. None of which is provided by standard cut.
See also: https://stackoverflow.com/a/24543231/1296044
Source and documentation (free software): http://arielf.github.io/cuts/
This Perl one-liner shows how closely Perl is related to awk:
perl -lane 'print $F[3]' text.txt
However, the #F autosplit array starts at index $F[0] while awk fields start with $1
With versions of cut I know of, no, this is not possible. cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.

Resources