shell script to extract text from a variable separated by forward slashes - shell

I am trying to find a way to to extract text from a variable with words separated by a forward slash. I attempted it using cut, so here's an example:
set variable = '/one/two/three/four'
Say I just want to extract three from this, I used:
cut -d/ -f3 <<<"${variable}"
But this seems to not work. Any ideas of what I'm doing wrong? Or is there a way of using AWK to do this?

You need to remove the spaces before and after to = during string or variable assignment. And tell the cut command to print the 4th field.
$ variable='/one/two/three/four'
$ cut -d/ -f4 <<<"${variable}"
three
With the delimiter /, cut command splits the input like.
/one/two/three/four
| | | | |
1 2 3 4 5
that is, when it splits on first slash , you get an empty string as first column.

I think that the main problem here is in your assignment. Try this:
var='/one/two/three/four'
cut -d/ -f4 <<<"$var"

Here is an awk version:
awk -F\/ '{print $4}' <<< "$variable"
three
or
echo "$variable" | awk -F\/ '{print $4}'
three
PS to set a variable not need for set and remove spaces around =
variable='/one/two/three/four'

Related

Shell: Counting lines per column while ignoring empty ones

I am trying to simply count the lines in the .CSV per column, while at the same time ignoring empty lines.
I use below and it works for the 1st column:
cat /path/test.csv | cut -d, -f1 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 8
And below for the 2nd column:
cat /path/test.csv | cut -d, -f2 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 6
But when I try to count 3rd column, it simply Outputs the Total number of lines in the whole .CSV.
cat /path/test.csv | cut -d, -f3 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 33
#Should be: 19?
I've also tried to use awk instead of cut, but get the same issue.
I have tried creating new file thinking maybe it had some spaces in the lines, still the same.
Can someone clarify what is the difference? Betwen reading 1-2 column and the rest?
20355570_01.tif,,
20355570_02.tif,,
21377804_01.tif,,
21377804_02.tif,,
21404518_01.tif,,
21404518_02.tif,,
21404521_01.tif,,
21404521_02.tif,,
,22043764_01.tif,
,22043764_02.tif,
,22095060_01.tif,
,22095060_02.tif,
,23507574_01.tif,
,23507574_02.tif,
,,23507574_03.tif
,,23507804_01.tif
,,23507804_02.tif
,,23507804_03.tif
,,23509247_01.tif
,,23509247_02.tif
,,23509247_03.tif
,,23527663_01.tif
,,23527663_02.tif
,,23527663_03.tif
,,23527908_01.tif
,,23527908_02.tif
,,23527908_03.tif
,,23535506_01.tif
,,23535506_02.tif
,,23535562_01.tif
,,23535562_02.tif
,,23535636_01.tif
,,23535636_02.tif
That happens when input file has DOS line endings (\r\n). Fix your file using dos2unix and your command will work for 3rd column too.
dos2unix /path/test.csv
Or, you can remove the \r at the end while counting non-empty columns using awk:
awk -F, '{sub(/\r/,"")} $3!=""{n++} END{print n}' /path/test.csv
The problem is in the grep command: the way you wrote it will return 33 lines when you count the 3rd column.
It's better instead to use the following command to count number of lines in .CSV for each column (example below is for the 3rd column):
cat /path/test.csv | cut -d , -f3 | grep -cve '^\s*$'
This will return the exact number of lines for each column and avoid of piping into wc.
See previous post here:
count (non-blank) lines-of-code in bash
edit: I think oguz ismail found the actual reason in their answer. If they are right and your file has windows line endings you can use one of the following commands without having to convert the file.
cut -d, -f3 yourFile.csv cut | tr -d \\r | grep -c .
cut -d, -f3 yourFile.csv | grep -c $'[^\r]' # bash only
old answer: Since I cannot reproduce your problem with the provided input I take a wild guess:
The "empty" fields in the last column contain spaces. A field containing a space is not empty altough it looks like it is empty as you cannot see spaces.
To count only fields that contain something other than a space adapt your regex from . (any symbol) to [^ ] (any symbol other than space).
cut -d, -f3 yourFile.csv | grep -c '[^ ]'

Shell - How can I grep a particular word in a sentence?

I want to grep some part of the sentence, for example: /hana/new/register. In this I need to grep the first element between the / characters, so here I want to get hana.
How can I do that in shell?
You can use sed and capture whatever is between the first /.../ using a character class and back reference. For example:
echo '/samarth/new/register' | sed 's/^\/\([^/]*\).*$/\1/'
samarth
Where the sed command is the basic substitute command with the form sed 's/find/replace/' where you find everything after the first / (escaped as \/) and anchored to the beginning with ^. You use a capture group \(...\) to capture the character class [^/]* (everything not a /) and in the replace side of the substitute, you use the backreference \1 to put what you first captured in as the replace.
To get the first word on a slash-separated line, we can use cut:
$ echo '/samarth/new/register then i want to grep samarth' | cut -d/ -f 2
samarth
$ echo '/hana/new/register' | cut -d/ -f 2
hana
Or, we can use awk:
$ echo '/samarth/new/register then i want to grep samarth' | awk -F/ '{print $2}'
samarth
$ echo '/hana/new/register' | awk -F/ '{print $2}'
hana

How to extract text by unspecified spaces

I'm trying to extract usernames from a text file in one per line format and from my research, it seems like the only way to do it is by spacing commands here's the format:
1 user 3
2 fusrfff 4
3 usrf 12
The only problem is because all of the users are different I can't define a static space amount. There's also the fact the UIDs (first numbers) go from 1-40k. There's a bunch of other information after the user group number too. Can anyone point me in the right direction? Thanks.
awk does not care about the amount of space between fields:
awk '{print $2}' your_file.txt
If you want to go with bash only, read does not care either:
while read uid username other_stuff; do
printf '%s\n' "$name"
done < your_file.txt
First replace spaces by one space. You can use sed 's/ +/ /g' or
tr -s " " < file.txt| cut -d" " -f2
This is using sed
$ cat file.txt| sed "s/ */ /g" | cut -d' ' -f2
user
fusrfff
usrf

Shell cut delimiter before last

I`m trying to cut a string (Name of a file) where I have to get a variable in the name.
But the problem is, I have to put it in a shell variable, until now it is ok.
Here is the example of what i have to do.
NAME_OF_THE_FILE_VARIABLEiWANTtoGET_DATE
NAMEfile_VARIABLEiWANT_DATE
NAME_FILE_VARIABLEiWANT_DATE
The position of the variable I want always can change, but it will be always 1 before last. The delimiter is the "_".
Is there a way to count the size of the array to get size-1 or something like that?
OBS: when i cut strings I always use things like that:
VARIABLEiWANT=`echo "$FILENAME" | cut 1 -d "_"`
awk -F'_' '{print $(NF-1)}' file
or you have a string
awk -F'_' '{print $(NF-1)}' <<< "$FILENAME"
save the output of above oneliner into your variable.
IFS=_ read -a array <<< "$FILENAME"
variable_i_want=${array[${#array[#]}-2]}
It's a bit of a mess visually, but it's more efficient than starting a new process. ${#array[#]} is the number of elements read from FILENAME, so the indices for the array range from 0 to ${#array[#]}-1.
As of bash 4.3, though, you can use a negative index instead of computing it.
variable_i_want=${array[-2]}
If you need POSIX compatibility (no arrays), then
tmp=${FILENAME%_${FILENAME##*_}} # FILENAME with last field removed
variable_i_want=${tmp##*_} # last field of tmp
Just got it... I find someone using a cat function... I got to use it with the echo... and rev. didn't understand the rev thing, but I think it revert the order of the delimiter.
CODIGO=`echo "$ARQ_NAME" | rev | cut -d "_" -f 2 | rev `

Cut command does not appear to be working

I'm piping a command to cut and nothing appears to be happening.
The output of the command looks like this:
Name File Info OS
11 FileName1 OS1
12 FileName2 OS2
13 FileName3 OS3
I'm trying to extract column 1,2 from all rows (starting with row 2) using the following:
my_command | cut -f1,2 and the output is exactly the same as the original.
Cut doen't behave well with multiple spaces as a delimiter. Use awk instead
mycommand | awk 'NR>1{print $1,$2}'
use tr -s to convert repeating spaces into single space. Now cut can be used where single space is delimiter seperating columns.
mycommand | tr -s ' ' | cut -d' ' -f1,2
If multiple spaces are used for a delimiter and the column positions are fixed, you would use column numbers with cut:
mycommand | cut -c1-27
Or you could lose the front spaces with:
mycommand | cut -c5-27
This will work even if your fields have embedded spaces. The awk method will fail if you have embedded spaces in your fields.

Resources