shell cut command to remove characters

shell cut command to remove characters - bash

The current code below the grep & cut is outputting 51315123&category_id , I need to remove &category_id can cut be used to do that?
... | tr '?' '\n' | grep "^recipient_dvd_id=" | cut -d '=' -f 2 >> dvdIDs.txt

Yes, I would think so
... | cut -d '&' -f 1

If you're open to using AWK:
... | tr '?' '\n' | awk -F'[=&]' '/^recipient_dvd_id=/ {print $2}' >> dvdIDs.txt
AWK handles the regex and splitting fields, in this case using both '=' and '&' as field separators and printing the second column. Otherwise, you will need a second cut statement.

Related

Extract words within curly quotes but keep it when used as apostrophe

I have a UTF-8 file which has curly quotes ‘Awaara’ like these and in some places curly quotes are used such as don’t and don't' . The issue arises when trying to convert these curly quotes to single quotes. After converting to single quotes, I am unable to extract the single quotes words 'Awaara' without removing all single quotes used as don't , I'm.
GOAL: Convert curly--> single, remove single quotes yet keep apostrophied single quotes.
Here's the code I have written which convert yet fails to remove words within single quotes:
#!/bin/bash
cat $1 | sed -e "s/\’/'/g" -e "s/\‘/'/g" | sed -e "s/^'/ /g" -e "s/'$/ /g" | sed "s/\…/ /g" | tr '>' ' ' | tr '?' ' ' | tr ',' ' ' | tr ';' ' ' | tr '.' ' ' | tr '!' ' ' | tr '′' ' ' | tr ':' ' ' | sed -e "s/\[/ /g" -e "s/\]/ /g" -e 's/(/ /g' -e "s/)/ /g" | tr ' ' '\n' | sort -u | uniq | tr 'a-z' 'A-Z' >our_vocab.txt
The output is:
'AWAARA ---> Should be AWAARA
25
50
70
800
A
AD
AI
AMITABH
AND
ANYWAY
ARE
BACHCHAN
BECAUSE
BUT
C++
CAN
CHECK
COMPUTER
DEVAKI
DIFFICULT
.
.
.
HOON' --> Should be HOON

You can use
sed -E -e "s/([[:alpha:]]['’][[:alpha:]])|['‘’]/\\1/g" \
-e 's/[][()>?,;.!:]|′|…/ /g' "$1" | tr ' ' '\n' | sort -u | \
tr 'a-z' 'A-Z' > our_vocab.txt
See the online demo.
I merged several tr commands into a single (second) sed command, and the ([[:alpha:]]['’][[:alpha:]])|['‘’] regex removes all '‘’ apostrophes other than those in between letters.

How to print before and after dot text Unix

I am trying to print only specific output from sentence like below
Before and after dot text should be printed
InputVar="ABC SDFSG XYZ.AFGAJK JKK"
Expected output :
XYZ.AFGAJK
I am using cut command not working
echo "$InputVar" | cut -d'' -f2
Any other approach ?

Here are a few suggestions. awk with RS set to a space seems easiest. YMMV
$ echo "$InputVar" | cut -d ' ' -f 3
XYZ.AFGAJK
$ echo "$InputVar" | awk '/\./' RS=' '
XYZ.AFGAJK
$ echo "$InputVar" | awk '{for(i=1;i<=NF;i++) if(match($i,"\\.")) print $i}'
XYZ.AFGAJK
$ echo "$InputVar" | sed -n 's/.* \([^ .]*[.][^ .]*\) .*/\1/p'
XYZ.AFGAJK

Using cut:
If you really want to use cut, then you could try:
echo "$InputVar" | cut -d' ' -f3
Which uses a space character as a delimiter (you originally had an empty string, which is not allowed), and extracts field 3 rather than field 2.
Using grep:
You can use grep rather than cut, to match & extract specifically what you want:
echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
Explanation:
The -E option is for extended regex
The -o option is for extracting the matched component only
The regex matches a literal ., surrounded by a non-empty sequence of non-space characters
Comparing the two methods:
Either of these will work with your shown example. But, suppose the input string was instead:
InputVar="ABC SDFSG XYZ.AFGAJK JKK XYZ.ABC"
The version using grep would give all the matches (a literal . with non-space characters on either side).
Using cut however, you would need to specify the specific fields you want, i.e.
$ echo "$InputVar" | cut -d' ' -f3,5
XYZ.AFGAJK
XYZ.ABC
If you instead wanted just the n-th match, using the grep approach, you could use sed to select the n-th match, e.g.
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
XYZ.AFGAJK
XYZ.ABC
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '1q;d'
XYZ.AFGAJK
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '2q;d'
XYZ.ABC

what does this bash script line of code mean

I am new to shell scripting and I found following line of code in a given script.
Could someone explain me with an example what the following line of code means
Path=`echo $line | awk -F '|' '{print $1}'`

echo $line will print the value of the variable $line, the | symbol means that the output of this will be passed (or piped) to another program/command/script. I will not attempt to explain awk here, but what is done above is that the output from the echo $line is taken and processed with it.
the option -FS as per awk man page means
-F fs Use fs for the input field separator
so the string after it will be used to split the input string given to awk into different fields. Example, you variable $line has a value of a|b it will be split into two fields a and b. What is to be done with this is specified within the '{}' expression.
Again, what can be done in there is next to infinite, here the only thing that is done is to print the first field which can be accessed with $1, or a in the above example ($2 would be b as can be guessed).
Finally, the output of this whole operation is then stored in the variable Path.
to summarize:
line="a|b"
echo $line | awk -F '|' '{print $1}'
> a
Path=`echo $line | awk -F '|' '{print $1}'`
echo $Path
> a

echo $line | awk -F '|' '{print $1}'
Explanation:
echo -> display a line of text
$line -> parameter expansion read the line
| -> A pipeline is a sequence of one or more commands separated by one of the control operators |
awk -> Invoke awk program
-F '|' -> Field separator as | for the data feed
'{print $1}' -> Print the first field
Example
echo 'a|b|c' | awk -F '|' '{print $1}'
will print a

I think this is just a complicated way to express
echo ${line%%|*}
i.e. write to stdout the part of the content of the variable line which goes up to - but not including - the first vertical bar.

Path=`echo $line | awk -F '|' '{print $1}'`
^ ^ ^ ^
| | | |
| | | print 1st column
| | |
| | input field separator
| |
| echo variable line
|
variable Path
-F'|' - by default awk splits record/line/row into columns by single space, but with |, awk splits by pipe
Above one can be written as
Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
Suppose say
$ line="1|2|3"
$ Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
$ echo $Path; # you get first column
1
Same as
$ Path=$( cut -d'|' -f1 <<< "$line" )
$ echo $Path;
1

the default field separator is ' ', if you have -F , means change default separator to '|'

Retrieve an exact word- Unix

I have a file which contains the same headings for different information. I want to extract the information for one of them. How to do it?
Actually, I want to extract number 234874 from /membership_number="ID:234874 for the person named sarah, but not them same ID from John. Actually, the number can be anything, I just want to extract the number with the condition that I don't know the exact number to use: grep '234874'

Try this:
grep -v '^$' <filename> | awk '/Information \/Name="Sarah"/ {getline; getline; print $1}' | cut -d':' -f2 | tr -d '"'
Here:
grep -v '^$' <filename>: This removes the blank lines.
awk '/Information \/Name="Sarah"/ {getline; getline; print $1}': This finds the name and gets the membership line.
cut -d':' -f2 | tr -d '"': This fetches the exact number.

Something like
grep -E "Name=\"Sarah\"" inputfile | grep -Eo "membership_number=\"[^\"]*" | cut -d: -f2
or put things together with
sed -n 's/.*Name="Sarah".*membership_number="ID:\([^"]*\).*/\1/p' inputfile

Bash output (triming string)

Can someone explain to me, what exactly am i trimming? Like everything, that the \n, the head, the cut, the -d etc. means.
classnumber=$(cat "ClassTimetable.aspx?CourseId=156784&TermCode=1620" | tr '\n' '\r' | head -n 1 | cut -d '>' -f1235- | cut -d '<' -f1)
Thanks

cat "ClassTimetable.aspx?CourseId=156784&TermCode=1620" | \
tr '\n' '\r' | # replace Line Feed with Carriage Return
head -n 1 | # take what's in the first line
cut -d '>' -f1235- | # take everything after the 1235th '>' until end of line
cut -d '<' -f1) # take the first chunk before the first '<'
Experiment with it to understand what it does, for example try to reduce it to
echo "1>2>3>4>5>6>7>8>9>10>11>12>13><14>15" | cut -d '>' -f12- | cut -d '<' -f1
And come up with an explanation on how does that work.

You are trimming
\n , \r
First line
Getting field, 1,2,3,5 (until end of line). '>' is the delimiter
And from result of 3, getting the first column ('<' is the delimiter)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

shell cut command to remove characters - bash

The current code below the grep & cut is outputting 51315123&category_id , I need to remove &category_id can cut be used to do that? ... | tr '?' '\n' | grep "^recipient_dvd_id=" | cut -d '=' -f 2 >> dvdIDs.txt

Yes, I would think so ... | cut -d '&' -f 1

If you're open to using AWK: ... | tr '?' '\n' | awk -F'[=&]' '/^recipient_dvd_id=/ {print $2}' >> dvdIDs.txt AWK handles the regex and splitting fields, in this case using both '=' and '&' as field separators and printing the second column. Otherwise, you will need a second cut statement.

Related

Extract words within curly quotes but keep it when used as apostrophe

How to print before and after dot text Unix

what does this bash script line of code mean

Retrieve an exact word- Unix

Bash output (triming string)

Categories

Resources