The current code below the grep & cut is outputting 51315123&category_id , I need to remove &category_id can cut be used to do that?
... | tr '?' '\n' | grep "^recipient_dvd_id=" | cut -d '=' -f 2 >> dvdIDs.txt
Yes, I would think so
... | cut -d '&' -f 1
If you're open to using AWK:
... | tr '?' '\n' | awk -F'[=&]' '/^recipient_dvd_id=/ {print $2}' >> dvdIDs.txt
AWK handles the regex and splitting fields, in this case using both '=' and '&' as field separators and printing the second column. Otherwise, you will need a second cut statement.
Related
I have a UTF-8 file which has curly quotes ‘Awaara’ like these and in some places curly quotes are used such as don’t and don't' . The issue arises when trying to convert these curly quotes to single quotes. After converting to single quotes, I am unable to extract the single quotes words 'Awaara' without removing all single quotes used as don't , I'm.
GOAL: Convert curly--> single, remove single quotes yet keep apostrophied single quotes.
Here's the code I have written which convert yet fails to remove words within single quotes:
#!/bin/bash
cat $1 | sed -e "s/\’/'/g" -e "s/\‘/'/g" | sed -e "s/^'/ /g" -e "s/'$/ /g" | sed "s/\…/ /g" | tr '>' ' ' | tr '?' ' ' | tr ',' ' ' | tr ';' ' ' | tr '.' ' ' | tr '!' ' ' | tr '′' ' ' | tr ':' ' ' | sed -e "s/\[/ /g" -e "s/\]/ /g" -e 's/(/ /g' -e "s/)/ /g" | tr ' ' '\n' | sort -u | uniq | tr 'a-z' 'A-Z' >our_vocab.txt
The output is:
'AWAARA ---> Should be AWAARA
25
50
70
800
A
AD
AI
AMITABH
AND
ANYWAY
ARE
BACHCHAN
BECAUSE
BUT
C++
CAN
CHECK
COMPUTER
DEVAKI
DIFFICULT
.
.
.
HOON' --> Should be HOON
You can use
sed -E -e "s/([[:alpha:]]['’][[:alpha:]])|['‘’]/\\1/g" \
-e 's/[][()>?,;.!:]|′|…/ /g' "$1" | tr ' ' '\n' | sort -u | \
tr 'a-z' 'A-Z' > our_vocab.txt
See the online demo.
I merged several tr commands into a single (second) sed command, and the ([[:alpha:]]['’][[:alpha:]])|['‘’] regex removes all '‘’ apostrophes other than those in between letters.
I am trying to print only specific output from sentence like below
Before and after dot text should be printed
InputVar="ABC SDFSG XYZ.AFGAJK JKK"
Expected output :
XYZ.AFGAJK
I am using cut command not working
echo "$InputVar" | cut -d'' -f2
Any other approach ?
Here are a few suggestions. awk with RS set to a space seems easiest. YMMV
$ echo "$InputVar" | cut -d ' ' -f 3
XYZ.AFGAJK
$ echo "$InputVar" | awk '/\./' RS=' '
XYZ.AFGAJK
$ echo "$InputVar" | awk '{for(i=1;i<=NF;i++) if(match($i,"\\.")) print $i}'
XYZ.AFGAJK
$ echo "$InputVar" | sed -n 's/.* \([^ .]*[.][^ .]*\) .*/\1/p'
XYZ.AFGAJK
Using cut:
If you really want to use cut, then you could try:
echo "$InputVar" | cut -d' ' -f3
Which uses a space character as a delimiter (you originally had an empty string, which is not allowed), and extracts field 3 rather than field 2.
Using grep:
You can use grep rather than cut, to match & extract specifically what you want:
echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
Explanation:
The -E option is for extended regex
The -o option is for extracting the matched component only
The regex matches a literal ., surrounded by a non-empty sequence of non-space characters
Comparing the two methods:
Either of these will work with your shown example. But, suppose the input string was instead:
InputVar="ABC SDFSG XYZ.AFGAJK JKK XYZ.ABC"
The version using grep would give all the matches (a literal . with non-space characters on either side).
Using cut however, you would need to specify the specific fields you want, i.e.
$ echo "$InputVar" | cut -d' ' -f3,5
XYZ.AFGAJK
XYZ.ABC
If you instead wanted just the n-th match, using the grep approach, you could use sed to select the n-th match, e.g.
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
XYZ.AFGAJK
XYZ.ABC
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '1q;d'
XYZ.AFGAJK
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '2q;d'
XYZ.ABC
I am new to shell scripting and I found following line of code in a given script.
Could someone explain me with an example what the following line of code means
Path=`echo $line | awk -F '|' '{print $1}'`
echo $line will print the value of the variable $line, the | symbol means that the output of this will be passed (or piped) to another program/command/script. I will not attempt to explain awk here, but what is done above is that the output from the echo $line is taken and processed with it.
the option -FS as per awk man page means
-F fs Use fs for the input field separator
so the string after it will be used to split the input string given to awk into different fields. Example, you variable $line has a value of a|b it will be split into two fields a and b. What is to be done with this is specified within the '{}' expression.
Again, what can be done in there is next to infinite, here the only thing that is done is to print the first field which can be accessed with $1, or a in the above example ($2 would be b as can be guessed).
Finally, the output of this whole operation is then stored in the variable Path.
to summarize:
line="a|b"
echo $line | awk -F '|' '{print $1}'
> a
Path=`echo $line | awk -F '|' '{print $1}'`
echo $Path
> a
echo $line | awk -F '|' '{print $1}'
Explanation:
echo -> display a line of text
$line -> parameter expansion read the line
| -> A pipeline is a sequence of one or more commands separated by one of the control operators |
awk -> Invoke awk program
-F '|' -> Field separator as | for the data feed
'{print $1}' -> Print the first field
Example
echo 'a|b|c' | awk -F '|' '{print $1}'
will print a
I think this is just a complicated way to express
echo ${line%%|*}
i.e. write to stdout the part of the content of the variable line which goes up to - but not including - the first vertical bar.
Path=`echo $line | awk -F '|' '{print $1}'`
^ ^ ^ ^
| | | |
| | | print 1st column
| | |
| | input field separator
| |
| echo variable line
|
variable Path
-F'|' - by default awk splits record/line/row into columns by single space, but with |, awk splits by pipe
Above one can be written as
Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
Suppose say
$ line="1|2|3"
$ Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
$ echo $Path; # you get first column
1
Same as
$ Path=$( cut -d'|' -f1 <<< "$line" )
$ echo $Path;
1
the default field separator is ' ', if you have -F , means change default separator to '|'
I have a file which contains the same headings for different information. I want to extract the information for one of them. How to do it?
Actually, I want to extract number 234874 from /membership_number="ID:234874 for the person named sarah, but not them same ID from John. Actually, the number can be anything, I just want to extract the number with the condition that I don't know the exact number to use: grep '234874'
Try this:
grep -v '^$' <filename> | awk '/Information \/Name="Sarah"/ {getline; getline; print $1}' | cut -d':' -f2 | tr -d '"'
Here:
grep -v '^$' <filename>: This removes the blank lines.
awk '/Information \/Name="Sarah"/ {getline; getline; print $1}': This finds the name and gets the membership line.
cut -d':' -f2 | tr -d '"': This fetches the exact number.
Something like
grep -E "Name=\"Sarah\"" inputfile | grep -Eo "membership_number=\"[^\"]*" | cut -d: -f2
or put things together with
sed -n 's/.*Name="Sarah".*membership_number="ID:\([^"]*\).*/\1/p' inputfile
Can someone explain to me, what exactly am i trimming? Like everything, that the \n, the head, the cut, the -d etc. means.
classnumber=$(cat "ClassTimetable.aspx?CourseId=156784&TermCode=1620" | tr '\n' '\r' | head -n 1 | cut -d '>' -f1235- | cut -d '<' -f1)
Thanks
cat "ClassTimetable.aspx?CourseId=156784&TermCode=1620" | \
tr '\n' '\r' | # replace Line Feed with Carriage Return
head -n 1 | # take what's in the first line
cut -d '>' -f1235- | # take everything after the 1235th '>' until end of line
cut -d '<' -f1) # take the first chunk before the first '<'
Experiment with it to understand what it does, for example try to reduce it to
echo "1>2>3>4>5>6>7>8>9>10>11>12>13><14>15" | cut -d '>' -f12- | cut -d '<' -f1
And come up with an explanation on how does that work.
You are trimming
\n , \r
First line
Getting field, 1,2,3,5 (until end of line). '>' is the delimiter
And from result of 3, getting the first column ('<' is the delimiter)