Remove the first word in a text stream - bash

How would I remove the first word from each line of text in a stream?
For example,
$ cat myfile
some text 1
some text 2
some text 3
I want:
$ cat myfile | magiccommand
text 1
text 2
text 3
How would I go about this using Bash? I could use awk '{print $2 $3 $4 $5 ....}', but that's messy and would result in extra spaces for all null arguments. I was thinking that sed might be able to do this, but I could not find any examples of this.

Based on your example text,
cut -d' ' -f2- yourFile
should do the job.

That should work:
$ cat test.txt
some text 1
some text 2
some text 3
$ sed -e 's/^\w*\ *//' test.txt
text 1
text 2
text 3

Here is a solution using awk
awk '{$1= ""; print $0}' yourfile

Run this:
sed "s/^some\s//g" myfile
You even don't need to use a pipe.

To remove the first word, until space no matter how many spaces exist, use: sed 's/[^ ]* *//'
Example:
$ cat myfile
some text 1
some text 2
some text 3
$ cat myfile | sed 's/[^ ]* *//'
text 1
text 2
text 3

Related

How to insert a generated value by a loop while you open a file in bash

Lets say that I have:
cat FILENAME1.txt
Definition john
cat FILENAME2.txt
Definition mary
cat FILENAME3.txt
Definition gary
cat textfile.edited
text
text
text
I want to obtain an ouput like:
1 john text
2 mary text
3 gary text
I tried to use "stored" values from FILENAMES "generated" by a loop. I wrote this:
for file in $(ls *.txt); do
name=$(cat $file| grep -i Definition|awk '{$1="";print $0}')
#echo $name --> this command works as it gives the names
done
cat textfile.edited| awk '{printf "%s\t%s\n",NR,$0}'
which very close to what I want to get
1 text
2 text
3 text
My issue was coming through when I tried to add the "stored" value. I tried the following with no success.
cat textfile.edited| awk '{printf "%s\t%s\n",$name,NR,$0}'
cat textfile.edited| awk '{printf "%s\t%s\n",name,NR,$0}'
cat textfile.edited| awk -v name=$name '{printf "%s\t%s\n",NR,$0}'
Sorry if the terminology used is not the best, but I started scripting recently.
Thank you in advance!!!
One solution using paste and awk ...
We'll append a count to the lines in textfile.edited (so we can see which lines are matched by paste):
$ cat textfile.edited
text1
text2
text3
First we'll look at the paste component:
$ paste <(egrep -hi Definition FILENAME*.txt) textfile.edited
Definition john text1
Definition mary text2
Definition gary text3
From here awk can do the final slicing-n-dicing-n-numbering:
$ paste <(egrep -hi Definition FILENAME*.txt) textfile.edited | awk 'BEGIN {OFS="\t"} {print NR,$2,$3}'
1 john text1
2 mary text2
3 gary text3
NOTE: It's not clear (to me) if the requirement is for a space or tab between the 2nd and 3rd columns; above solution assumes a tab, while using a space would be doable via a (awk) printf call.
You can do all with one awk command.
First file is the textfile.edited, other files are mentioned last.
awk 'NR==FNR {text[NR]=$0;next}
/^Definition/ {namenr++; names[namenr]=$2}
END { for (i=1;i<=namenr;i++) printf("%s %s %s\n", i, names[i], text[i]);}
' textfile.edited FILENAME*.txt
You can avoid awk with
paste -d' ' <(seq $(wc -l <textfile.edited)) \
<(sed -n 's/^Definition //p' FILE*) \
textfile.edited
Another version of the paste solution with a slightly careless grep -
$: paste -d\ <( grep -ho '[^ ]*$' FILENAME?.txt ) textfile.edited
john text
mary text
gary text
Or, one more way to look at it...
$: a=( $(sed '/^Definition /s/.* //;' FILENAME[123].txt) )
$: echo "${a[#]}"
john mary gary
$: b=( $(<textfile.edited) )
$: echo "${b[#]}"
text text text
$: c=-1 # initialize so that the first pre-increment returns 0
$: while [[ -n "${a[++c]}" ]]; do echo "${a[c]} ${b[c]}"; done
john text
mary text
gary text
This will put all the values in memory before printing anything, so if the lists are really large it might not be your best bet. If they are fairly small, it's pretty efficient, and a single parallel index will keep them in order.
If the lines are not the same as the number of files, what did you want to do? As long as there aren't more files than lines, and any extra lines are ok to ignore, this still works. If there are more files than lines, then we need to know how you'd prefer to handle that.
A one-liner using GNU utilities:
paste -d ' ' <(cat -n FILENAME*.txt | sed 's/\sDefinition//') textfile.edited
Or,
paste -d ' ' <(cat -n FILENAME*.txt | sed 's/^\s*//;s/\sDefinition//') textfile.edited
if the leading white spaces are not desired.
Alternatively:
paste -d ' ' <(sed 's/^Definition\s//' FILENAME*.txt | cat -n) textfile.edited

Adjusting column padding in bash

Any idea how can I put the output as the following?
Input:
1 GATTT
2 ATCGT
Desired output:
1 GATTT
2 ATCGT
I tried the following and it did not work
cut -c7,1-6,8-
$ awk -v OFS='\t' '{print $1,$2}' input
1 GATTT
2 ATCGT
or
$ awk '{print $1 "\t" $2}' input
SED can also be used:
sed "s/[:digit:]* .*/ &/g" input
1 GATTT
2 ATCGT
I'm assuming that the original whitespace were 6 spaces based on your cut command. The easiest way to knock this out with simple bash commands is using a tab for separation on the output.
echo " 1 GATTT" | cut -d ' ' -f 7- | tr ' ' '\t'
The cut command makes the delimeter a space character and takes from field 7 on. Then the tr (translate) command converts the remaining space to a tab.

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

sed, capture only the number

I have this text file:
some text A=10 some text
some more text A more text
some other text A=30 other text
I'm trying to use sed to capture only the numeric value of A. Using this
cat textfile | sed -r 's/.*A=(\S+).*/\1/'
I get:
10
some more text A more text
30
But what i really need is:
10
0
30
If the string A= does not exist output a 0. How can I accomplish this?
I cannot think on a one-liner, so this is my approach:
while read line
do
grep -Po '(?<=A=)\d+' <<< "$line" || echo "0"
done < file
I am using the look-behind grep to get any number after A=. In case there is none, the || (else) will print a 0.
I love code-golf!
sed -e 's/^/A=0 /; s/.*\<A=\(\d\+\).*/\1/'
This prepends A=0 to the line before substituting.
try this one-liner:
awk -F'A=' 'NF==1{print "0";next}{sub(/ .*/,"",$2);print $2}' file
with your data:
kent$ echo "some text A=10 some text
some more text A more text
some other text A=30 other text"|awk -F'A=' 'NF==1{print "0";next}{sub(/.*/,"",$2);print $2}'
10
0
30
gawk
awk '{$0=gensub(/^.*A=?([[:digit:]]+).*$/, "\\1", "g"); print($0+0)}' file.txt
This might work for you (GNU sed):
sed '/.*A=\([0-9][0-9]*\).*/s//\1/;t;s/.*/0/' file
Look for the string A= followed by one or more numbers and if it occurs replace the whole line by the back reference. Otherwise replace the whole of the line by 0.
I think the best way is to do two different commands - the first replaces lines without 'A=' with the line 'A=0', the second does what you did.
So
cat textfile | sed -r 's/^([^A]|A[^=)*$/A=0/' | sed -r 's/.*A=(\S+).*/\1/'
How about:
sed -r -e 's/.*A=(\S+).*/\1/' -e 's/.*A.*/0/'
Some grep-sed-cut combination:
grep -o 'A=\?[0-9]*' input | sed 's/A$/A=0/' | cut -d= -f2
Produces:
10
0
30

how to pick specific words from script and create a new one with them withouth spaces

I'm want to read a string from file
this string is for example
&0001 = 1234 5678 9abc
now I want to take this string and build another string from it which is
123456789abc
I succeeded to read the the string from the end of the file by
read_addr="`awk "END {print}" file.txt`"
echo ${read_addr}
how should I continue to create the string 123456789abc out of the above?
How about this instead:
tail -n 1 file.txt | sed 's/ //g' | sed 's/.*=//'
The tail -n 1 gives you the last line of the file and the sed 's/ //g' removes the spaces.
you can just change your awk line a little bit:
awk -F= 'END{gsub(/ /,"",$2);print $2}' file.txt
this awk line will do the simple task with single process.

Resources