Cut first appearing pattern from line - shell

I have a file say abc containing records like:
$cat xyz
ABC
ABCABC
ABCABCABC
I want to cut first pattern so result should be like:
AC
ACABC
ACABCABC
I am trying to cut pattern using awk like:
$ cat xyz|awk -F 'B' '{print $1,$2}'
A CA
A CA
A CA
Of course, B is deliminator so i am getting above result. How could i do that?
Thanks

I understand you want to delete first B in each line. If so, this will work:
sed 's/B//' xyx
Output:
AC
ACABC
ACABCABC
If you want the file to be replaced, add -i
sed -i 's/B//' xyx
I see you tried to edit my answer to add a new question - note that you have to do it updating your answer or writing in the comments.
Thanks and if i have one more case that i want to delete first pattern
only if i have more than one repeated pattern like:
$cat xyz
ABC
ABCABC
ABCABCABC
Output should be:
ABC
ACABC
ACABCABC
$cat xy
This can be a way to do it:
while read line
do
if [ `echo $line | grep -o "B" | wc -l` -ge 2 ]
then
echo $line | sed 's/B//'
else
echo $line
fi
done < xyz
Output:
ABC
ACABC
ACABCABC

Related

How to insert a generated value by a loop while you open a file in bash

Lets say that I have:
cat FILENAME1.txt
Definition john
cat FILENAME2.txt
Definition mary
cat FILENAME3.txt
Definition gary
cat textfile.edited
text
text
text
I want to obtain an ouput like:
1 john text
2 mary text
3 gary text
I tried to use "stored" values from FILENAMES "generated" by a loop. I wrote this:
for file in $(ls *.txt); do
name=$(cat $file| grep -i Definition|awk '{$1="";print $0}')
#echo $name --> this command works as it gives the names
done
cat textfile.edited| awk '{printf "%s\t%s\n",NR,$0}'
which very close to what I want to get
1 text
2 text
3 text
My issue was coming through when I tried to add the "stored" value. I tried the following with no success.
cat textfile.edited| awk '{printf "%s\t%s\n",$name,NR,$0}'
cat textfile.edited| awk '{printf "%s\t%s\n",name,NR,$0}'
cat textfile.edited| awk -v name=$name '{printf "%s\t%s\n",NR,$0}'
Sorry if the terminology used is not the best, but I started scripting recently.
Thank you in advance!!!
One solution using paste and awk ...
We'll append a count to the lines in textfile.edited (so we can see which lines are matched by paste):
$ cat textfile.edited
text1
text2
text3
First we'll look at the paste component:
$ paste <(egrep -hi Definition FILENAME*.txt) textfile.edited
Definition john text1
Definition mary text2
Definition gary text3
From here awk can do the final slicing-n-dicing-n-numbering:
$ paste <(egrep -hi Definition FILENAME*.txt) textfile.edited | awk 'BEGIN {OFS="\t"} {print NR,$2,$3}'
1 john text1
2 mary text2
3 gary text3
NOTE: It's not clear (to me) if the requirement is for a space or tab between the 2nd and 3rd columns; above solution assumes a tab, while using a space would be doable via a (awk) printf call.
You can do all with one awk command.
First file is the textfile.edited, other files are mentioned last.
awk 'NR==FNR {text[NR]=$0;next}
/^Definition/ {namenr++; names[namenr]=$2}
END { for (i=1;i<=namenr;i++) printf("%s %s %s\n", i, names[i], text[i]);}
' textfile.edited FILENAME*.txt
You can avoid awk with
paste -d' ' <(seq $(wc -l <textfile.edited)) \
<(sed -n 's/^Definition //p' FILE*) \
textfile.edited
Another version of the paste solution with a slightly careless grep -
$: paste -d\ <( grep -ho '[^ ]*$' FILENAME?.txt ) textfile.edited
john text
mary text
gary text
Or, one more way to look at it...
$: a=( $(sed '/^Definition /s/.* //;' FILENAME[123].txt) )
$: echo "${a[#]}"
john mary gary
$: b=( $(<textfile.edited) )
$: echo "${b[#]}"
text text text
$: c=-1 # initialize so that the first pre-increment returns 0
$: while [[ -n "${a[++c]}" ]]; do echo "${a[c]} ${b[c]}"; done
john text
mary text
gary text
This will put all the values in memory before printing anything, so if the lists are really large it might not be your best bet. If they are fairly small, it's pretty efficient, and a single parallel index will keep them in order.
If the lines are not the same as the number of files, what did you want to do? As long as there aren't more files than lines, and any extra lines are ok to ignore, this still works. If there are more files than lines, then we need to know how you'd prefer to handle that.
A one-liner using GNU utilities:
paste -d ' ' <(cat -n FILENAME*.txt | sed 's/\sDefinition//') textfile.edited
Or,
paste -d ' ' <(cat -n FILENAME*.txt | sed 's/^\s*//;s/\sDefinition//') textfile.edited
if the leading white spaces are not desired.
Alternatively:
paste -d ' ' <(sed 's/^Definition\s//' FILENAME*.txt | cat -n) textfile.edited

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

How to make cat start a new line

I have four files:
one_file.txt
abc | def
two_file.txt
ghi | jkl
three_file.txt
mno | pqr
four_WORD.txt
xyz| xyz
I want to concatenate all of the files ending with "file.txt" (i.e. all except four_WORD.txt) in order to get:
abc | def
ghi | jkl
mno | pqr
To accomplish this, I run:
cat *file.txt > full_set.txt
However, full_set.txt comes out as:
abc | defmno | pqrghi | jkl
Any ideas how to do this correctly and efficiently so that each ends up on its own line? In reality, I need to do the above for a lot of very large files. Thank you in advance for your help.
Try:
awk 1 *file.txt > full_set.txt
This is less efficient than a bare cat but will add an extra \n if missing at the end of each file
Many tools will add newlines if they are missing. Try e.g.
sed '' *file.txt >full_set.txt
but this depends on your sed version. Others to try include Awk, grep -ho '.*' file*.txt and etc.
this works for me:
for file in $(ls *file.txt) ; do cat $file ; echo ; done > full_set.txt
I hope this will help you.
You can loop over each file and do a check to see if the last line ends in a new line, outputting one if it doesn't.
for file in *file.txt; do
cat "$file"
[[ $(tail -c 1 "$file") == "" ]] || echo
done > full_set.txt
You can use one line for loop for this. The following line:
for f in *_file.txt; do (cat "${f}") >> full_set.txt; done
Yields the desired output:
$ cat full_set.txt
abc | def
mno | pqr
ghi | jkl
Also, possible duplicate.
find . -name "*file.txt" | xargs cat > full_set.txt

Reorder lines of file by given sequence

I have a document A which contains n lines. I also have a sequence of n integers all of which are unique and <n. My goal is to create a document B which has the same contents as A, but with reordered lines, based on the given sequence.
Example:
A:
Foo
Bar
Bat
sequence: 2,0,1 (meaning: First line 2, then line 0, then line 1)
Output (B):
Bat
Foo
Bar
Thanks in advance for the help
Another solution:
You can create a sequence file by doing (assuming sequence is comma delimited):
echo $sequence | sed s/,/\\n/g > seq.txt
Then, just do:
paste seq.txt A.txt | sort tmp2.txt | sed "s/^[0-9]*\s//"
Here's a bash function. The order can be delimited by anything.
Usage: schwartzianTransform "A.txt" 2 0 1
function schwartzianTransform {
local file="$1"
shift
local sequence="$#"
echo -n "$sequence" | sed 's/[^[:digit:]][^[:digit:]]*/\
/g' | paste -d ' ' - "$file" | sort -n | sed 's/^[[:digit:]]* //'
}
Read the file into an array and then use the power of indexing :
echo "Enter the input file name"
read ip
index=0
while read line ; do
NAME[$index]="$line"
index=$(($index+1))
done < $ip
echo "Enter the file having order"
read od
while read line ; do
echo "${NAME[$line]}";
done < $od
[aman#aman sh]$ cat test
Foo
Bar
Bat
[aman#aman sh]$ cat od
2
0
1
[aman#aman sh]$ ./order.sh
Enter the input file name
test
Enter the file having order
od
Bat
Foo
Bar
an awk oneliner could do the job:
awk -vs="$s" '{d[NR-1]=$0}END{split(s,a,",");for(i=1;i<=length(a);i++)print d[a[i]]}' file
$s is your sequence.
take a look this example:
kent$ seq 10 >file #get a 10 lines file
kent$ s=$(seq 0 9 |shuf|tr '\n' ','|sed 's/,$//') # get a random sequence by shuf
kent$ echo $s #check the sequence in var $s
7,9,1,0,5,4,3,8,6,2
kent$ awk -vs="$s" '{d[NR-1]=$0}END{split(s,a,",");for(i=1;i<=length(a);i++)print d[a[i]]}' file
8
10
2
1
6
5
4
9
7
3
One way(not an efficient one though for big files):
$ seq="2 0 1"
$ for i in $seq
> do
> awk -v l="$i" 'NR==l+1' file
> done
Bat
Foo
Bar
If your file is a big one, you can use this one:
$ seq='2,0,1'
$ x=$(echo $seq | awk '{printf "%dp;", $0+1;print $0+1> "tn.txt"}' RS=,)
$ sed -n "$x" file | awk 'NR==FNR{a[++i]=$0;next}{print a[$0]}' - tn.txt
The 2nd line prepares a sed command print instruction, which is then used in the 3rd line with the sed command. This prints only the line numbers present in the sequence, but not in the order of the sequence. The awk command is used to order the sed result depending on the sequence.

sed move text in .txt to next line

I am trying to parse out a text file that looks like the following:
EMPIRE,STATE,BLDG,CO,494202320000008,336,5,AVE,ENT,NEW,YORK,NY,10003,N,3/1/2012,TensionCode,VariableICAP,PFJICAP,Residential,%LBMPZone,L,9,146.0,,,10715.0956,,,--,,0,,,J,TripNumber,ServiceClass,PreviousAccountNumber,MinMonthlyDemand,TODCode,Profile,Tax,Muni,41,39,00000000000000,9952,54,Y,Non-Taxable,--,FromDate,ToDate,Use,Demand,BillAmt,12/29/2011,1/31/2012,4122520,6,936.00,$293,237.54
what I would like to see is the data stacked
- EMPIRE STATE BLDG CO
- 494202320000008
- 336 5 AVE ENT
- NEW YORK NY
and so on. If anything, after each comma I would want the text following to go to a new txt line. Ultimatly in regards to the last line where it states date from forward, I would like to have it in a txt file like
- From Date ToDate use Demand BillAmt
- 12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
I am using cygwin on a windows XP machine. Thank you in advance for any assistance.
For getting the last line into a separate file:
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > lastlinefile.txt
cat originalfile.txt | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> lastlinefile.txt
For the rest:
cat originalfile.txt | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > nocommas.txt
Your mileage may vary as far as the first '\n' is concerned in the second command. It if doesn't work properly replace it with a space (assuming your data doesn't have spaces).
Or, if you like, a shell script to operate on a file and split it:
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
echo -e "From Date\tToDate\tuse\tDemand\tBillAmt" > "$1_lastline.txt"
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{print $2}' | sed 's/FromDate,ToDate,use,Demand,BillAmt,//' | sed 's/,/\t/' >> "$1_lastline.txt"
cat "$1" | sed -r 's/,Fromdate[^\n]+//' | sed 's/,/\n/' | sed -r 's/$/\n\n' > "$1_fixed.txt"
Just paste it into a file and run it. It's been years since I used Cygwin... you may have to chmod +x file it first.
I'm providing you two answers depending on how you wanted the file. The previous answer split it into two files, this one keeps it all in one file in the format:
EMPIRE
STATE
BLDG
CO
494202320000008
336
5
AVE
ENT
NEW
YORK
NY
From Date ToDate use Demand BillAmt
12/29/2011 1/31/2012 4122520 6,936.00 $293,237.54.
That's the best I can do with the delimiters have you set in place. If you'd have left it something like "EMPIRE STATE BUILDING CO,494202320000008,336 5 AVE ENT,NEW YORK,NY" it'd be a lot easier.
#!/bin/bash
if [ -z "$1" ]
then echo "Usage: $0 filename.txt; exit; fi
cat "$1" | sed 's/,FromDate/~Fromdate/' | awk -v FS="~" '{gsub(",","\n",$1);print $1;print "FromDate\tToDate\tuse\tDemand\tBillAmt";gsub("FromDate,ToDate,use,Demand,BillAmt","",$2);gsub(",","\t",$2);print $2}' >> "$1_fixed.txt"
again, just paste it into a file and run it from Cygwin: ./filename.sh

Resources