Bash - How to remove similar (not identical) lines? Grep? Sed? Awk? - bash

I need to remove similar (not identical) lines from a file. e.g.
file.txt
/Bota-Espaco-Fashion-Com-Tachas-e-Cristais-Bege-1472878.html#botas
/Bota-Raphaella-Booz-Cano-Curto-2-Fivelas-Vermelha-1458535.html#botas
/Bota-Dumond-Country-3-Fivelas-Caramelo-1481004.html#botas
/Bota-Espaco-Fashion-Com-Tachas-e-Cristais-Bege-1472878.html
/Bota-Raphaella-Booz-Cano-Curto-2-Fivelas-Vermelha-1458535.html
/Bota-Dumond-Country-3-Fivelas-Caramelo-1481004.html
Wanted results: (unique lines ending with #botas)
/Bota-Espaco-Fashion-Com-Tachas-e-Cristais-Bege-1472878.html#botas
/Bota-Raphaella-Booz-Cano-Curto-2-Fivelas-Vermelha-1458535.html#botas
/Bota-Dumond-Country-3-Fivelas-Caramelo-1481004.html#botas
Any handy solution?

With awk:
awk -F\# '!a[$1]++' your_file.txt
Output:
/Bota-Espaco-Fashion-Com-Tachas-e-Cristais-Bege-1472878.html#botas
/Bota-Raphaella-Booz-Cano-Curto-2-Fivelas-Vermelha-1458535.html#botas
/Bota-Dumond-Country-3-Fivelas-Caramelo-1481004.html#botas

Related

Removing all the characters from a string after pattern+2

I am trying to remove all the characters from a string after a specific pattern +2 in bash.
In this case I have for example:
3434.586909
3434.58690932454
3434.5869093232r3353
I'd like to keep just 3434.58
I tried with awk and a wildcard but my test haven't worked yet.
You can use sed:
sed 's/\(\...\).*/\1/'
It means "remembering a dot and two following characters, replace them and everything that follows with the remembered part".
How about using floating point logic?
awk '{printf("%.02f\n",$0)}' Input_file
awk '{print substr($0,1,7)}' file
3434.58
3434.58
3434.58

How to enumerate a one column csv file using bash?

I have a list like this
6.53143.S
6.47643.S
6.53161.S
dots are just for presentation
some bash scripting
6.53143.S
6.47643.S
6.53161.s
Try this :
awk '{print NR, $0}' file
If your data actually looks like this:
- 6.53143.S
- 6.47643.S
- 6.53161.S
use:
$ awk '$1=NR' file
1 6.53143.S
2 6.47643.S
3 6.53161.S
In case you only want to print the line numbers along with lines then use simple cat for the same.
cat -n Input_file

Replacing newlines with commas at every third occurrence using AWK?

For example: a given file has the following lines:
1
alpha
beta
2
charlie
delta
10
text
test
I'm trying to get the following output using awk:
1,alpha,beta
2,charlie,delta
10,text,test
Fairly simple. Use the output record separator as follows. Specify the comma delimiter when the line number is not divisible by 3 and the newline otherwise:
awk 'ORS=NR%3?",":"\n"' file
awk can handle this easily by manipulating ORS:
awk '{ORS=","} !(NR%3){ORS="\n"} 1' file
1,alpha,beta
2,charlie,delta
10,text,test
there is a tool for this kind of text processing pr
$ pr -3ats, file
1,alpha,beta
2,charlie,delta
10,text,test
You can also use xargs with sed to coalesce multiple lines into single lines, useful to know:
cat file|xargs -n3|sed 's/ /,/g'

How to use shell to solve the scripts and about file?

I have a question:
file:
154891
145690
165211
190189
135901
290134
I want to output like this: (Every three uid separated by comma)
154891,145690,165211
190189,135901,290134
How can I do it?
You can use pr:
pr -3 -s, -l 1
Print in 3 columns, with commas as separators, with a 'page length' of 1.
154891,145690,165211
190189,135901,290134
sed ':1;N;s/\n/,/;0~3b;t1' file
or
awk 'ORS=NR%3?",":"\n"' file
There could be many ways to do that, pick one you like, with/out comma ",":
$ awk '{printf "%s%s",$0,(NR%3?",":RS)}' file
154891,145690,165211
190189,135901,290134
$ xargs -n3 -a file
154891 145690 165211
190189 135901 290134

Remove a line from a csv file bash, sed, bash

I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv

Resources