Remove duplicate lines on two columns on a .csv [closed] - bash

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I'd like to find duplicates on a csv file over bash with a pipe as field separator.
Let's take an example :
Input:
W14|E75
Z20|K60
R59|R59
K60|O74
A08|M10
Expected output :
Z20|K60
R59|R59
K60|O74
Else other expected output :
Z20|K60
R59|R59
I mean when the expression already exist in the first column, just keep it, the same with the second column, else I can accept to keep only the first line.
What I tried is :
awk -F "|" 'FNR==NR { x[$1,$2]++; next } x[$1,$2] > 1' file.csv file.csv
I think about using a grep but i'm not quiet sure how to do it.
Sorry for bad english and thank you in advance

I think based on the output, you want the non unique entries regardless of their position in the lines
$ awk -F'|' 'NR==FNR{a[$1]++;a[$2]++;next} a[$1]*a[$2]>1' file{,}
should give you your first output.

Related

List folder contents and only show unique value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Is there a way to use the 'ls' command then then pipe the output to only show the unique item?
Here is the example:
ls /dev/disk/by-id
ata-ST500DM002-1BD142_S2AFE0JP
ata-ST500DM002-1BD142_W2AEDMQK
ata-ST500DM002-1BD142_W2AEDMQK-part1
ata-ST500DM002-1BD142_W2AEDMQK-part2
ata-ST500DM002-1BD142_W2AEDMQK-part3
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804-part1
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804-part9
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EKK60289
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EKK60289-part1
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EKK60289-part9
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491-part1
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491-part9
ata-WDC_WD5003AZEX-00MK2A0_WD-WCC3F2HAD1XC
ata-WDC_WD5003AZEX-00MK2A0_WD-WCC3F2HAD1XC-part1
ata-WDC_WD5003AZEX-00MK2A0_WD-WCC3F2HAD1XC-part9
As you can see in the output all but one of the items has the serial number and a -part# to it, but the serial ontop:
'ata-ST500DM002-1BD142_S2AFE0JP'
Does not. What I am trying to do is get the output to only show me the resits that do not have any other duplicate serial entries.The out out would be just the unique serial number.
ata-ST500DM002-1BD142_S2AFE0JP
Thank you.
Assuming these entries don't contain newlines, you may use:
printf '%s\n' /dev/disk/by-id/* |
awk '!/-part[0-9]*$/{arr[$0]; next} { sub(/-[^-]+$/, ""); delete arr[$0] }
END { for (i in arr) print i }'
ata-ST500DM002-1BD142_S2AFE0JP
Use grep with a regular expression : ls | grep -Ev 'part.*$'
$ ls test
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804 ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804-part9 ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491-part1
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804-part1 ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491 ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491-part9
$ ls test | grep -Ev 'part.*$'
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EEV65804
ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2ET092491

Get contents from file into bash variables [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to get contents from a text file into seperate bash variables but I can't get my head around it.
In total 3 variables need to be fetched from the 2nd line.
;File created: 20200727
user.details 184 16 John Smith
Output needs to be:
USERID="184"
GROUPID="16"
FULLNAME="John Smith"
Any ideas? I've tried to seperate it via grep, but as values change thats not an option. Same to awk. Problem I am struggling with is that the last variable can be of any length including spaces, which I previously used as delimiter. The source file can't be changed unfortunately.
Like this with awk:
awk '/user\.details/ {
print "USERID=\""$2"\""
print "GROUPID=\""$3"\""
print "FULLNAME=\""substr($0, index($0,$4))"\""}' file.txt
With bash:
while read -r foo USERID GROUPID FULLNAME; do [[ $foo != ";File" ]] && echo "$USERID $GROUPID $FULLNAME"; done < file
Output:
184 16 John Smith
Variable $FULLNAME contains everything from the third column onwards.

Replace and remove characters in string, and add output as new column [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have an output from a program that I would like to process and If I pipe it to a file I get:
file/path#backup2018
file2/path/more/path/path#backup2019
file3/path#backup2017
And I want to process it so it looks like this:
file/path file.path
file2/path/more/path/path file.path.more.path.path
file3/path file.path
I have figured out how to make it with separate commands but would like a one liner.
$ awk -F# '{s=$1; gsub("/", ".", s); print $1, s}' file | column -t
file/path file.path
file2/path/more/path/path file2.path.more.path.path
file3/path file3.path
using sed
sed 's/\([^#]*\)#.*/\1 \1/g' file|column -t

How can I remove digits from these strings? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a text file containing a few string values:
PIZZA_123
CHEESE_PIZZA_785
CHEESE_PANEER_PIZZA_256
I need to remove the numeric values in these values and need the following output. The tricky part for me is that these numeric values are random every time. I need to remove these numeric values and write the string values alone to a file.
CHEESE_PIZZA
CHEESE_PANEER_PIZZA
What is an easy way to do this?
sed 's/_[0-9]*$//' file > file2
Will do it.
There's more than one way to do it. For example, since the numbers always seem to be in the last field, we can just cut off the last field with a little help from the rev util. Suppose the input is pizza.txt:
rev pizza.txt | cut -d _ -f 2- | rev
Since this uses two utils and two pipes, it's not more efficient than sed. The sole advantage for students is that regex isn't necessary -- the only text needed is the _ as a field separator.
You can use the below script for this.
#!/bin/bash
V1=PIZZA_123
V2=CHEESE_PIZZA_785
V3=CHEESE_PANEER_PIZZA_256
IFS=0123456789
echo $V1>tem.txt
echo $V2>>tem.txt
echo $V3>>tem.txt
echo "here are the values:"
sed 's/...$//' tem.txt
rm -rf tem.txt

Bash: Grep Line Numbers to Correspond to AWK NR [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I suspect I am going around this the long way but please bear with me I am new to Bash, grep and awk ...
The summary of my problem is that line numbers in grep do no correspond to the actual line numbers in a file. From what I gather empty lines are discarded in the line numbering. I would prefer not to iterate through every line in a file to ensure 100% coverage.
What I am trying to do is grab a segment of lines from a file and process them using grep and awk
The grep call gets a list of line numbers since there could be more than one instance of a 'starting position' in a file:
startLnPOSs=($(cat "$c"| grep -e '^[^#]' | grep --label=root -e '^[[:space:]]start-pattern' -n -T | awk '{print $1}'
Then using awk I iterate from a starting point until an 'end' token is encountered.
declarations=($(cat "$c" | awk "$startLnPos;/end-pattern/{exit}" ))
To me this looks a bit like an xy problem as you are showing us what you are doing to solve a problem but not actually outlining the problem.
So on a guess I am thinking you want to return all the items between the start/end patterns to your array (which may also be erroneous, but again we do not know the overall picture).
So what you could do is:
declarations=($(awk '/start-pattern/,/end-pattern/' "$c"))
Or with sed (exactly the same):
declarations=($(sed -n '/start-pattern/,/end-pattern/p' "$c"))
Depending if you want those actual lines included or not the commands may need to be altered a little.
Was this the kind of thing you were looking to do??

Resources