How to extract lines containing unique text in a column

How to extract lines containing unique text in a column - shell

I have a text file similar to
"3"|"0001"
"1"|"0003"
"1"|"0001"
"2"|"0001"
"1"|"0002"
i.e. a pipe-delimited text file containing quoted strings.
What I need to do is:
First, extract the first line which contains each value in the first column, producing
"3"|"0001"
"1"|"0003"
"2"|"0001"
Then, sort by the values in the first column, producing
"1"|"0003"
"2"|"0001"
"3"|"0001"
Performing the sort is easy - sort -k 1,1 -t \| - but I'm stuck on extracting the first line in the file which contains each value in the first column. I thought of using uniq but it doesn't do what I want, and it's "column-handling" abilities are limited to ignoring the first 'x' columns of space-or-tab delimited text.
Using the Posix shell (/usr/bin/sh) under HP-UX.
I'm kind of drawing a blank here. Any suggestions welcomed.

you can do:
awk -F'|' '!a[$1]++' file|sort...
The awk part will remove the duplicated lines, only leave the first occurrence.
I don't have a HP-unix box, I therefore cannot do real test. But I think it should go...

Related

Bash: Sort file numerically, but only where the first field matches a pattern

Due to poor past naming practices, I'm left with a list of names that is proving to be a challenge to work with. The bottom line is that I want the most current name (by date) to be placed in a variable. All the names are listed (unsorted) in a file called bar.txt.
In this case I can't rename, and there's no way to get the actual dates of the images; these names are all I have to go on. The names can follow one of several patterns;
foo
YYYYMMDD-foo
YYYYMMDD##-foo
foo can be anything from a single character to a long string of letters/numbers/symbols. I am interested only in the names matching the second use case, YYMMDD-foo, as those are from after we started tagging consistently.
I would like to end up with a variable containing the most recent date that follows the pattern YYMMDD-foo.
I know sort -k1 -n < bar.txt, but then I'm not sure how to isolate the second pattern's results to extract what I need.
How do I sort the file to ignore anything but the second pattern, and return the most current date?
Sample
Given that bar.txt looks like this;
test
2017120901-develop-BUILD-31
20170326-TEST-1.2.0
20170406-BUILD-40-1.2.0-test
2010818_001
I would want to extract 20170406-BUILD-40-1.2.0-test

Since your requirement involves 1) to get only files of a certain format 2) apply sorting and get only the latest file. Am using a Awk & GNU sort together to achieve it
awk -F'-' 'length($1) == 8' file | sort -nrk1 | head -1
20170406-BUILD-40-1.2.0-test
The solution works by only getting those lines in the file whose first column has 8 characters exactly corresponding to YYYYMMDD alignment. Once those filtered, sort applied on first field and the first line is obtained using head.

Extracting data from text file after 2 conditions have been met

I'm working on a bash script at the moment which extracts data from a text file called carslist.txt, which each car (and its corresponding characteristics) being on separate lines. I've been able to extract and save data from the text file after it's met a single condition (below for example) but I can't figure out how to do it for two conditions.
Single condition example:
grep 'Vauxhall' $CARFILE > output/Vauxhall_Cars.txt
output:
Vauxhall:Vectra:1999:White:2
Vauxhall:Corsa:1999:White:5
Vauxhall:Cavalier:1995:White:2
Vauxhall:Nova:1994:Black:8
From the examples above, how would I extract data if I wanted the conditions Vauxhall and White to be met before extracting them?
the grep example above asks for Vauxhall to be met before pulling and saving the data, but I have no idea how to do it for 2. I've tried pipelining the command as Vauxhall | White but after that I was out of ideas.
Thanks in advance.

I would recommend to use awk, like this:
awk -F: '$1=="Vauxhall" && $4=="White"' input.file
As I'm using : as the field separator, I simply need to check the values of field 1 and 4.

Unix Shell Programming

Have an input file with 200 lines, each line just one field which is a number.
E.g.
89970060122507635800
I need to create one output file in a way that it will look like for every input line like following:
INSERT,89970060122507635800,425062250763580,,0000,29514215,0000,29514215,,,,NORMAL,425062260621583,Blank,sim,9877
where:
All the fields have constant value (including empty values within commas) except the Second and the Third one
Second field is filled by input file, the third one is obtained by removing last digit from the second field and replace at the beginning 899700601 with 42506 (as in the example).
I'm sure I can find ways how to do that (and I will try before getting answers) but I'm more interested in knowing which could be the more efficient in your opinion. Awk, sed, a shell script using both?

This will replace the beginning "123" from the input with "AAA" and trim the last digit for the third field.
awk -v OFS="," '{$2=substr($1,1,length($1)-1); gsub(/^123/,"AAA",$1); print "bla bla bla",$1,$2,"bla bla bla"}'
replace the magic values and add the proper template for the print statement.

Sorting lines with vim by lines chunk

Can I sort lines in vim depending on a part of line and not the complete line?
e.g
My Name is Deus Deceit
I would like to sort depending on the column that the name starts + 6 columns
for example
sort by column 19-25 and vim will only check those characters for sorting.
If it can be done without a plugin that would be great. ty

Check out :help :sort. The command takes an options {pattern} whose matched text is skipped (i.e. sorting happens after the match.
For example, to sort by column 19+ (see :help /\%c and the related regexp atoms):
:sort /.*\%19c/

Sorting a text file & removing duplicates

I have a large text file with 4-digit codes and some information about them in every row. It looks something like this:
3456 information
1234 info
2222 Some ohter info
I need to sort this file, so the codes are in ascending order in the file. Also, some codes appear more than once, so I need to remove duplicates. Can I do this with perl, awk or some other scripting language?
Thanks in advance,
-skazhy

sort happybirthday.txt | uniq
From IBM.
1st result for Google: unix remove duplicate lines.

You can create a hash then read the file in line by line and for each line
split at the first space
check if the val(0), the number that you just split, is in the hash
if not the insert the val(1), rest of the line, into the hash with a key val(0)
continue
Then print the (sorted) hash to the file.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to extract lines containing unique text in a column - shell

you can do: awk -F'|' '!a[$1]++' file|sort... The awk part will remove the duplicated lines, only leave the first occurrence. I don't have a HP-unix box, I therefore cannot do real test. But I think it should go...

Related

Bash: Sort file numerically, but only where the first field matches a pattern

Extracting data from text file after 2 conditions have been met

Unix Shell Programming

Sorting lines with vim by lines chunk

Sorting a text file & removing duplicates

Categories

Resources