Retaining n duplicate entries in a file - bash

This feels like a Stack Overflow question that folks have already answered but I can't find an appropriate thread.
there are tools to sort a file to retain unique entries.
there are tools to sort a file to retain only duplicate entries.
But... what if I want to specify to keep the first Nth entries of duplicate values, and then discard the rest?
For example, in the list below, I'd like to be able to print out up to the Nth number of duplicates for the first field. Here's the original list:
apple toledo
apple omaha
apple butte
apple freeport
peach saginaw
peach frakenmuth
pears wichita
So, for example, the standard uniq way of doing things could generate (on an unsorted list):
apple toledo
peach saginaw
pears wichita
But I might want to keep up to 2 duplicate entries:
apple toledo
apple omaha
peach saginaw
peach frakenmuth
pears wichita
Or if I was feeling crazy, even 3 duplicate entries:
apple toledo
apple omaha
apple butte
peach saginaw
peach frakenmuth
pears wichita
Is there a sensible way to go about doing this in bash?

How about:
cat list | awk 'count[$1]++ < 2'
You can change the number 2 to 3 or whatever.

Related

Google Sheets: using Query in ArraryFormula

I have a sheet to apply Query function to get the respective search data row by row. But I need to apply ArrayFormula to automate this search process. I want to know how should I do.
Expected Result
Check phrase Result 1 Result 2 Result 3 Result 4
Apple Apple Ice Apple Custard apple/Sugar apple/Sweetsop Rose apple/Water apple
berry Cape gooseberry/Inca berry/Physalis
man Mango Mangosteen
mom
fruit Dragon fruit Egg fruit Passion fruit Black sapote/Chocolate pudding fruit
j Jackfruit Jujube Jenipapo
nake Snake fruit/Salak
me Horned Melon Honeydew melon Medlar fruit Mouse melon
Currently
Check phrase Result 1 Result 2 Result 3 Result 4
Apple Apple Ice Apple
berry Apple Ice Apple
man Apple Ice Apple
mom Apple Ice Apple
fruit Apple Ice Apple
j Apple Ice Apple
nake Apple Ice Apple
me Apple Ice Apple
What I currently achieve is for single row using this:
=IF(LEN(F2:F)=0, IFERROR(1/0), IF(LEN(F2:F)>0, Query(TRANSPOSE(QUERY(Fruits!B:B, "select B where B contains '" & F2:F & "'")),"select * limit 12")))
How should I do. Please advise me. I attach my file link here.
[My Google Sheet file]
(https://docs.google.com/spreadsheets/d/1QDfruKtwJjmRQWqTlO3sBM-e9vp9QKwmla23ss0U1sY/edit#gid=1411907513)
use:
=ARRAY_CONSTRAIN(LAMBDA(a, b, BYROW(a, LAMBDA(x,
TRANSPOSE(IFNA(FILTER(b, SEARCH(IF(x="", "×", x), b)))))))
(F2:INDEX(F:F, MAX(ROW(F:F)*(F:F<>""))), Fruits!B2:B), 9^9, 12)
=LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(FRUIT,
TRANSPOSE(FILTER(FRUITS,REGEXMATCH(FRUITS,FRUIT)))
))
)(QUERY({Current!F2:F},"WHERE Col1 IS NOT NULL"),QUERY({Fruits!B:B},"WHERE Col1 IS NOT NULL"))
Put this formula into G2, the result should be same as this image.
What we are doing here is...
use QUERY to get rid of blanks in range Current!F2:F, name the array as PHRASES with LAMBDA.
use QUERY to get rid of blanks in range Fruits!B:B, name the array as FRUITS with LAMBDA.
use BYROW to work on the single column array FRUITS value by value, with...
LAMBDA inside BYROW to name the value of each ROW as FRUIT,
use FILTER to filter the array FRUITS,
use REGEXMATCH to set the condition of the filter funciton in step.5, which returns TRUE for string matches,
TRANSPOSE the result of each filter to met your display format.
The filter can also be replaced by another QUERY function if you want, outputs should be identical in this case.
=LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(FRUIT,
TRANSPOSE(QUERY(FRUITS,"WHERE Col1 CONTAINS '"&FRUIT&"'"))
))
)(QUERY({Current!F2:F},"WHERE Col1 IS NOT NULL"),QUERY({Fruits!B:B},"WHERE Col1 IS NOT NULL"))
According to you request in comments, this is the updated code:
to make it case insensitive, apply UPPER() to Col1 and FRUIT inside the transposed query,
to show blank instead of #N/A when there is no output on that row, apply IFNA() to the whole QUERY() inside the TRANSPOSE(),
to limit the length of the output array, warp up the TRANSPOSE() with ARRAY_CONSTRAIN().
=LAMBDA(NOTNULL,LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(PHRASE,
ARRAY_CONSTRAIN(
TRANSPOSE(IFNA(
QUERY(FRUITS,"WHERE UPPER(Col1) CONTAINS '"&UPPER(PHRASE)&"'"),
"")),
1,12)
))
)(QUERY({Current!F2:F},NOTNULL),QUERY({Fruits!B:B},NOTNULL)))("WHERE Col1 IS NOT NULL")
The code will leave an empty row if there is no match found, which is required in your comment * Show blank if no valid return. (instead of #N/A),
What do you means When there is no phrase match, that row skipped?
It won't in my test environment.
But if you mean when you leave some part of the 'check phrase' column empty, it does break the calculation, because this case is never mentioned, that you may have blanks in the check phrase column, so I simply didn't handle it.
And if that is the case, you should always include such conditions into the sample data you provide at the very begining, otherwise this is another issue, and maybe better to open another question to ask about a solution after you trying to work it out on your own.
Anyway, this is a quick solution if you need to handle blanks in Check phrase column:
=LAMBDA(NOTNULL,LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(PHRASE,
ARRAY_CONSTRAIN(TRANSPOSE(IFNA(IF(PHRASE="","",QUERY(FRUITS,"WHERE UPPER(Col1) CONTAINS '"&UPPER(PHRASE)&"'")),"")),1,12)
))
)({Current!F2:F},QUERY({Fruits!B:B},NOTNULL)))("WHERE Col1 IS NOT NULL")
The reason why the output result shifts upward when there are blanks in 'Check phrase' column, is because, as I said, I uses QUERY to get rid of extra blanks of the 2 source data, this helps speed things up a bit, but if there are blanks between array values, they will also be removed, which lead to the reference array being shortened.
To handle this issue, the easiest slove is, instead of removing the blanks, leave them there, and inside IFNA(), wherever encountering empty PHRASE, use a IF() to skip it by doing nothing, which result in leaving a blank row.

Bash swap array location

I need to swap the order in a string array.
Not sure can be done by using switching index method or any suggestions are welcomed.
Example:
default=("apple" "banana" "mango" "orange" "peach")
default output will be:
apple
banana
mango
orange
peach
the output I need:
orange <--- switch with apple
banana
mango
apple <--- switch with orange
peach
Thanks in advance
Save "${default[0]}" and "${default[3]}" to two separate variables and update the relevant indexes with the values of those variables swapped around.
(Since this looks like an assignment and you haven't included your code I've responded with high level instructions.)
This will save the two values in a variable and assign them to the other element of the array as somebody has stated before.
firstValue=${default[0]}
fourthValue=${default[3]}
default[3]=$firstValue
default[0]=$fourthValue

Saving data in a command-line game, no database [Ruby]

I'm making a simple command line game with Ruby and I'm having trouble saving some information without a database/HTTP dynamic.
For example, let's say I have to make a sandwich (in the game). I am presented with an array of ingredients to choose from, like so:
[1] Carrot
[2] Banana
[3] Cheese
[4] Tomato
I cannot hardcode a direct correspondence between number and ingredient because, before that, I was forbidden to use a couple of ingredients, at random (so the complete ingredients array is two items longer). And I don't want to present a list numbered like [1] [2] [4] [6] because it gets confusing.
What I'm doing right now is hardcoding a direct correspondence between a letter and an item, so for Banana press B, for Cheese press C and so on. But it's less than ideal, particularly because this is a pattern used throughout the game, and in some contexts it will get very inconvenient, both for me and the player.
So, is there a better way for me to do this? How can I associate an input with an item of a list that is generated randomly, and also save that information for further use down the line)?
Here's how I solved it:
Mario Zannone's comment made me realize I could use the index of the array elements as an id, whereas I had been looking at the whole thing as if it was sort of just text.
So here's the code I came up with to take advantage of that:
(0...#ingredients.length).each do |i|
puts "[#{i+1}] #{#ingredients[i]}"
end
That way I now have a direct correspondence between element and input with:
choice = gets.chomp.to_i - 1
#selected_ingredient = #ingredients[choice]

Sort after sort while keeping first sort?

So I use sort to sort it via my first columm in vim.
apple bear
apple zoo
apple bar
banana hockey
banana football
But then, I want it to sort it on the second column, that it becomes this:
apple bar
apple bear
apple zoo
banana football
banana hockey
Any ideas how I can achieve this in vim?
First move to the start of the file:
gg
Then use 'sort -k 2' to sort the buffer:
!Gsort -k 2<ENTER>
The following worked for me, using Vim's built in sort function:
:sort! r/ /|sort
This works even if not sorted to begin with.

Merging lists together UNIX Bash

For example I have these 2 lists:
List A
Dog Tamal301*
Iguana Tamal345
Cat Tamal405
Snake Tamal408*
Cocodrile Tamal420
Bird Tamal467*
Parrot Tamal578*
List B
Tamal301* Orchid
Tamal320 Daisy
Tamal408* Poinsettia
Tamal467* Tulip
Tamal490 Rose
Tamal578* Chrysanthemums
(the * is just to emphazise that there are coincidences, it shouldn't be there)
I want to merge together list A and B, with only the matches.
Like this:
Dog Tamal301 Orchid
Snake Tamal408 Poinsettia
Bird Tamal467 Tulip
Parrot Tamal578 Chrysanthemums
I have a method to do it, but it is stupid, greps and for loops.
So I want to know if there is a better way to do it.
Thanks guys =D
Check the manual for the join command: man 1 join

Resources