Checking for duplicate values on a line [duplicate]

Checking for duplicate values on a line [duplicate] - bash

This question already has answers here:
Bash simplified sudoku
(2 answers)
Closed 7 years ago.
So I am making a script to check whether a Sudoku is correct or not.
The task is to check for duplicate numbers in the lines and columns, now for the columns the job was fairly easy using awk with sort and uniq. But to check for the lines is another thing..Since I can not use sort and the likes of that, keep in mind that my sed knowledge is pretty much nothing.
Input:
5 3 4 6 7 8 9 1 2
6 7 2 1 9 5 3 4 8
1 3 8 3 4 2 5 6 7
8 5 9 7 6 1 4 2 3
4 2 6 8 5 3 7 9 1
7 1 3 9 2 4 8 5 6
9 6 1 1 3 7 2 8 4
2 8 7 4 1 9 6 3 5
3 4 5 2 8 6 1 7 8
Now on some of the lines there are duplicate numbers, and I want the output to tell something like
line 1 : good
line 2 : good
line 3 : bad
line 4 : good
line 5 : good
line 6 : good
line 7 : bad
line 8 : good
line 9 : bad

Assuming you've used awk for checking the columns, and that each column just have numbers from 1 to 9, you can do the check also with awk with something like this:
{ delete arr;
okline = 1;
for (i = 1; i <= NF; ++i) {
if (arr[$i] == 1) {
okline = 0;
break;
}
arr[$i] = 1;
}
if (okline == 1)
{
print "Line: OK";
}
else
{
print "Line: error";
}
}
It does not count the lines, but you can get the idea.

Related

How to remove variables in large dataset i Rstudio?

I'm trying to permanently remove some of the variables in the dataframe 'd' individually, as they are no longer useful.
New to Rstudio and coding. Using Rstudio, version 0.99.491 on Windows. I am using a secure server, so downloading packages are not an option.
I have a very large dataset 'd' containing 122 variables of ~450.000 rows.
I'm using a Danish version of the program, so the error messages have been translated by me and might be incorrect.
I have tried:
Option 1:
> rm (d$variable121)
Error in rm(d$variable121):... must contain name or character strings
Option 2:
> rm('d$variable121')
Warning meaasage:
in rm('d$variable121'): object 'd$variable121' not found
Option 3:
> rm (list=c('d$variable121', 'd$variable122'))
Warning messages:
1: in rm (list=c('d$variable121', 'd$variable122')) object 'variable 121' not found.
2: in rm (list=c('d$variable121', 'd$variable122')) object 'variable 122' not found.
I'm able to remove other dataframes, but not any variable in the 'd' dataframe.
Does anyone know how to do this?

Suppose your data frame is called d with four columns and you want to delete variables with names var1 and var3. You can do
> d <- data.frame(var1=1:10, var2=2:11, var3=3:12, var4=4:13)
> d
var1 var2 var3 var4
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
6 6 7 8 9
7 7 8 9 10
8 8 9 10 11
9 9 10 11 12
10 10 11 12 13
> dropped <- c("var1", "var3")
> d[, !(names(d) %in% dropped)]
var2 var4
1 2 4
2 3 5
3 4 6
4 5 7
5 6 8
6 7 9
7 8 10
8 9 11
9 10 12
10 11 13

Pandas Sort Order of Columns by Row

Given the following data frame:
df = pd.DataFrame({'A' : ['1','2','3','7'],
'B' : [7,6,5,4],
'C' : [5,6,7,1],
'D' : [1,9,9,8]})
df=df.set_index('A')
df
B C D
A
1 7 5 1
2 6 6 9
3 5 7 9
7 4 1 8
I want to sort the order of the columns descendingly on the bottom row like this:
D B C
A
1 1 7 5
2 9 6 6
3 9 5 7
7 8 4 1
Thanks in advance!

Easiest way is to take the transpose, then sort_values, then transpose back.
df.T.sort_values('7', ascending=False).T
or
df.T.sort_values(df.index[-1], ascending=False).T
Gives:
D B C
A
1 1 7 5
2 9 6 6
3 9 5 7
7 8 4 1
Testing
my solution
%%timeit
df.T.sort_values(df.index[-1], ascending=False).T
1000 loops, best of 3: 444 µs per loop
alternative solution
%%timeit
df[[c for c in sorted(list(df.columns), key=df.iloc[-1].get, reverse=True)]]
1000 loops, best of 3: 525 µs per loop

You can use sort_values (by the index position of your row) with axis=1:
df.sort_values(by=df.index[-1],axis=1,inplace=True)

Here is a variation that does not involve transposition:
df = df[[c for c in sorted(list(df.columns), key=df.iloc[-1].get, reverse=True)]]

Sort on specific columns, output only one of those identical but having the highest number in another column

I have records like these:
1 4 6 4 2 4 8
2 3 5 4 6 7 1
5 4 6 4 3 8 4
1 4 6 4 5 7 1
5 7 3 3 3 6 3
6 7 3 3 4 8 4
I want to sort them on columns 2,3,4, and 6 and keep just one of those identical in column 2,3,4 and having the biggest number in column 6 such as:
1 4 6 4 5 7 1
2 3 5 4 6 7 1
5 4 6 4 3 8 4
5 7 3 3 3 6 3
6 7 3 3 4 8 4
I have tried all kinds of combinations between sort and uniq but everything fails because uniq cannot be applied onto a specific column. The only thing I came up with is to change the order of the columns as to first sort as above then move records 2,3,and 4 to the end and then run uniq with -w as to focus only on the last 3 records. This seems quite inefficient to me.
Thanks for help!

You can achieve this with two passes of sort(assuming in the first place I understand your requirement correctly, seeing that the desired data snippet posted above does not match your description of it) . The first pass sorts by field 2 through 4 ascending and field 6 descending, the second pass sorts on fields 2 through 4 only but passing in the "stable sort" and unique flags in addition to pick out those rows for each combination of fields 2-4 that have the highest value from field 6
sort -k2,4n -k6,6nr file.txt | sort -k2,4n -s -u
2 3 5 4 6 7 1
5 4 6 4 3 8 4
6 7 3 3 4 8 4

Solving a recreational square packing problem

I was asked to find a 11x11-grid containing the digits such that one can read the squares of 1,...,100. Here read means that you fix the starting position and direction (8 possibilities) and if you can find for example the digits 1,0,0,0,0,4 consecutively, you have found the squares of 1, 2, 10, 100 and 20. I made a program (the algorithm is not my own. I modified slightly a program which uses best-first search to find a solution but it is too slow. Does anyone know a better algorithm to solve the problem?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <vector>
#include <algorithm>
using namespace std;
int val[21][21];//number which is present on position
int vnum[21][21];//number of times the position is used - useful if you want to backtrack
//5 unit borders
int mx[4]={-1,0,1,0};//movement arrays
int my[4]={0,-1,0,1};
int check(int x,int y,int v,int m)//check if you can place number - if you can, return number of overlaps
{
int c=1;
while(v)//extract digits one by one
{
if(vnum[x][y] && (v%10)!=val[x][y])
return 0;
if(vnum[x][y])
c++;
v/=10;
x+=mx[m];
y+=my[m];
}
return c;
}
void apply(int x,int y,int v,int m)//place number - no sanity checks
{
while(v)//extract digits one by one
{
val[x][y]=v%10;
vnum[x][y]++;
v/=10;
x+=mx[m];
y+=my[m];
}
}
void deapply(int x,int y,int v,int m)//remove number - no sanity checks
{
while(v)
{
vnum[x][y]--;
v/=10;
x+=mx[m];
y+=my[m];
}
}
int best=100;
void recur(int num)//go down a semi-random path
{
if(num<best)
{
best=num;
if(best)
printf("FAILED AT %d\n",best);
else
printf("SUCCESS\n");
for(int x=5;x<16;x++) // 16 and 16
{
for(int y=5;y<16;y++)
{
if(vnum[x][y]==0)
putchar('.');
else
putchar(val[x][y]+'0');
}
putchar('\n');
}
fflush(stdout);
}
if(num==0)
return;
int s=num*num,t;
vector<int> poss;
for(int x=5;x<16;x++)
for(int y=5;y<16;y++)
for(int m=0;m<4;m++)
if(t=check(x,y,s,m))
poss.push_back((x)|(y<<8)|(m<<16)|(t<<24));//compress four numbers into an int
if(poss.size()==0)
return;
sort(poss.begin(),poss.end());//essentially sorting by t
t=poss.size()-1;
while(t>=0 && (poss[t]>>24)==(poss.back()>>24))
t--;
t++;
//t is now equal to the smallest index which has the maximal overlap
t=poss[rand()%(poss.size()-t)+t];//select random index>=t
apply(t%256,(t>>8)%256,s,(t>>16)%256);//extract random number
recur(num-1);//continue down path
}
int main()
{
srand((unsigned)time(0));//seed
while(true)
{
for(int i=0;i<21;i++)//reset board
{
memset(val[i],-1,21*sizeof(int));
memset(vnum[i],-1,21*sizeof(int));
}
for(int i=5;i<16;i++)
{
memset(val[i]+5,0,11*sizeof(int));
memset(vnum[i]+5,0,11*sizeof(int));
}
recur(100);
}
}

Using a random search so far I only got to 92 squares with one unused spot (8 missing numbers: 5041 9025 289 10000 4356 8464 3364 3249)
1 5 2 1 2 9 7 5 6 9 5
6 1 0 8 9 3 8 4 4 1 2
9 7 2 2 5 0 0 4 8 8 2
1 6 5 9 6 0 4 4 7 7 4
4 4 2 7 6 1 2 9 0 2 2
2 9 6 1 7 8 4 4 0 9 3
6 5 5 3 2 6 0 1 4 0 6
4 7 6 1 8 1 1 8 2 8 1
8 0 1 3 4 8 1 5 3 2 9
0 5 9 6 9 8 8 6 7 4 5
6 6 2 9 1 7 3 9 6 9
The algorithm basically uses as solution encoding a permutation on the input (search space is 100!) and then places each number in the "topmost" legal position. The solution value is measured as the sum of the squares of the lengths of the numbers placed (to give more importance to long numbers) and the number of "holes" remaining (IMO increasing the number of holes should raise the likehood that another number will fit in).
The code has not been optimized at all and is only able to decode a few hundred solutions per second. Current solution has been found after 196k attempts.
UPDATE
Current best solution with this approach is 93 without free holes (7 missing numbers: 676 7225 3481 10000 3364 7744 5776):
9 6 0 4 8 1 0 0 9 3 6
6 4 0 0 2 2 5 6 8 8 9
1 7 2 9 4 1 5 4 7 6 3
5 8 2 3 8 6 4 9 6 5 7
2 4 4 4 1 8 2 8 2 7 2
1 0 8 9 9 1 3 4 4 9 1
2 1 2 9 6 1 0 6 2 4 1
2 3 5 5 3 9 9 4 0 9 6
5 0 0 6 1 0 3 5 2 0 3
2 7 0 4 2 2 5 2 8 0 9
9 8 2 2 6 5 3 4 7 6 1
This is a solution (all 100 numbers placed) however using a 12x12 grid (MUCH easier)
9 4 6 8 7 7 4 4 5 5 1 7
8 3 0 5 5 9 2 9 6 7 6 4
4 4 8 3 6 2 6 0 1 7 8 4
4 8 4 2 9 1 4 0 5 6 1 4
9 1 6 9 4 8 1 5 4 2 0 1
9 4 4 7 2 2 5 2 2 5 0 0
4 6 2 2 5 8 4 2 7 4 0 2
0 3 3 3 6 4 0 0 6 3 0 9
9 8 0 1 2 1 7 9 5 5 9 1
6 8 4 2 3 5 2 6 3 2 0 6
9 9 8 2 5 2 9 9 4 2 2 7
1 1 5 6 6 1 9 3 6 1 5 4
It has been found using a truly "brute force" approach, starting from a random matrix and keeping randomly changing digits when that improved the coverage.
This solution has been found by an highly unrolled C++ program automatically generated by a Python script.
Update 2
Using an incremental approach (i.e. keeping a more complex data structure so that when changing a matrix element the number of targets covered can be updated instead than recomputed) I got a much faster search (about 15k matrices/second investigated with a Python implementation running with PyPy).
In a few minutes this version was able to find a 99 quasi-solution (a number is still missing):
7 0 5 6 5 1 1 5 7 1 6
4 6 3 3 9 8 8 6 7 6 1
3 9 0 8 2 6 1 1 4 7 8
1 1 0 8 9 9 0 0 4 4 6
3 4 9 0 4 9 0 4 6 7 1
6 4 4 6 8 6 3 2 5 2 9
9 7 8 4 1 1 4 0 5 4 2
6 2 4 1 5 2 2 1 2 9 7
9 8 2 5 2 2 7 3 6 5 0
3 1 2 5 0 0 6 3 0 5 4
7 5 6 9 2 1 6 5 3 4 6
UPDATE 3
Ok. After a some time (no idea how much) the same Python program actually found a complete solution (several ones indeed)... here is one
6 4 6 9 4 1 2 9 7 3 6
9 2 7 7 4 4 8 1 2 1 7
1 0 6 2 7 0 4 4 8 3 4
2 1 2 2 5 5 9 2 9 6 5
9 2 5 5 2 0 2 6 3 9 1
1 6 3 6 0 0 9 3 7 0 6
6 0 0 4 9 0 1 6 0 0 4
9 8 4 4 8 0 1 4 5 2 3
2 4 8 2 8 1 6 8 6 7 5
1 7 6 9 2 4 5 4 2 7 6
6 6 3 8 8 5 6 1 5 2 1
The searching program can be found here...

You've got 100 numbers and 121 cells to work with, so you'll need to be very efficient. We should try to build up the grid, so that each time we fill a cell, we attain a new number in our list.
For now, let's only worry about 68 4-digit numbers. I think a good chunk of the shorter numbers will be in our grid without any effort.
Start with a 3x3 or 4x4 set of numbers in the top-left of your grid. It can be arbitrary, or fine-tune for slightly better results. Now let's fill in the rest of the grid one square at a time.
Repeat these steps:
Fill an empty cell with a digit
Check which numbers that knocked off the list
If it didn't knock off any 4-digit numbers, try a different digit or cell
Eventually you may need to fill 2 cells or even 3 cells to achieve a new 4-digit number, but this should be uncommon, except at the end (at which point, hopefully there's a lot of empty space). Continue the process for the (few?) remaining 3-digit numbers.
There's a lot room for optimizations and tweaks, but I think this technique is fast and promising and a good starting point. If you get an answer, share it with us! :)
Update
I tried my approach and only got 87 out of the 100:
10894688943
60213136008
56252211674
61444925224
59409675697
02180334817
73260193640
.5476685202
0052034645.
...4.948156
......4671.

My guess is that both algorithms are too slow. Some optimization algorithm might work like best-first search or simulated annealing but my I don't have much experience on programming those.

Have you tried any primary research on Two-Dimensional Bin Packing (2DBP) algorithms? Google Scholars is a good start. I did this a while ago when building an application to generate mosaics.
All rectangular bin packing algorithms can be divided into 4 groups based on their support for the following constraints:
Must the resulting bin be guillotine cuttable? I.e. do you have to later slice the bin(s) in half until all the pieces are unpacked?
Can the pieces be rotated to fit into the bin? Not an issue with square pieces, so this makes more algorithms available to you.
Out of all the algorithms I looked into, the most efficient solution is an Alternate Directions (AD) algorithm with a Tabu Search optimization layer. There are dissertations which prove this. I may be able to dig-up some links if this helps.

Some ideas off the top of my head, without investing much time into thinking about details.
I would start by counting the number of occurrences of each digit in all squares 1..100. The total number of digits will be obviously larger than 121, but by analyzing individual frequencies you can deduce which digits must be grouped on a single line to form as many different squares as possible. For example, if 0 has the highest frequency, you have to try to put as many squares containing a 0 on the same line.
You could maintain a count of digits for each line, and each time you place a digit, you update the count. This lets you easily compute which square numbers have been covered by that particular line.
So, the program will still be brute-force, but it will exploit the problem structure much better.
PS: Counting digit frequencies is the easiest way to decide whether a certain permutation of digits constitutes a square.

How can I sort a 2-D array in MATLAB with respect to 2nd row?

I have array say "a"
a =
1 4 5
6 7 2
if i use function
b=sort(a)
gives ans
b =
1 4 2
6 7 5
but i want ans like
b =
5 1 4
2 6 7
mean 2nd row should be sorted but elements of ist row should remain unchanged and should be correspondent to row 2nd.

sortrows(a',2)'
Pulling this apart:
a = 1 4 5
6 7 2
a' = 1 6
4 7
5 2
sortrows(a',2) = 5 2
1 6
4 7
sortrows(a',2)' = 5 1 4
2 6 7
The key here is sortrows sorts by a specified row, all the others follow its order.

You can use the SORT function on just the second row, then use the index output to sort the whole array:
[junk,sortIndex] = sort(a(2,:));
b = a(:,sortIndex);

How about
a = [1 4 5; 6 7 2]
a =
1 4 5
6 7 2
>> [s,idx] = sort(a(2,:))
s =
2 6 7
idx =
3 1 2
>> b = a(:,idx)
b =
5 1 4
2 6 7
in other words, you use the second argument of sort to get the sort order you want, and then you apply it to the whole thing.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Checking for duplicate values on a line [duplicate] - bash

Related

How to remove variables in large dataset i Rstudio?

Pandas Sort Order of Columns by Row

Sort on specific columns, output only one of those identical but having the highest number in another column

Solving a recreational square packing problem

How can I sort a 2-D array in MATLAB with respect to 2nd row?

Categories

Resources