How to sort text files, in Python, in rows - sorting

I would like to know how to sort numbers in ascending order when outputting a text file in Python.
At the moment I can sort the names of my text file alphabetically; and I could sort my numbers in ascending order, however I want to keep the scores corresponding to the student in the same row.
import operator
class1 = open('sample.txt','r')
sort = sorted(class1,key=operator.itemgetter(0))
for eachline in sort:
print(eachline)
This is the code so far and the output is this:
Holy = 10 ,6 ,10
Jhnn = 9 ,0 ,1
Oli = 5 ,7 ,6
But how would you sort the numbers so it would look like this:
Holy = 6 ,10 ,10
Jhnn = 0 ,1 ,9
Oli = 5 ,6 ,7

Try this:
for eachline in sort:
line = map(int, eachline.split(","))
sorted_line = sorted(line)
print(" ,".join(map(str, sorted_line)))
We can split on the comma and map that to a list of integers. We then sort that list and join it back to a comma-separated string of integers.

Related

Sorting/ordering values from smallest to biggest in an array

I have a formula like this : =ArrayFormula(sort(INDEX($B$1:$B$10,MATCH(E1,$A$1:$A$10,0))))
in columns A:B:
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
j 10
and
the data to convert in E:H
a c f e
f a c b
b a c d
I get the following results using the above formula
in columns L:O:
1 3 6 5
6 1 3 2
2 1 3 4
My desired output is like this:
1 3 5 6
1 2 3 6
1 2 3 4
I'd like to arrange the numbers from smallest to biggest in value. I can do this with additional helper cells. but if possible i'd like to get the same result without any additional cells. can i get a little help please? thanks.
To sort by row, use SORT BYROW. But unfortunately, nested array results aren't supported in BYROW. So, we need to JOIN and SPLIT the resulting array.
=ARRAYFORMULA(SPLIT(BYROW(your_formula,LAMBDA(row,JOIN("🌆",SORT(TRANSPOSE(row))))),"🌆"))
Here's another way using Makearray with Index to get the current row and Small to get the smallest, next smallest etc. within the row:
=ArrayFormula(makearray(3,4,lambda(r,c,small(index(vlookup(E1:H3,A1:B10,2,false),r,0),c))))
Or you could change the order (might be a little faster) as you don't need to vlookup the entire array, just the current row:
=ArrayFormula(makearray(3,4,lambda(r,c,small(vlookup(index(E1:H3,r,0),A1:B10,2,false),c))))
It's interesting (to me at any rate) that you can interrogate the row and column number of the current cell using Map or Scan, so this is also possible:
=ArrayFormula(map(E1:H3,lambda(cell,small(vlookup(index(E1:H3,row(cell),0),A1:B10,2,false),column(cell)-column(E:E)+1))))
Thanks to #JvdV for this insight (which may be obvious to some but wasn't to me) shown here in Excel.
try:
=INDEX(TRIM(SPLIT(FLATTEN(QUERY(QUERY(QUERY(SPLIT(FLATTEN(E1:H3&"×​"&ROW(E1:H3)), "​"),
"select max(Col1) group by Col1 pivot Col2"), "offset 1", 0),,9^9)), "×")))
or if you want numbers:
=INDEX(IFNA(VLOOKUP(TRIM(SPLIT(FLATTEN(QUERY(QUERY(QUERY(SPLIT(FLATTEN(E1:H3&"×​"&ROW(E1:H3)), "​"),
"select max(Col1) group by Col1 pivot Col2"), "offset 1", 0),,9^9)), "×")), A:B, 2, 0)))

Power Query check if any number is a linear combination of a subset of remaining numbers

I have a dataset made of prod names (coming from local countries' databases) and in columns, all numbers parsed out of their names.
I am building a data mapper that would reconstruct product names to the standard used in central database. Aka ProdName Size PackSize [mix optional]
Prod Size = SUM ( Size(i) x NumPacks(i) ), i=[1,10]
Example of data (number of columns can be anything between 1-10):
ProdName|num1| num 2 |num 3 |num 4|num 5 | num 6 | num 7 | Desired Output
Prod1 | 5 | 20 | 2 | 25 | 2 | 30 | 120 | Prod1 120g pack of 5 (mix)
Prod2 | 2 | 200 | | 400 | | | | Prod2 200g pack of 2
The challenge is that some numbers will be irrelevant, i.e. traces of barcode, discount, parts of brand names.
I need to find a way to identify
Is any of the numbers a linear combination of a subset of others.
If so, and number of regressors is more than 1, then return a total size, total count of packs, and "mix".
Prod 1 scenario would return Prod1 120g pack of 5 mix (because 120 = 20x2+25x2+30, and 5 is the total number of packs 2+2+1).
If number of regressors is only 1 (i.e. count of numbers 3 or less), I want to return the regressor's (not total!) size and packsize.
I.e. Prod 2 scenario: return name is Prod2 200g pack of 2 (and not ProdName 400 pack of 2)
I am building a set of helper columns. For now I only have an idea how to work out the case of Prod2, when there are 3 or less numbers. I'm searching for a solution, but the plan is:
get numbers for each row into a list excluding blanks (this is what I'm at now)
calc the MAX and check if other numbers when multiplied = MAX
assign smaller num to pack size, 2nd large num to Size (and ignore the MAX)
I am not sure yet about the Prod1's complex case yet.
For a good answer, one should know all types of possible combinations.
For the 2 examples provided, this will work:
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "NonNullCount", each List.NonNullCount(Record.FieldValues(_))),
#"Added Conditional Column" = Table.AddColumn(#"Added Custom", "Output", each if [NonNullCount] = 4 then [ProdName] & " " & Text.From([num2]) & "g pack of "&Text.From([num1]) else [ProdName] & " "& Text.From(List.Max(List.Skip(Record.FieldValues(_),1))) & "g pack of "&Text.From([num1]) & " (mix)")
in
#"Added Conditional Column"
Here is another couple of ideas.
Getting a list of values, adding max and min values, quickly analyzing list for a value (you can add your logic here):
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
AddValuesList = Table.AddColumn(Source, "ValList", (x)=>
List.SelectValues(Record.ToList(Record.RemoveFields(x, "ProdName")),
(y)=> y <> null)),
AddMax = Table.AddColumn(AddValuesList, "MaxVal", each List.Max([ValList])),
AddMin = Table.AddColumn(AddMax, "MinVal", each List.Min([ValList])),
AddCnt = Table.AddColumn(AddMin, "ItemCount",
if List.Contains(List.Select([ValList], (x)=> x <> [MaxVal] and x <> [MinVal])
, [MaxVal]/[MinVal])
then [MaxVal]/[MinVal]
else null)
in
AddCnt

Maximizing difference of sums

Given a list of N distinct positive integers, partition the list into two
sublists of n/2 size such that the difference between sums of the sublists
is maximized.
Assume that n is even and determine the time complexity.
I know, I know, it's a homework question. But the issue is not necessarily in solving it, but in understanding what exactly is being asked. I can safely say that half of the problem is simple to solve, but I don't think I get what is meant by
such that the difference between sums of the sublists
is maximized.
Any help in illustrating the "plan of attack" on this would be appreciated
asuume you have this list
list : 1 ,1 , 2, 3, 1, 5, 6, 1, 2, 20
it means you can split it to sub lists with size of n/2 in many ways
such like this
sub list 1 : 3, 1, 5, 6, 1
sub list 2 : 1 ,1 , 2, 2, 20
now calculate sum of each sub list
sum of sub list 1 is 16
sum of sub list 2 is 26
diffrence between them is : 10
but question want two sub lists such has this condition
question condition : difference between sums of the sublists is maximized.
it means between all ways that we can split main list into two sublists choose one way that has the question condition.
for example if we split above list into this lists
sub list 1 : 1 ,1 ,1 ,1 , 2
sub list 2 : 2, 3, 5 , 6 , 20
sum of sub list 1 is 6
sum of sub list 2 is 36
diffrence between them is : 30
which is more than last result and also is maximum

Pick random from top 30% of int array

Suppose I had a int array
array=( 1 2 3 4 5 6 7 8 9 10 )
How would I pick a random number from the top 30% of the array, numbers 8, 9, or 10.
I know to pick a number at complete random in the array is:
${array[RANDOM % ${#array[#]}]}
However I don't know how to do pick a random element in a percentage of the array,
Sort the array, in reverse:
IFS=$'\n' sorted=($(sort -rn <<<"${array[*]}"))
Figure out the number of eligible elements:
n=$((${#sorted[#]}*3/10))
Pick a random element:
val=${sorted[RANDOM % $n]}
Working off nneonneo's example...
So if I want to do something more dynamic I can do this:
percentage=0.3
IFS=$'\n' sorted=($(sort -rn <<<"${array[*]}"))
s=$(bc <<< $percentage*${#array[#]})
round=${s/.*}
round_ceil=$((round+1))
val=${sorted[RANDOM % $round_ceil]}
or do you see any bugs?
EDIT: I had to make a ceiling round instead of a floor round as floor rounds sometimes didn't produce a number.

Separate matrix based on cumulative sum of column criteria (MATLAB)

I have a dataset something like the following:
a = [1 11; 2 16; 3 9; 4 13; 5 8; 6 14];
I am looking to separate into several Matrices by the following criteria:
Starting with the first column, construct sets where the sum of the second row is in the range 19-to-25.
So the output would be something like this:
a1 = [1 11; 3 9]
a2 = [2 16; 5 8]
a3 = [6 14]
Where a1=20, a2=24, and a3 does not meet criteria but is the last.
Could this be contained and output from a FOR loop?
Edit: Criteria of how to combine: I am looking to start at the beginning (first row) and add to the next row. If the sum is greater than 25, that row would be skipped till the next iteration. Each iteration should output a seperate matrix (a1, a2, a3).
I think I have some useful pseudo code for you.
For one I would not modify the matrix by removing columns. Rather I would keep a list of used columns.
I would use the summing like this:
used = false(1,num lines)
for i=1:num lines
if used(i) continue
curr_use = i
for j=i+1
if used(j) continue
if cant_add(j) continue
Concat j to curr_use
end
used(curr_use) = true
end

Resources