Stack multiple columns into one - sorting

I want to do a simple task but somehow I'm unable to do it. Assume that I have one column like:
a
z
e
r
t
How can I create a new column with the same value twice with the following result:
a
a
z
z
e
e
r
r
t
t
I've already tried to double my column and do something like :
=TRANSPOSE(SPLIT(JOIN(";",A:A,B:B),";"))
but it creates:
a
z
e
r
t
a
z
e
r
t
I get inspired by this answer so far.

Try this:
=SORT({A1:A5;A1:A5})
Here we use:
sort
{} to combine data
Accounting your comment, then you may use this formula:
=QUERY(SORT(ArrayFormula({row(A1:A5),A1:A5;row(A1:A5),A1:A5})),"select Col2")
The idea is to use additional column of data with number of row, then sort by row, then query to get only values.
And join→split method will do the same:
=TRANSPOSE(SPLIT(JOIN(",",ARRAYFORMULA(CONCAT(A1:A5&",",A1:A5))),","))
Here we use range only two times, so this is easier to use. Also see Concat + ArrayFormula sample.

Few hundreds rows is nothing :)
I created index from 1 to n, then pasted it twice and sorted by index. But it's obviously fancier to do it with a formula :)

Assuming Your list is in column A and (for now) the times of repeat are in C1 (can be changed to a number in the formula), then something simple like this will do (starting in B1):
=INDEX(A:A,(INT(ROW()-1)/$C$1)+1)
Simply copy down as you need it (will give just 0 after the last item). No sorting. No array. No sheets/excel problems. No heavy calculations.

Related

Highlighting mininimum row value in Pander

I am trying to display a dataframe in an RMarkdown document using the Pander package.
I would like to highlight the minimum value in each row of values. Here's what I have tried:
df <- replicate(4, rnorm(5))
df <- as.data.frame(df)
df$min <- apply(df, 1, min)
emphasize.strong.cells(which(df == df$min, arr.ind = T))
pander(df[1:4])
When I do this I get the error:
Error in check.highlight.parameters(emphasize.strong.cells, nrow(t), ncol(t)) :
Too high number passed for column indexes that should be kept below 6
I can print out the whole table (with the min column) without any trouble or I can print out a partial table without emphasis, but neither of these is ideal. I want the highlighting, but I do not wish to include the 'min' column.
I imagine the fact that I am leaving some highlighted cells out of the pander command is causing the error.
Is there a way around this? Or a better way to do this?
Thanks.
Subquestion: What if I wanted to highlight the minimum in the first few rows and the maximum in the next few. Is that possible in a single table?
Instead of the which lookup, with the possibility to match row minimums in the wrong rows, you can easily construct those array indices with a simple sequence (1:N) and calling which.min on each row, eg with apply:
> df <- replicate(4, rnorm(5))
> df <- as.data.frame(df)
> emphasize.strong.cells(cbind(1:nrow(df), apply(df, 1, which.min)))
> pander(df)
----------------------------------------------
V1 V2 V3 V4
----------- ----------- ----------- ----------
0.6802 0.1409 **-0.7992** 0.1997
0.6797 **-0.2212** 1.016 0.6874
2.031 -0.009855 0.3881 **-1.275**
1.376 0.2619 **-2.337** -0.1066
**-0.4541** 1.135 -0.1566 0.2912
----------------------------------------------
About your next question: you could of course do that in a single table, eg rbind two matrices created similarly as described above with which.min and which.max.

How to filter one list of items from another list of items?

I have a huge list of items in Column A (1,000 items) and a smaller list of items in Column B (510 items).
I want to put a formula in Column C to show only the Column A items not in Column B.
How to achieve this through a formula, preferably a FILTER formula?
Select the list in column A
Right-Click and select Name a Range...
Enter "ColumnToSearch"
Click cell C1
Enter this formula: =MATCH(B1,ColumnToSearch,0)
Drag the formula down for all items in B
If the formula fails to find a match, it will be marked "#N/A", otherwise it will be a number.
If you'd like it to be TRUE for match and FALSE for no match, use this formula instead:
=ISNUMBER(MATCH(B1,ColumnToSearch,0))
If you'd like to return the unfound value and return empty string for found values
=IF(ISNUMBER(MATCH(B1,ColumnToSearch,0)),"",B1)
Alternative method is simply =
FILTER(A1:A,if(COUNTIF(B1:B,A1:A),0,1))
It's much more efficient.
It uses countif to get a 0 or a 1 as an array if the values in B are in A, then it reverses the 0 and 1 to get the values that are missing instead of only the values that are in there. It then filters based on that.
Columns look like this
A B
1 2
2 5
3
4
5
ARE formulae:
=FILTER(A1:A, MATCH(A1:A, B1:B, 0))
=FILTER(A1:A, COUNTIF(B1:B, A1:A))
ARE NOT formulae:
=FILTER(A1:A, ISNA(MATCH(A1:A, B1:B, 0)))
=FILTER(A1:A, NOT(COUNTIF(B1:B, A1:A)))
in your case:
=FILTER(A1:A; ISNA(MATCH(A:A; B:B; )))
if you face a mismatch of ranges see: https://stackoverflow.com/a/54795616/5632629

Difficulty Accessing Members of Tuple in Apache Pig

I have a variable titled F.
Describe F returns:
F: {group: bytearray,indexkey: {(indexkey: chararray)}}
Dump F returns:
(321,{(CHOW),(DREW)})
(5011,{(CHOW),(DREW)})
(5825,{(TANNER),(SPITZENBERGER)})
(16631,{(CHOW),(DREW)})
(34299,{(CHOW),(DREW)})
(35044,{(TANNER),(SPITZENBERGER)})
(65623,{(CHOW),(DREW)})
(74597,{(SPITZENBERGER),(TANNER)})
(83499,{(SPITZENBERGER),(TANNER)})
(90257,{(SPITZENBERGER),(TANNER)})
What I need is to produce an output that looks like this (only 1st row as an example):
(321,DREW,{(CHOW)})
I've tried using deference to pull out the first element by using this:
G = FOREACH F generate indexkey.$0;
But, this still returns the whole tuple.
Can anyone suggest a method for doing this? I was under the impression that the deference operator should allow me to do this.
Thanks in advance!
Daniel
You can't index into bags like that. The reason for that is bags don't have any notion of ordering. Selecting the first item in a bag should be treated as picking a random one.
Either way, if you want only one item instead of all of them you can used a nested FOREACH to pull a LIMIT of 1:
first = FOREACH F {
lim = LIMIT indexkey 1;
GENERATE group, lim;
}
(disclaimer: I can't test this code right now, if it doesn't work let me know. Hopefully you can get the gist)
You can take this a bit further and FLATTEN it to remove the bag of one item entirely, but be careful in that if the bag is empty i think you throw away the entire record in this case.
first = FOREACH F {
lim = LIMIT indexkey 1;
GENERATE group, FLATTEN(lim);
}

How to I create a loop that uses the last column in an array as the starting point for the next loop? In matlab

I have output = A(:,Nout) Nout = points along the array..... : = all points in column
So, it is saying the values in the last column.
How do I use output as A at the first column for the next iteration?
Your question is not clear. You may mean a variety of things.
If you want to loop through values in the first column in some order specified by your last column you can:
Asort = A ( A (:, end), :);
and then loop through Asort.
You may also mean to loop N times for each row where N is defined by last column of A.
You can do it using a nested loop:
for Arow = A(:, end)
for ii = 1:Arow
% your code here
end
end
You may also mean several other things, but instead me guessing you could try clarifing a bit. :)
(it should be a comment but I can't add comments yet, sorry)

mathematica: PadRight[] and \[PlusMinus]

Is there any way that
PadRight[a \[PlusMinus] b,2,""]
Returns
{a \[PlusMinus] b,""}
Instead of
a \[PlusMinus] b \[PlusMinus] ""
?
I believe that i need to somehow deactivate the operator properties of [PlusMinus].
Why do i need this?
I'm creating a program to display tables with physical quantities. To me, that means tables with entries like
(value of a) [PlusMinus] (uncertainty of a)
When i have several columns with different heights, i'm stuffing the shorter ones with "", so i can use Transpose the numeric part of the table.
If the column has more than one entrie, there's no problem:
PadRight[{a \[PlusMinus] b,c \[PlusMinus] d},4,""]
gives what i want:
{a \[PlusMinus] b,c \[PlusMinus] d,"",""}
It is when the column has only one entrie that my problem appears.
This is the code that constructs the body stuffed with "":
If[tested[Sbody],1,
body = PadRight[body, {Length[a], Max[Map[Length, body]]
With
tested[a__] :=
If[Length[DeleteDuplicates[Map[Dimensions, {a}]]] != 1, False,
True];
, a function that discovers if is arguments have the same dimension
and
a={Quantity1,Quantity2,...}
Where the quantities are the one's that i want on my table.
Thanks
First you need to be aware of that any expression in Mathematica is in the form of Head[Body]
where body may be empty, a single expression or a sequence of expressions separated by commas
Length operate on expressions, not necessarily lists
so
Length[PlusMinus[a,b]]
returns 2 since the body of the expression contains to expressions (atoms in this case) that are a and b
Read the documentation on PadRight. The second argument define the final length of the expression
so
PadRight[{a,b},4,c] results with a list of length 4 with the last two elements equal to
PadRight[{a,b},2,c] results with the original list since it is already of length 2
Therefore
PadRight[PlusMinus[a,b],2,anything] just returns the same PlusMinus[a,b] unchanged since it is already of length 2
so, youר first example is wrong. You are not able to get a result with head List using PadRight when you try to pad to an expression with head PlusMinus
There is no problem of executing
PadRight[PlusMinus[a,b],3,""]
but the result looks funny (at best) and logically meaningless, but if this is what you wanted in the first place you get it, and following my explanations above you can figure out why
HTH
best
yehuda

Resources