Merge, join, and concatenate data with Power Query - powerquery

For example, I have this data:
Name | m / f / x
Peter | m
Jack | m
Mary | f
Tim | m
Olivia | f
Cindy | f
Walter | m
Ronald | m
Patty | x
And I want to use Power Query to do the following:
m / f / x | Surname
m | Peter, Jack, Tim, Walter, Ronald
f | Mary, Olivia, Cindy
x | Patty
I tried a lot but did not get a result. The best but still wrong result:
let
Source = Excel.CurrentWorkbook () {[Name = "Table1"]} [Content],
# "Modified Type" = Table.TransformColumnTypes (Source, {{"Name", type text}, {"m / f / x", type text}}),
# "Renamed columns" = Table.RenameColumns (# "Modified type", {{"m / f / x", "m, f or x"}}),
# "Grouped rows" = Table.Group (# "renamed columns", {"m, f or x"}, {{"sex", each _, type table [name = text, m, f or x = text] }}),
# "Added custom column" = Table.AddColumn (# "Grouped rows", "Custom", each Table.TransformColumns (# "Grouped rows", {"sex", each Text.Combine ([Name], "|") })),
# "Advanced Custom" = Table.ExpandTableColumn (# "Added custom column", "Custom", {"sex"}, {"Custom.sex"})
in
# "Advanced Custom"
I'm sure this line / formula (editor) is wrong:
Table.TransformColumns (# "grouped rows", {"sex", each Text.Combine ([name], "|")})
What is the correct code?
Gladly a reference to an already existing solution (which I searched but did not find). Many of the examples in the Microsoft M reference are, in my view, unfamiliar to practice and incomprehensible to me, because I'm accessing existing fields in a query rather than constructing individual cell values. - I would understand it better, if you write me not only the complete M code but also the important command lines with the formula. With an age of more than 70 years, the ability to abstract is no longer so pronounced ...
Thank you says
Guenther

This code should work:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
group = Table.Group(Source, {"m / f / x""}, {"Names", each Text.Combine([Name], ", ")})
in
group

Related

Steps to generate parse tree from CYK Algorithm (Natural Language Processing)

I am currently working on a project involving NLP. I have implemented a CKY identifier as given in Jurafsky and Martin (algorithm on page 450). The table so produced actually stores the nonterminals in the table (instead of the usual boolean values). However, the only issue I am getting is to retrieve the parse tree.
Here is an illustration of what my CKY identifier does:
This is my grammar
S -> NP VP
S -> VP
NP -> MODAL PRON | DET NP | NOUN VF | NOUN | DET NOUN | DET FILENAME
MODAL -> 'MD'
PRON -> 'PPSS' | 'PPO'
VP -> VERB NP
VP -> VERB VP
VP -> ADVERB VP
VP -> VF
VERB -> 'VB' | 'VBN'
NOUN -> 'NN' | 'NP'
VF -> VERB FILENAME
FILENAME -> 'NN' | 'NP'
ADVERB -> 'RB'
DET -> 'AT'
And this is the algorithm:
for j from i to LENGTH(words) do
table[j-1,j] = A where A -> POS(word[j])
for i from j-2 downto 0
for k from i+1 to j-1
table[i,j] = Union(table[i,j], A such that A->BC)
where B is in table[i,k] and C is in table[k,j]
And this is what my parsing table looks like after being filled:
Now that I know that since S resides in [0,5], the string has been parsed, and that for k = 1 (as per the algorithm given in Martin and Jurafsky), we have S -> table[0][2] table[2][5]
i.e. S -> NP VP
The only issue I am getting is that I have been able to retrieve the rules used, but then they are in a jumbled format, i.e. not on the basis of their appearance in parse tree. Can someone suggest an algorithm to retrieve the correct parse tree?
Thankyou.
You should visit recursively the cells of your table and unfold them in the same way you did for the S node until everything is a terminal (so you don't have anything else to unfold). In your example, you first go to cell [0][2]; this is a terminal, you don't have to do anything. Next you go to [2][5], this is a non-terminal made by [2][3] and [3][5]. You visit [2][3], it's a terminal. [3][5] is a non-terminal, made by two terminals. You are done. Here is a demo in Python:
class Node:
'''Think this as a cell in your table'''
def __init__(self, left, right, type, word):
self.left = left
self.right = right
self.type = type
self.word = word
# Declare terminals
t1 = Node(None,None,'MOD','can')
t2 = Node(None,None,'PRON','you')
t3 = Node(None,None,'VERB', 'eat')
t4 = Node(None,None,'DET', 'a')
t5 = Node(None,None,'NOUN','flower')
# Declare non-terminals
nt1 = Node(t1,t2, 'NP', None)
nt2 = Node(t4,t5, 'NP', None)
nt3 = Node(t3,nt2,'VP', None)
nt4 = Node(nt1,nt3,'S', None)
def unfold(node):
# Check for a terminal
if node.left == None and node.right == None:
return node.word+"_"+node.type
return "["+unfold(node.left)+" "+unfold(node.right)+"]_"+node.type
print unfold(nt4)
And the output:
[[can_MOD you_PRON]_NP [eat_VERB [a_DET flower_NOUN]_NP]_VP]_S

F# Tree: Node Insertion

This is a question that extends F# Recursive Tree Validation, which I had nicely answered yesterday.
This question concerns inserting a child in an existing tree. This is the updated type I'd like to use:
type Name = string
type BirthYear = int
type FamilyTree = Person of Name * BirthYear * Children
and Children = FamilyTree list
My last question concerned checking the validity of the tree, this was the solution I decided to go with:
let rec checkAges minBirth = function
| Person(_,b,_) :: t -> b >= minBirth && checkAges b t
| [] -> true
let rec validate (Person(_,b,c)) =
List.forall isWF c && checkAges (b + 16) c
Now I would like to be able to insert a Person Simon as a child of specific Person Hans in the following form
insertChildOf "Hans" simon:Person casperFamily:FamilyTree;;
So, input should be parent name, child and the family tree. Ideally it should then return a modified family tree, that is FamilyTree option
What I am struggling with is to incorporating the validate function to make sure it is legal, and a way to insert it properly in the list of children, if the insertion Person is already a parent - maybe as a seperate function.
All help is welcome and very appreciated - thanks! :)
After your comment here's a code that will behave as expected:
let insert pntName (Person(_, newPrsnYear, _) as newPrsn) (Person (n,y,ch)) =
let rec ins n y = function
| [] -> if y < newPrsnYear && n = pntName then Some [newPrsn] else None
| (Person (name, year, childs) as person) :: bros ->
let tryNxtBros() = Option.map (fun x -> person::x) (ins n y bros)
if y < newPrsnYear && n = pntName then // father OK
if newPrsnYear < year then // brother OK -> insert here
Some (newPrsn::person::bros)
else tryNxtBros()
else // keep looking, first into eldest child ...
match ins name year childs with
| Some i -> Some (Person (name, year, i) :: bros)
| _ -> tryNxtBros() // ... then into other childs
Option.map (fun x -> Person (n, y, x)) (ins n y ch)
As in my previous answer I keep avoiding using List functions since I don't think they are a good fit in a tree structure unless the tree provides a traverse.
I might be a bit purist in the sense I use either List functions (with lambdas and combinators) or pure recursion, but in general I don't like mixing them.

Re-generate random numbers in a loop

I would like to create a basic genetic algorithm in order to output a set of input to enter in an emulator. Basically, what it does is :
Generate an input sheet
List item
Run said input
slightly modify it
Run it
See whichever input set performed better and "fork" it and repeat until the problem is solved
So : here is my code to generate the first set of inputs :
(* RNG initialization
* unit *)
Random.self_init();;
(* Generating a starting input file
* array
* 500 inputs long *)
let first_input =
let first_array = Array.make 500 "START" in
for i = 1 to 499 do
let input =
match Random.int(5) with
| 0 -> "A "
| 1 -> "B "
| 2 -> "DOWN "
| 3 -> "LEFT "
| 4 -> "RIGHT "
| _ -> "START " in
first_array.(i) <- input
done;
first_array;;
And here is my "mutation" function that randomly alters some inputs :
(* Mutating input_file
* Rate : in percent, must be positive and <= 100
* a must be an array of strings *)
let mutation a n=
let mutation_rate = n in
for i = 0 to ((Array.length(a) * mutation_rate / 100) - 1) do
let input =
match Random.int(5) with
| 0 -> "A "
| 1 -> "B "
| 2 -> "DOWN "
| 3 -> "LEFT "
| 4 -> "RIGHT "
| _ -> "START " in
a.( Random.int(498) + 1) <- input
done;;
However, I don't feel like my function is efficient because I had to paste the pattern matching part in the mutation function and I think there has to be a smarter way to proceed. If I define my "input" function as a global function, then it is only evaluated once (let's say as "RIGHT" and all occurrences of "input" will return "RIGHT" which is not really useful.
Thanks.
There isn't anything wrong with putting that into it's own function. What you are missing is an argument to make the function deal with the side-effect of Random.int. Since you are not using this argument, it's often/always the case people use unit.
let random_input () = match Random.int 5 with
| 0 -> "A "
| 1 -> "B "
| 2 -> "DOWN "
| 3 -> "LEFT "
| 4 -> "RIGHT "
| _ -> "START "
What you are doing here is pattern matching the argument, and since there is only one constructor this matching is exhaustive. But technically, you can replace the () above with an _. This will match anything making the function polymorphic against it's argument, 'a -> string. In this case it's bad form since it may lead to confusion as to what the parameter is for.

Difference Between J48 and Markov Chains

I am attempting to do some evaluation of the relative rates of different algorithms in the C# and F# realms using WekaSharp and one of the algorithms I was interested in was Markov Chains. I know Weka has an HMM application but I have not been able to implement this into WekaSharp and was wondering if there was a way to modify the J48 Algorithm to suit this purpose. I know there is some similarity between J48 and first order Markov chains but am trying to determine what needs to be modified and if this is a reasonable thing to do. Here is the J48 as implemented in Yin Zhu's WekaSharp:
type J48() =
static member DefaultPara = "-C 0.25 -M 2"
static member MakePara(?binarySplits, ?confidenceFactor, ?minNumObj, ?unpruned, ?useLaplace) =
let binarySplitsStr =
let b = match binarySplits with
| Some (v) -> v
| None -> false
if not b then "-B" else ""
let confidenceFactorStr =
let c = match confidenceFactor with
| Some (v) -> v
| None -> 0.25 // default confi
"-C " + c.ToString()
let minNumObjStr =
let m = match minNumObj with
| Some (v) -> v
| None -> 2
"-M " + m.ToString()
let unprunedStr =
let u = match unpruned with
| Some (v) -> v
| None -> false
if u then "-U" else ""
let useLaplaceStr =
let u = match useLaplace with
| Some (v) -> v
| None -> false
if u then "-A" else ""
binarySplitsStr + " " + confidenceFactorStr + " " + minNumObjStr + " " + unprunedStr + " " + useLaplaceStr
Thank you very much.
J48 is just an implementation of the C4.5 algorithm that learns decision trees by considering the entropy of each attribute (dimension) and taking the attribute that has maximum entropy as root of the current subtree. This algorithm does not need reinforcement.
I guess that by Markov Chains you mean Hidden Markov Model that is used in reinforcement learning.
You should take a look to HMMWeka.
A related question is:
What is the equivalent for a Hidden Markov Model in the WEKA toolkit?

Create (pseudo) Cyclic Discriminated Unions in F#

I've run into a small problem here. I wrote the Tortoise and Hare cycle detection algorithm.
type Node =
| DataNode of int * Node
| LastNode of int
let next node =
match node with
|DataNode(_,n) -> n
|LastNode(_) -> failwith "Error"
let findCycle(first) =
try
let rec fc slow fast =
match (slow,fast) with
| LastNode(a),LastNode(b) when a=b -> true
| DataNode(_,a), DataNode(_,b) when a=b -> true
| _ -> fc (next slow) (next <| next fast)
fc first <| next first
with
| _ -> false
This is working great for
let first = DataNode(1, DataNode(2, DataNode(3, DataNode(4, LastNode(5)))))
findCycle(first)
It shows false. Right. Now when try to test it for a cycle, I'm unable to create a loop!
Obviously this would never work:
let first = DataNode(1, DataNode(2, DataNode(3, DataNode(4, first))))
But I need something of that kind! Can you tell me how to create one?
You can't do this with your type as you've defined it. See How to create a recursive data structure value in (functional) F#? for some alternative approaches which would work.
As an alternative to Brian's solution, you might try something like:
type Node =
| DataNode of int * NodeRec
| LastNode of int
and NodeRec = { node : Node }
let rec cycle = DataNode(1, { node =
DataNode(2, { node =
DataNode(3, { node =
DataNode(4, { node = cycle}) }) }) })
Here is one way:
type Node =
| DataNode of int * Lazy<Node>
| LastNode of int
let next node = match node with |DataNode(_,n) -> n.Value |LastNode(_) -> failwith "Error"
let findCycle(first) =
try
let rec fc slow fast =
match (slow,fast) with
| LastNode(a),LastNode(b) when a=b->true
| DataNode(a,_), DataNode(b,_) when a=b -> true
| _ -> fc (next slow) (next <| next fast)
fc first <| next first
with
| _ -> false
let first = DataNode(1, lazy DataNode(2, lazy DataNode(3, lazy DataNode(4, lazy LastNode(5)))))
printfn "%A" (findCycle(first))
let rec first2 = lazy DataNode(1, lazy DataNode(2, lazy DataNode(3, lazy DataNode(4, first2))))
printfn "%A" (findCycle(first2.Value))
Even though both Brian and kvb posted answers that work, I still felt I needed to see if it was possible to achieve the same thing in a different way. This code will give you a cyclic structure wrapped as a Seq<'a>
type Node<'a> = Empty | Node of 'a * Node<'a>
let cyclic (n:Node<_>) : _ =
let rn = ref n
let rec next _ =
match !rn with
| Empty -> rn := n; next Unchecked.defaultof<_>
| Node(v, x) -> rn := x; v
Seq.initInfinite next
let nodes = Node(1, Node(2, Node(3, Empty)))
cyclic <| nodes |> Seq.take 40 // val it : seq<int> = seq [1; 2; 3; 1; ...]
The structure itself is not cyclic, but it looks like it from the outside.
Or you could do this:
//removes warning about x being recursive
#nowarn "40"
type Node<'a> = Empty | Node of 'a * Lazy<Node<'a>>
let rec x = Node(1, lazy Node(2, lazy x))
let first =
match x with
| Node(1, Lazy(Node(2,first))) -> first.Value
| _ -> Empty
Can you tell me how to create one?
There are various hacks to get a directly cyclic value in F# (as Brian and kvb have shown) but I'd note that this is rarely what you actually want. Directly cyclic data structures are a pig to debug and are usually used for performance and, therefore, made mutable.
For example, your cyclic graph might be represented as:
> Map[1, 2; 2, 3; 3, 4; 4, 1];;
val it : Map<int,int> = map [(1, 2); (2, 3); (3, 4); (4, 1)]
The idiomatic way to represent a graph in F# is to store a dictionary that maps from handles to vertices and, if necessary, another for edges. This approach is much easier to debug because you traverse indirect recursion via lookup tables that are comprehensible as opposed to trying to decipher a graph in the heap. However, if you want to have the GC collect unreachable subgraphs for you then a purely functional alternative to a weak hash map is apparently an unsolved problem in computer science.

Resources