How to have parametrizable "methods" in Elm data-structures - data-structures

I'm stuck refactoring a large data-structure in Elm. I know how I would implement this in OO languages, but have no experience in a functional setting. I can't express well what my problem is, because I can only frame it in OOP terms which don't apply. So I go by way of example, which is a simplified version of the module I'm refactoring.
Suppose I have this type:
type alias Book = { title : String, text : String }
and I have two Books:
englishBook = { title = "An English Book", text = "This book is an Eglish book." }
frenchBook = { title = "Un livre Francais", text = "Ce livre est un livre Francais." }
There is an associated index function to compute which words are in a Book:
index = String.words >> Set.fromList
Here's already my first problem. When index takes a String, the user of the module must know how to take the text from the book. Instead, my habits say that the function should do this for us. So index could behave like a method and take a Book as first argument: index = .text >> String.words >> Set.fromList. But that also feels weird.
That's not the end of it though, because the index generator should be parametrizable. Depending on the book, it should do different things. So I could add the index function like this:
englishBook = { title = "...", text = "...", index = englishIndex }
frenchBook = { title = "...", text = "...", index = frenchIndex }
now each book has the function to build its index. But still the caller has to supply the record when it wants the index:
wordsInEnglishBook = englishBook.index englishBook.text
which is not a nice solution to me because it burdens the caller with internals of the module. Well what if that part is encapsulated with a constructor?
book title text index = { title = title, text = text, index = \_ -> index text }
Now I've come full circle and have implemented a method. So what is the idiomatic solution for this in Elm?

You can use a custom type to represent the language then pattern match on the language to perform your index.
type Language
= English
| French
type alias Book =
{ title : String
, text : String
, language : Language
}
wordsInBook : Book -> Set String
wordsInBook { language, text } =
case language of
English ->
doSomethingWithEnglish text
French ->
doSomethingWithFrench text
or
type Book
= Book Language Data
type Language
= English
| French
type alias Data =
{ title : String
, text : String
}
wordsInBook : Book -> Set String
wordsInBook (Book language data) =
case language of
English ->
doSomethingWithEnglish data
French ->
doSomethingWithFrench data

Related

Entity Framework Core translation capabilities [duplicate]

This question already has an answer here:
Can I reuse code for selecting a custom DTO object for a child property with EF Core?
(1 answer)
Closed 1 year ago.
I have a rather theoretical issue with Entity Framework Core on SQLite.
I have an entity - Person { ID, FirstName, LastName, ... }, class PersonReference { ID, Representation : string }, extension method with argument of type Person that composes reference out of Person like this:
public static PersonReference ComposeReference(this Person from) => new PersonReference
{
ID = from.ID,
Representation = from.FirstName + " " + from.LastName
};
I need to compose references on the sql side. So I do the following:
var result = dbContext.People.Select(p => p.ComposeReference());
Result is IQueriable and program goes beyond that line and materializes the collection successfully. But when I look at the query I see it selects everything of Person and then query text ends.
If I rewrite EF expression to direct
var result = dbContext.People.Select(p => new PersonReference
{
ID = from.ID,
Representation = from.FirstName + " " + from.LastName
});
it gives me the satisfying expression with compact select and string concatenations.
Is there a way to keep the composition logic in extension method but still do calculations on the SQL side?
The trick is about using System.Linq.Expressions.Expression.
I met it at work and didn't get what it was for at first, but it is designed right for the purpose I required.
Declaration:
Expression<Func<Person, PersonReference>> ComposeReference => from => new
PersonReference
{
ID = from.ID,
Representation = from.FirstName + " " + from.LastName
};
Usage:
var result = dbContext.People.Select(ComposeReference);
Pay attention, that expressions can be compiled, but for this case never do it, or it will treat your DbSet as IEnumerable.
The answer from Svyatoslav's comment referred to some libraries, but I think vanilla EF does well enough on its own.

Getting output in the desired format using TokenRegex

I am using TokensRegex for rule based entity extraction. It works well but I am having trouble getting my output in the desired format. The following snippet of code gives me an output given below for the sentence:
Earlier this month Trump targeted Toyota, threatening to impose a
hefty fee on the world's largest automaker if it builds its Corolla
cars for the U.S. market at a plant in Mexico.
for (CoreMap sentence : sentences)
{
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
if (matched != null) {
matched = MatchedExpression.removeNested(matched);
matched = MatchedExpression.removeNullValues(matched);
System.out.print("FOR SENTENCE:" + sentence);
}
for(MatchedExpression phrase : matched){
// Print out matched text and value
System.out.print("MATCHED ENTITY: " + phrase.getText()+ "\t" + "VALUE: " + phrase.getValue());
OUTPUT
MATCHED ENTITY: Donald Trump targeted Toyota, threatening to impose a hefty fee on the world's largest automaker if it builds its Corolla cars for the U.S. market
VALUE: LIST([PERSON])
I know if I iterate over tokens using :
for (CoreLabel token : cm.get(TokensAnnotation.class))
{String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
String pos = token.get(PartOfSpeechAnnotation.class);
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("matched token: " + "word="+word + ", lemma="+lemma + ", pos=" + pos + ", NE=" + ne);
}
I can get an output that gives annotation for each tag. However, I am using my own rules to detect Named Entities and I have sometimes seen issues where in a multi token entity one word from it may be tagged as person where the where multi token expression should have been an organization (mostly in the case of Organization and location names)
So the output I am expecting is:
MATCHED ENTITY: Donald Trump VALUE: PERSON
MATCHED ENTITY: Toyota VALUE: ORGANIZATION
How do I change the above code to get the desired output? Do I need to use custom annotations?
I produced a jar of the latest build a week or so ago. Use that jar available from GitHub.
This sample code will run the rules and apply the appropriate ner tags.
package edu.stanford.nlp.examples;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import java.util.*;
public class TokensRegexExampleTwo {
public static void main(String[] args) {
// set up properties
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,tokensregex");
props.setProperty("tokensregex.rules", "multi-step-per-org.rules");
props.setProperty("tokensregex.caseInsensitive", "true");
// set up pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// set up text to annotate
Annotation annotation = new Annotation("...text to annotate...");
// annotate text
pipeline.annotate(annotation);
// print out found entities
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
System.out.println(token.word() + "\t" + token.ner());
}
}
}
}
I managed to get output in desired format.
Annotation document = new Annotation(<Sentence to annotate>);
//use the pipeline to annotate the document we created
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
//Note- I doesn't put environment related stuff in rule file.
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor
.createExtractorFromFiles(env, "test_degree.rules");
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
for(MatchedExpression phrase : matched){
// Print out matched text and value
System.out.println("MATCHED ENTITY: " + phrase.getText() + " VALUE: " + phrase.getValue().get());
}
}
Output:
MATCHED ENTITY: Technical Skill VALUE: SKILL
You might want to have a look at my rule file in this question.
Hope this helps!
Answering my own question for those struggling with a similar issue. THe key to getting your output in the correct format lies in how you define your rules in the rules file. Here's what I changed in the rules to change the output:
Old Rule:
{ ruleType: "tokens",
pattern: (([pos:/NNP.*/ | pos:/NN.*/]+) ($LocWords)),
result: Annotate($1, ner, "LOCATION"),
}
New Rule
{ ruleType: "tokens",
pattern: (([pos:/NNP.*/ | pos:/NN.*/]+) ($LocWords)),
action: Annotate($1, ner, "LOCATION"),
result: "LOCATION"
}
How you define your result field defines the output format of your data.
Hope this helps!

Using openIE to extract negation

I am trying to test OpenIE with Stanford CoreNLP
http://nlp.stanford.edu/software/openie.html
I am using the following code based on one of the demos available on http://stanfordnlp.github.io/CoreNLP/openie.html
public static void main(String[] args) throws Exception {
// Create the Stanford CoreNLP pipeline
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
props.setProperty("openie.triple.strict", "false");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate an example document.
//File inputFile = new File("src/test/resources/0.txt");
//String text = Files.toString(inputFile, Charset.forName("UTF-8"));
String text = "Cats do not drink milk.";
Annotation doc = new Annotation(text);
pipeline.annotate(doc);
// Loop over sentences in the document
for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
// Get the OpenIE triples for the sentence
Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
// Print the triples
for (RelationTriple triple : triples) {
System.out.println(triple.confidence + "|\t" +
triple.subjectLemmaGloss() + "|\t" +
triple.relationLemmaGloss() + "|\t" +
triple.objectLemmaGloss());
}
}
}
This counter-intuitively results in the triple
1.0| cat| drink| milk
being extracted, which is the same result I get using input text "Cats drink milk." If I set "openie.triple.strict" to "true" no triples are extracted at all. Is there a way to extract a triple like cats | do not drink | milk ?
I think you want to set "openie.triple.strict" to true to ensure logically warranted triples. OpenIE does not extract negative relations, it is only designed to find positive ones.
So you are getting the correct behavior when "openie.triple.strict" is set to true (i.e. no relation being extracted). Note that a relation is extracted for "Cats drink milk." when "openie.triple.strict" is set to true.

How to search with dynamic entity names with linq

Basically all I'm looking to do is something like the following:
string EntityFrameworkType = "Product";
string searchField = "ProductName";
string searchValue = "My Product";
using( var context = new entitycontext())
{
var result = (from x in context.EntityFrameworkType.Where(l=>l.searchField == searchValue) select x).FirstOrDefault();
}
of course this syntax won't work because context does not contain an entity called "EntityFrameworkType"...
Is it possible to do this another way??? What I'm looking to do in generalize my database duplicate check. In this example, I'm searching for any Product with the Name "My Product". But I'd like to be able to pass in these string for say, ProductCategory with ProductCategoryId = 1.... etc...
you can have a look here to get the idea of how it is done.
You'll need to learn about Expression

How do I programmatically translate a LINQ query to readable English text that correctly describes the linq expression?

I am working on a project that uses Albahari's PredicateBuilder library http://www.albahari.com/nutshell/ to create a linq expression dynamically at run time. I would like to find a way to translate this dynamically created linq predicate of type Expression<Func<T, bool>> into a readable english statement at runtime.
I'll give a statically created linq statement as an example:
from p in Purchases
select p
where p.Price > 100 && p.Description != "Bike".
For this linq statement I would want to dynamically generate at runtime an english description along the lines of:
"You are searching for purchases where the price is greater than 100 and the description is not bike".
Are there any libraries that already exist which accomplish this goal, keep in mind I am using PredicateBuilder to dynamically generate the where predicate. If no solution exists how would you go about building a solution?
Thanks!
This caught my attention so I downloaded ExpressionSerializationTypeResolver.cs and ExpressionSerializer.cs and then I:
class Purchase
{
public decimal Price {get;set;}
public string Description {get;set;}
}
...
var purchases = new List<Purchase>() { new Purchase() { Price = 150, Description = "Flute" }, new Purchase() { Price = 4711, Description = "Bike" } };
Expression<Func<IEnumerable<Purchase>>> queryExp = () => from p in purchases
where p.Price > 100 && p.Description != "Bike"
select p;
ExpressionSerializer serializer = new ExpressionSerializer();
XElement queryXml = serializer.Serialize(queryExp);
and then I got into problems, but maybe you could do something with the pretty big expression tree of your query? You can find it here.

Resources