How to map summarized article's sentences and original article's sentences - huggingface-transformers

If I use a tansformer model to get a summary form an article,
Is there any method that I can know which part of the original article does the generated sentence belong to?
I just want some ideas about this kind of topic

Related

Are there any alternate ways other than Named Entity Recognition to extract event names from sentences?

I'm a newbie to NLP and I'm working on NER using OpenNLP. I have a sentence like " We have a dinner party today ". Here "dinner party" is an event type. Similarly consider this sentence- "we have a room reservation" here room reservation is an event type. My goal is to extract such words from sentences and label it as "Event_types" as the final output. This can be fairly achieved by creating custom NER model's by annotating sentences with proper tags in the training dataset. But the event types can be heterogeneous and random and hence it is very hard to label all possible patterns(ie. event types can be anything like "security meeting", "family function","parents teachers meeting", etc,etc,...). So I'm looking for an alternate way to achieve this problem... Immediate response would be appreciated. Thanks ! :)
Basically you have two options: 1) A list-based approach where you have lists of entities you will extract from text. To solve the heterogeneous language use, one can train an embedding (e.g. Word2Vec or FastText) to identify contextually similar phrases for your list. 2) Train a custom CRF with data you have annotated (this obviously requires that you annotate bunch of sentences with corresponding tags). I guess the ideal solution really depends on the data and people's willingness to annotate it.

How to improve the accuracy of ner of StanfordCoreNLP?

I used NER of StanfordCoreNLP to recognize the entity including organization, location and person. But there exists something weird. For example, I input a sentence like "Cleveland Cavaliers" and it will recognize the 'Cleveland' as 'location' but not 'Cleveland Cavaliers' as organization.
I am not very familiar with the ner and I don't know how the NER works. My task is to get all the company name in the text and the result I have got is not very satisfactory. So there are two ways occuring to me to solve the problem. The first is to modify the dict and insert the correct data. The second is to train the model. But there are still some questions.
Will the first way work effectively?
If the answer of question 1 is yes, how to modify the dict?
Further more, the FAQ list at https://nlp.stanford.edu/software/crf-faq.shtml#a proposed the way to train the ner model but what confused me most is what I will get if I trained my model.
If I create a dataset containing like
"organization 'Cleveland
Cavaliers'"
to train the model, what will happen in the model? The dict inside the CRFClassifier will change?
Will the CRFClassifier modify the bug when I input 'Cleveland Cavaliers' and recognize the 'Cleveland Cavaliers' as an organization entity?
These are all my puzzles and I am preparing the dataset to try the second way. Can anybody answer the 4 questions above?
Thanks
I think the first solution is not very technical and every time you want to tag a new company, you need to update your dictionary.
I prefer your second solution and I do this before and trained a new model to tag my sentences.
If you have a good corpus that is big enough which tagged properly, It may take some time to train, but it worth the effort.

LUIS- Doesn't not understand Intent sometime

We are developing Library BOT using Microsoft BOT.
Here We have created one Intent BookSearch, and Entity BookName, BookAuthor.
We trained LUIS with Simple question,but he works only matching questions.
Ex. I trained LUIS like "I need book", so its works properly
But with Same question we write "I need a book", its doesn't understand to match with book intent.
Can anyone help us here? Like that so many scenario where we found LUIS only works with exact matching questions.
One More Problem, We have Book name, with Three Work, unable to tag three words as a bookname entity.
It sounds like your model just needs more training with a variety of sentence structures.
LUIS will match the exact intent when it's been trained but needs more examples to get better with novel utterances. So "I need book" vs "I need a book" should be pretty easy for it to learn with more properly labeled utterances.
As for the title with three words, highlighting them all by clicking and dragging across all three is possible.
You have to write more possible questions which user can ask regarding book?
Let me give an example to explain this...
I need book
I need a book
What latest book you have?
Can you recommend a book to me
I am looking for book
I am looking for a book
In above examples - Intent should be - FindBook
I am looking for C# book?
In above example - Intent should be - FindBook, here user has mentioned the Subject (C#) as well. Subject will be entity.
I am looking for C# book written by Joseph Albahari, Ben Albahari
In above example - Intent Should be one - FindBook, here user has mentioned the Subject and Writer.
Subject and Writer will be entity.
You have to train your model and feed more possible questions, then only LUIS will work perfectly.
You can highlight full sentence as well.

Topic Modelling - Assign human readable labels to topic

I want to assign human readable labels to the results of my topic modelling.
Is there any software library or data set that I can use that takes these key words as an input, and returns a title to describe the topic.
Example:
Input: ["Church","Priest","God","Prayer"]
Output: "Religion"
Note: I want automatic label creation - Not manual like others have asked before.
See this paper by Jey Han Lau. He describes how to automatically generate labels using different sources and features.
We generate a set of label candidates
from the top-ranking topic terms, titles of Wikipedia
articles containing the top-ranking topic terms, and
also a filtered set of sub-phrases extracted from the
Wikipedia article titles. We rank the label candidates
using a combination of association measures, lexical
features and an Information Retrieval feature.

Algorithms to recognize misspelled names in texts

I need to develop an application that will index several texts and I need to search for people’s names inside these texts. The problem is that, while a person’s correct name is “Gregory Jackson Junior”, inside the text, the name might me written as:
- Greg Jackson Jr
- Gegory Jackson Jr
- Gregory Jackson
- Gregory J. Junior
I plan to index the texts on a nightly bases and build a database index to speed up the search. I would like recommendation for good books and/or good articles on the subject.
Thanks
Check these related questions.
Algorithm to find articles with similar text
How to search for a person's name in a text? (heuristic)
Your question is incorrectly phrased. The examples do not indicate misspelling but change in the form of writing a full name.
And,
would your search expect to match on words like son with reference to the example?
would it expect to match bob when looking for a name called Robert?
Are you looking for things like this and this?
Ok, reading your comment suggests you do not want to venture into that.
For the record. Use a Bayesian filter. You may use mechanical truck for initializing your algorithm.

Resources