This log post on NLP, builds from a request that a Reddit user had and I have tried my best to justify to address the most pressing bottlenecks that the NLP community faces.
5. Evaluation Metrics :
This is not an explicit bottleneck problem as such in the community, however, there are many researchers from the community who want to shed light on the often, neglected aspects such as blindly following certain architectures, dataset and metrics for evaluation.
As the way is being paved for much higher level cognitive tasks understanding why certain techniques and architectures work well in certain scenarios, definitely help.
Another concern was for the evaluation metrics, the fact that how well do these techniques fit and generalise to the true variability offered by human languages and better yet, how to come up with much more interesting natural language inference datasets.
” I’d be much happier with benchmarks that understated progress than the current ones that seem to overstate it. ” - George Dahl
4. Life long learning :
The other hindrance faced in the community is to solve the problems of
-
Life long adaptation of low level models for downstream tasks
-
Applying transfer learning
-
Seamlessly integrating language related modalities such as vision, text, audio.
-
Effective cross-domain transfer for low resource scenarios
” Ability to transfer models to new conditions, which includes learning under limited (or absence of) annotated resources. To be able to learn truly robust NLP models, for many more than the tiny amount of languages and domains which we can currently support ” - Barbara Plank
3. Goal oriented dialogue systems :
A recent spike in the number of papers addressing goal oriented dialogue systems in ACL, EMNLP is evident in the ACL anthology.
Which leads to the next open problem, which is to figure out how to have longer goal/task oriented human-machine conversations that require real-world context and a knowledge base.
Task driven dialogue systems with state tracking, dialogue systems using Reinforcement learning and other bunch of novel techniques are a part of current active research.
## 2. Low resource Languages :
Addressing the most pressing problem, to be tackled.
There are approximately 7,000 languages in the world, but of these, only a small fraction (20 languages) are considered high-resource languages.
This is a direction where the effort of the community has to be channelised and where all the low fruits are hanging, insights to be explored and transferred to other languages.
There are hardly 6 papers on Paper’s with code.
Some pointers to work on, given by community experts:
-
Methods for gathering data and training language models for under resourced languages.
-
Effective cross-domain transfer for low resource scenarios
Check out this link here for a much more thorough understanding on the problem. Some pointers on African MT.
1. Natural Language Understanding :
This is the most open problem, tied up with many other connected problems that need to be solved by the community to achieve this higher level cognitive task, which may very well require insights, methodologies from other ML areas like reinforcement learning, domain adaptation, zero shot learning etc.
Below are some open pointers on which one could work on.
-
Coreference resolution, Polysemy, Text/Document Summarization.
-
Arguments/reasoning, sarcasm and humour.
-
Representing large contexts efficiently.
-
Grounded language learning, i.e. jointly learning a world model and how to refer to it in natural language.
Table of topics :
-
Natural language understanding
-
Ambiguity
-
Synonymy
-
Syntax
-
Coherence
-
Coreference
-
Personality, intention and style
-
Text Summarization
-
Humour and ambiguity
-
Polysemy
-
Keyphrase extraction
-
Knowledgebase population (KBP)
-
-
NLP for low-resource scenarios
-
Cross-lingual learning
-
Bilingual dictionary induction
-
-
Reasoning about large or multiple documents
-
Multi-task learning
-
Discourse parsing
-
Task-independent architecture improvements
-
-
Datasets, problems, and evaluation
-
Task-independent data augmentation for NLP
-
Few-shot learning for NLP
-
Transfer learning for NLP
-
Semi-supervised learning
-
Frame-semantic parsing
-
References:
Frontiers in Natural Language Processing Expert Responses.
http://ruder.io/4-biggest-open-problems-in-nlp/