Data is an incredibly valuable resource – if we are able to make use of it. Analyzing text corpora becomes increasingly important for Data Scientists, not least because of user generated content with the advent of web 2.0. In order to facilitate the access to NLP for Data Scientists, Philipp Thomann and Jürgen Schwärzler from D ONE are offering an easy approach with their open source library NLPeasy. At the conferences SDS 2019 and SwissText 2019, they presented NLPeasy to the public for the first time.
Data makes the world go round – if we are able to process and interpret it, that is. Text Analytics and Natural Language Processing is used in many commercial and industrial sectors; for process optimization, decision support, but also for the development of new products and services. For several years now, this field is rapidly advancing and adapting. And not without reason. Not only the methods are progressing, but also the data needed to feed the systems is growing at an incredible pace.
The age of contribution: everybody is a publisher
Since web 2.0, we are living in the age of contribution – everybody has become a publisher: the user no longer remains in the role of consumer, but becomes a content provider. We are also given the opportunity to not only generate and contribute our own content, but react to that of others: expressing our opinions and sharing our thoughts – we post updates on Social Media or write reviews on platforms such as TripAdvisor or comment on blogs or newspaper websites.
How to make use of the data available
This means that there is a vast amount of data available – how to harness the power of this user-generated content with Search and Machine Learning is another question. NLPeasy, developed by Philipp Thomann and Jürgen Schwärzler from D ONE, is a perfect tool for Data Scientists not particularly familiar with Natural Language Processing on an expert level. The open source library facilitates access to NLP by building the needed pipelines in an easy way.
NLPeasy in action
In one of the demo sessions at SwissText 2019, Philipp Thomann and Jürgen Schwärzler showed how exactly NLPeasy works. The crowd gathering around the booth of D ONE could watch in real time how the application works, as well as with an uploaded data set from a file as by analyzing restaurant reviews on TripAdvisor for Swiss cities Zurich and Berne and comments on articles on the digital news platform “Inside Paradeplatz”.
NLPeasy is a product of the commitment of D ONE to facilitate rapid prototyping. For this goal, the company is open-sourcing their work spanning the ElasticSearch/Kibana stack (index, search, visualization), spaCy (named entity recognition, syntax analysis, word embeddings), Vader (sentiment analysis), and others.
An auto-generated Dashboard out of the NIPS-example.
The conference SwissText 2019 brought together practitioners, analysts, end users, software vendors, Data Scientists and researchers in the field of Automatic Text Analytics, to give an overview of already existing solutions and applications and to inspire ideas for new and innovative projects. SwissText 2019 featured 3 high-profile keynotes and over 40 presentations and posters form industry and research and also hands-on Demo Sessions. D ONE is one of the proud sponsors of this yearly conference.
NLPeasy on GitHub:
Video: https://youtu.be/2DZx3sXKbVw (Audio only available after minute 7)