Together with Philipp and Jürgen from D ONE, I had the chance to teach 20 participants at this year’s virtual SDS conference about Natural Language Processing (NLP) with the python package NLPeasy.
Philipp has written this python package to make a data scientist’s work with NLP easier. It is a wrapper for other NLP packages such as VaderSentiment and SpaCy, and builds a bridge to Elasticsearch and Kibana. Elasticsearch is a distributed full-text search engine on schema-free JSON documents, while Kibana is a dashboarding tool that can read data from Elasticsearch efficiently. The setup of Elasticsearch and Kibana is made simple by running them in Docker containers. Like this, the package can be used without tedious pre-installations.
NLPeasy is a great starting point for any data exploration journey that works with text data.
In this pre-conference workshop, we taught participants the main methods that are used in NLP. Because of the virtual setup, it was not straightforward to involve participants in discussions as well. This is why we chose to deliver content in two modes. First, Philipp, who is trained in both Mathematics and Linguistics, explained the theoretical aspects in a classroom setting. Then, the participants split into smaller groups and entered breakout rooms. We helped the participants try out how NLP methods work in practise. For example, participants looked into how different words are represented by vectors. Also, they created and visualised syntax trees of different sentences.
We did not want participants to get lost during the workshop, so we tried to limit technical problems as much as possible. The approach is highly recommendable: D ONE hosted VMs on a Binder Hub, so that participants did not have to follow lengthy installation protocols before the workshop. With one link, people could connect to the hub, where their own machine was instantiated. And off they went to start with NLP! For those interested, the NLPeasy tutorial can also be found on D ONE’s Github account.
After discovering NLP basics, participants explored in the first tutorial how NLPeasy can be set up. To this end, we used a freely available, anonymised dataset of a dating website. It included text answers to profile sections of the app, such as “Describe yourself” and “What are you doing on a typical Friday night?”, as well as more structured information like city, languages, and age. The participants did some basic feature engineering, and then defined the pipeline steps they wanted to include, such as the text columns used for sentiment analysis. Then, they ran the enrichment on the dataset. They loaded the enriched dataset into Elasticsearch and a basic Kibana dashboard was created automatically, all using NLPeasy out-of-the-box functions.
Overall the NLPeasy workshop was a great experience for both participants and hosts. With the workshop being held remotely, also people from farther away could participate, such as one participant joining from King Abdulaziz University in Saudi Arabia. It was fun to have such a diverse audience. Participants walked out of the workshop equipped with a new toolset to not be afraid of textual data analysis anymore, but to dive right into it. We will host this workshop again in the future, on site or remote.
Sparked your interest?
- Introduction to NLP: We are happy to tailor an introductory NLP course to your (or your company’s) needs, so feel free to reach out to us any time.
- Remote workshop hosting: If you are planning to host a virtual workshop yourself, ask us for details on how to enable a stress-free and productive technical set-up.
- Contributing to NLPeasy: Review our code at the Github repo and let us know of any feedback.
- Follow us to stay up to date on where we might host the workshop next.