Sixteen years after it was originally coined (vid van Rijmenam, 2013), the term “Big Data” has slowly fizzled out from our vocabulary. There was a time, though, in the early 2010’s when it was all the rage. But make no mistake: “Big Data” is still here and it is not going anywhere. Behemoths like Google or Facebook (or totalitarian regimes like China, for that matter) still use unfathomable amounts of data to get insights and distill knowledge from billions of data points that people leave behind like breadcrumbs as they live their everyday lives.

Most Machine Learning (ML) algorithms, and…

A realization

The rise in interest in research and development of tools to help detect and combat the growth of online abusive behavior and discourse is clearly observable by the amount of academic work produced over the past two decades.

When I started getting ready to work on developing Codeq’s abuse detection module, I had to spend a considerable amount of time reviewing as much existing literature on the topic as possible. Today, when I was going over all the papers that I had gathered to prepare for this post, I realized that, whereas in the early 2000’s and 2010’s there were…

Leverage the past to build an awesome future

Late 2017 I wrote an article discussing the project we at Codeq were at the time working on: Courier.

Courier was an email processing system that analyzed incoming email messages to extract relevant information and summarize the content of emails. As an email client with magic NLP powers, Courier’s main objective was to ease the burden of information overflow in the form of email by, without the need of opening emails messages, presenting relevant bits of information and informative short summaries condensing the meaning of those messages.

Internally, Courier was powered by a myriad of NLP modules, some of which…

Courier features a complex Natural Language Processing and Understanding system that allows users to spend less time reading email.

When we started working on this project we only had handful of tools, like our tokenizer and sentence splitter; so we had to build pretty much the entire system from the ground up. Our guiding philosophy is that we build all modules that are essential and for which we have the necessary resources and knowledge. …

As an extension to our speech act classifier that we introduced in a previous blog post, in Courier we have implemented a module that further categorizes information requests, that is, questions, into different types.

This module has three main purposes:

1) identify questions in emails that need to be answered by a user when responding to email conversations,

2) provide finer-grained information about the nature of sentences to our email conversation summarizer; and

3) allow for more advanced post-processing rules that select relevant sentences to be included in email summaries

Training Corpora

To train this question type classifier we extracted information requests…

As discussed on a previous post, capturing pragmatic phenomena, that is, phenomena that go beyond the realms of morphology, syntax and distributional lexical and compositional semantics, is crucial for the success of natural language understanding (NLU) projects.

On this post we would like to discuss our effort in training a Machine Learning classifier that is able to detect sarcasm in textual conversations.

Sarcasm is an indirect act of speech “in which speakers convey their message in an implicit way. […] The inherently ambiguous nature of sarcasm sometimes makes it hard even for humans to decide whether an utterance is sarcastic…


Last year (2016) we worked on training an emotion classifier that we integrated into Courier. Initially this classifier was trained using the Support Vector Machine (SVM) algorithm provided in scikit-learn. Since then, this classifier has evolved and it has been trained using a Deep Learning architecture that we put together using the fantastic combo that is Keras+Tensorflow. For a detailed discussion of our emotion classifier you can read my previous post entitled “Learning Emotions from Reddit.”

Even though this post is based on the SVM version of the classifier, we still think that it would be fun to tell the…

As pointed out in a previous post, conversational emails are defined as text-based asynchronous conversations. Drawing from John L. Austin and John R. Searle’s body of work on Speech-act theory, conversation utterances can be analyzed and classified, from a pragmatic point of view, according to their illocutionary force, that is, their intention and their effect in the world.

Using Stolcke et al. (2000) taxonomy of speech acts, we have created a simplified and refined list of speech acts that we can apply to sentences in emails so that we can exploit this information in order to produce email summaries that…


Everyday, our inboxes are flooded with a ton of computer-generated messages, what we internally call ‘botmails’, letting us know about updates for services we use, purchases we have made, promotions trying to convince us to buy the latest and shiniest products, etc. However, many of us also use email to communicate with our family and friends as well as a our co-workers. This type of communication is what we call “conversational email.”

In conversational email, just as in any other type of human-to-human form of communication, emotion is an essential aspect that needs to be taken into account when trying…

Pragmatics in Courier

When we started working on Courier’s conversational email summarizer, we knew that just using an extractive summarization approach wasn’t enough. At Codeq, we understand that the way people construct a discourse to communicate meaning is very complex because there are several levels of linguistic phenomena that interact transversally.

Computational modeling of language morphology and syntax, even though they are far from being “solved”, are well researched and understood phenomena in natural language processing (NLP). …

Paulo Malvar

Chief Computational Scientist @ Codeq

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store