Courier: Many emails, one killer app

Paulo Malvar
5 min readOct 4, 2017

The beginning

When Dane Baker and I started thinking of building a system to combat email bankruptcy, I knew this was going to be a project that would need a significant amount of development effort to succeed.

- “So how long do think it will take us to build something like this?”, I remember Dane asking me some time in mid 2013.

- “I’m not sure, Dane. Three years?”

- “Three years?!?”, my answer had to be most certainly terrifying for Dane.

- “Yeah, I’m sure there’s a lot of tech that I can’t even think of right now that will need to be built.”

We had other projects in hand, so we decided to focus on those projects and to slowly start building the infrastructure of Courier. Of course, we knew this project had the potential to be the jewel of the crown, but the prospect of spending years developing the tech needed for Courier was a scary prospect.

Email is complicated

“OK, so let’s start from the beginning? What is email?” I asked myself many times during the summer of 2013.

Four years later, after many lessons learned, it turns out that email is many things. There’s no magic formula for building a system that is able to comprehensively extract important information from “email”.

For starters, email is a channel of communication that imposes a set of inherent constraints and collectively accepted norms. Humans and machines use this channel to send back and forth documents that contain information formatted in the most diverse ways.

How can one apply a universal set of rules or mathematical operations to summarize vastly different types of email, like those daily emails you get from Medium with the most popular pieces of the day and that email that your boss sent you last month, as long as a day without bread, to let you know of the progress on an important project?

Is there a common strategy or technique that can be applied to summarize a promotional email from Monoprice or a purchase confirmation email from Amazon for the nicest slow cooker money can buy?

The answer to these questions is always the same: “No.”

Having already established that email is just a medium of communication, the next step is to come to the realization that working with email means working with a diverse collection of genres and text types.

Many genres, many summarizers

Coming back to 2013, when I started working on email summarization, I focused on putting together a strategy to summarize conversational emails, that is, text-based interactions among humans in which conversation turns are not synchronized in space and time.

The natural language processing (NLP) team started to grow as I slowly began to realize of all the tech that had to be built from scratch. We need more horse power!

Distribution of botmail and conversational emails in Courier’s database as of September 29th, 2017.

After a year of development, with many NLP analysis modules in the works for conversational email, Russ Smith joined Codeq. Russ had a ton of experience working on email at Yahoo! and he made clear from the beginning that we had been working on a small subset of the whole email challenge. Around 80% to 90% of all email in the world is botmail (name that we came up with at Codeq to refer to machine produced email) and, since we needed to provide a comprehensive experience, we had to develop strategies to process and summarize this prominent type of email as well.

We couldn’t apply to botmail the same strategy that we had come up with for conversational email. We just couldn’t. It was clear to us that an extractive summarization algorithm that modeled email as as free asynchronous conversational text wouldn’t work for Twitter notifications, Meetup weekly digests, or App Store purchase confirmations.

Many of the botmail text types required a paradigm shift if we wanted to successfully make sense of them. So we did it. We came up with an abstractive summarization approach designed to find pieces of information from botmail that put together correctly tell the story that each botmail is trying to convey. Product names, transaction number, shipment dates, flight legs and prices are little pieces of information that can be collected in order to put together summarization cards that help users see at a glance all the need to know about some botmails.

But we still needed to deal with other type of machine emails, promotional emails that contain a short narrative trying to persuade users of the benefits of upgrading to the new version of their favorite app or to let you know of the latest update to the Privacy Policy of a cloud service that they use. Another type summarizer had to be built for that.

Not having the option to avoid summarizing any type of email, what can one do if an email doesn’t fit any of the predetermined email categories or if the workflow fails for whatever reason? Then you generate a dummy summary, in other words, you just pick the first X numbers of characters and call it a day.

“It can’t be done.”

Working on Courier for the past 4 years has been an amazing journey. But arriving at where we stand right now wasn’t easy.

Learning sometimes is painful and oversimplification of problems leads to many road ends (the opposite, overcomplicating already complex problems, is true too).

But research, experimentation, patience and laser-focused determination pay off, even when others have said, “It can’t be done.”

Thanks NLP team!

Get Courier!

--

--