As a data scientist, I’m passionate about taking complicated data and building models that can deliver insights and form the basis for new, innovative products. At Funnel Leasing, an industry-leading provider of software for the multifamily industry, I’m helping to build a new virtual leasing agent that assists prospective renters in finding their next home. By answering renters’ questions faster or booking their viewings sooner, the virtual agent can deliver a lot of value to renters and management companies, but developing it presents its own unique challenges. The biggest challenge relates to the fact that, as we’re all aware, there are many ways people can ask questions or inquire for more information. For this reason, instead of handcrafting logic to account for the many ways messages can be phrased, modern approaches rely on machine learning, which builds a model automatically from data. For the virtual agent, the data consists of a collection of renters’ messages, which are annotated by hand to indicate their intent and other valuable information. From there, an algorithm optimizes a model that relates messages to the information they contain, so that when a new message arrives, the intent behind it can be inferred. Using the model, the virtual agent is able to respond successfully to some of the most common inquiries renters make, freeing agents to perform more complex tasks.
At this point, it’s tempting to conclude that, because machine learning accepts data and uses math to generate the best model, the result is free from bias or human error. Yet with recent events drawing increased attention to racial inequality and discrimination prevalent in society, companies are taking greater steps to eliminate biases both within their organizations and in their products, and that includes AI-based products. In this spirit, Funnel Leasing is launching a new initiative, called H.O.M.E., to advance our commitment to provide underprivileged renters with access to safe and healthy housing. To start, H.O.M.E. has committed to donating a dollar for every online application completed on Funnel’s platform, but its larger goal is to combat systemic oppression present in the leasing process. In this blog post, I’ll focus on related efforts to address bias in the virtual agent. I’ll show that though machine learning utilizes the data it’s provided effectively, biases implicit in that data can make their way into the resulting model. Here’s a preview of some of the issues and how we tackled them that other organizations can learn from as they build out their AI systems with an eye toward fairness:
- Foreign language recognition. When it comes to machine learning, it’s all about the data. If important messages to respond to aren’t present in the data used to build the model, the virtual agent won’t be able to respond to those messages. We found that to be the case for foreign language messages, which only constituted 0.6% of our data (561 in 100K messages). To address this, we built in foreign language detection, together with the ability to disable the virtual agent and hand off the conversation to a real agent to handle.
- Affordable housing. We made sure to add support for inquiries about affordable housing, as well as least expensive apartments, by specifically annotating messages to identify when those intents are expressed in messages. Again, these kinds of messages only occur about 2% of the time in our dataset (see the figure below), but it’s important to be able to respond to them, especially given the economic uncertainties generated by the pandemic.
- Other ways to address bias, like fairness metrics and topic modeling. There are many other ways of tackling bias that will be explored in the future, including establishing explicit fairness metrics, and using machine learning to automatically identify major topics on the minds of renters over time. Addressing bias in virtual agents related to the leasing process is never finished, but we hope that some of the efforts so far have gotten the conversation going about this important topic.
Funnel CRM and Virtual Agent Overview
Funnel’s new virtual agent builds on the company’s decade-long success at delivering an intuitive, modern platform that enables leasing agents to manage prospective renters and guide them through the often stressful process of leasing a new home. This platform offers tools for agents to call and email prospects, but it also provides some basic ways to respond to communications automatically, including an auto-response capability for first-touch messages and a chatbot that can be embedded into management companies’ websites. To further deliver on Funnel’s promise to provide an automated leasing experience, the virtual agent will facilitate longer conversations with prospects by email and text, supporting a greater range of inquiries and providing a seamless transition to a real agent when it can’t answer a question. See the screenshot below for an example of what a conversation with the virtual agent looks like within Funnel’s messaging platform.
Pitfalls with Machine Learning and How to Tackle Them
The virtual agent is designed to recognize the most common requests and inquiries prospective renters make, like scheduling a tour or inquiring about utilities. In this way, it’s expected to deliver significant benefit to management companies on day one. However, the most straightforward use of machine learning to build the virtual agent can introduce biases in terms of the kinds of messages, and therefore renters, the agent recognizes and responds to. Before discussing these biases, I’ll briefly review how the virtual agent recognizes the various kinds of inquiries renters make.
There are many ways of extracting information from messages using machine learning, but broadly speaking, these methods are distinguished based on whether they require the messages to be annotated for specific content by hand first, or whether they only use the raw messages. Because the virtual agent is mainly intended to respond to the most common inquiries using specific information provided by management companies, it made sense to annotate the messages first to identify all instances of these inquiries and other valuable information. Below is a figure of the frequency of the inquiries representing some of the information gathered by the virtual agent.
Using the annotated messages, the machine learning algorithm optimizes a model that relates messages to their corresponding inquiries. The details vary on the exact algorithm used, but generally, successful algorithms learn patterns in the way different inquiries are phrased, while disregarding irrelevant details in the message.
As the predictions of the model ultimately reflect the data it was provided, if there are biases in the messages or in the way the annotators labeled them, those biases will make their way into the model. An example of this involving the virtual agent had to do with foreign language detection. The overwhelming majority of renter messages in our system are in English, but there are some messages written in other languages, most notably Spanish. Because these messages only constitute a small percentage overall, the model can’t learn the relationship between the foreign-language words and their corresponding content. Additionally, because annotators sourced by third-party annotation services likely don’t have the requisite knowledge to annotate these messages, foreign-language messages may get skipped and not included in the final annotated dataset. Hiring specialized annotators for different languages may be possible, but it’s difficult and can extend the data collection process. Again, even if it’s addressed, the messages still constitute a small percentage of the data, steering the model away from improving to recognize those examples. The resulting virtual agent may help most customers, but it’ll fail to service a specific, considerable part of the community. Though it might not be reflected in the final performance numbers, this is a major failing that needs to be addressed.
From a theoretical perspective, there are several ways to improve the model, like artificially inflating the proportion of foreign-language messages, but in light of the larger application in which the model is embedded, we landed on a different approach. The philosophy behind the virtual agent is that it should provide immediate feedback on inquiries it’s confident about, but disengage and hand the conversation over to a real agent when it’s not. That way, instead of taking a prospect down the wrong road and detracting from the experience, the virtual agent will make sure those clients get personalized service. This approach of handing over the conversation ensures the virtual agent provides relevant feedback while building the real agent’s confidence in the system. It also offers a natural way of addressing foreign-language and other specialized or difficult messages. By building foreign language detection into the virtual agent, we’ve enabled it to address those messages by escalating it up to a real agent through a handoff. See the screenshot below.
As you can see, when a foreign language is detected, the virtual agent is disabled, and the handoff reason is displayed to the real agent. Because the handoff reason is tracked throughout the application, the agent can also filter prospective renters by specific handoff reason. That way, if there’s a foreign-language speaker on the leasing team, that agent is able to provide those prospects appropriate service.
In addition to foreign language detection, the virtual agent supports handing off the conversation for various other reasons. This includes instances in which frustration or profanity are detected, as well as those in which the agent doesn’t recognize any inquiries or relevant information at all. Many of these scenarios may not correspond to any recognizable bias, but the hope is that this handoff mechanism can cover many instances of messages with non-standard grammar or spelling that may not be well reflected in the training dataset.
However appropriate these measures are, they admittedly only represent ad-hoc additions to a virtual agent that uses data or processes that involve recognizable biases. Ultimately, a more principled approach would involve addressing these issues at the root source of the data, or in the training process itself. I’ll discuss one instance of this in the virtual agent, then provide more general ways biases can be addressed in the future.
Recall one of the goals of Funnel’s H.O.M.E. initiative is to provide underprivileged renters with access to good housing. Because the ability to execute on this requires good and timely service from the moment a prospective renter first reaches out, in many ways the H.O.M.E. initiative begins with the virtual agent. Because the training requires a dataset annotated for specific content, this meant making sure that the one of the categories provided to the annotators relates to affordable housing. While this inquiry also constitutes a relatively small percentage of messages (see the figure above), it’s important to capture these inquiries regardless of their overall frequency. The screenshot below shows how the agent handles these messages. Additionally, since affordable housing usually refers only to subsidized housing, we’ve also added another category to address a general concern over finding the most affordable apartment. To inquiries like, “What are your cheapest one bedrooms?”, the agent can provide the most relevant feedback, ordered by price. These kinds of inquiries are especially important to respond to given the uncertainty and hardship related to the pandemic.
By providing support for common messages from possible underprivileged renters and mechanisms for handing over to a real agent, the virtual agent tries to ensure all prospects get the service they deserve, but there are several ways of improving it. Again, the best way to address biases in the machine learning model is to de-bias the dataset by accurately annotating more messages it is biased against answering correctly. These messages include not only foreign-language messages and those about affordable housing, but all sorts of non-standard messages the agent should be receptive to. In the absence of this careful process of annotation, the final proportions of underrepresented messages can be artificially increased before training, but in general, there’s no good substitute for assessing vulnerabilities in the dataset and re-annotating to address them. In the remainder of this blog, I’ll discuss two general tools that can be used in combination with that strategy to reduce bias in a conversational agent.
Introduce Fairness Metrics
While we all possess an instinctive sense of what’s fair, it’s more difficult to distill it down into explicit criteria for what constitutes fairness that a computer can understand. Traditionally, machine learning algorithms optimize performance in terms of overall accuracy, or the number of false positives and negatives, but if they are to produce models that lead to fair treatment across different groups, it’s likely they’ll have to take into account other metrics that explicitly calculate fairness. What makes this difficult is that there isn’t a single definition of what constitutes fairness even within a given application. For the virtual agent, one way of assessing fairness might involve penalizing poor performance on messages from underrepresented groups, but which messages to penalize and to what extent is up for debate. The point is not to devise one authoritative definition of fairness for all time, but to bring awareness to the issue, open up a debate about how to tackle it, and generate consensus about possible solutions. What if presentations of machine learning results included “fairness” as a metric alongside the more common, “F1” score? More importantly, what would that mean for future renters in terms of broadening housing access for all?
Discover What Renters Are Talking About
In many ways, the bias problem outlined above was actually introduced by the process of annotating messages with a given set of labels. As we’ve seen, annotators may fail to label certain classes of messages correctly, or the labels they’re provided may not reflect the most important categories from the standpoint of fairness. To address this, the second major approach to machine learning touched on above can be used. Recall in this approach, information is extracted from messages directly without requiring annotating them beforehand. Instead of mapping messages to labels, the algorithm groups together messages with similar content, allowing distinct topics to emerge directly from the data. Below is an example of 20 topics extracted from a collection of 100 thousand messages from prospective renters. The circles on the left indicate the topics, with sizes proportional to their frequency in the data. For the selected topic in red, the most common words contained in messages tagged as that topic are shown on the right. As you can see, this topic is dominated by occurrences of words like “job” and “employment,” as well as “pandemic,” “medical,” and “family.” The learned topics don’t always have obvious interpretations, but the appearances of these words are a strong indication that this topic relates to a general concern about the ability to provide for rent or engage in the leasing process during the pandemic.
While there may not be an appropriate way for the virtual agent to respond in all instances of a given topic, extracting information from messages directly shows us what the concerns of actual renters are. That kind of information will be important as the virtual agent matures and takes on more kinds of conversations. For example, the topics can be used directly by the virtual agent to provide the most relevant information, or they can be used to create a revised set of labels that annotators can use to more faithfully label messages. This hybrid approach of using results from both labeled and unlabeled data has the potential to improve overall performance, while expanding the range of support the virtual agent can provide. A virtual agent that knows about the concerns of real renters is better able to provide support for all renters.
I just touched on two methods that can help alleviate biases in machine learning models, but there are many other philosophies and tools for tackling this problem. For more information, please see the references at the end of this blog.
Inspired by recent events, Funnel Leasing launched H.O.M.E. to advance the commitment to provide underprivileged renters with access to safe and healthy housing. H.O.M.E. has agreed to donate a dollar for every online application completed on Funnel’s platform, but it’s also helping to make sure its other products help underprivileged renters and don’t exhibit biases. The virtual agent is a new Funnel product that is helping to make the vision of an automated leasing process a reality. It has the potential of assisting the majority of prospective renters, but care must be taken not to introduce bias toward any single group. Unfortunately, a straightforward application of machine learning leads to biases on which kinds of messages the virtual agent recognizes. In this blog, I showed a few ways we addressed these in the cases of foreign languages and affordable housing. However, there are many other ways of reducing bias in future versions of the virtual agent, including optimizing fairness metrics and incorporating feedback from dominant topics discussed in renters’ messages. Since access to safe and healthy housing begins from the very first message, in many ways the H.O.M.E initiative begins with the virtual agent. As that initiative clarifies, eliminating bias is a continual process. As the virtual agent services more and more users, we’ll continue to reduce biases to ensure that Funnel’s virtual leasing agent is an agent for all renters.
 Jake Silberg and James Manyika, “Tackling bias in artificial intelligence (and in humans),” McKinsey Global Institute, June 6, 2019. https://www.mckinsey.com/featured-insights/artificial-intelligence/tackling-bias-in-artificial-intelligence-and-in-humans
 Jerry Wei, “Bias in Natural Language Processing (NLP): A Dangerous But Fixable Problem,” Toward Data Science, September 1, 2020. https://towardsdatascience.com/bias-in-natural-language-processing-nlp-a-dangerous-but-fixable-problem-7d01a12cf0f7
 Marco Peixeiro, “Introduction to Natural Language Processing (NLP) and Bias in AI,” Toward Data Science, April 22, 2019. https://towardsdatascience.com/introduction-to-natural-language-processing-nlp-and-bias-in-ai-877d3f3ee680
 Charlton McIlwain, “AI has exacerbated racial bias in housing. Could it help eliminate it instead?”, MIT Technology Review, October 20, 2020. https://www.technologyreview.com/2020/10/20/1009452/ai-has-exacerbated-racial-bias-in-housing-could-it-help-eliminate-it-instead
 Kawin Ethayarajh, “Measuring Bias in NLP (with Confidence!)”, The Stanford AI Lab Blog, November 11, 2020. http://ai.stanford.edu/blog/bias-nlp