Sergey Nivens - Fotolia

Analyzing big data is big science at Change.org

For Change.org, managing big data and analyzing results is a big science problem that technology is helping to solve.

It can be challenging for organizations to weave big data into new applications in a meaningful way. It's one thing to create a new dashboard for executives, but quite another to drive user interaction toward change in the real world. At the Chief Analytics Officer Forum in San Francisco, Andy Veluswami, director of Data Science and Analytics at Change.org, talked about how a small team of developers and data scientists built a new kind of social network for effecting change.

Change.org allows anyone to start a petition to make the world a better place. It has grown to over 150 million people in eight years. Many of these petitions have had a real impact including: protecting the right to vote, ending genital mutilation in Uganda, overturning unfair convictions and providing medical treatment to children in Spain. Anyone can start a petition. The platform makes it easy for supporters to create virality through social media channels like Facebook. "Every victory inspires more petitions, creating a virtuous cycle of change," Veluswami said.

The service grows by analyzing big data, connecting people interested in specific types of causes and bringing them back when related causes emerge. For example, a large number of people signed up for a petition last summer protesting the killing of 10,000 dogs at a festival in China. The campaign quickly attracted 10 million signers. This didn't stop the dog massacre. But later on, many of these petitioners were quickly enrolled in another successful campaign to end the transport of exotic hunting trophies.

The big science of analyzing big data

Every victory inspires more petitions, creating a virtuous cycle of change.
Andy Veluswamidirector of data science and analytics, Change.org

The data science team at Change.org is broken into groups for quantitative analysis, machine learning, data engineering and content science. The quantitative analysis group focuses on gathering statistical data. The machine learning team focuses on modeling. The data engineering team makes sure it cleans and enriches the data warehouse. The content team deals with the big science tasks of updating the taxonomy required to correlate different words for the same concept.

The core application leverages a campaign analytics service for analyzing petitions, which can be queried via APIs. For example, it developed integration with Slack chat service to share alerts about spikes in campaigns. Veluswami said, "When we put it online, everyone in the company started joining in. It was a great way for us to do analytics evangelism as it was happening."

Once a new spike has been identified, different people in the company can start to think of how to drive the campaign or connect it to related ones. The information about topic growth also helps guide the Change.org management team in identifying verticals to invest in. For example, they know that animal rights issues do well in terms of virality. This allows them to enhance their focus on these areas to encourage virality and the growth of the platform in general.

The back-end engine also makes it easy for users to connect others to a campaign. Change.org uses a collaborative filter to look at a social graph of how users are connected by petitions they have signed together. These are rearranged into clusters to identify others that might be interested in a similar petition. A similar algorithm powers the recommendation engine to allow petitioners to identify related causes that might be important to them.

Implement rich taxonomies to connect the dots

A key part to analyzing big data and big science infrastructure is the development of a rich taxonomy. The core taxonomy set involves analytics, natural language processing and human curation. Veluswami said that different areas like sports, politics, human rights, animal rights and entertainment have different behaviors. Politics and sports fans tend to have long-term retention, but engagement comes and goes. Entertainment petitioners tend to create petitions that grow quickly. Some topics are easy to classify. If a petition has the word "dog" or "dolphin" it is likely related to animal rights.

But other words are harder to classify. For example, the data science team has abstracted human rights into higher fidelity words that change over time like "migrants," which are now being replaced by "refugees." There are also polarizations that are important to address. "Gun rights" and "gun safety" petitions appeal to very different audiences. Veluswami said enterprises can leverage many of these principles to improve product recommendation engines and customer experience.

Next Steps

Resolve big data problems with Amazon tools

Integrate big data with Apache Spark

Information ethics questioned in big data collection

Dig Deeper on Software development best practices and processes