Big data in its organic form is, well, just data — neither good, bad nor stimulating. In fact, it’s meaningless without interpretation. But when collected on a massive scale, with purpose, it takes on a much more significant title: big data. Big data projects are more widely accessible to the public. And many people are using their contents for social good.
Big data becomes valuable once it is processed, analyzed, and applied towards a project or initiative. It offers power to those who use it strategically. In fact, trained professionals analyze big data and before creating products and services that have a lasting impact on society and business alike. We call this good data.
Data must go through a process of transformation, similar to organisms in the natural world, before they become valuable to an ecosystem – much like a caterpillar becoming a butterfly. We can’t visualize the beauty of its mature form until we craft it into something meaningful.
Even so, data, much like the rest of the earth, is currently under the influence of mankind in a way that most often benefits a company’s bottom line. learning through from discovery, more and more business entities support and harness the power of big data as an act of social responsibility. Now, this is what we call good data.
We recognize that good data is subjective, but these modern projects serve as a glowing-north star for the positivity that big data can offer to both society and business. The following ten examples of big data for good offers a glimpse into the realm of decision making inspired by data that might otherwise be missed.
Monitoring the Numbala Reserve With Big Data Projects
In 2006, Nature and Culture International (NCI) purchased 1,260 acres of rainforest in Ecuador that was previously approved for logging use. Since the purchase, NCI has almost doubled the size of this reserve to an estimated 2,552 acres. Within this dense forest lives a tree only found in one region of the world — the “romerillo.” The native wood brings top dollar in the illegal market, so much so that it’s valued at over ten times that of the average monthly income for rural Ecuadorians. And although modernized-government intervention has established legal boundaries, the illegal timber trade continues in this forest.
In 1997 Global Forest Watch was created to serve as a monitoring network over the impacted region. Organized by the World Research Institute, the project has been live for nearly twenty years by way of data sharing and visualization. Global Forest Watch allows the NCI to monitor the Numbala Reserve with interactive maps. Collaborations such as this are a tribute to the power of data sharing among conservation groups.
This project is made possible by partner satellites that cover the Earth’s surface, transmitting sorted data in seconds to maps on the Global Forest Watch website. It’s a firsthand resource for users who normally rely on foot patrol to gather data, then slowly analyze it over time — often too late to take action against threats.
Not only can this satellite data be accessed on a global scale, but it also reduces risk to both the public and business sectors. And it’s cheap! To read the full story, visit Western Digital’s “Data Makes Possible” blog, and see how big data is driving change for conservation organizations.
Stabilizing Fisheries Management with “Aquagenomics”
A leader in genomics research and solutions, Illumina is accustomed to operating in unison with big data. But did you know that their solutions transcend into the realm of fisheries management?
As the company explains in their article titled, Seafood Offers Opportunity to Feed Growing Human Population, the global population is predicted to reach 9.7 billion people within the next thirty years, and our ocean environment is going to play a key role in sustaining a food source for this massive amount of people. To combat this global challenge, Illumina is using an application they call “aquagenomics.”
Current commercial fishing production relies on data to estimate fish stocks across the globe, but this system has proven to be mediocre at best. Using their DNA-barcoding approach, Illumina Technologies has introduced Next Generation Sequencing (NGS) as a more accurate and faster solution for identifying fish eggs. This technology has benefitted both the wild-caught and and farm-raised fishing industries by increasing the amount of samples that can be processed, opening the door for more data to be collected and analyzed.
Illumina’s genomics solutions produce massive amounts of data — tens of thousands of genes are processed at any given time. When thinking about the gene interactions that biologists must account for in the field, it’s easy to see how old data collection and analysis could be overwhelming when looking for favorable traits among fish stocks. With the global population of humans showing no signs of slowing down, this modern approach to leveraging big data projects is a positive sign for fisheries management. As Illumina’s scientists explain best, “While the value of an individual fish is nowhere near the value of a cow or a pig, the value of broodstock families is quite high, and the use of genomic tools on the broodstock shows great promise.”
Street View Cameras Sniff Out Pollution
Three Google vehicles were commissioned to the greater Denver area with the objective of taking more than street-view photos. Aside from cameras, these cars were outfitted with environmental sensors to detect nine major air pollutants. Already in route to San Francisco, Google and Aclima (the sensor producer) intend to open source their data recovery for public access.
The sensors are used for months on end, collecting large quantities of environmental data samples in order to best measure air quality. During the company’s testing period alone, they were able to deploy 500 sensor units (each composed of 12 sensors) to detect and record many different air-quality components. Alima’s CEO recognizes that users need reliability before they will use this data as a daily tool for decision making. Therefore, he partnered with the EPA and the Environmental Defense Fund to perfect their instruments.
As this mapping project continues, all parties hope to design a vast network of sensors available to the general public, or anyone interested in contributing to monitoring pollution. With tens of thousands of sensors — if not more — in the field, the project can begin to crowdsource this big data for numerous causes, mapping the findings for digital devices to display. This is yet another example of data being used for ‘good’, or in this case, data for climate action.
Big Data Projects for Cancer Research
What if data scientists could lead researchers on the path to curing cancer? After all, is the gap between biology and IT really that large? For Dr. Bissan Al-Lazikani, head of data science at The Institute of Cancer Research, the answer is no. Within her current research, Dr. Bissan Al-Lazikani identified that biology has produced unthinkable amounts of data over time, and analyzing this data using computational analysis has never been more important in the role of cancer discovery.
In her role, Al-Lazikani has access to huge data-sets collected from cancer patients during treatment. In a discussion with I-CIO by Fujitsu, Al-Lazikani explains, “We are now capable of collecting amounts of data that we never thought were possible before.” In fact, to tailor treatment to each patient, her team estimates they could collect 50 terabytes of data per person — more data than the Hubble telescope produces in years. Fortunately, Al-Lazikani and her team refer to themselves as “data hungry,” and look forward to innovating cancer treatment through their discovery. To learn what the team is doing with this large amount of data, continue reading How big data analytics is transforming cancer research, from I-CIO.
Turning The Tide Against Modern-Day Slavery
Technology is commonly exploited by criminals. It’s a fact. But as Teradata believes, our world is interconnected, and big data — as well as data analytics — present prime opportunities to monitor vast crime networks that would otherwise go undetected. Human trafficking is an example of some of the worst criminal activity on earth that can now be detected with the help of data. Recently, we read about San Antonio, Texas, in the news and were reminded just how ‘real’ of a problem human smuggling is in this country.
In Teradata’s latest article on big data, they reminded us of the immense power we have when using big data projects for social good. Human trafficking in particular, leaves massive data trails for people and machines to analyze and act upon, something a handful of organizations are doing now.
The Polaris Project in the U.S., LaStrada International in Eastern Europe, and Liberty Asia — aided by a grant from Google, launched The Global Human Trafficking Hotline Network using big-data technology to collect, analyze, and warehouse critical information to prevent crime and disseminate information to the public. As a leader in data and analytics, Teradata is helping such organizations leverage data for good.
Predicting Pollution Hazards In New York
Predicting hazardous waste violators in New York City is a daunting task for a single agency. Although we like to think of this waste in a cartoon-like manner, it’s often more subtle and includes many non-detectable toxins which quietly permeate the environment.
Just two years ago, the Animas River in Durango, Colorado was flooded with three million gallons of toxic mine waste due to a miscalculation by the EPA at a nearby toxic cleanup site. Although this spill changed the ecosystem irreversibly, it also shed light on an immediate need for technology-monitoring systems, especially in densely populated cities. This served a direct call for big data tools.
Most recently, New York State’s Department of Environmental Conservation (NYSDEC) pointed out that events like the one in Colorado happen more often than we hear in the news, and usually go undetected. As a state regulatory agency, NYSDEC currently performs 700 inspections a year on waste facilities across the state. This might sound acceptable, that is, until you learn of the 25,000 plus facilities that must be inspected in New York alone. And the agency only employs three inspectors! As you can see, this isn’t an adequate form of public protection against water and air contamination.
In response, NYSDEC has recruited machine learning to tap into multiple public data sources, and search for attributes which are then fed into a model for inspectors to use. Not only can environmental agencies use data modeling to predict where the next threat will come from (thousands of threats in reality), they can use this data to plan hundreds of visits, all while having pre-loaded data on the troubled facility. Let’s look at an example in this brief presentation from DSSG Data Fest.
Mapping Risk With The Aqueduct Global Maps
Water quantity, water variability, and water quality; public awareness of water issues, access to water, and ecosystem vulnerability are all critical factors to nearly every community and business around the globe. World Resources Institute has observed a need for robust data around the physical, regulatory, and reputational water risks to companies and their investors. More importantly, as the world’s resources become more scarce, comprehensive data is needed for organizations to properly assess the water-related risks in their supply chain.
Responding to this demand, World Resources Institute created the Aqueduct Water Risk Atlas: A digital service mapping twelve key indicators of global-water risk. From both public and research-based sources, the risk atlas aggregates massive amounts of data to model both current and potential hazards across the world. As an interactive tool, the public can freely access this comparison which visually interprets the data to better understand regional differences for risk and opportunity.
Video: http://www.wri.org/our-work/project/aqueduct/about
University Students Design a Better Mass Transit
When you think of Seattle, a bustling mass-transit system doesn’t come to mind. We save that thought for New York City. But this may change in the future as students use ‘good’ code and ‘good’ data to create Seattle’s next mass transit solution. A summer program at The University of Washington, Data Science For Good brings students to the city where they work with experts who connect them to data and tools.
With a theme of “urban science,” students analyzed years of data collected by authorities, municipalities, and contractors to introduce a modern-day transit solution: ORCA (One Regional Card for All). Even more impressive, this solution was created in just ten weeks and it produced more data than any prior projects; it’s all useful data as well, that can be augmented for passengers or transportation employees. For example, companies can see what percentage of employees are commuting via mass transit and compare this trend over time. Impressive projects like this remind us how impactful data can be on our current systems as long as we continue to design new applications and visualization tools.
Read the complete project report on TechCrunch: Student projects leapfrog governments and industry in ‘Data Science for Social Good’ program
Saving the Amazon With Wisdom of Crowds
Tracking rampant deforestation takes an army. But not the army you might be visualizing. It requires large volumes of data, which can be created by anyone or everyone who shares a common goal: protecting their rainforest, or better yet, their livelihood from permanent destruction.
An initial crowdsourcing website deployed from International Business Machines (IBM) served as the foundation for strategy and marketing initiatives aimed at guiding landowners in accordance with government code. Although helpful, Brazilian officials needed more to successfully prevent illegal activity from happening in the first place. This led to the introduction of PAM (Municipal Environmental Portal App), a tracking system for land-ownership records and land use.
But implementing PAM meant two things: Brazilian officials now had more data than they were accustomed to; and they were forced to develop innovative measures for creating a remote Wi-Fi network. Using the systems data collection abilities, they surveyed 400,000 colleagues to gather the best feedback on a possible communication network. The result: a solar-powered drone broadcasting network that manages nearly all of the region’s land-ownership and conservation-related data needs. Although it was only a starting point for conservation, it’s a huge leap for technology innovation in the region. And a great example of what good data can accomplish.
The Animal Kingdom Needs Even Bigger Data
Collecting, managing, and analyzing biodiversity and climate data from 16 sites on four continents – a large ask for a team of scientists. When Hewlett Packard (HP) spoke with researchers and scientists at Conservation International (CI), these were their biggest challenges. Upon hearing this, the team at HP (who were interested in supporting biodiversity research) realized they could design a first-of-its-kind data system to assist CI. For example, this data system would excel at collection — working 9 times faster than scientists can do themselves. In a Q&A on the project, HP Vice President and Chief Progress Officer Gabi Zedlmayer shared her support for the big-data tool properly coined, “Earth Insights.”
Collecting good data from over 1,000 cameras and sensors on the ground, Earth Insights is pushing out warning signals to scientists before it’s too late to take action on the cause. Not long after its implementation, Earth Insights had already produced 3 terabytes of data and over 1.4 million photos, not to mention all of the climate readings also recorded. Offloading the burden of recording and managing the data has opened the door for researchers to do what they do best: design solutions in response to data. And with the HP vertical platform, data processing speeds were increased by nearly 90%. It’s no secret that big data can have an immense impact on society and our business-driven world. But this collaboration is showing us that the environment can be on the receiving end of technology benefits.
Big data projects are no longer exclusive to IT teams. They are rope for the making, ready to solve a wide array of problems. From machine learning to aggregation and analytics, big data makes strategies that were once impossible, possible. And with these possibilities comes an increase in using big data for social ‘good.’