As big data initiatives gain steam at organizations, many companies are creating “data lakes” to provide a large number of users with access to the data they need. And as with almost every type of new IT initiative, this comes with a variety of security risks that enterprises must address.
Data lakes are storage repositories that hold huge volumes of raw data kept in its native format until it’s needed. They’re becoming more common as organizations gather enormous amounts of data from a variety of resources.
The growing business demand for analytics is helping to fuel the move to large repositories of data. And data lakes are likely to take on even more significance with the growth of the internet of things (IoT), in which companies will gather data from and about countless networked objects.
“Businesses and consumers are creating data like never before,” says Mohit Aron, founder and CEO of data storage company Cohesity. “In turn, the number of siloed data lakes has exploded, meaning that enterprises are faced with the challenge of protecting separate security perimeters around each data lake.”
For many companies, the promise of data science “is something that simply cannot be overlooked,” says Roger Hockenberry, CEO and founder of Technology and Management Consulting firm Cognitio Group.
“For the executive, the idea of gaining competitive advantage, unique insight and anticipatory intelligence is compelling,” Hockenberry says. “However, in order to generate these outcomes the data scientist is advocating for a data lake. This lake is a combination of proprietary, open source and other datasets that can be analyzed in unique ways.”
It can also be a major target for cyber criminals. “Hacks into data lakes are a continual threat, one that is exacerbated by the large number of data lakes that enterprises have,” says Aron, who as a former Google engineer and lead architect of Google File System 2 has helped build and maintain some of the biggest data lakes in the world.
Considering the high business value of these information resources and the growing risks, security and IT executives need to make data lake security a high priority. To begin with, there needs to be an understanding at the highest levels of the organization of the need to protect data stores to the greatest extent possible.
Unfortunately, this doesn’t always happen.
“The appeal of increased agility, reduced costs and removal of silos cause many organizations to jump head first into the data lake and ignore basic information governance best practices at their own peril,” says Jonathan Steenland, principal at Zyston CISO Advisory Services, where he is responsible for co-leading CISO advisory and consulting.
“Since data lakes are such a data rich target, hackers will prioritize their efforts at exploiting these types of technologies and the users who connect to them,” says Steenland, who previously served as CISO at Fujitsu.
Data lakes should be managed as a highly valuable corporate asset, Hockenberry says. “In many cases, executives look at this as a ‘tech problem,’” he says. “However, a data lake should be seen as corporate IP [intellectual property] and if someone gains access to it, they could see strategic information that could affect shareholder value, compromise [research and development], and reveal plans and intentions that can create issues for a company.”
The best way to address these issues is to understand what data the enterprise is collecting, how it’s being analyzed, protected and disseminated, Hockenberry says. Business, IT and security executives need to build data-centric risk management strategies to ensure information is protected no matter where it resides, he says.
Hackers, cyber criminals and other bad actors are sure to go after large data stores if they think there is something to gain from these resources and if they sense they are not adequately protected.
“Because of the data they contain, they may be seen as a great target—someone could steal much of the most important and sensitive data that a company owns by stealing the contents of a data lake,” Hockenberry says. As such, one of the biggest risks companies need to be aware of is ransomware, which brings the possibility of costly denial-of-service attacks. “The denial of use of corporate data can be far more damaging than simply stealing it,” he says.
The most important security functions with regard to data lakes are authorization and access. Research firm Gartner has warned companies not to overlook the inherent weaknesses of lakes. Data can be placed into a data lake with no oversight of the contents, Gartner analyst Nick Heudecker noted at the firm’s Business Intelligence & Analytics Summit last year.
Many data lakes are being used by organizations for data whose privacy and regulatory requirements are likely to represent risk exposure, Heudecker said. The security capabilities of central data lake technologies are still emerging, and the issues of data protection will not be addressed if they’re left to non-IT personnel, he said.
Many of the current data lake technologies on the market “don’t have fine-grained security controls that allow for multi-faceted control at the object level,” Hockenberry says.
The promise of data science and the data lake can only be realized by the free flow and joining of very large data sets. “This freedom creates opportunity, but is also harder to manage from a security perspective,” Hockenberry says. “Executives should ask questions about access, encryption, and tracking of data throughout its lifecycle in the enterprise.”
Organizations need to ensure that they have appropriate access and authorization controls, strong identity management and audit processes in place. Most importantly, they need a robust and well-tested incident response plan “that can quickly determine what, how much, and to what extent data has been compromised in the enterprise, and how to quickly restore not only functionality but trust in data once an attack has been successfully executed,” Hockenberry says.
Deploying data encryption where it makes sense is another key step. “Each data lake becomes an endpoint with unique vulnerabilities,” Aron says. “Data at rest should always be encrypted, without exception. Self-encrypting drives make it easier to ensure data is secure from the get-go.”
The recent string of high-profile hacks is serving to remind organizations that security should remain a top concern in any data architecture, Aron says. “The world is producing exponentially more data, and inevitably enterprises are creating more and more data lakes to house these new streams of data,” he says. “These disparate data silos create a headache for the security community because there are inevitably more doors for hackers to try and penetrate.”
It’s safe to assume that threats against data lake technologies will increase significantly as they become more mainstream, Steenland says. “However, the biggest threat will likely be insider threats due to inadequate deployment and configuration of these technologies,” he says.
All the more reason for executives to add data lakes to their list of key resources to protect.
“Companies should take the same types of steps as they would securing any type of data to include giving consideration as to who needs access to the data and how it will be used, ensuring strong access controls exist and logging is in place,” Steenland says. “Some level of information governance is still required, especially if the data includes regulated data.”