Hortonworks : 4 essential steps for managing sensitive data in your data lake

January 05, 2018 at 01:39 pm

By: Balaji Ganesan, CEO of Privacera

How to leverage data discovery, control, anonymization and monitoring using Privacera, Apache Atlas and Ranger

Data is growing in data lakes, so are security and compliance risks. These risks stem from storing and processing sensitive data.

Forrester defines toxic data (its definition of sensitive data) as a combination of 3P + IP. 3Ps being PII, PHI and PCI data while IP refers to intellectual property. Essentially, sensitive data carries the biggest risk if the data gets compromised, leaked or accessed inappropriately.

So how do companies manage sensitive data in their growing data lakes?
Option A - Do not bring any sensitive data into the data lake. This option limits the exposure but also limits the use cases that could be built on the data lake.
Option B - Bring in any kind of data into the data lake but institute rigorous standards for managing data risks. This option unlocks the power of big data but also require teams spend time in building security and governance standards

This blog is intended for companies planning to ingest any kind of data into the data lake and enabling business team to use such data. Here are the four essential steps data teams should follow to manage sensitive data and potential security and compliance risks:

Incorporate automated data classification. Without proper data classification, security and governance teams cannot institute proper controls or get visibility into risks. Data classification is the foundation for the modern data lake.
Access control. Companies need to put in controls to restrict access to sensitive data and ensure policies are granted on an as-needed basis
Anonymization. To reduce exposure, constitute policies to anonymize data as data is ingested into the data lake. Different methods for data protection are available for different use cases
Monitoring. Big data provides power to users to combine, transform and move data. Institute monitoring to detect any potential data loss or a behavior leading to a compliance or security violation.

Privacera is a fast growing data security and governance startup and a leading Hortonworks partner. Privacera platform integrates with Apache Atlas and Apache Ranger and extends the security controls available in HDP to provide a comprehensive functionality for data teams to manage sensitive data.

Here is how Privacera + HDP can help with the 4 steps outlined above to effectively manage data related risks.

Automated Data Classification

Privacera incorporates machine learning and NLP along with inbuilt rules to precisely discover sensitive data and classify them. Privacera connects to HDFS, Hive as well as other data stores, and can analyze content, context and metadata to precisely identify and classify any data. Privacera can scan structured and unstructured data, as it lands into the data lake or when it is stored in HDFS.

Privacera then pushes the metadata into Apache Atlas. Tags and associated metadata can now be searched and queried through Atlas UI or APIs.

Access Control

Once data is ingested into the data lake, fine grained access control policies need to be implemented in the data lake to ensure users get access to data only on a as-needed basis.
Using Apache Ranger, data teams can construct policies based on data sensitivity levels. Through the Apache Atlas and Ranger integration, metadata discovered by Privacera is pushed into Apache Atlas and Ranger. Administrators can then construct tag based policies for enabling or restricting access to any sensitive data.

Anonymization

Compliance and privacy regulations mandate the personal information be anonymized and encrypted at rest and while being accessed. As data lakes grow, sensitive data may need to be anonymized to reduce exposure and manage risks with compliance and security.

Privacera extends the dynamic anonymization feature available in Ranger with ability to apply format preserving encryption and tokenization capabilities. Privacera can help with:

Anonymizing or tokenizing data as it is ingested or while it is stored within the data lake. Privacera can help with preserving the format of data so that data can be used for analytics while preserving the confidentiality and privacy.
Anonymize or de-anonymize data only for specific users depending on the business need.

Monitoring

Beyond access policy enforcement, auditing all user activities for compliance and legal purposes is recommended steps. Audit data is often used by compliance and security teams to analyze how users are using data. As data and user bases grow, it can be challenging for compliance teams to manually analyze reports and audit logs to measure adherence to a compliance and legal regulations.

Privacera collects and analyzes audit data and monitors the data use across various parameters.Privacera monitoring can detect security risks and compliance violations proactively. Privacera monitoring module stitches together user information and detects data movements or unusual user behavior. The end result are alerts that can be viewed by administrators in the Privacera portal and can take appropriate action.

Summary

Data lakes are growing and data teams are embracing new use cases. Enterprises should embrace security and governance best practices while building the data lake. Data teams must look at automated data classification, building controls based on data content, and implementing data protection and monitoring to ensure sensitive data is protected at all times.

For more information, please visit us at www.privacera.com or reach out through email at info@privacera.com.

Hortonworks Inc. published this content on 05 January 2018 and is solely responsible for the information contained herein.
Distributed by Public, unedited and unaltered, on 05 January 2018 18:39:09 UTC.

Original documenthttp://feedproxy.google.com/~r/hortonworks/feed/~3/M5nj-MpMYbY/

Public permalinkhttp://www.publicnow.com/view/748E237641DFDB6E3CC555F4309B6161EF879120

Mauritanians go to the polls as Ghazouani seeks re-election	09:05pm	RE
Bolivia's President Arce finds new strength after seeing off military coup	08:59pm	RE
Haiti PM travels to US as Kenyan police patrol capital	08:26pm	RE
Texas wins court block on Biden overtime pay rule	08:05pm	RE
TEXAS WINS COURT BLOCK ON BIDEN OVERTIME PAY RULE…	08:05pm	RE
US military says it destroys seven Houthi drones, one ground control station	07:49pm	RE
U.S. MILITARY SAYS IT DESTROYED SEVEN HOUTHI DRONES AND ONE GROU…	07:49pm	RE
Dormitory fire in Moscow suburb kills five	07:18pm	RE
Biden vows to win election after shaky debate	07:18pm	RE
Japan's emperor, empress take trip down memory lane at Oxford	07:07pm	RE
Brazil's Petrobras ends contract with Unigel	07:07pm	RE
Judge denies Alec Baldwin's motion to dismiss manslaughter charge	06:54pm	RE
Russian military says it captures eastern Ukraine village, Kyiv says fighting continues	06:51pm	RE
Alaska Air formally returns Flight 1282 737 MAX 9 to Boeing	06:50pm	RE
Stocks dip as investors digest inflation data; dollar dips	06:46pm	RE

Stock Market News

Hortonworks : 4 essential steps for managing sensitive data in your data lake

All News: More news

Headlines

Most Read News