Is your Data Science team increasing your cyber exposure?

Published in

Towards Data Science

4 min readOct 14, 2018

With cyber-attacks and data breaches making the headlines almost every week, it is surprising to see how poorly prepared technology companies continue to be. Many of these breaches are the result of poor security measures and/or human errors: systems are not updated fast enough, sensitive data is not encrypted, or is encrypted using easily broken algorithms. Every year Verizon publishes a report analyzing the cyber-attacks their clients experienced. The 2018 edition reflected on over 2,200 such breaches, many of which could have been easily prevented. You can read the full report here.

If the problem starts with poor IT practices, it definitely does not end here. Phishing and ransomware campaigns target unsuspecting employees, using deception and social engineering methods. Some companies have invested in training. But this knowledge stays relatively abstract, and most employees don’t apply these teachings to their everyday work and life. For example, they exchange data in excel spreadsheets, by email, without encrypting it or any concern for the sensitive information it holds.

Now, consider your data science team. Their daily activities require them to handle large amounts of data. This data is usually stored in systems that are managed by your IT organization. The data science team gets either direct access to it or gets an export of it in a text file. In either case, the data is prepared for modeling and ends up unencrypted on a file system somewhere, without any IT control or security measures. To compound the issues, this data sometimes get sent via unencrypted email.

To change this sad state of affairs, companies need to create a culture where everyone is conscious of cyber exposure and knows what to do on a daily basis to mitigate it.

Secure your IT infrastructure

It might be time to change the way you think about security. The traditional perimeter security model relies on network segmentation as the primary mechanism for protecting sensitive resources: devices inside your firewall are supposed to be more trusted than the ones outside of it.

After Operation Aurora occurred in the second part of 2009, where a dozen organizations including Google were targeted, Google developed a Zero Trust Network called BeyondCorp. In this new model, all applications are deployed to the internet, and no connection is more trusted just because it is in a safer zone defined by a firewall. The access rights are managed at the user and device levels, irrespective of where they are.

Decide with IT where files should be stored and determine best encryption solutions

Agreeing with IT on where modeling datasets are to be stored is a first step towards better managing your cyber risks. It is impossible for IT to secure data they don’t even know exists.

Remove any sensitive information

Employees need to know the different types of sensitive information and understand that they should be treated differently depending on their sensitivity. The main ones are:

Personal Identifiable Information(PII) is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context.

Protected Health Information(PHI) under the US law is any information about health status, provision of health care, or payment for health care that is created or collected by a Covered Entity (or a Business Associate of a Covered Entity) and can be linked to a specific individual. This is interpreted rather broadly and includes any part of a patient’s medical record or payment history.

Payment Card Information(PCI) is information relevant to payment cards, like credit cards.

Companies usually own more specialized sensitive information and need to make it available to their employees. That information needs to have restricted access and should never be stored in an unencrypted database or file.

Make sure your modeling datasets are stored in an encrypted format

Even when all sensitive information has been removed and the data anonymized, it is good practice to encrypt the data and any intermediate products to protect it from unauthorized access.

Avoid sending data by email

Email is a very unsecure protocol that makes it easy for hackers to intercept messages, or spoof emails, and should never be used to exchange unencrypted data.

If the only way you have to exchange data is by email, you should always use highly secure encryption tools. There are secure solutions that allow you to encrypt and sign messages and attachments, like GPG Tools, an open source product. You should check with your IT department for the best practices at your company.

Implementing these very simple steps would go a long way in reducing companies’ exposure to cyber risks. Security is everyone’s responsibility and should not be taken lightly.

Stephane is the founder & CEO of Prometheus Ax, a company that helps CEOs optimize their data science projects success rate and ROI.

Is your Data Science team increasing your cyber exposure?

Written by Stef Caraguel