Security,Privacy and Safety

Introduction

This section outlines security, privacy and safety concerns that matter when applying machine learning for real business use.

The complexity of ML technologies has fuelled fears that machine learning applications will cause harm in unforeseen circumstances, or that they will be manipulated to act in harmful ways. Think of a self driving car with its own ethics or algorithms that make prediction based on your personal data that really scare you. E.g. Predicting what diseases will hit you based on data from your grocery store.

As with any technology: Technology is never neutral. You have to think before starting what values you implicitly use to design your new technology. All technology can and will be misused. But it is up to the designers to think of the risks when technology will be misused. On purpose or by accident.

Machine learning systems should be operated reliably, safely and consistently. Not only under normal circumstances but also in unexpected conditions or when they are under attack for misuse.

Machine learning software differs from traditional software because:

  • The outcome is not easily predictable.
  • The used trained models are a black box, with very few options for transparency.
  • Logical reasoning (or cause and effect) is not present. Predictions are made based on statistical number crunching complex algorithms which are non linear.
  • Both Non IT people and trained IT people will have a hard time figuring out machine learning systems, due to the new paradigms in use.

What makes security and safety more than normal aspects for machine learning driven applications is that by design neural networks are not designed to to make the inner workings easy to understand for humans and quality and risk managers.

Without a solid background in mathematics and software engineering evaluating the correct working of most machine learning application is impossible for security researchers safety auditors.

However more and more people will dependent on the correct outcome of decisions made by machine learning software. So we should ask some critical questions:

  • Is the system making any mistakes?
  • How do you know what alternatives were considered?
  • What is the risk of trusting the outcome blind?

Understanding how output produced by machine learning software is created will make more people comfortable with self-driving cars and other safety critical systems that will be machine learning enabled. In the end systems that can kill you must be secure and safe to use. So how do we get the process and trust chain to a level that we are not longer depended of:

  • Software bugs
  • Machine learning experts
  • Auditors
  • A proprietary certification process that end with a stamp (if paid enough)

From other sectors, like finance or oil industry we know that there is no simple solution. However regarding the risks involved only FOSS machine learning applications have the right elements needed to start working on processes that will give enough trust to use machine learning system for society at large.

Security

Using machine learning technology gives some serious new threads. More and more new ways for exploiting the technology are published. IT security is proven to be hard and complex to control and manage. But machine learning technology makes the problem of IT security even worse. This is due to the fact that the special created machine learning exploits are very hard to determine.

Machine learning challenges many current security measurements. This because machine learning software:

  • Lowers the cost of applying current known attacks on all devices which depend on software. So almost all modern technology devices.
  • Machine learning software enables the easy creation of new threats and vulnerabilities on existing systems. E.g. you can take the CVE security vulnerability database (https://www.cvedetails.com/) and train a machine learning model how to create attack on the published omissions.
  • When machine learning software will be in hospitals, traffic control systems, chemical fabrics and IoT devices machine learning gives easier options to create a complete new attack surface as with traditional software.

Security aspects for machine learning accounts for the application where machine learning is used, but also for the developed algorithms self. So machine learning security is divided into two main categories:

  1. Machine learning attacks aimed to fool the developed machine learning systems. Since machine learning is often a ‘black-box’ these attacks are very hard to determine.
  2. Machine learning offers new opportunities to break existing traditional software systems.
  3. Usage threats. The outcome of many machine learning systems is far from correct. If you base decisions or trust on machine learning application you can make serious mistakes. This accounts e.g. for self driving vehicles, health care systems and surveillance systems. Machine learning systems are known for producing racially biased results often caused by using biased data sets. Think about problematic forms of “profiling” based on surveillance cameras with face detection.

Some examples of machine learning exploits:

  • Google’s Cloud Computing service can be tricked into seeing things that are not there. In one test it perceived a rifle as a helicopter.
  • Fake videos made with help from machine learning software are spreading online, and the law can’t do much about it. E.g. videos with speeches given by political leaders created by machine learning software are created and spread online. E.g. a video where some president declares a war to another country is of course very dangerous. Even more dangerous is the fact that the fake machine learning created videos are very hard to diagnose as machine learning creations. This since besides machine learning a lot of common Hollywood special effects are also used to make it hard to distinguish real videos from fake video’s. Creating online fake porn video sites were you can use a photo of a celebrity or someone you really do not like is nowadays only just three mouse clicks away. And the reality is that you can do very little against these kinds of damaging threads. Even from a legal point of view.

Users and especially developers of machine learning applications must be more paranoid from a security point of view. But unfortunately security cost a lot of effort and money and a lot of special expertise is needed to minimize the risks.

Privacy

Machine learning raises serious privacy concerns since machine learning is using massive amounts of data that contain often personal information.

It is a common believe that personal information is needed for experimenting with machine learning before you can create good and meaningful applications. E.g. for health applications, travel applications, eCommerce and of course marketing application. Machine learning models are often loaded with massive amounts of personal data for training and to make in the end good meaningful predictions.

The belief that personal data is needed for machine learning creates a tension between developers and privacy aware consumers. Developers want the ability to create innovative new products and services and need to experiment, while consumers and GDPR regulators are concerned for the privacy risks involved.

The applicability of machine learning models is hindered in settings where the risk of data leakage raises serious privacy concerns. Examples of such applications include scenar- ios where clients hold sensitive private information, e.g., medical records, financial data, or location.

It is commonly believed that individuals must provide a copy of their personal information in order for AI to train or predict over it. This belief creates a tension between developers and consumers. Developers want the ability to create innovative products and services, while consumers want to avoid sending developers a copy of their data.

Machine learning models can be trained in environments that are not secure on data it never has access to. Secure machine learning that works on anonymized data sets is still an obscure and unpaved path. But some companies and organizations are already working on creating deep learning technology that work on encrypted data. Using encryption on data to train machine learning models raises the complexity in various ways. It is already hard to get inside the ‘black-box’ of the working of machine learning. Using advanced data encryption will require even more knowledge and competences for all engineers involved when developing machine learning applications.

In the EU the use of personal data is protected by law in all countries by a single law. The EU General Data Protection Regulation (GDPR). This GDPR does not prohibit the use of machine learning. But when you use personal data you will have a severe challenge to explain to DPOs (Data Protection Officers) and consumers what you actually do with the data and how you comply with the GDPR.

When you apply machine learning for your business application you should consider the following questions:

  • In what way will your customers be happy with their data usage for their and your benefit?
  • Do you really have a clear and good overview of all GDPR implications when using personal data in your machine learning model? What happens if you invite other companies to use your model?
  • What are the ethical concerns when using massive amounts of data of your customers to develop new products? Is the way you use the data to train your model congruent with you business vision and moral?
  • What are the privacy risks involved for your machine learning development chain and application?