Why Free and Open Machine Learning¶
Free and Open machine learning is close to open source(FOSS - Free and Open Source Software) but openness requires more than open source software alone. So we advocate for using Free and Open machine learning. The term open source (OSS) means FOSS in this publication. ‘Open source software’ is sometimes also called “Free software”, “libre software”, “Free/open source software (FOSS or F/OSS)”, and “Free/Libre/Open Source Software (FLOSS)”. But the term “Free software” has been sometimes misinterpreted as meaning “no cost”, which is not the intended meaning. It is all about Freedom, so a better term would have been to call it Freedom Software. So ‘Free’ open source software (FOSS) refers to freedom, not price.
The Freedom part makes a great difference in making sure machine learning technology and all other aspects involved really preserver freedom in a sustainable way.
FOSS machine learning is crucial for all. In our view machine learning technology must be inclusive for all. This means that besides using FOSS machine learning frameworks like Tensorflow more aspects must be open and transparent. This to achieve that machine learning becomes a real open and inclusive technology that can be used for the advantage of everyone. And everyone should be able to experiment, play and create new machine learning application. Without without major obstacles in terms of cost for technology usage or hardware required.
Free and Open machine learning means that everyone must be able to develop, test and play and deploy machine learning based solutions. So inclusive for all. Large investments should not be needed for using and applying machine learning. So not only companies or people who can afford the enormous investments needed in specialized GPU hardware should benefit of machine learning technology, but everyone should be able to create meaningful applications to create a better world.
The following aspects are needed for real Free and Open Machine Learning:
- FOSS (Free and Open Source software)
- Open data
- Open algorithms (machine learning libraries)
- Open architectures
- Open Science
Open Source (FOSS)¶
Free and open-source software (FOSS) is software that can be classified as both free software and open-source software. FOSS is an inclusive term that covers both free software(FLOSS) and open-source software(OSS).
Open Source is an approach for the design, development, and distribution of new products & knowledge offering practical accessibility to its source. Real open source solutions have a license that is approved by the ree Software Foundation (FSF) (https://www.fsf.org/) or the Open Source Initiative (OSI) foundation (https://opensource.org/). Open source is all about collaboration and Freedom. Collaboration is key for developing, applying and using machine learning functionality.
Open Source Software(FOSS) is the norm for machine learning. However using open source software will still be new and innovative for a lot of companies. If you really want to benefit from new machine learning software you must go for a solid FOSS machine learning ecosystem. This makes you flexible, independent and you can still use thousands of consultancy firms and (Cloud)hosting companies that can help you or will provide hosting facilities.
A transition towards FOSS software can already be very hard and can be disruptive for many companies. It takes the right mindset, attitude and culture within a company. Applying machine learning for real business cases is also already complex and challenging. So taking advantage of machine learning requires the right innovative mindset. Using machine learning without using the benefits that come with the FOSS ecosystems of choice, is like learning to swim without hitting the water. So hit the water as soon as possible, after a while you will see and use the benefits.
Machine learning applications are expensive to develop and to adopt. This accounts for the development process itself, good skilled professional IT engineers and scientist are expensive. But even more for the needed infrastructure and resources needed to develop meaningful applications for your business. This means that currently big firms like Google, IBM, Microsoft, Facebook and Amazon are at the front of the queue and smaller counterparts get left behind. Since most of the scientific knowledge is freely available and more and more infrastructure needed is available within the open source domain, this book is entirely focussed on openness and open source. The technique behind machine learning is too much fun and often requires adjustments and tweaking, which is hard when you are using black-box solutions.
OSS developments in the machine learning field are absolutely no hobby projects. Almost all major OSS machine learning developments are backed by small or large companies(e.g. Google, Microsoft, Facebook, Uber) active in the deep learning ecosystem. Also many great FOSS machine learning frameworks are backed by research groups of universities of research communities organized by universities. Small machine learning OSS projects are often developed by PhD researchers and are supported by a strong scientific foundation or by universities.
A focus on open source (FOSS) software for applying machine learning for real is crucial. FOSS machine learning applications and frameworks have the following benefits:
- Create solutions software faster, better and with less friction. You can adjust what you want without limitations.
- Lower cost for creating your first pilot project. Mind: Your first attempts will fail. And the faster your pilot projects fail, the better. This since applying the new machine learning capabilities requires a learning curve. Technical, but also for the organization and business side point of view.
- Flexibility and changeability.
- No vendor lock ins. Of course the machine learning cloud offerings of the major tech companies are great (Azure ML, IBM Watson, Amazon, Google etc). But playing around without any strings attached and limitations set for you gives you a head start.
FOSS machine learning is very popular. See e.g. the diagram below. You should have very strong arguments, also from a business perspective. This because investments for real world application always have business risk. Choosing a commercial black box solution often increases business risks and mitigation of risks is harder. E.g. security and privacy risk mitigation is hard with blackbox solutions.
All IT companies advertise with machine learning powered software products nowadays. This also means that existing software that has been sold for decades is now suddenly re-branded with the new machine learning buzz words. Also terms like cognitive, artificial intelligence (AI) powered and data driven are used to sell you old solutions using this new trend. You can easily be fooled since massive marketing efforts (time, money, material) have been invested to sell the old buggy solutions as new innovative machine learning powered solutions. In reality black box solutions from small or large vendors that seems good to be true for your use case, are almost always based on fads. This is why you should be very suspicious when using cloud based machine offerings that offers you instant new business and customers. Make sure to do a fast and cheap hands on innovation project first yourself to check if and how your business use case can benefit from machine learning. So if the new solution looks to good to be true, be aware.
To use ML for real business applications you should use and reuse good FOSS tools, frameworks and knowledge available. But you should also take the quality aspects (technical and non-technical) that come with a framework choice into account.
When using machine learning FOSS solutions you can inspect the working and evaluate all risks involved. Or by using FOSS solution you can ask every IT company or consultant with the right skills to audit the application. Because in the end: The security, safety and privacy of your customers are at risk and you will be held accountable.
For Free and Open machine learning we do not only need FOSS software, but also open data sets. Data is one of the most important aspects for making machine learning work. Without data and open transparent insights in the various quality aspects of the data machine learning is not open.
Trust in the outcome of applications powered by machine learning technology is only possible when the input data is fully available.
Open and reusable quality datasets are crucial for creating machine learning driven applications.
Creating a data set to test and develop machine learning algorithms is hard and time consuming. Many current machine learning algorithms are developed and proven by using open data sets. In https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research a short overview can be found of various data sets used for scientific machine learning research.
Free and open machine learning means that everyone should be able to access and use data that is used to train machine learning applications. So Google, Facebook and many other companies who donate a lot of machine learning knowledge and frameworks in the open source domain rarely release datasets that are used for their fantastic commercial machine learning offerings. Not knowing details about datasets, especially for live saving systems that are powered using machine learning technology, means verification of claims is impossible. There are can also be large privacy risks involved, since training machine learning algorithms requires large datasets. Seldom do people give permission for using their valuable data for developing applications that are not beneficial for them. E.g. why should a government use your data in order to develop an application that is not in your interest.
Machine learning is a challenging science. Many researchers on universities worldwide are working to develop new knowledge for solving a range of complex problems.
Universities are funded by tax payers. So in an ideal world everyone should benefit from knowledge developed. Also almost all knowledge developed is based on work developed earlier by others. This is how science works. We build upon knowing of others to develop new knowledge and insights.
Open science represents an approach to the scientific process based on cooperative work and new ways of diffusing knowledge by using digital technologies and new collaborative tools. The idea captures a systemic change to the way science and research have been carried out for the last fifty years: shifting from the standard practices of publishing research results in scientific publications towards sharing and using all available knowledge at an earlier stage in the research process.
Developing machine learning knowledge using open science means that publications, data, results, and software is accessible without borders for everyone to learn and build upon. Key pillars of open science important for open machine learning are:
- Open Data:
- Open source software
- Open access
This so everyone can validate claims, inspect algorithms used and can created and read ML experiments done without large upfront costs. Transparency is needed for trust. This also accounts for machine learning applications, algorithms and frameworks used. But also for real open machine learning applications providing real transparency in terms of explaining how results are created is a complex problem. This is a results of how some type of machine learning algorithms work.
Only when the basic principles for open science are followed trust in machine learning algorithms and software frameworks is possible.
Work in progress
Applying new technology brings new responsibilities. Computations power needed for deep learning research have been doubling every few months. Machine learning computations can have a very large carbon footprint. This is a results of the way most algorithms are designed. Most algorithms do give good results when large amounts of data are used and an enormous number of calculations are performed. Computers do use a lot of energy when calculations at large are performed.
Ironically, deep learning was inspired by the human brain, which is remarkably energy efficient. Moreover, the financial cost of the computations can make it difficult for academics, students, and researchers, in particular those from emerging economies, to engage in deep learning research.
Green ML means is machine learning optimized to minimize resource utilization and environmental impact. This can be done by data center resource optimization, balancing training data requirements versus accuracy, choosing less resource intense models or in some cases transfer learning versus new models.
Besides the cost green machine learning is a important factor for Free and Open machine learning since the benefits machine learning can bring should not harm the environment of all living cells that have no direct relationship with your machine learning application. The Freedom to use this powerful technology should not limit the freedom to live in good health of others. So green ML is a difficult but important aspects for machine learning developments. So chose algorithms that perform well without weeks of calculation on datasets. Or make sure expensive and time consuming calculations can be reused by others in an easy way.