As data-driven and AI-first applications are on the advance, we extend our best practices for DevOps and agile development with new concepts and tools. The corresponding buzz words would be continuous intelligence and continuous delivery for machine learning (CD4ML).

For our current project, we researched, tried different approaches and build a proof of concept for a continuously improved machine learning model. That’s why I got interested in this topic and went to a meet up at Thoughtwork’s office. Christoph Windheuser (Global Head of Artificial Intelligence) shared their experience in this field and gave a lot of insights. The following post summarizes these thoughts [1] with some notes from our learning process.

CD4ML continuous intelligence cycle

The continuous intelligence cycle

1- Acquire data

Get your hands on data sets. There are multiple ways, most likely the data is bought, collected or generated.

2- Store, clean, curate, featurize information

Use statistical and explorative data analysis. Clean and connect your data. At the end, it needs to be consumable information.

3- Explore models and gain insights

You are going to create mathematical models. Explore them, try to understand them and gain insights in your domain. These models will forecast events, predict values and discover patterns.

4- Productionize your decision-making

Bring your models and machine learning services into production. Apply your insights and test your hypothesizes.

5- Derive real life actions and execute upon

Take actions on your gained knowledge. Follow up with your business and gain value. This generates new (feedback) data. With this data and knowledge, you follow up with step one of the intelligence cycle.

Productionizing machine learning is hard

There are multiple experts collaborating in this process circle. We have data hunters, data scientists, data engineers, software engineers, (Dev)Ops specialists, QA engineers, business domain experts, data analysts, software and enterprise architects… For software components, we mastered these challenges with CI/CD pipelines, iterative and incremental development approaches and tools like GIT and Docker (orchestrators). However, in continuous delivery for machine learning we need to overcome additional issues:

  • When we have changing components in software development, we talk about source code and configuration. In machine learning and AI products, we have huge data sets and multiple types and permutations of parameters and hyperparameters. GitHub for example denies git pushes with files bigger than 100mb. Additionally, copying data sets around to build/training agents is more consuming than copying some .json or .yml files.
  • A very long and distributed value chain may result in a "throw over the fence" attitude.
  • Depending on your current and past history, you might need to think more about parallelism in building, testing and deploying. You might need to train different models (e.g. a random forest and an ANN) in parallel, wait for both to finish, compare their test results and only select the better performing.
  • Like software components, models must be monitored and improved.

The software engineer’s approach

In software development, the answer to this are pipelines with build-steps and automated tests, deployments, continuous monitoring and feedback control. For CD4ML the cycle looks like this [1]:

CD4ML Pipelines

There is a profusely growing demand on the market for tools to implement this process. While there are plenty of tools, here are examples of well-fitting tool chains.

stack discoverable and accessible data version control artifact repositories cd orchestration (to combine pipelines)
Microsoft Azure Azure Blobstorage / Azure Data lake Storage (ADLS) Azure DevOps Repos & ADLS Azure DevOps Pipelines
open source with google cloud platform [1] Google cloud storage Git & DVC GoCD
stack infrastructure (for multiple environments and experiments) model performance assessment monitoring and observability
Microsoft Azure Azure Kubernetes Service (AKS) Azure machine learning services / ml flow Azure Monitor / EPG *
open source with google cloud platform [1] GCP / Docker ml flow EFK *

* Aside from general infrastructure (cluster) and application monitoring, you want to:

  • Keep track of experiments and hypothesises.
  • Remember what algorithms and code version was used.
  • Measure duration of experiments and learning speed of your models.
  • Store parameters and hyperparameters.

The solutions used for this are the same as for other systems:

search engine log collector visual layer
EFK stack elasticsearch fluentd kibana
EPG stack elasticsearch prometheus grafana
ELK stack elasticsearch logstash kibana

[1]: C.Windheuser, Thoughtworks, Slideshare: https://www.slideshare.net/ChristophWindheuser/cd4ml-thoughtworks-meetup-munich-christoph-windheuser-may-8th-2019

Introduction

Most people want to learn new things. It could be a new skill, a new hobby or simply to broaden your general knowledge horizons. We have this desire to learn and grow, but yet we struggle to find the discipline to achieve our learning goals. I’m sure all of us can attest to learning intentions – be it from a new year’s resolution or some other source of inspiration – that died a silent death along the road side.

So, why can’t we achieve our learning goals? A complex question indeed, but a part of the problem is that we have to actively do something in order to get where we want to be. For example, if you want to learn a new language, then you have to open a book and read; or log onto a website and complete the lessons and tests; or go to evening classes at your local school or college. And there are more than enough challenges in our lives that prevent us from doing this diligently! But what if we could still learn without actually doing something actively, just by engaging in our normal daily routine?

Active vs. Passive Learning

As already mentioned, Active Learning means that you have to initiate an action by yourself to achieve a desired learning objective. It is a decision and a discipline that you have to set into motion by yourself.

Passive learning, on the other hand, means that you learn without initiating something by yourself. To explain this in more detail, let’s continue with the example of learning a new language. Learning a new language actively means that you have to read a book or go to a class. Now, let’s try to find an example of how you could learn a new language passively.

Let’s say that it takes you an hour to drive to work every day. In this time, you could listen to audio tapes or CDs that help you to learn a new language. You are going to drive to work in any ways, so why not use this time to learn something new? This is a good initial example to dive into the idea of passive learning, but it still has some shortcomings: you have to make the decision to switch the language CD on instead of listening to your favorite music or the radio (even though you are not doing anything actively once it is switched on); audio alone is not necessarily enough to learn a new language, since you might also want to look at grammar structures and alphabet of the new language. But at least we have made some progress. We don’t have to open a book or go to a class anymore. In the next section, we look at how we can use technology to further expand the idea of passive learning.

Technology and Passive Learning

The digital era is upon us. Technology is pervasive throughout society. As a result, we also consume large amounts of information electronically. We surf the web to inform ourselves about topics that interest us; we read the news online; and we use a variety messaging systems and social media – to name just a few! These are things that we are going to do every day as part of our routine. So, can we build in a Passive Learning experience while going about our daily routine? The answer is yes, and in the next section we illustrate how this can be achieved by means of a practical example.

Technology and Passive Learning: A Practical Example

In this section we will look at a practical example, again in the context of learning a language. Vocabulary is an important building block in the language learning process. Within a learning context, it is important for us to map words from one language to another so that we can learn the vocabulary of the new language. Flash cards often get used to achieve this goal. The idea is simple: you have a word on one side of a card and you flip the card to see the meaning of this word in another language.

Flash card software (flipping the card with a mouse click) has also been around for a long time. The problem is that this still requires the Learner to be motivated and do something actively. So, we need to find a way in which the Learner can get exposed to the new vocabulary in a passive way.

As mentioned in the previous section, we consume large amounts of information electronically these days. Let’s say that we consume online information in English and we want to learn German. Our proposal is to now develop a web browser plugin that will replace some of the English words on websites with German words. Selecting the correct amount of words to replace is important, since it should still be easy for the reader to understand the text without too much effort. As a starting point, our suggestion is to replace only 10% of the nouns. The image below has some sample text that shows the difference between the original website and the transformed website. You should still be able to understand the content of the transformed website without too much additional effort. Try it out for yourself!
ArtOuput1

From a programmatic point of view, it is not difficult to tokenize and extract the nouns in a piece of text. Most programming languages have either built-in capability or 3rd party Libraries that does just that. Below is a JavaScript code-snippet (using the pos-tag lib) illustrating this concept.

fs.readFile('input.html', 'utf8', (err, data) => {  
    if (err) throw err; 
    const result = pos(data);
    //extract all the nouns – pos stands for ‘part of speech’
    const nouns = result.filter(item => item.pos === 'NN');        
    nouns.forEach((item) => {
        //get the translation of the extracted nouns    
        var trResult = getTranslation(item.word, 'en', 'de');
        data = data.replace(item.word, '<strong>' + trResult.translation + '</strong>');

    });
});

Completing the circle: Reintroducing Active Learning into the Passive Learning Experience

So far, we have been making good progress in creating a passive language learning experience. But we can go even further!

The idea is to reintroduce a form of active learning back into our current model. The words that we replaced in our original source text will be created as hyperlinks; when the Learner clicks on one of these words, we will provide more information about the word.
ArtOutput2
In our case, we will link to a WordNet browser. Wordnet is kind of like an intelligent electronic dictionary that, amongst others, provides synonyms and word-meanings in context. The image below is an example of a popup WordNet browser that would display once the Learner clicks on one of the hyperlinked words in the source text.
WordnetBrowser

The active learning that takes place here is different from the active learning as described earlier. In this case, the Learner would click on the hyperlink out of curiosity and consequently also learn something. It differs from the scenario described earlier, in the sense that the Learner does not have to find some kind of internal motivation to set the learning process into motion. The learning happens as a result of curiosity that was generated by the embedded Passive Learning experience.

Conclusion

Passive Learning ideas can be embedded into technology that we are using on a daily basis. We illustrated how passive learning can be used in the context of language learning as part of our daily web browsing experience. We also showed how Active Learning could be reintroduced into the learning process as a result of the Passive Learning context in which the Learner is operating. This example only scratches the surface of what is really possible when combing passive learning and technology. Some questions – to name but a few – that come to mind for possible future work in this area, are the following:

  • Can the idea be introduced into messaging platforms such as Skype, Slack and WhatsApp? These messaging technologies are pervasive and get used by millions of people on a daily basis.
  • We should also be able to expand the idea so that it applies to a variety of language pairs. Also, we only looked at replacing a certain percentage of nouns in the text, but we can also include adjectives, adverbs and verbs, and make it configurable to suit the Learner’s needs. The image below illustrates how such a configurable setup could look like.
    Configuration
  • And finally, what about other learning domains? Can we make adjustments so that the Passive Learning experience is also possible in other domains such as Math, Engineering, Biology and Social Sciences?

I went to a great session about CQRS, Event Sourcing and domain-driven Design (DDD) on the Software Architecture Summit. The speaker Golo Roden (@goloroden) did a fantastic job selling these concepts to his audience with a great storytelling approach. He explained why CQRS, Event Sourcing and DDD fit together perfectly while replicating the www.nevercompletedgame.com for his daughter. This is what he shared with us.

Domain-driven Design

The more enterprise-y your customer the weirder the neologisms get.

We – as software engineers – struggle to understand business and domain experts. Once we understand something, we try to map it to technical concepts. Understood the word "ferret"? Guess we need a database table called "ferret" somehow. We then proceed to inform our business colleagues, "deploying a new schema is easy as we use Entity Framework or Hibernate as OR-mapper". He thinks we understood, we think he understood. Actually, nobody understood anything.
As software engineers we tend to fit every trivial and every complex problem into CRUD-operations. Why? Because its "easy" and everyone does it. If it was that easy, software development would be effortless. Rather trying to fit problems in a crud pattern, we should transform business stories into software.
That’s why we should use domain-driven design and ubiquitous language.
Golo Roden proceeds to create a view on the nevercompletedgame with ubiquitous language. So nobody asks, "what does open a game mean" and there is no mental mapping.
I won’t go into detail here, but an example can show why we need this.

  • Many words have one meaning: When developing a software for a group of people, sometimes we call them users, sometimes end-users, sometimes customers etc. If we use different words in the code or documentation and developers join a project later – they might think there is a difference between these entities.
  • One word has many meanings: Every insurance software has "policies" somewhere in its system. Sometimes it describes a template for a group of coverages, sometimes it’s a contract underwritten by an insurance, sometimes a set of government rules. You don’t need to be an expert to guess this can go wrong horribly.

CQRS

Asking a question should not change the answer

Golo Roden jokes, "CQRS is CQS on application level", but actually it’s easy to understand this way, once you read a single article about CQS. Basically, it’s a pattern where you separate commands (writes) and queries (reads): CQS.

  • Writes do not return any values and change the state of an object.
    stack.push(23); // pushes value 23 onto the stack; returns nothing
  • Reads return a value and don’t change the state.
    stack.isEmpty() // does not change state; returns a isEmpty boolean
  • But don’t be fooled! Stacks are not following the CQS pattern.
    stack.pop() // returns a value and changes state

Separating them on application level means: exposing different APIs for reading (return a value; do not change state) and writing (change state; do not return value *). Or phrased differently: Segregate responsibilities for commands and queries: CQRS.

* for http: always returns 200 before doing anything

Enforcing CQRS could have this effect on your application:

For synchronizing patterns see patterns like the saga pattern or 2 phase commit . For more reference see: Starbucks Does Not Use Two-Phase Commit

Event Sourcing

When talking about databases (be it relational or NoSQL) often we save the current state of some business item persistently. When we are ambitious we save a history of these states. Event sourcing follows a different approach. There is only one initial state, change requests to this state (commands) and following manipulating operations (events). When we want to change the state of an object, we set up a command. This triggers an event (that’s published so some kind of queue) and most likely is persistent in a database.

Bank account example: we start with 0 € and do not change this initial value when we add or withdraw money. We save the events something like this:

Date EventId Amount Message
2019-01-07 e5f9e618-39ad-4979-99a7-342cb1962266 0 account created
2019-01-11 f2e98590-7795-4cf7-bdc2-1794ad39874d 1000 manual payment received
2019-01-29 cbf44bfc-7a5e-4514-a906-a313a6e0fb5e 2000 saylary received
2019-02-01 32bc638c-4783-45b8-8c1e-bebe2b4528a1 -1500 rent payed

When we want to see the current balance, we read all the events and replay what happened.

const accountEvents = [0, 1000, 2000, -1500];
const replayBalance = (total, val) =>  total + val;
const accountBalance = accountEvents.reduce(replayBalance);

Once every n (e.g. 100) values we save a snapshot to not have to replay too many events. Aside from the increased complexity this has some side effects which should not be unadressed.

  • As we append more and more events, the data usage increases endless. There are ways around removing "old" events etc. and replacing them with snapshots, but this destroys the intention of the concept.
  • Additionally, as more events are stored, the system gets slower as it has to replay more events to get the current state of an object. Though, snapshotting every n events can get deterministic maximum execution time.

While there are many contra arguments there is one the key benefit why its worth: your application is future proof, as you save "everything" for upcoming changes and new requirements. Think of the account example from the previous step. You can implement/analyze everything of the following:

  • "How long does it take people to pay their rent once they got their salary"
  • "How many of our customers have two apartments? How much is the difference between both rents?"
  • "How many of our customers with two apartments with at least 50% in price difference need longer to pay off their car credit?"

To sum it up and coming back to our initial challenge, our simple CRUD application with domain-driven Design, CQRS and event sourcing would have transformed our architecture to something like this:

While this might solve some problems in application and system development this is neither a cookie-cutter approach nor "the right way" to do things. Be aware of the rising complexity of your application, system and enterprise ecosystem and the risk of over-engineering!