AI / ML in enterprises: Technology Platform

As an organization embarks on leveraging AI / ML at enterprise scale, it is important to establish a flexible technology platform that caters well to different needs of data scientists and the engineering teams supporting them. Technology platform here includes hardware architecture and software framework that allows ML algorithms to run at scale.

Before getting into software stack directly used by data scientists, lets understand the hardware and software components required to enable machine learning.

  • Hardware layer: x86 based servers (typically intel) with acceleration using GPUs (typically nvidia)
  • Operating Systems: Linux (typically redhat)
  • Enterprise Data Lake (EDL): Hadoop based repository like Cloudera or MapR, along with supporting stacks for data processing:
    • Batch ingestion & processing: example – Apache Spark
    • Stream ingestion & processing: example – Apache Spark Streaming
    • Serving: example – Apache Drill
    • Search & browsing: example – Splunk

Once necessary hardware and data platforms setup, the focus is on providing an effective end user computing experience to data scientists:

  • Notebook framework for data manipulation and visualization: like Jupyter Notebooks or Apache Zeppelin, which support most commonly used programming languages for ML like Python and R.
  • Data collection & visualization: like Elastic Stack and Tableau.
  • An integrated application and data-optimized platform like IBM Spectrum makes it simple for enterprises by addressing all the needs listed above (components include enterprise grid orchestrator along with a Notebook framework and Elastic Stack).
  • Machine Learning platforms: specialized platforms like DataRobot, H2O, etc simplifies ML development lifecycle and lets data scientists and engineering focus on creating business value.

There are numerous other popular platforms like Tensorflow, Anaconda, RStudio and evergreen ones like IBM SPSS, MATLAB. Given the number of options available, particularly open source ones, an attempt to create a comprehensive list will be difficult. My objective is to capture the high-level components required as part of Technology platform for an enterprise to get started with AI / ML development.

AI / ML in enterprises: Lifecycle & Departments

Many start-ups are based on AI / ML competence and require this expertise across the organization. In established enterprises, AI / ML is fast becoming pervasive across the organization given the disruption from start-ups and customer expectations. Depending on the size and level of regulation in their respective industries, machine learning activities might be embedded within existing technology teams or dedicated “horizontal” teams might be responsible for them.

ML activities that people readily recognize are the ones performed by data scientists, data engineers and the like. However, there are other business and technology teams that are essential to enable ML development. Given the potential bias and ethics implications with business decisions made by AI / ML, governance to ensure risk and regulatory compliance will be required too. In this blog, I will cover AI / ML lifecycle along with the functions and departments in an enterprise that are critical for successful ML adoption.

AI Model inventory: There is an increasing regulatory expectation that organizations should be aware of all AI / ML models used across the enterprise to effectively manage risks. This McKinsey article provides an overview of risk management expected in banking industry. As an organization embarks on creating AI / ML development process, a good starting point is to define what constitutes an AI model to ensure common understanding across the organization and create a comprehensive inventory.

Intake and prioritization: To avoid indiscriminate and inappropriate AI / ML development and use, it is important that any such development go through an intake process that evaluates risk, regulatory considerations and return on investment. It is a good practice to define certain org wide expectations and preferable to federate the responsibility for agility.

Data Management: Once an AI Model is approved for development, business and technology teams work together to identify required data, secure them from different data sources across the organization and convert them into feature set for model development.

  • Data Administrators manage various data sources, which typically are data lake (Apache Hadoop implementation) or warehouses (like Teradata) or RDBMS (like SQL Server / Oracle).
  • Data Engineers help with data preparation, wrangling, munging and feature engineering using a variety of tools (like Talend) and makes feature set available for model development.

Model Development: Data Scientists use AI / ML platforms (like Anaconda, H2O, jupyter) to develop AI models. While model development is federated in most enterprises, AI / ML governance requires them to adhere to defined risk and regulatory guidelines.

Model Validation: An Enterprise Risk team usually validates models before production use, particularly for ones that are external facing and deemed high risk.

Deployment & Monitoring: Technology team packages approved models with necessary controls and integrated into appropriate business systems and monitors for stability and resilience.

Enterprises strive to automate the entire lifecycle so that focus can be on adding business value effectively and efficiently. Open Source platforms like AirFlow, MLFlow and Kubeflow help automate orchestration and provide seamless end to end integration for all teams across AI / ML lifecycle.