Moving faster with MLOps: Bridging the gap between AI development and deployment

Artificial intelligence (AI) recently got a boost with the release of ChatGPT from OpenAI. Now, everyone’s talking about the potential of AI to transform enterprise processes across business functions.

Less often discussed is what it takes to move from developing AI models to productionizing them with machine learning (ML) to deliver significant enterprise value. To date, data scientists have spent much of their time preparing and managing data used in AI models, a slow, costly processes that has limited machine learning gains. In addition, many ML models have failed to make it out of production into full deployment.

Enterprises that set up MLOps disciplines can productionize and scale ML models, gaining new advantages such as improved forecasting and operational efficiencies. As a result, there’s never been a better time for enterprises to evolve their MLOps capabilities to move faster with AI and leverage it for more use cases.

To date, just 50 percent of enterprises have adopted AI in at least one business unit, deploying an average of 3.8 capabilities. That journey can accelerate with MLOps.

What is an ML model?

A machine learning (ML) model is a computer program that has been trained to recognize patterns. By doing so, it can then make predictions about what will happen next.

What is MLOps and why do you need it?

Machine learning operations (MLOps) combines data engineering, machine learning, and DevOps into a single discipline. MLOps encompasses the skills, frameworks, technologies, and best practices that equip data engineering, data science, and IT teams to industrialize ML models and evolve processes over time. MLOps also integrates data sources and data sets, an AI model repository, an automated ML platform, and software containers.

“MLOps is the key to making machine learning projects successful at scale,” says John 'JG' Chirapurath, VP of Azure at Microsoft.

“What does MLOps mean? MLOps is the short form of the phrase machine learning and information technology operations."

Telecom Banner

MLOps versus DevOps and AIOps: What’s the difference?

IT innovation and operations have gotten more complicated with the industry trend of naming every discipline “ops.” Let’s explore the difference between three similar sounding terms: DevOps, AIOps, and MLOps.

Development operations (DevOps) is the practice of using Agile, lean processes to increase the pace of IT development, while taking a systems approach. Organizations using DevOps seek to create a strong development culture, use automation to streamline processes, and provide iterative value with each code release.

Gartner coined the term AIOps, short for AI operations. According to the analyst firm, “AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” Or to put it more simply, AIOps automates IT processes to make them more efficient.

Machine learning operations, or MLOps, standardizes the process of developing, deploying, and maintaining ML models. MLOps seeks to industrialize or productionize these capabilities, so that teams can progress from deploying ML models using bespoke processes to ones built for scale.

Telecom Banner

What is the MLOps market size?

The global MLOps market size was valued at $983.6 million in 2021, but is growing at a heady of CAGR of 37.5%. As a result, it’s expected to reach $23.1 billion by 2031, according to Allied Market Research. That’s because enterprises are investing in developing MLOps architectures, deploying MLOps tools and platforms, and leveraging MLOps best practices to rapidly evolve their competency with this important emerging discipline.

Explore the key MLOps themes for enterprises to consider in 2023

Who needs MLOps?

Any organization that uses big-data analytics to drive decision-making and optimize processes needs ML models and machine learning ops. To date, because of the cost and commitment of standing up an MLOps function, this discipline has been embraced by well-funded research organizations, startups, and enterprises.

Why and when to employ MLOps?

Organizational leaders should consider investing in MLOps when they are ready to develop and productionize ML models for multiple use cases. Developing a strong MLOps practice can help organizations overcome the following issues:

  • ML models are not making it to production (a common industry problem).
  • Data scientists are using customized processes to create ML models, increasing time and cost.
  • The data science team has no way to consistently deploy ML models at pace, resulting in a large backlog.
  • ML model monitoring is inconsistent, harming model accuracy over time.

Why is MLOps important?

Enterprises that deploy ML models unlock significant new business advantages, including:

Gaining a repeatable process for ML deployment: As with DevOps, MLOps teams seek to automate as much as possible, creating standardized processes are easy to reproduce and scale.

Focusing time on activities that generate business value: Teams focus less time on data preparation and more time on ML model deployment and optimization, activities that improve enterprise ML value creation.

Ensuring ML models are deployed: Industry statistics show that up to 90 percent of ML models don’t make it to production, representing wasted investment. With such low success rates, business stakeholders lose interest in funding ML projects. Using standardized processes and automation decreases project risk and error, ensuring that ML models are deployed and realize intended business value.

Increasing operational efficiency Teams that use machine learning ops become faster at producing and deploying ML models. While many enterprises may currently have a low number of models in production, leaders want to be able to deploy and manage more and more ML use cases.

Reducing ML development costs: Currently, ML models are resource-intensive, requiring large teams of data engineers, data scientists, and IT professionals to develop, deploy, and maintain. Organizations can reduce costs by using standardized processes; cloud-based platforms that can scale with business growth; and partners who provide critical capabilities, such as frameworks, MLOps platforms, and teams to manage ML programs.

Accelerating time to market: Being able to industrialize ML capabilities and launch new models in days to weeks is critical to achieving business transformation goals.

By 2024, IDC predicts that 60% of enterprises will have operationalized their MLOps workflows. Those that move ahead now can create competitive advantage by using data, analytics, and automation to predict and respond to business changes.

The current state of AI and ML adoption

So, who’s adopting AI and ML and what have they accomplished thus far with this exciting new discipline?

The worldwide market for AI is growing fast. Sales of AI software, hardware, and services are expected to break the $500 billion mark in 2023.

However, deploying AI and ML is still an emerging discipline. Just one in three enterprises have adopted clear frameworks for AI governance (38%), follow standard protocols for building and delivering AI tools (36%), and have capability-building programs to enhance team member skills (36%).

Despite challenges, such as insufficient talent and difficulties productionizing ML models, more than half (56 percent) have adopted AI in at least one business function, finds McKinsey.

Telecom Banner

Source: McKinsey

Check out our guide, "MLOps: A set of essential practices for scaling ML-powered applications," and start implementing these best practices today.

Addressing AI and ML challenges with MLOps

Machine learning operations helps organizations overcome challenges that have plagued AI and ML initiatives to-date. These issues include:

Competing for skilled talent: Enterprises’ appetite to pursue AI and ML programs is creating an enormous demand for data scientists. LinkedIn jobs for data science have skyrocketed 650%. However, only 20% of enterprise leaders believe their data science teams are ready for AI. Enterprises need to hire, develop, and retain data engineers and data scientists, necessitating new cross-skilling and upskilling

Overcoming data challenges: Enterprises have a wealth of data that’s siloed in repositories and business processes. Migrating, ingesting, preparing and provisioning data on modern cloud platforms like Databricks and Snowflake is a necessary prerequisite to moving forward with AI and ML.

Download our eBook, Migrate to Modernize Your Data and Analytics, to learn how you can do this.

Telecom Banner

Source: IDC/Snowflake

Overcoming data challenges: ML models are costly to deploy in part due to the high cost of talent and lack of repeatable processes. The number of data scientists employed by the average organization grew from 28 to 50 in 2021. With a one-off approach, this expense is incurred with every model.

Machine learning ops enables enterprises to develop scalable frameworks that reduce costs over time. Enterprises need to be able to control costs and achieve efficiencies as they scale models.

ML models can be slow to deploy: According to IDC, deploying AI and ML solutions can take up to nine months on average, by which time the business has almost certainly changed.

“MLOps brings model velocity down to weeks —sometimes days,” says IDC analyst Sriram Subramanian. “Just like the average time to build an application is accelerated with DevOps, this is why you need MLOps.”

Too many ML models fail in production: The ability to use AI and ML across industry challenges will be critical for future success in business. Those organizations that can automate end-to-end processes, predict events, and respond to demand changes, as well as continuously learn from data, will outpace those that can’t. However, the majority of ML models fail in production, representing wasted time and cost. The reasons why include losing long-term executive support, choosing manual processes over automation, being unable to motivate functions to change operational processes, or being unable to deploy and manage ML models over the long-term.

“Many organizations are constrained by artisanal development and deployment techniques, with star data scientists frequently treated as virtuosos and given considerable creative control.” Source: Deloitte

Machine learning operations provides a framework, standardized processes, and automation and other tools for the entire ML model lifecycle, including development, training, packaging, validation, deployment, monitoring, and evolution.

Ensuring ML models are reproducible: Even the best ML models will be of limited use if they aren’t reproducible. Manual processes often aren’t repeatable, which is another argument for MLOps. MLOps centralizes resources, helps orchestrate the many pipelines required to develop and deploy MLOps, and automates critical processes, among other gains.

Planning for monitoring and training: ML models and data drift over time, so they need to be closely monitored and trained on new data. After being retrained, models will need to be revalidated and redeployed. So, enterprises need to ensure they have the teams, processes, and automated capabilities in place to move to a continuous improvement model.

Scaling ML requires close collaboration: Collaborators want to work together to explore data, conduct model experiments, perform feature engineering, manage models and get them ready for deployment and ongoing monitoring. With an MLOps framework, everyone uses the same processes, ensuring consistency across work.

Ensuring enterprise readiness: Enterprise teams need to lay some important groundwork before they deploy ML models, including ensuring adequate security for data and models; applying governance; and capturing data on decision making and processes for compliance. The top ML model risks include adversarial attacks that cause models to make false predictions; data poisoning, or manipulation; training manipulation; transfer learning attacks; and malicious data extraction from models. In addition, enterprises need a clear audit trail for all ML activities, to demonstrate compliance with relevant requirements and regulations.

Types of ML Models

Criterion/Type Generic Custom Custom Adaptive
Applicability Universal/One fits all Specific/One fits one Specific for most/ One fits one
Scalability Limited Models can be scaled
(extra effort)
Models can be scaled
Deployment
(ease and speed)
Fast Slow(MLOps) Specific for most/ One fits one
Cost
(development and deployment)
Low
(across the board)
High
(high development costs + high deployment costs)
Medium
(medium development costs + low deployment costs)
Accuracy Low High Medium to High
Support Medium High Low

Source: VentureBeat

Key components of MLOps

MLOps spans myriad use cases and strategies for developing models. However, what they all have in common are:

Navigating an exploratory phase: Teams prepare or acquire data, create AI models, test algorithms to see which ones work, and create models that can be productionized.

Productionizing the model: Next, it’s time to reproduce the process of creating the model, by leveraging CI/CD pipelines and training models. After that, teams will freeze the ML pipeline to prepare for deployment.

Deploying the model: Teams will then push the ML model to a centralized store, package it to run in different environments, validate it, and deploy it to the target system/server.

Implementing MLOps: Teams may use manual processes to develop and deploy a few models. However, to improve speed and scale of deployments, they’ll need to automate the ML lifecycle.

Who’s who in terms of MLOps startups and MLOps companies

Nearly all (88%) of MLOps providers are startups. MLOps startups have already received $3.8 billion and provide data governance, data monitoring, ML monitoring, ML platforms, and serving platforms. Providers include hyperscalers and data platforms (listed below), as well as Tredence, which recently received $175M from private equity firm Advent International to accelerate data-fueled growth and AI value realization for industry companies.

All of the major cloud providers, including AWS, Microsoft Azure, and Google Cloud Platform provide MLOps tools.

Using MLOPs with Databricks: Databricks Machine Learning is a data-native, collaborative and full-lifecycle ML platform that does data engineering work, so that you can focus on building scalable and replicable models. Built natively with MLflow and Delta Lake — two of the world’s most popular open-source projects — Databricks Machine Learning accelerates machine learning efforts all the way from featurization to training, tuning, serving and monitoring.

Using MLOps with Microsoft Azure: Microsoft Azure provides tools for data scientists and IT professionals to use to accelerate the automation, collaboration, and reproducibility of ML workflows. Teams can build MLOps workflows and models with MLFlow and Azure Machine Learning, easily deploy highly accurate models anywhere, efficiently manage the entire ML lifecycle, and improve team collaboration. In addition, Microsoft Azure helps implement governance across all ML assets.

Using MLOps with AWS: AWS Sagemaker enables teams to deliver high-performance ML models at scale. Sagemaker provides repeatable training workflows to accelerate model development, catalogues ML artifacts centrally to enable model reproducibility and governance, integrates ML workflows with CI/CD pipelines for faster time to production, and continuously monitors data and models in production to maintain quality.

Using MLOps with Google Cloud Platform: Google Cloud Platform provides Vertex AI, a machine learning (ML) platform that empowers teams to train and deploy ML models and AI applications. Teams can use the platform to orchestrate workflows, track metadata, identify the best model for a use case, manage model versions and features, and monitor model quality.

What are MLOps best practices?

Ready to get started with MLOps and learn how to deploy a ML model? Here are some strategies from Databricks and Tredence to set up your practice for success and industrialize ML model deployment.

Getting started with MLOps

Here are some general recommendations on capabilities you need to have in place to get started with machine learning ops.

  • Manage ML models: An MLOps platform will provide a central repository for all your frameworks, models, experiment summaries, and metadata, making it easy to find and share model information.
  • Pursue MLOps certifications: Ensure your team has the latest skills and knows how to use MLOps frameworks, platforms, tools, and workflow. Many leading companies are offering MLOps training and certifications, including Amazon, Coursera with Google Cloud, LinkedIn, and Udemy’s Azure Machine Learning and Operations course.
  • Build for reproducibility: Data, experiments, and processes must all be reproducible, enabling you to scale, troubleshoot, and more.
  • Design for scalability: You’ll want to ensure that your people, platform and processes can scale with business growth. You’ll need to ensure that you have adequate talent, computing, storage, and other key resources to support your ML strategy.
  • Plan for security: By considering and implementing security controls throughout your ML pipeline, you can protect data from unauthorized access. Most organizations apply defense in depth strategies, using multiple tools across different layers to secure data and processes.
  • Ensure governance and compliance: Many ML models use sensitive data, including PII and PCI. An MLOps platform can help ensure that your ML program meets corporate, industry, and regulatory requirements by centralizing all assets and activity in a single location.
  • Ensure high availability and disaster recovery: ML models used for real-time applications need to meet high availability requirements. Plan for redundancy and recovery to avoid business impacts.
  • Use a CI/CD pipeline: A CI/CD platform will enable you to automate the development, testing, and deployment of ML models.
  • Automate configuration management: An MLOps platform enables you to automate and configure ML infrastructure, reducing errors and improving consistency.

How to deploy an ML model

Now, you’re ready to get build. Here’s how to develop and deploy an ML model.

  • Prepare data: Your data engineering team will get data ready for production and consumption. This process involves harmonizing data from multiple sources.
  • Explore data: You and your colleagues can explore, share, and get data ready for the entire ML lifecycle by creating data sets, tables, and visualizations that can be reproduced, edited, and shared.
  • Prepare data and conduct feature engineering: You can transform, aggregate, and de-duplicate data, refining features and making them visible and shareable to your team.
  • Train and tune models: Use automated MLOps tools to train and improve model performance by optimizing weights and biases and creating code that’s ready to be reviewed, optimized, and hen deployed.
  • Set parameters: Your ML platforms will set some parameters, but your team will need to set others. Data scientists become skilled at configuring parameters so that ML models can learn.
  • Verify model: Test the model to see if it can accomplish key goals and maintain its integrity when new data is introduced.
  • Review all models: Track ML models across their entire lifecycle, including lineage, versions, artifacts, and transitions. Use your MLOps platform to share and collaborate on models.
  • Optimize model inference and serving: Use a CI/CD tool to automate key aspects of model inference and serving, such as refresh frequency, inference request times, and other testing and Q&A requirements.
  • Orchestrate deployments: MLOps platforms enable you to deploy ML models on a server and make them available as a REST endpoint, A/B test them, and roll back unwanted changes. You can automate permissions and the creation of clusters to productionize registered ML models.
  • Perform ongoing monitoring: You’ll want to identify problems with your models, such as drift, AI biases, and anomalies early and often. You can enable REST API to monitor models.
  • Create an audit trail: Capture and store information about events that occur during ML model lifecycles. You can use this information to monitor performance, debug issues, and support compliance processes.
  • Automate ML model retraining: ML models drift. By creating alerts and automating retraining, you can correct model drift before it harms the business.

How to improve ML model accuracy?

Data scientists use a variety of strategies to improve ML model accuracy. Only highly accurate ML models can be used for critical processes, such as improving decision making. Here are some strategies from Towards Data Science.

  • Ask the right questions: If models aren’t delivering expected answers, it may not be a data or configuration issue: Your team may be asking the wrong questions. Challenge your assumptions to see if they’re valid.
  • Increase training data and context: ML models that train on more data or have greater context improve their forecasting accuracy.
  • Address missing and outlier values: Missing or outlier values can greatly skew algorithm results, because the ML model will reach the wrong conclusions. Add or fix these values to get better results.
  • Do more feature engineering: Create new features that enhance your ML models’ prediction capabilities. The most common way to do this is to combine two or more existing features into new ones.
  • Remove unnecessary features: If features aren’t important, consider removing them. By doing so, you’ll help reduce ML model noise and data redundancy.
  • Try multiple algorithms: Use a process of trial and error to determine which model produces the best data results.
  • Adjust and cross-validate hyperparameters: Create different data training sets to test how different combination of hyperparameters perform. Or use grid search to try every combination of parameters to discover which set optimize model accuracy.

Are you ready to take your ML operations to the next level? Learn more about MLOps and its benefits by reading our latest article, "MLOps: Only Way to Eat the Elephant?".

Key MLOps roles and responsibilities

ML Roles Functional Responsibilities Before MLOps Responsibilities After MLOps
Telecom Banner
Data Scientists
  • Frame KPIs to meet business needs.
  • Build and deliver working models.
  • Test and validate models to meet KPIs
  • Automate ML model packaging and delivery, streamlining deployment..
  • Develop test cases to enhance and control ML model quality, enabling continual improvement.
  • Gain centralized visibility into ML models, allowing teams to make fast adjustments to the model pipeline.
Telecom Banner
ML engineer
  • Retrieve data.
  • Integrate ML models.
  • Streamline ML models to make them compatible with non-ML applications.
  • Use visibility to enhance the performance of deployed models.
  • Source model versioning and tests.
  • Ensure seamless compatibility with non-ML applications.
Telecom Banner
Data architects
  • Create a scalable framework for ML model pipelines.
  • Improve ML model performance using the available tools.
  • Centralize visibility into ML models, to reduce waste and improve resource management.
  • Enhance the team’s ability to assess and adjust infrastructure requirements.
Telecom Banner
Visualization expert
  • Creates dashboards and visualizations for business users.
  • Creates dashboards that synthesize insights from large data sets.
  • Develops dashboards across more industry use cases.
Telecom Banner
Domain expert
  • Frames business KPIs, goals
  • Monitors ML model performance.
  • Gains measurable KPIs with easy-to-understand visualizations.
  • Benefits from model monitoring processes that are intuitive, fast to execute, and hassle-free.
Telecom Banner
Business stakeholders
  • Wait long times for data and analytics for decision making.
  • Benefit from fast processes that deliver new models at pace.
  • Use ML models to make decisions, such as where to place new investments.
Telecom Banner
Data governance teams
  • Tackle data issues on a limited basis with just a few ML models.
  • Set strategy for data use in ML model development.
  • Oversee data governance, compliance, and privacy issues.

How industry companies are putting MLOps to work

Many industries are experimenting with MLOps, but just a few have an extensive number of models in production. The software and technology (13.7%) and consumer packaged goods industries (8.2%) are leading other verticals, with having 100 or more ML models in production.

Telecom Banner

Source: HBR/Capital One report

Here are some examples of how industry companies are using ML and MLOps to create competitive advantage.

All industries: Analyzing data to uncover new customer needs and product development opportunities. Using insights into operational processes to create new efficiencies.

Consumer packaged goods (CPG): CPG companies need to understand customer demand to produce enough goods, place them in the right locations, and innovate products. By deploying more ML models with MLOps, CPG manufacturers can improve forecasting, focus investments on the right product development opportunities, and dynamically adjust inventory levels and pricing across markets as demand fluctuates. They do this by leveraging reinforcement learning, which seeks to optimize behaviors to improve cumulative rewards, or supervised learning models, which teach models to achieve a desired result.

Tredence partnered with a global CPG company to automate 80 percent of ML model development, deploy 100k+ ML models across as 20+ markets, reduce model support costs by over 50% and reduce weekly execution time by 22 percent.

E-Commerce: eCommerce companies can use ML models to categorize products correctly and recommend products to customers. This can mean making product recommendations to customers based on their search history, offering related products, or providing reordering prompts. Companies can improve their personalization strategies by using natural language processing (NLP) algorithms that detect customers’ preferences as they interact with their website and app.

Supply chain and logistics: Supply chain and logistics companies can use ML models to determine the right amount of inventory to keep on hand and automate picking and packing, enabling them to offer same-day shipping. They also can optimize warehouse processes, determine the best routes for delivery trucks to take to minimize traffic and track shipments. MLOps processes bring new visibility and efficiencies to a business that often has razor-thin margins.

Retail: Retailers can use ML to solve a wide array of challenges, such as improving customer segmentation, predicting which customers will churn, cross-selling and upselling goods, detecting and preventing fraud, reducing product returns, optimizing pricing, and more. MLOps enables retailers to manage their businesses strategically, driving profitability.

Tredence helped a global retailer scale its MLOps practice across multiple business functions. The retailer reduced the time to build new models and complete feature engineering by 40 percent and the time to onboard new models by 60 percent.

Develop MLOps capabilities with Tredence

There are numerous MLOps tools, platforms and partners in the market. So, why work with Tredence?

The 15 percent of companies leading on AI get 3.4X greater returns than laggards.
Source: McKinsey

Tredence offers some compelling advantages to its partners:

We are a data analytics company: Many companies offer data analytics as one of many services. We focus deeply and exclusively on data analytics, which gives us unique insights into enterprises’ pain points with data and the problems they want to solve.

We offer a centralized monitoring tool to industrialize processes. Our MLOps platform, MLWorks, provides enterprises with an ML framework, including automated workflows, pre-built accelerators, and a centralized monitoring tool for MLOps observability. Using MLWorks makes model management simpler and more accessible for team.

  • It not only manages MLOps successfully but also industrialize it. It provides automated workflows and pre-built accelerators that help track model degradation, manage code workflow, and fast track model Ops management across the enterprise. It can scale thousands of machine learning models, reduce outages and simplify model monitoring.
  • Enterprises can use MLWorks to ensure their ML models in production are current, contextual, and provide deep insight into model health. By doing so, they can realize value faster.

Leverage AI/ML accelerators: Tredence offers ATOM.AI, an accelerator ecosystem with pre-built ML models, a standardized data architecture, implementable use cases, and automated infrastructure provisioning solutions to speed the path to value.

ATOM.AI includes:

Telecom Banner

Other MLOps accelerators include templatized notebooks with integrated feature store support, standardized frameworks for tracking experimentation, extendable ML libraries that provide a simplified view of model predictions, along with metrics customized for user personas and the most relevant information for key business functions. A visual workflow graph provides end-to-end model visibility and pipeline traceability, while automated alerts enhance production model monitoring and optimization.

Benefit from plug and play capabilities: MLWorks is completely native to all leading cloud platforms. As a result, it can seamlessly integrate into your data and ML ecosystem and automate model monitoring.

Use managed services: When you work with Tredence, you can focus your team on innovating. Tredence data science teams can provide 24/7 model management, to meet production SLAs and improve operational efficiency.

Leverage insights to create a data culture: MLWorks provides a simplified view of model predictions, along with metrics that are customized for user personas and provide the most relevant information for business functions. Additional tools improve end-to-end model visibility, pipeline traceability, and production model monitoring.

Do you know the best way to start with Machine Learning is to leverage the Glass Box AutoML provided out of the box from Databricks?

The time to industrialize ML is now

The past few years have witnessed incredible advances, as companies have digitized business models, processes, products and services. Now, they stand ready to reap the gains of generative AI, ML, and MLOps.

Now is the time to set up and evolve an MLOps practice with the right tools, processes, and partner. You can move ahead of competitors, creating new visibility into your business and the ability to sense and respond to demand and other changes. Drive revenues, create cost savings, and continually enhance profitability with new capabilities.

Contact Tredence to learn more about how you can create competitive advantage with MLOps, Atom.AI, and MLWorks.

Unleash your business potential with Tredence