Concurrently, Google is eliminating "blind spots" in its infrastructure: a new `AI Telemetry Collector Agent` based on the OpenTelemetry standard was introduced. TPU metrics are now routed directly to Prometheus and Grafana. And the cherry on top for ML developers is the release of the open-source `Workbench Notebooks` extension for VS Code. The development environment now links seamlessly with managed cloud environments. MLOps infrastructure is maturing: instead of blindly increasing CAPEX, engineers are learning to squeeze the maximum ROI out of every leased gigaflop.
Source: Google Cloud
MLOpsGoogle CloudTPUInfrastructureDeveloper Tools