What is Vertex AI?
- Sharon Rajendra Manmothe
- Dec 2, 2024
- 9 min read
When we think of Google, the first thing that comes to mind is its dominance in the search engine space. However, Google has made substantial contributions to the data science industry, consistently delivering state-of-the-art products and solutions to help users unlock the full potential of their data.
One of Google’s standout contributions is Vertex AI, a platform launched in 2021 to simplify the machine learning (ML) process on an enterprise scale.
In this tutorial, we will explore how to get started with Google’s Vertex AI platform and use it to manage the various stages of the ML lifecycle. By the end, we’ll have a deployed model ready to generate predictions for a classification task.
A Comprehensive Solution for the Machine Learning Life Cycle
The machine learning lifecycle is a multi-step process that encompasses the following stages:
Data Preparation, Ingestion, and Exploration
Feature Engineering and Selection
Model Training and Tuning
Deployment and Model Monitoring
Each stage comes with its own set of tools, techniques, and challenges. Typically, implementing ML solutions requires a team of specialists with expertise in diverse areas.
This is where Vertex AI by Google Cloud stands out—it unifies and streamlines the entire ML lifecycle within a single, cohesive platform.
Simplifying Machine Learning for Everyone
Vertex AI is designed to cater to users of all skill levels, democratizing machine learning in the following ways:
1. AutoML for Beginners
No Coding Required: With AutoML, users with minimal technical expertise can create high-quality ML models.
Automated Workflows: From data preparation to hyperparameter tuning, everything is handled seamlessly under the hood.
Quick Results: Users simply upload their data and follow a few simple steps to build a model.
2. Custom Model Training for Experts
Flexibility and Freedom: Experienced data scientists can train models using their preferred frameworks, such as TensorFlow, PyTorch, or XGBoost.
Advanced Capabilities: Vertex AI provides robust tools to handle complex ML workflows, allowing experts to push the boundaries of what’s possible.
3. Simplified Deployment for Everyone
Real-Time APIs: Easily deploy models to serve real-time predictions, integrating them into live applications.
Batch Predictions: Perform large-scale inference tasks efficiently.
Unified ML Workflow in Action
Vertex AI’s holistic approach not only saves time but also bridges the gap between technical experts and business users, enabling organizations to scale their ML initiatives efficiently.
In the sections to follow, we will dive deeper into the features and practical applications of Vertex AI, walking through how it simplifies the end-to-end machine learning lifecycle.
Stay tuned as we unravel the full potential of Vertex AI—a tool that’s reshaping how enterprises leverage machine learning!
What Are Google Cloud Services?
Before diving into the specifics of Vertex AI, it’s essential to understand the broader ecosystem it belongs to—Google Cloud Services. This suite of cloud computing solutions provides an extensive range of tools to support storage, networking, databases, analytics, and machine learning tasks.
Google Cloud Services seamlessly integrate with Vertex AI, offering a unified platform for managing the end-to-end machine learning workflow. Let’s explore some of the key services that complement Vertex AI:
Key Google Cloud Services
1. Data Storage and Management
Efficient machine learning requires robust data storage and management capabilities, and Google Cloud offers powerful tools to handle this:
Cloud Storage
Acts as the central repository for raw data, making it easily accessible to Vertex AI.
Stores datasets required for model training and analysis, ensuring scalability and security.
BigQuery
A high-performance, serverless data warehouse designed for large-scale datasets.
Enables advanced data querying and analytics, which can be directly integrated with BigQuery ML to train models.
2. Compute Resources
Machine learning workloads, especially at scale, demand significant computational power. Google Cloud offers versatile compute solutions:
Compute Engine
Provides customizable virtual machines (VMs) to handle resource-intensive ML tasks.
Vertex AI can utilize these VMs for custom model training, allowing you to scale resources as needed.
Vertex AI Pipelines
Orchestrates complex ML workflows across multiple compute resources, improving efficiency and traceability.
Facilitates automation, ensuring your ML processes are optimized for both time and cost.
Getting Started with Google Cloud Services
To explore these services and begin your journey with Vertex AI, follow these steps:
Visit the Homepage
Navigate to cloud.google.com to explore the available services.
If you’re new, click “Get started for free” to activate a free trial and access introductory credits.
Access the Cloud Console
Go directly to console.cloud.google.com for hands-on access to Google Cloud tools.
This is your control center for managing projects, accessing services, and monitoring resource usage.
By leveraging the full spectrum of Google Cloud Services, Vertex AI users can optimize their machine learning workflows. These services work in perfect harmony to ensure that your data, compute resources, and processes are aligned for maximum efficiency.
Setting Up Your Google Cloud Console for Vertex AI
Before you begin your journey with Vertex AI, you’ll need to set up your Google Cloud Console. Follow this step-by-step guide to configure the console and prepare it for your Vertex AI projects. Note that you should be prepared to spend approximately $25–30 for this tutorial, using the most affordable configurations.
Step 1: Visit the Google Cloud Console
Navigate to the Google Cloud Console. You’ll typically land on the welcome page, which displays your workspace name (e.g., ibexprogramming.com).
Step 2: Create a Project
Projects in Google Cloud are organizational units used to manage resources for specific tasks. Here’s how to create one:
Go to the Console
On the welcome page, look for the option to create a new project.
Create the Project
Follow the prompts to name your project and configure basic settings.
Once created, select your project. You’ll notice its name displayed on the top bar of the page.
Step 3: Set Up a Billing Account
Vertex AI requires billing information to enable its services. Don’t worry—you’ll only be charged for paid resources you use.
Visit the Billing Console
Head over to the Google Cloud Billing Console.
Create an Account
If you don’t have an existing billing account, click on “Create account” and follow the setup instructions.
Link Your Billing Account
Once your billing account is created, link it to your project to enable payment for any resources Vertex AI might use.
Step 4: Access Vertex AI
View All Products
Return to the Google Cloud Console and click on “View all products” at the bottom of the page.
Find and Pin Vertex AI
Use the search function (Ctrl + F) to locate Vertex AI in the list of services.
Pin it to your menu for quick access.
Enable the Vertex AI API
Click on Vertex AI in the menu to navigate to its dashboard.
If prompted, enable the Vertex AI API by clicking “Enable.”
Alternatively, if no prompt appears, use the “Enable all API permissions” option to ensure Vertex AI services are activated.
Step 5: Ready to Upload Your Dataset
With your project created, billing account linked, and Vertex AI API enabled, your console is now set up for your machine learning tasks. Next, you’ll upload a dataset to Vertex AI to begin the process of building and deploying models.
Uploading a Dataset in Vertex AI
Adding a dataset is one of the foundational steps in building machine learning models using Vertex AI. For simplicity, we’ll use a local CSV file in this tutorial. Let’s walk through the process step by step.
Step 1: Download a Sample Dataset
For this example, we’ll use the Dry Bean Dataset from the UCI Machine Learning Repository.
About the Dataset
Description: Contains numeric measurements of 13,611 instances of beans.
Task: Classify beans into seven categories:
Seker
Barbunya
Bombay
Cali
Dermosan
Horoz
Sira
Save the Dataset
Download the dataset as a ZIP file from the UCI repository.
Extract the file, and locate the Excel (.xlsx) file in the ZIP archive.
Save the Excel file in a suitable directory on your local machine (e.g., in a data folder within your working directory).
Step 2: Read and Convert the Dataset
To use the dataset with Vertex AI, we need to convert it from Excel format to CSV. Here’s how to do it:
Import Required Libraries
First, ensure you have the necessary libraries installed. The pandas library is used for data manipulation, and openpyxl is required for reading Excel files:
pip install pandas openpyxl
Code for Reading and Saving the Dataset
from pathlib import Path
import pandas as pd
# Set the file paths
cwd = Path.cwd()
data_path = cwd / "data" / "Dry_Bean_Dataset.xlsx"
# Read the Excel file
beans = pd.read_excel(data_path)
# Display the shape of the dataset
print(beans.shape) # Output: (13611, 17)
# Save the dataset as a CSV file
csv_path = cwd / "data" / "dry_bean.csv"
beans.to_csv(csv_path, index=False)
Step 3: Validate the CSV File
Ensure the CSV file has been saved correctly by navigating to the directory and opening the file. The dataset should now be ready for upload into Vertex AI.
Next Steps: Uploading the Dataset to Vertex AI
In the next section, we’ll guide you through the process of uploading this CSV file into Vertex AI, configuring the dataset, and preparing it for model training. Stay tuned!
Create a Cloud Storage Bucket
To manage and store raw data for machine learning tasks in Vertex AI, we need to set up a Google Cloud Storage Bucket. This serves as a centralized repository for files like our dataset. Follow these steps to create and configure your bucket.
Step 1: Link a Billing Account
To use storage services, your Google Cloud project must have a billing account linked. Here's how to do that:
Navigate to your Google Cloud Console.
Go to the Billing section from the navigation menu.
Link the billing account you created earlier to your current project.
Step 2: Create a Cloud Storage Bucket
Buckets must have globally unique names, so take care when naming them. Follow these steps:
In the Cloud Console, locate Storage in the navigation menu and click on Buckets.
Click the “Create” button.
Provide a unique name for your bucket.
For example: my-vertex-ai-dataset-storage.
Select the default options for the remaining fields (e.g., region, storage class) and click “Continue” until the bucket is created.
Congratulations! Your bucket is now ready to store data.
Ingest a Local CSV into Vertex AI
With your Cloud Storage Bucket created and the CSV file prepared in the previous step, let’s upload the file into Vertex AI.
Step 1: Upload the CSV to the Cloud Bucket
Open the Storage > Buckets section in the Google Cloud Console.
Click on your bucket’s name to open it.
Use the Upload Files button to upload your dry_bean.csv file into the bucket.
Step 2: Create a Dataset in Vertex AI
Go to the Vertex AI Dashboard.
Navigate to the Datasets tab in the left-hand menu.
Click + Create Dataset.
Step 3: Configure the Dataset
Dataset Name: Provide a unique name for the dataset, such as dry-bean-classification.
Data Type: Choose Tabular (since we’re working with structured data).
Source Options: Select Upload from Cloud Storage.
Enter the Cloud Storage path for your uploaded CSV file (e.g., gs://my-vertex-ai-dataset-storage/dry_bean.csv).
Step 4: Analyze the Dataset
Once uploaded, Vertex AI will analyze the dataset. You will see the Analyze tab displaying metadata like:Configuring JupyterLab and Compute Resources in Vertex AI Workbench
Vertex AI Workbench provides a powerful, integrated development environment for building, training, and deploying machine learning models. Here’s how you can set it up for your project:
Step 1: Creating a Workbench
Navigate to the Workbench tab in the Vertex AI Dashboard.
Click on the + New Notebook button.
Provide a name for your notebook, such as vertexai-tutorial-notebook.
Configure the hardware:
Machine Type: Choose a smaller instance like n1-standard-2 for cost efficiency (~$0.12/hour).
Disk Size: Default options are sufficient for small datasets.
Set the idle shutdown duration to 10 minutes to ensure the environment automatically stops if left idle.
Click Create to initialize the workbench.
The notebook will take a few minutes to provision. Once ready, the OPEN JUPYTERLAB button will appear.
Step 2: Setting Up JupyterLab
Click OPEN JUPYTERLAB to launch the environment.
Create a new Python 3 notebook:
Go to File > New Notebook.
Rename the notebook to something meaningful (e.g., vertex_ai_setup).
Step 3: Install the Required SDK
In the first cell of your notebook, install or upgrade the Google Cloud AI Platform SDK:
python
!pip3 install --upgrade --quiet google-cloud-aiplatform
This SDK allows you to interact with Vertex AI services directly.
Step 4: List Available Projects
Verify your active project configuration using the gcloud command-line tool:
python
!gcloud config list
The output will display details like the current project ID, region, and compute configurations:
plaintext
[compute] region = us-central1 [core] account = your-email@example.com project = vertexai-tutorial-423010
Step 5: Save Configuration Details
From the output, save the following details as Python variables:
python
Copy code
PROJECT_ID = 'vertexai-tutorial-423010' BUCKET_URI = 'gs://your-unique-bucket-name' REGION = 'us-central1'
Note: Replace BUCKET_URI with the path to your actual Google Cloud Storage bucket.
Step 6: Initialize the SDK
Import the aiplatform module and initialize it using the saved configurations:
python
from google.cloud import aiplatform as ai ai.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)
This sets up the Vertex AI SDK for use in your project.
Step 7: Manage the Workbench Instance
When you finish working or need a break, stop the notebook instance to avoid unnecessary costs:
Return to the Vertex AI Workbench tab.
Locate your active notebook instance.
Click Stop to shut it down.
Tip: Always monitor your active instances to manage expenses effectively.
With JupyterLab configured, you're ready to explore Vertex AI's capabilities further, including custom training and AutoML workflows!
Number of rows and columns.
Data distribution and basic statistics.
Comments