Latest AI Tools for AI Media Processing NEWS

[ad_1]

Amazon SageMaker provides a set of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) get started training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can handle different types of input data including table, image and text.

The SageMaker XGBoost algorithm allows you to easily run XGBoost training and inference on SageMaker. XGBoost (eXtreme Gradient Boosting) is a popular and efficient open source implementation of the gradient boosted tree algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm works well in ML competitions because of its powerful handling of different data types, relationships, distributions, and the variety of hyperparameters you can specify. You can use XGBoost for regression, classification (binary and multi-class) and ranking problems. You can use GPUs to speed up training on large datasets.

Today we are excited to announce that SageMaker XGBoost now offers fully distributed GPU training.

Starting with version 1.5-1 and higher, you can now use all GPUs when using a multi-GPU instance. The new feature addresses your needs to use fully distributed GPU training when dealing with large datasets. This means you can use multiple Amazon Elastic Compute Cloud (Amazon EC2) instances (GPUs) and use all GPUs per instance.

Distributed GPU training with multiple GPU instances

With SageMaker XGBoost 1.2-2 or later, you can use one or more single GPU instances for training. hyperparameter tree_method should be set to gpu_hist. When using more than one instance (distributed setup), the data must be split between the instances as follows (same as the non-GPU distributed training steps mentioned in the XGBoost algorithm). While this setting is effective and can be used in a variety of training settings, it does not apply to all GPUs when you select a multi-GPU instance such as g5.12xlarge.

With SageMaker XGBoost 1.5-1 and higher, you can now use all GPUs on each instance when using multiple GPU instances. The ability to use all GPUs in multiple GPU instances is offered by integrating the Dask Framework.

You can use this option to finish training quickly. In addition to saving time, this option is also useful for working around blockers, such as (soft) limits on maximum usable instances, or if a training job cannot provide a large number of single GPU instances for some reason.

The configuration for using this option is the same as the previous option, except for the following differences:

Add a new hyperparameter use_dask_gpu_training with a string value true.
Set the distribution parameter when creating the TrainingInput FullyReplicated, using single or multiple instances. The core Dask framework will handle the data loading and split the data between Dask workers. This is different from the data distribution setting for all other distributed training with SageMaker XGBoost.

Note that splitting the data into smaller files still works for Parquet, where Dask will read each file as a partition. Since you will have a Dask worker per GPU, the number of files must exceed the number of instances * the number of GPUs per instance. Also, having too small a file size and too many files can slow down performance. For more information, see Avoiding very large graphs. For CSV, we still recommend splitting large files into smaller ones to reduce data download times and enable faster reading. However, this is not a requirement.

Currently, the input formats supported by this option are:

text/csv
application/x-parquet

The following input modes are supported:

The code looks like this:

import os
import boto3
import re
import sagemaker
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput
from sagemaker.xgboost.estimator import XGBoost

role = sagemaker.get_execution_role()
region = sagemaker.Session().boto_region_name
session = Session()

bucket = "<Specify S3 Bucket>"
prefix = "<Specify S3 prefix>"

hyperparams = 
    "objective": "reg:squarederror",
    "num_round": "500",
    "verbosity": "3",
    "tree_method": "gpu_hist",
    "eval_metric": "rmse",
    "use_dask_gpu_training": "true"



output_path = "s3:////output".format(bucket, prefix)

content_type = "application/x-parquet"
instance_type = "ml.g4dn.2xlarge"

xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
xgb_script_mode_estimator = sagemaker.estimator.Estimator(
    image_uri=xgboost_container,
    hyperparameters=hyperparams,
    role=role,
    instance_count=1,
    instance_type=instance_type,
    output_path=output_path,
    max_run=7200,

)

test_data_uri = " <specify the S3 uri for training dataset>"
validation_data_uri = “<specify the S3 uri for validation dataset>”

train_input = TrainingInput(
    test_data_uri, content_type=content_type
)

validation_input = TrainingInput(
    validation_data_uri, content_type=content_type
)

xgb_script_mode_estimator.fit("train": train_input, "validation": validation_input)

The following screenshots show a successful training job log from a notebook.

Mark

We compared the evaluation metrics to ensure that the model quality did not deteriorate in the multi-GPU training path compared to the single GPU training. We also benchmarked large data sets to ensure that our distributed GPU setups were efficient and scalable.

Billing time refers to absolute wall clock time. The training time is only the XGBoost training time that is measured train() Call before the model is stored in Amazon Simple Storage Service (Amazon S3).

Performance benchmarks on large datasets

Using multiple GPUs is usually appropriate for large datasets with complex training. We created a dummy dataset with 2,497,248,278 rows and 28 features for testing. The data set was 150 GB and consisted of 1,419 files. Each file size was 105-115 MB. We saved the data in Parquet format to an S3 bucket. To simulate somewhat complex training, we used this dataset for a binary classification task with 1000 rounds to compare the performance between a single GPU training path and a multi-GPU training path.

The following table provides the billing training time and performance comparison between single GPU training path and multi GPU training path.

One GPU learning path
The instance type	The number of instances	Billing Time/Instance(s)	training time(s)
g4dn.xlarge	20	without memory
g4dn.2xlarge	20	without memory
g4dn.4xlarge	15	1710	1551.9
	16	1592	1412.2
	17	1542	1352.2
	18	1423	1281.2
	19	1346	1220.3

Multi-GPU Learning Path
The instance type	The number of instances	Billing Time/Instance(s)	training time(s)
g4dn.12xlarge	7	without memory
	8	1143	784.7
	9	1039	710.73
	10	978	676.7
	12	940	614.35

We can see that using multiple GPU instances leads to lower training time and lower overall time. The single GPU training path still has some advantages in downloading and reading only part of the data for each instance and thus lower data download times. It also doesn’t suffer from Dask’s overhead. Therefore, the difference between training time and total time is smaller. However, due to the use of more GPUs, a multi-GPU setup can significantly reduce training time.

You should use an EC2 instance that has enough computing power to avoid memory errors when dealing with large datasets.

It is possible to reduce the total time even further by using a single GPU setup with more instances or a more powerful instance. However, in terms of cost, it can be more expensive. For example, the table below shows a comparison of training time and cost with a single GPU instance of g4dn.8xlarge.

One GPU learning path
The instance type	The number of instances	Billing Time/Instance(s)	cost ($)
g4dn.8xlarge	15	1679	15.22
	17	1509	15.51
	19	1326	15.22

Multi-GPU Learning Path
The instance type	The number of instances	Billing Time/Instance(s)	cost ($)
g4dn.12xlarge	8	1143	9.93
	10	978	10.63
	12	940	12.26

The cost calculation is based on the asking price for each instance. For more information, see Amazon EC2 G4 Instances.

Model quality benchmarks

For model quality, we compared the evaluation metrics between the Dask GPU option and the single-GPU option and trained on different types and number of instances. For different tasks, we used different datasets and hyperparameters, each dataset divided into training, validation and test sets.

For binary classification (binary:logistic) task, we used the HIGGS dataset in CSV format. The training partition of the dataset has 9,348,181 rows and 28 features. The number of rounds used was 1000. The following table summarizes the results.

Multi-GPU Training with Dask
*The instance type*	*Number of GPUs / instance*	*The number of instances*	*Billing Time/Instance(s)*	*accuracy %*	*F1 %*	*ROC AUC %*
g4dn.2xlarge	1	1	343	75.97	77.61	84.34
g4dn.4xlarge	1	1	413	76.16	77.75	84.51
g4dn.8xlarge	1	1	413	76.16	77.75	84.51
g4dn.12xlarge	4	1	157	76.16	77.74	84.52

for regression (reg:squarederror), we used the NYC green cab data set (with some modifications) in parquet format. The training partition of the dataset has 72,921,051 rows and 8 features. The number of rounds used was 500. The following table shows the results.

Multi-GPU Training with Dask
*The instance type*	*Number of GPUs / instance*	*The number of instances*	*Billing Time/Instance(s)*	*MSE*	R2	*MAE*
g4dn.2xlarge	1	1	775	21.92	0.7787	2.43
g4dn.4xlarge	1	1	770	21.92	0.7787	2.43
g4dn.8xlarge	1	1	705	21.92	0.7787	2.43
g4dn.12xlarge	4	1	253	21.93	0.7787	2.44

Model quality metrics are similar between the multi-GPU (Dask) training variant and the existing training variant. Model quality remains consistent when using a distributed configuration with multiple instances or GPUs.

conclusion

In this post, we reviewed how you can use combinations of different types and number of instances for distributed GPU training with SageMaker XGBoost. For most use cases, you can use single GPU instances. This option allows for a wide range of use cases and is very efficient. You can use multi-GPU instances to train with large datasets and lots of rounds. It can provide fast training with a small number of instances. Overall, you can use SageMaker XGBoost’s distributed GPU setup to speed up your XGBoost training immensely.

To learn more about distributed training using SageMaker and Dask, see Amazon SageMaker Embedded LightGBM Now Offers Distributed Training Using Dask

About the authors

Dheeraj Thakur is a solutions architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and loves building and experimenting in the analytics and AI/ML space.

Dewan Chowdhury is a software development engineer with Amazon Web Services. It runs on Amazon SageMaker algorithms and JumpStart offers. In addition to building AI/ML infrastructure, he is also passionate about building scalable distributed systems.

Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker Embedded Algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in natural language processing, explanatory deep learning for tabular data, and robust analysis of non-parametric spatio-temporal clustering. He has published numerous papers in ACL, ICDM, KDD conferences and the Royal Statistical Society: Series A Journal.

Tony Cruz

[ad_2]

Source link

A new set of Arctic images will help artificial intelligence research MIT News

Analyzing rodent infestations using the geospatial capabilities of Amazon SageMaker

Using knowledge of social context for responsible use of artificial intelligence – Google Research Blog

Leave A Reply Cancel Reply

Amazon SageMaker XGBoost now offers fully distributed GPU training