Cortex is an open supply platform that takes machine studying fashions—educated with almost any framework—and turns them into manufacturing internet APIs in a single command.
Beneath, we’ll stroll by means of the right way to use Cortex to deploy OpenAI’s GPT-2 mannequin as a service on AWS. You may must set up Cortex in your AWS account earlier than getting began.
Step 1: Configure your deployment
deployment and an
api useful resource. A
deployment specifies a set of APIs which are deployed collectively. An
api makes a mannequin out there as an online service that may serve real-time predictions. The configuration under will obtain the mannequin from the
cortex-examples S3 bucket. You possibly can run the code that generated the mannequin right here.
# cortex.yaml - variety: deployment title: textual content - variety: api title: generator mannequin: s3://cortex-examples/text-generator/gpt-2/124M request_handler: handler.py
Step 2: Add request dealing with
The mannequin requires encoded information for inference, however the API ought to settle for strings of pure language as enter. It must also decode the inference output. This may be applied in a request handler file utilizing the
# handler.py from encoder import get_encoder encoder = get_encoder() def pre_inference(pattern, metadata): context = encoder.encode(pattern["textual content"]) return "context": [context] def post_inference(prediction, metadata): response = prediction["pattern"] return encoder.decode(response)
Step three: Deploy to AWS
Deploying to AWS is so simple as working
cortex deploy out of your CLI.
cortex deploy takes the declarative configuration from
cortex.yaml and creates it on the cluster. Behind the scenes, Cortex containerizes the mannequin, makes it servable utilizing TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.
$ cortex deploy deployment began
You possibly can observe the standing of a deployment utilizing
cortex get. The output under signifies that one duplicate of the API was requested and one duplicate is obtainable to serve predictions. Cortex will routinely launch extra replicas if the load will increase and spin down replicas if there’s unused capability.
$ cortex get generator --watch standing up-to-date out there requested final replace avg latency stay 1 1 1 8s 123ms url: http://***.amazonaws.com/textual content/generator
Step four: Serve real-time predictions
Upon getting your endpoint, you can also make requests:
$ curl http://***.amazonaws.com/textual content/generator -X POST -H "Content material-Kind: software/json" -d '' Machine studying, with a couple of thousand researchers around the globe at the moment, want to create computer-driven machine studying algorithms that will also be utilized to human and social issues, comparable to schooling, well being care, employment, medication, politics, or the atmosphere...
Any questions? chat with us.
Sentiment evaluation with BERT
Picture classification with Inception v3 and AlexNet
Autoscaling: Cortex routinely scales APIs to deal with manufacturing workloads.
Multi framework: Cortex helps TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and extra.
CPU / GPU assist: Cortex can run inference on CPU or GPU infrastructure.
Rolling updates: Cortex updates deployed APIs with none downtime.
Log streaming: Cortex streams logs from deployed fashions to your CLI.
Prediction monitoring: Cortex displays community metrics and tracks predictions.
Minimal declarative configuration: Deployments are outlined in a single