Cortex: An open supply various to SageMaker

Cortex: An open supply various to SageMaker

Cortex is an open supply platform that takes machine studying fashions—educated with almost any framework—and turns them into manufacturing internet APIs in a single command.


Beneath, we’ll stroll by means of the right way to use Cortex to deploy OpenAI’s GPT-2 mannequin as a service on AWS. You may must set up Cortex in your AWS account earlier than getting began.

Step 1: Configure your deployment

Outline a deployment and an api useful resource. A deployment specifies a set of APIs which are deployed collectively. An api makes a mannequin out there as an online service that may serve real-time predictions. The configuration under will obtain the mannequin from the cortex-examples S3 bucket. You possibly can run the code that generated the mannequin right here.

# cortex.yaml

- variety: deployment
  title: textual content

- variety: api
  title: generator
  mannequin: s3://cortex-examples/text-generator/gpt-2/124M

Step 2: Add request dealing with

The mannequin requires encoded information for inference, however the API ought to settle for strings of pure language as enter. It must also decode the inference output. This may be applied in a request handler file utilizing the pre_inference and post_inference features:


from encoder import get_encoder
encoder = get_encoder()

def pre_inference(pattern, metadata):
    context = encoder.encode(pattern["textual content"])
    return "context": [context]

def post_inference(prediction, metadata):
    response = prediction["pattern"]
    return encoder.decode(response)

Step three: Deploy to AWS

Deploying to AWS is so simple as working cortex deploy out of your CLI. cortex deploy takes the declarative configuration from cortex.yaml and creates it on the cluster. Behind the scenes, Cortex containerizes the mannequin, makes it servable utilizing TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.

$ cortex deploy

deployment began

You possibly can observe the standing of a deployment utilizing cortex get. The output under signifies that one duplicate of the API was requested and one duplicate is obtainable to serve predictions. Cortex will routinely launch extra replicas if the load will increase and spin down replicas if there’s unused capability.

$ cortex get generator --watch

standing   up-to-date   out there   requested   final replace   avg latency
stay     1            1           1           8s            123ms

url: http://*** content/generator

Step four: Serve real-time predictions

Upon getting your endpoint, you can also make requests:

$ curl http://*** content/generator 
    -X POST -H "Content material-Kind: software/json" 
    -d ''

Machine studying, with a couple of thousand researchers around the globe at the moment, want to create computer-driven machine studying algorithms that will also be utilized to human and social issues, comparable to schooling, well being care, employment, medication, politics, or the atmosphere...

Any questions? chat with us.

Extra examples

Key options

  • Autoscaling: Cortex routinely scales APIs to deal with manufacturing workloads.

  • Multi framework: Cortex helps TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and extra.

  • CPU / GPU assist: Cortex can run inference on CPU or GPU infrastructure.

  • Rolling updates: Cortex updates deployed APIs with none downtime.

  • Log streaming: Cortex streams logs from deployed fashions to your CLI.

  • Prediction monitoring: Cortex displays community metrics and tracks predictions.

  • Minimal declarative configuration: Deployments are outlined in a single cortex.yaml file.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.