Update docs for release 0.14.0 (#420)

* Generate ref docs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix jinja template error Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update developer guide Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add Helm installation docs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix input is missing in torchserve llm docs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update kserve installation docs to use server side apply Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update Alibi explainer docs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * change to comment Signed-off-by: Dan Sun <[email protected]> * change to comment Signed-off-by: Dan Sun <[email protected]> * Update ingress controller Signed-off-by: Dan Sun <[email protected]> * Update installation guide --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]>
kserve · Nov 24, 2024 · d1ee184 · d1ee184
1 parent 71c5049
commit d1ee184
Show file tree

Hide file tree

Showing 18 changed files with 1,056 additions and 156 deletions.
diff --git a/docs/admin/kubernetes_deployment.md b/docs/admin/kubernetes_deployment.md
@@ -1,19 +1,19 @@
 # Kubernetes Deployment Installation Guide
 KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.
 
-Kubernetes 1.22 is the minimally required version and please check the following recommended Istio versions for the corresponding
+Kubernetes 1.28 is the minimally required version and please check the following recommended Istio versions for the corresponding
 Kubernetes version.
 
 ## Recommended Version Matrix
 | Kubernetes Version | Recommended Istio Version |
 | :----------------- | :------------------------ |
-| 1.27               | 1.18, 1.19                |
-| 1.28               | 1.19, 1.20                |
-| 1.29               | 1.20, 1.21                |
+| 1.28               | 1.22                      |
+| 1.29               | 1.22, 1.23                |
+| 1.30               | 1.22, 1.23                |
 
-## 1. Install Istio 
+## 1. Install Ingress Controller 
 
-The minimally required Istio version is 1.13 and you can refer to the [Istio install guide](https://istio.io/latest/docs/setup/install).
+In this guide we choose to install Istio as ingress controller. The minimally required Istio version is 1.22 and you can refer to the [Istio install guide](https://istio.io/latest/docs/setup/install).
 
 Once Istio is installed, create `IngressClass` resource for istio.
 ```yaml
@@ -25,11 +25,9 @@ spec:
   controller: istio.io/ingress-controller
 ```
 
-
 !!! note 
     Istio ingress is recommended, but you can choose to install with other [Ingress controllers](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) and create `IngressClass` resource for your Ingress option.
-
-
+
 
 ## 2. Install Cert Manager
 The minimally required Cert Manager version is 1.15.0 and you can refer to [Cert Manager installation guide](https://cert-manager.io/docs/installation/).
@@ -41,31 +39,49 @@ The minimally required Cert Manager version is 1.15.0 and you can refer to [Cert
 !!! note 
     The default KServe deployment mode is `Serverless` which depends on Knative. The following step changes the default deployment mode to `RawDeployment` before installing KServe.
 
+=== "Install using Helm"
+
+    I. Install KServe CRDs
+
+    ```shell
+    helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v{{ kserve_release_version }}
+    ```
+
+    II. Install KServe Resources
+
+    Set the `kserve.controller.deploymentMode` to `RawDeployment` and `kserve.controller.gateway.ingressGateway.className` to point to the `IngressClass`
+    name created in [step 1](#1-install-ingress-controller).
+
+    ```shell
+    helm install kserve oci://ghcr.io/kserve/charts/kserve --version v{{ kserve_release_version }} \
+     --set kserve.controller.deploymentMode=RawDeployment \
+     --set kserve.controller.gateway.ingressGateway.className=your-ingress-class
+    ```
+
+=== "Install using YAML"
 
-**i. Install KServe**
+    I. Install KServe:
+    `--server-side` option is required as the InferenceService CRD is large, see [this issue](https://github.com/kserve/kserve/issues/3487) for details.
 
-=== "kubectl"
     ```bash
-    kubectl apply -f https://github.com/kserve/kserve/releases/download/v{{  kserve_release_version }}/kserve.yaml
+    kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v{{kserve_release_version}}/kserve.yaml
     ```
 
-Install KServe default serving runtimes:
+    II. Install KServe default serving runtimes:
 
-=== "kubectl"
     ```bash
-    kubectl apply -f https://github.com/kserve/kserve/releases/download/v{{  kserve_release_version }}/kserve-cluster-resources.yaml
+    kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v{{kserve_release_version}}/kserve-cluster-resources.yaml
     ```
 
-**ii. Change default deployment mode and ingress option**
+    III. Change default deployment mode and ingress option
 
-First in ConfigMap `inferenceservice-config` modify the `defaultDeploymentMode` in the `deploy` section,
+    First in the ConfigMap `inferenceservice-config` modify the `defaultDeploymentMode` from the `deploy` section to `RawDeployment`,
 
-=== "kubectl"
     ```bash
     kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"deploy": "{\"defaultDeploymentMode\": \"RawDeployment\"}"}}'
     ```
 
-then modify the `ingressClassName` in `ingress` section to point to `IngressClass` name created in [step 1](#1-install-istio).
+    then modify the `ingressClassName` from `ingress` section to the `IngressClass` name created in [step 1](#1-install-ingress-controller).
     ```yaml
     ingress: |-
     {

diff --git a/docs/admin/serverless/serverless.md b/docs/admin/serverless/serverless.md
@@ -33,16 +33,29 @@ The minimally required Cert Manager version is 1.15.0 and you can refer to [Cert
     Cert manager is required to provision webhook certs for production grade installation, alternatively you can run self signed certs generation script.
 
 ## 4. Install KServe
-=== "kubectl"
+
+=== "Install using Helm"
+
+    Install KServe CRDs
+    ```bash
+    helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v{{ kserve_release_version }}
+    ```
+
+    Install KServe Resources
     ```bash
-    kubectl apply -f https://github.com/kserve/kserve/releases/download/v{{  kserve_release_version }}/kserve.yaml
+    helm install kserve oci://ghcr.io/kserve/charts/kserve --version v{{ kserve_release_version }}
     ```
+
+=== "Install using YAML"
+    Install KServe CRDs and Controller, `--server-side` option is required as the InferenceService CRD is large, see [this issue](https://github.com/kserve/kserve/issues/3487) for details.
 
-## 5. Install KServe Built-in ClusterServingRuntimes
-{{ kserve_release_version }}
-=== "kubectl"
     ```bash
-    kubectl apply -f https://github.com/kserve/kserve/releases/download/v{{ kserve_release_version }}/kserve-cluster-resources.yaml
+    kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v{{kserve_release_version}}/kserve.yaml
+    ```
+
+    Install KServe Built-in ClusterServingRuntimes
+    ```bash
+    kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v{{ kserve_release_version }}/kserve-cluster-resources.yaml
     ```
 
 !!! note

diff --git a/docs/admin/serverless/servicemesh/README.md b/docs/admin/serverless/servicemesh/README.md
@@ -65,7 +65,7 @@ Apply the `PeerAuthentication` and `AuthorizationPolicy` rules with [auth.yaml](
 kubectl apply -f auth.yaml
 ```
 
-### Disable Top Level Virtual Service {#disable-top-level-vs}
+### Disable Top Level Virtual Service {{ '{' }}#disable-top-level-vs{{ '}' }}
 KServe currently creates an Istio top level virtual service to support routing between InferenceService components like predictor, transformer and explainer, as well as support path based routing as an alternative routing with service hosts.
 In serverless service mesh mode this creates a problem that in order to route through the underlying virtual service created by Knative Service, the top level virtual service is required to route to the `Istio Gateway` instead of to the InferenceService component on the service mesh directly.
 
@@ -80,7 +80,7 @@ ingress : |- {
 }
 ```
 
-## Turn on strict mTLS on the entire service mesh {#mesh-wide-mtls}
+## Turn on strict mTLS on the entire service mesh {{ '{' }}#mesh-wide-mtls{{ '}' }}
 In the previous section, turning on strict mTLS on a namespace is discussed. For users requiring to lock down all workloads in the service mesh, Istio can be configured with [strict mTLS on the whole mesh](https://istio.io/latest/docs/tasks/security/authentication/mtls-migration/#lock-down-mutual-tls-for-the-entire-mesh).
 
 Istio's Mutual TLS Migration docs are using `PeerAuthentication` resources to lock down the mesh, which act at server side. This means that Istio sidecars will only reject non-mTLS incoming connections while non-TLS _outgoing_ connections will still be allowed ([reference](https://istio.io/latest/docs/concepts/security/#authentication-policies)). To further lock down the mesh, you can also create a `DestinationRule` resource to require mTLS on outgoing connections. However, under this very strict mTLS configuration, you may notice that the KServe top level virtual service will stop working and inference requests will be blocked. You can fix this either by disabling the top level virtual service [as mentioned above](#disable-top-level-vs), or by configuring the Knative's local gateway with mTLS on its listening port.
@@ -109,7 +109,7 @@ spec:
 
 After patching Knative's local gateway resource, the KServe top level virtual service will work again.
 
-## Deploy InferenceService with Istio sidecar injection {#isvc-inject-sidecar}
+## Deploy InferenceService with Istio sidecar injection {{ '{' }}#isvc-inject-sidecar{{ '}' }}
 First label the namespace with `istio-injection=enabled` to turn on the sidecar injection for the namespace.
 
 ```bash
@@ -294,7 +294,7 @@ kubectl exec -it sleep-6d6b49d8b8-6ths6 -- curl -v sklearn-iris-burst-predictor-
     {"name":"sklearn-iris-burst","ready":true}
     ```
 
-## Invoking InferenceServices from workloads that are not part of the mesh {#invoking-isvc-non-mesh}
+## Invoking InferenceServices from workloads that are not part of the mesh {{ '{' }}#invoking-isvc-non-mesh{{ '}'}}
 Ideally, when using service mesh, all involved workloads should belong to the service mesh. This allows enabling strict mTLS in Istio and ensures policies are correctly applied. However, given the diverse requirements of applications, it is not always possible to migrate all workloads to the service mesh.
 
 When using KServe, you may successfully migrate InferenceServices to the service mesh (i.e. by [injecting the Istio sidecar](#isvc-inject-sidecar)), while workloads invoking inferencing remain outside the mesh. In this hybrid environment, workloads that are not part of the service mesh need to use an Istio ingress gateway as a port-of-entry to InferenceServices. In the default setup, KServe integrates with the gateway used by Knative. Both Knative and KServe apply the needed configurations to allow for these hybrid environments, however, this only works transparently if you haven't enabled strict TLS on Istio.

diff --git a/docs/blog/articles/2022-02-18-KServe-0.8-release.md b/docs/blog/articles/2022-02-18-KServe-0.8-release.md
@@ -71,7 +71,7 @@ spec:
     - name: kserve-container
       image: kserve/sklearnserver:latest
       args:
-        - --model_name={{.Name}}
+        - --model_name={{ '{{' }}.Name{{ '}}'}}
         - --model_dir=/mnt/models
         - --http_port=8080
       resources:

diff --git a/docs/developer/developer.md b/docs/developer/developer.md
@@ -15,7 +15,7 @@ Before submitting a PR, see also [CONTRIBUTING.md](https://github.com/kserve/kse
 
 You must install these tools:
 
-1. [`go`](https://golang.org/doc/install): KServe controller is written in Go and requires Go 1.20.0+.
+1. [`go`](https://golang.org/doc/install): KServe controller is written in Go and requires Go 1.22.7+.
 1. [`git`](https://help.github.com/articles/set-up-git/): For source control.
 1. [`Go Module`](https://blog.golang.org/using-go-modules): Go's new dependency management system.
 1. [`ko`](https://github.com/google/ko):
@@ -24,6 +24,8 @@ You must install these tools:
    managing development environments.
 1. [`kustomize`](https://github.com/kubernetes-sigs/kustomize/) To customize YAMLs for different environments, requires v5.0.0+.
 1. [`yq`](https://github.com/mikefarah/yq) yq is used in the project makefiles to parse and display YAML output, requires yq `4.*`.
+1. [`pre-commit`](https://pre-commit.com/) pre-commit is used to run checks on the codebase before committing changes.
+1. [`helm`](https://helm.sh/docs/intro/install/) Helm is used to install Kserve.
 
 ### Install Knative on a Kubernetes cluster
 
@@ -33,7 +35,7 @@ KServe currently requires `Knative Serving` for auto-scaling, canary rollout, `I
 use the [Knative Operators](https://knative.dev/docs/install/operator/knative-with-operators/) to manage your installation. Observability, tracing and logging are optional but are often very valuable tools for troubleshooting difficult issues,
 they can be installed via the [directions here](https://github.com/knative/docs/blob/release-0.15/docs/serving/installing-logging-metrics-traces.md).
 
-* If you start from scratch, KServe requires Kubernetes 1.25+, Knative 1.7+, Istio 1.15+.
+* If you start from scratch, KServe requires Kubernetes, Knative, Istio. You can find the recommended [version matrix](../admin/serverless/serverless.md#recommended-version-matrix) in the installation documentation.
 
 * If you already have `Istio` or `Knative` (e.g. from a Kubeflow install) then you don't need to install them explicitly, as long as version dependencies are satisfied.
 
@@ -97,6 +99,13 @@ _Adding the `upstream` remote sets you up nicely for regularly
 Once you reach this point you are ready to do a full build and deploy as
 described below.
 
+### Install pre-commit hooks
+Configuring pre-commit hooks will run checks on the codebase before committing changes. This will help you to catch lint errors, formatting issues, and other common problems before they reach the repository.
+
+```shell
+pre-commit install --install-hooks
+```
+
 ## Deploy KServe
 
 ### Check Knative Serving installation

diff --git a/docs/get_started/README.md b/docs/get_started/README.md
@@ -13,6 +13,8 @@ You can use [`kind`](https://kind.sigs.k8s.io/docs/user/quick-start){target=_bla
 
 The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-kubectl){target=_blank}, allows you to run commands against Kubernetes clusters. You can use `kubectl` to deploy applications, inspect and manage cluster resources, and view logs.
 
+### Install Helm
+The [Helm](https://helm.sh/docs/intro/install/){target=_blank} package manager for Kubernetes helps you define, install and upgrade software built for Kubernetes.
 
 ## Install the KServe "Quickstart" environment
 1. After having kind installed, create a `kind` cluster with:
@@ -31,17 +33,10 @@ The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-
     ```bash
     kubectl config use-context kind-kind
     ```
-
     to use this context.
 
 3. You can then get started with a local deployment of KServe by using _KServe Quick installation script on Kind_:
 
     ```bash
-    curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.13/hack/quick_install.sh" | bash
+    curl -s "https://raw.githubusercontent.com/kserve/kserve/release-{{ kserve_release_version | replace('.0', '') }}/hack/quick_install.sh" | bash
     ```
-
-    or install via our published Helm Charts:
-   ```bash
-   helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v{{  kserve_release_version }}
-   helm install kserve oci://ghcr.io/kserve/charts/kserve --version v{{  kserve_release_version }}
-   ```
diff --git a/docs/modelserving/explainer/alibi/cifar10/README.md b/docs/modelserving/explainer/alibi/cifar10/README.md
@@ -16,7 +16,7 @@ spec:
       resources:
         requests:
           cpu: 0.1
-          memory: 5Gi   
+          memory: 5Gi
         limits:
           memory: 10Gi
   explainer:
@@ -25,18 +25,22 @@ spec:
         image: kserve/alibi-explainer:v0.12.1
         args:
           - --model_name=cifar10
-    alibi:
-      type: AnchorImages
-      storageUri: "gs://kfserving-examples/models/tensorflow/cifar/explainer-0.9.1"
-      config:
-        batch_size: "40"
-        stop_on_first: "True"
-      resources:
-        requests:
-          cpu: 0.1
-          memory: 5Gi 
-        limits:
-          memory: 10Gi
+          - --http_port=8080 
+          - --predictor_host=cifar10-predictor.default 
+          - --storage_uri=/mnt/models 
+          - AnchorImages 
+          - --batch_size=40 
+          - --stop_on_first=True
+        env:
+          - name: STORAGE_URI
+            value: "gs://kfserving-examples/models/tensorflow/cifar/explainer-0.9.1"
+        resources:
+          requests:
+            cpu: 0.1
+            memory: 5Gi
+          limits:
+            cpu: 1
+            memory: 10Gi
 ```
 !!! Note
     The InferenceService resource describes:

diff --git a/docs/modelserving/explainer/alibi/cifar10/cifar10.yaml b/docs/modelserving/explainer/alibi/cifar10/cifar10.yaml
@@ -9,19 +9,28 @@ spec:
       resources:
         requests:
           cpu: 0.1
-          memory: 5Gi                        
+          memory: 5Gi
         limits:
           memory: 10Gi
   explainer:
-    alibi:
-      type: AnchorImages
-      storageUri: "gs://kfserving-examples/models/tensorflow/cifar/explainer-0.9.1"
-      config:
-        batch_size: "40"
-        stop_on_first: "True"
-      resources:
-        requests:
-          cpu: 0.1
-          memory: 5Gi            
-        limits:
-          memory: 10Gi
+    containers:
+      - name: kserve-container
+        image: kserve/alibi-explainer:v0.12.1
+        args:
+          - --model_name=cifar10
+          - --http_port=8080 
+          - --predictor_host=cifar10-predictor.default 
+          - --storage_uri=/mnt/models 
+          - AnchorImages 
+          - --batch_size=40 
+          - --stop_on_first=True
+        env:
+          - name: STORAGE_URI
+            value: "gs://kfserving-examples/models/tensorflow/cifar/explainer-0.9.1"
+        resources:
+          requests:
+            cpu: 0.1
+            memory: 5Gi
+          limits:
+            cpu: 1
+            memory: 10Gi
diff --git a/docs/modelserving/explainer/alibi/income/README.md b/docs/modelserving/explainer/alibi/income/README.md
@@ -28,16 +28,25 @@ spec:
           memory: 1Gi   
   explainer:
     minReplicas: 1
-    alibi:
-      type: AnchorTabular
-      storageUri: "gs://kfserving-examples/models/sklearn/1.3/income/explainer"
-      resources:
-        requests:
-          cpu: 0.1
-          memory: 1Gi
-        limits:
-          cpu: 1
-          memory: 4Gi
+    containers:
+      - name: kserve-container
+        image: kserve/alibi-explainer:v0.12.1
+        args:
+        - --model_name=income
+        - --http_port=8080
+        - --predictor_host=income-predictor.default
+        - --storage_uri=/mnt/models
+        - AnchorTabular
+        env:
+          - name: STORAGE_URI
+            value: "gs://kfserving-examples/models/sklearn/1.3/income/explainer"
+        resources:
+          requests:
+            cpu: 0.1
+            memory: 1Gi
+          limits:
+            cpu: 1
+            memory: 4Gi
 ```
 
 Create the InferenceService with above yaml: