Policies

What is a policy?

A policy is a set of configuration that will be used to generate the data plane proxy configuration. Kuma combines policies with the Dataplane resource to generate the Envoy configuration of a data plane proxy.

What do policies look like?

Like all resources in Kuma, there are two parts to a policy: the metadata and the spec.

Metadata

Metadata identifies the policies by its name, type and what mesh it’s part of:

In Kubernetes all our policies are implemented as custom resource definitions (CRD) in the group kuma.io/v1alpha1.

apiVersion: kuma.io/v1alpha1
kind: ExamplePolicy
metadata:
  name: my-policy-name
  namespace: kuma-system
spec: ... # spec data specific to the policy kind

By default the policy is created in the default mesh. You can specify the mesh by using the kuma.io/mesh label.

For example:

apiVersion: kuma.io/v1alpha1
kind: ExamplePolicy
metadata:
  name: my-policy-name
  namespace: kuma-system
  labels:
    kuma.io/mesh: "my-mesh"
spec: ... # spec data specific to the policy kind

Spec

The spec field contains the actual configuration of the policy.

Some policies apply to only a subset of the configuration of the proxy.

  • Inbound policies apply only to incoming traffic. The spec.from[].targetRef field defines the subset of clients that are going to be impacted by this policy.
  • Outbound policies apply only to outgoing traffic. The spec.to[].targetRef field defines the outbounds that are going to be impacted by this policy

The actual configuration is defined under the default field.

For example:

apiVersion: kuma.io/v1alpha1
kind: ExamplePolicy
metadata:
  name: my-example
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Mesh
  to:
  - targetRef:
      kind: Mesh
    default:
      key: value
  from:
  - targetRef:
      kind: Mesh
    default:
      key: value

While some policies can have both a to and a from section, it is strongly advised to create 2 different policies, one for to and one for from.

Some policies are not directional and will not have to and from. Some examples of such policies are MeshTrace or MeshProxyPatch. For example

apiVersion: kuma.io/v1alpha1
kind: NonDirectionalPolicy
metadata:
  name: my-example
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Mesh
  default:
    key: value

All specs have a top level targetRef which identifies which proxies this policy applies to. In particular, it defines which proxies have their Envoy configuration modified.

One of the benefits of targetRef policies is that the spec is always the same between Kubernetes and Universal.

This means that converting policies between Universal and Kubernetes only means rewriting the metadata.

Writing a targetRef

targetRef is a concept borrowed from Kubernetes Gateway API. Its goal is to select subsets of proxies with maximum flexibility.

It looks like:

targetRef:
  kind: Mesh | MeshSubset | MeshService | MeshGateway
  name: "my-name" # For kinds MeshService, and MeshGateway a name has to be defined
  tags:
    key: value # For kinds MeshSubset and MeshGateway a list of matching tags can be used
  proxyTypes: ["Sidecar", "Gateway"] # For kinds Mesh and MeshSubset a list of matching Dataplanes types can be used
  labels:
    key: value # In the case of policies that apply to labeled resources you can use these to apply the policy to each resource
  sectionName: ASection # This is used when trying to attach to a specific part of a resource (for example a port of a `MeshService`)
  namespace: ns # valid when the policy is applied by a Kubernetes control plane 

Here’s an explanation of each kinds and their scope:

  • Mesh: applies to all proxies running in the mesh
  • MeshSubset: same as Mesh but filters only proxies who have matching targetRef.tags
  • MeshService: all proxies with a tag kuma.io/service equal to targetRef.name. This can work differently when using explicit services.
  • MeshGateway: targets proxies matched by the named MeshGateway
  • MeshServiceSubset: same as MeshService but further refine to proxies that have matching targetRef.tags. ⚠️This is deprecated from version 2.9.x ⚠️.

Consider the two example policies below:

apiVersion: kuma.io/v1alpha1
kind: MeshAccessLog
metadata:
  name: example-outbound
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      app: web-frontend
  to:
  - targetRef:
      kind: MeshService
      name: web-backend_kuma-demo_svc_8080
    default:
      backends:
      - file:
          format:
            plain: '{"start_time": "%START_TIME%"}'
          path: "/tmp/logs.txt"
apiVersion: kuma.io/v1alpha1
kind: MeshAccessLog
metadata:
  name: example-outbound
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      app: web-frontend
  to:
  - targetRef:
      kind: MeshService
      name: web-backend
      namespace: kuma-demo
      sectionName: httpport
    default:
      backends:
      - file:
          format:
            plain: '{"start_time": "%START_TIME%"}'
          path: "/tmp/logs.txt"
apiVersion: kuma.io/v1alpha1
kind: MeshAccessLog
metadata:
  name: example-inbound
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      app: web-frontend
  from:
  - targetRef:
      kind: Mesh
    default:
      backends:
      - file:
          format:
            plain: '{"start_time": "%START_TIME%"}'
          path: "/tmp/logs.txt"

Using spec.targetRef, this policy targets all proxies that have a tag app:web-frontend. It defines the scope of this policy as applying to traffic either from or to data plane proxies with the tag app:web-frontend.

The spec.to[].targetRef section enables logging for any traffic going to web-backend. The spec.from[].targetRef section enables logging for any traffic coming from anywhere in the Mesh.

Omitting targetRef

When a targetRef is not present, it is semantically equivalent to targetRef.kind: Mesh and refers to everything inside the Mesh.

Applying to specific proxy types

The top level targetRef field can select a specific subset of data plane proxies. The field named proxyTypes can restrict policies to specific types of data plane proxies:

  • Sidecar: Targets data plane proxies acting as sidecars to applications (including delegated gateways).
  • Gateway: Applies to data plane proxies operating in built-in Gateway mode.
  • Empty list: Defaults to targeting all data plane proxies.

Example

The following policy will only apply to gateway data-planes:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: gateway-only-timeout
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Mesh
    proxyTypes:
    - Gateway
  to:
  - targetRef:
      kind: Mesh
    default:
      idleTimeout: 10s

Targeting gateways

Given a MeshGateway:

apiVersion: kuma.io/v1alpha1
kind: MeshGateway
mesh: default
metadata:
  name: edge
  namespace: kuma-system
conf:
  listeners:
  - port: 80
    protocol: HTTP
    tags:
      port: http-80
  - port: 443
    protocol: HTTPS
    tags:
      port: https-443

Policies can attach to all listeners:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: timeout-all
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshGateway
    name: edge
  to:
  - targetRef:
      kind: Mesh
    default:
      idleTimeout: 10s

so that requests to either port 80 or 443 will have an idle timeout of 10 seconds, or just some listeners:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: timeout-8080
  namespace: kuma-system
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshGateway
    name: edge
    tags:
      port: http-80
  to:
  - targetRef:
      kind: Mesh
    default:
      idleTimeout: 10s

So that only requests to port 80 will have the idle timeout.

Note that depending on the policy, there may be restrictions on whether or not specific listeners can be selected.

Routes

Read the MeshHTTPRoute docs and MeshTCPRoute docs for more on how to target gateways for routing traffic.

Target kind support for different policies

Not every policy supports to and from levels. Additionally, not every resource can appear at every supported level. The specified top level resource can also affect which resources can appear in to or from.

To help users, each policy documentation includes tables indicating which targetRef kinds is supported at each level. For each type of proxy, sidecar or builtin gateway, the table indicates for each targetRef level, which kinds are supported.

Example tables

These are just examples, remember to check the docs specific to your policy.

targetRef Allowed kinds
targetRef.kind Mesh, MeshSubset
to[].targetRef.kind Mesh, MeshService
from[].targetRef.kind Mesh

The table above show that we can select sidecar proxies via Mesh, MeshSubset

We can use the policy as an outbound policy with:

  • to[].targetRef.kind: Mesh which will apply to all traffic originating at the sidecar to anywhere
  • to[].tagerRef.kind: MeshService which will apply to all traffic to specific services

We can also apply policy as an inbound policy with:

  • from[].targetRef.kind: Mesh which will apply to all traffic received by the sidecar from anywhere in the mesh

Merging configuration

A proxy can be targeted by multiple targetRef’s, to define how policies are merged together the following strategy is used:

We define a total order of policy priority:

  • MeshServiceSubset > MeshService > MeshSubset > Mesh (the more a targetRef is focused the higher priority it has)
  • If levels are equal the lexicographic order of policy names is used

Remember: the broader a targetRef, the lower its priority.

For to and from policies we concatenate the array for each matching policies. We then build configuration by merging each level using JSON patch merge.

For example if I have 2 default ordered this way:

default:
  conf: 1
  sub:
    array: [1, 2, 3]
    other: 50
    other-array: [3, 4, 5]
---
default:
  sub:
    array: []
    other: null
    other-array: [5, 6]
    extra: 2

The merge result is:

default:
  conf: 1
  sub:
    array: []
    other-array: [5, 6]
    extra: 2

Using policies with MeshService, MeshMultizoneService and MeshExternalService.

MeshService is a feature to define services explicitly in Kuma. It can be selectively enabled and disable depending on the value of meshServices.mode on your Mesh object.

When using explicit services, MeshServiceSubset is no longer a valid kind and MeshService can only be used to select an actual MeshService resource (it can no longer select a kuma.io/service).

In the following example we’ll assume we have a MeshService:

apiVersion: kuma.io/v1alpha1
kind: MeshService
metadata:
  name: my-service
  namespace: kuma-demo
  labels:
    k8s.kuma.io/namespace: kuma-demo
    kuma.io/zone: my-zone
    app: redis
    kuma.io/mesh: 
spec:
  selector:
    dataplaneTags:
      app: redis
      k8s.kuma.io/namespace: kuma-demo
  ports:
  - port: 6739
    targetPort: 6739
    appProtocol: tcp

There are 2 ways to select a MeshService:

If you are in the same namespace (or same zone in Universal) you can select one specific service by using its explicit name:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: timeout-to-redis
  namespace: kuma-demo
spec:
  to:
  - targetRef:
      kind: MeshService
      name: redis
    default:
      connectionTimeout: 10s

Selecting all matching MeshServices by labels:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: all-in-my-namespace 
  namespace: kuma-demo
spec:
  to:
  - targetRef:
      kind: MeshService
      labels:
        k8s.kuma.io/namespace: kuma-demo
    default:
      connectionTimeout: 10s

In this case this is equivalent to writing a specific policy for each service that matches this label (in our example for each service in this namespace in each zones).

When MeshService have multiple ports, you can use sectionName to restrict policy to a single port.

Global, zonal, producer and consumer policies

Policies can be applied to a zone or to a namespace when using Kubernetes. Policies will always impact at most the scope at which they are defined. In other words:

  1. a policy applied to the global control plane will apply to all proxies in all zones.
  2. a policy applied to a zone will only apply to proxies inside this zone. It is equivalent to having:
    spec:
      targetRef: 
        kind: MeshSubset
        tags:
          kuma.io/zone: "my-zone"
    
  3. a policy applied to a namespace will only apply to proxies inside this namespace. It is equivalent to having:
    spec:
      targetRef: 
        kind: MeshSubset
        tags:
          kuma.io/zone: "my-zone"
          kuma.io/namespace: "my-ns"
    

There is however, one exception to this when using MeshService with outbound policies (policies with spec.to[].targetRef). In this case, if you define a policy in the same namespace as the MeshService it is defined in, that policy will be considered a producer policy. This means that all clients of this service (even in different zones) will be impacted by this policy.

An example of a producer policy is:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: timeout-to-redis
  namespace: kuma-demo
spec:
  to:
  - targetRef:
      kind: MeshService
      name: redis
    default:
      connectionTimeout: 10s

The other type of policy is a consumer policy which most commonly use labels to match a service.

An example of a consumer policy which would override the previous producer policy:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: timeout-to-redis-consumer
  namespace: kuma-demo
spec:
  to:
    - targetRef:
        kind: MeshService
        labels:
          k8s.kuma.io/service-name: redis
      default:
        connectionTimeout: 10s

Remember that labels on a MeshService applies to each matching MeshService. To communicate to services named the same way in different namespaces or zones with different configuration use a more specific set of labels.

Kuma adds a label kuma.io/policy-role to identify the type of the policy. The values of the label are:

  • system: Policies defined on global or in the zone’s system namespace
  • workload-owner: Policies defined in a non system namespaces that do not have spec.to entries, or have both spec.from and spec.to entries
  • consumer: Policies defined in a non system namespace that have spec.to which either do not use name or have a different namespace
  • producer: Policies defined in the same namespace as the services identified in the spec.to[].targetRef

The merging order of the different policy scopes is: workload-owner > consumer > producer > zonal > global.

Example

We have 2 clients client1 and client2 they run in different namespaces respectively ns1 and ns2.

 
flowchart LR
subgraph ns1
    client1(client)
end
subgraph ns2
  client2(client)
  server(MeshService: server)
end
client1 --> server
client2 --> server
  

We’re going to define a producer policy first:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
    name: producer-policy
    namespace: ns2
spec:
  to:
    - targetRef:
        kind: MeshService
        name: server
      default:
        idleTimeout: 20s

We know it’s a producer policy because it is defined in the same namespace as the MeshService: server and names this server in its spec.to[].targetRef. So both client1 and client2 will receive the timeout of 20 seconds.

We now create a consumer policy:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: consumer-policy
  namespace: ns1
spec:
  to:
    - targetRef:
        kind: MeshService
        labels:
          k8s.kuma.io/service-name: server
      default:
        idleTimeout: 30s

Here the policy only impacts client1 as client2 doesn’t run in ns1. As consumer policies have a higher priority over producer policies, client1 will have a idleTimeout: 30s.

We can define another policy to impact client2:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: consumer-policy
  namespace: ns2
spec:
  to:
    - targetRef:
        kind: MeshService
        labels:
          k8s.kuma.io/service-name: server
      default:
        idleTimeout: 40s

Note that the only different here is the namespace, we now define a consumer policy inside ns2.

Use labels for consumer policies and name for producer policies. It will be easier to differentiate between producer and consumer policies.

Examples

Applying a global default

type: ExamplePolicy
name: example
mesh: default
spec:
  targetRef:
    kind: Mesh
  to:
    - targetRef:
        kind: Mesh
      default:
        key: value

All traffic from any proxy (top level targetRef) going to any proxy (to targetRef) will have this policy applied with value key=value.

Recommending to users

type: ExamplePolicy
name: example
mesh: default
spec:
  targetRef:
    kind: Mesh
  to:
    - targetRef:
        kind: MeshService
        name: my-service
      default:
        key: value

All traffic from any proxy (top level targetRef) going to the service “my-service” (to targetRef) will have this policy applied with value key=value.

This is useful when a service owner wants to suggest a set of configurations to its clients.

Configuring all proxies of a team

type: ExamplePolicy
name: example
mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      team: "my-team"
  from:
    - targetRef:
        kind: Mesh
      default:
        key: value

All traffic from any proxies (from targetRef) going to any proxy that has the tag team=my-team (top level targetRef) will have this policy applied with value key=value.

This is a useful way to define coarse-grained rules for example.

Configuring all proxies in a zone

type: ExamplePolicy
name: example
mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/zone: "east"
  default:
    key: value

All proxies in zone east (top level targetRef) will have this policy configured with key=value.

This can be very useful when observability stores are different for each zone for example.

Configuring all gateways in a Mesh

type: ExamplePolicy
name: example
mesh: default
spec:
  targetRef:
    kind: Mesh
    proxyTypes: ["Gateway"]
  default:
    key: value

All gateway proxies in mesh default will have this policy configured with key=value.

This can be very useful when timeout configurations for gateways need to differ from those of other proxies.

Applying policies in shadow mode

Overview

The new shadow mode functionality allows users to mark policies with a specific label to simulate configuration changes without affecting the live environment. It enables the observation of potential impact on Envoy proxy configurations, providing a risk-free method to test, validate, and fine-tune settings before actual deployment. Ideal for learning, debugging, and migrating, shadow mode ensures configurations are error-free, improving the overall system reliability without disrupting ongoing operations.

It’s not necessary but CLI tools like jq and jd can greatly improve working with Kuma resources.

How to use shadow mode

  1. Before applying the policy, add a kuma.io/effect: shadow label.

  2. Check the proxy config with shadow policies taken into account through the Kuma API. By using HTTP API:
     curl http://localhost:5681/meshes/${mesh}/dataplane/${dataplane}/_config?shadow=true
    

    or by using kumactl:

     kumactl inspect dataplane ${name} --type=config --shadow
    
  3. Check the diff in JSONPatch format through the Kuma API. By using HTTP API:
     curl http://localhost:5681/meshes/${mesh}/dataplane/${dataplane}/_config?shadow=true&include=diff
    

    or by using kumactl:

     kumactl inspect dataplane ${name} --type=config --shadow --include=diff
    

Limitations and Considerations

Currently, the Kuma API mentioned above works only on Zone CP. Attempts to use it on Global CP lead to 405 Method Not Allowed. This might change in the future.

Examples

Apply policy with kuma.io/effect: shadow label:

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: frontend-timeouts
  namespace: kuma-system
  labels:
    kuma.io/effect: shadow
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: frontend
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      idleTimeout: 23s
apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
  name: frontend-timeouts
  namespace: kuma-system
  labels:
    kuma.io/effect: shadow
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: frontend
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: httpport
    default:
      idleTimeout: 23s

Check the diff using kumactl:

$ kumactl inspect dataplane frontend-dpp --type=config --include=diff --shadow | jq '.diff' | jd -t patch2jd
@ ["type.googleapis.com/envoy.config.cluster.v3.Cluster","backend_kuma-demo_svc_3001","typedExtensionProtocolOptions","envoy.extensions.upstreams.http.v3.HttpProtocolOptions","commonHttpProtocolOptions","idleTimeout"]
- "3600s"
@ ["type.googleapis.com/envoy.config.cluster.v3.Cluster","backend_kuma-demo_svc_3001","typedExtensionProtocolOptions","envoy.extensions.upstreams.http.v3.HttpProtocolOptions","commonHttpProtocolOptions","idleTimeout"]
+ "23s"

The output not only identifies the exact location in Envoy where the change will occur, but also shows the current timeout value that we’re planning to replace.