At Ginkgo we ensure that actions taken on software running in production are recorded and auditable. This is generally good practice and there are compliance regimes that require this level of logging and auditability.
We also want to enable our software engineering teams to easily troubleshoot their production applications. When running applications in our Kubernetes (k8s) clusters, we can use core, standard RBAC (Role Based Access Controls) and the cluster audit logs to capture actions taken on cluster resources to ensure adherence to these best practices and policies.
This blog will explain how we used OPA Gatekeeper policies to resolve tension between engineers wanting to execute shell commands in running containers when troubleshooting, while still capturing actions in the K8s cluster audit logs.
While we hope to provide all the visibility a software developer could want with our observability tooling, sometimes instrumentation is missing. Developers understandably want the ability to execute commands within running containers when under pressure to quickly resolve a production issue.
Kubernetes provides an exec API , which allows for executing shell commands within a Pod container.
Unfortunately, once an interactive shell session is initiated, any commands issued within the container are no longer captured by the K8s audit logs. The audit logs record who issued an exec on which pod container, and that’s it.
Using standard RBAC resources we can deny any exec command entirely, but developers would feel the loss of that capability. What we really want is to prevent interactive shell sessions, to which the audit logs are blind. Standard RBAC resources are not able to differentiate between interactive and non-interactive exec calls. With non-interactive exec commands, the shell commands are captured in the audit logs. While it may slow developers to have to construct individual exec commands, they can get the troubleshooting capabilities they need while satisfying logging and auditability constraints.
Open Policy Agent (OPA) is an open-source policy engine. OPA Gatekeeper is built on top of OPA to provide K8s specific policy enforcement features. The Software Developer Acceleration (SDA) team at Ginkgo is responsible for operating (Elastic Kubernetes Service) EKS clusters. SDA was already considering implementing OPA Gatekeeper for a few cluster policy enforcement use cases.
When concerns about allowing non-interactive exec arose, we thought OPA Gatekeeper might provide a solution. We stumbled upon a Gatekeeper GitHub issue, which described our exact use case. This issue suggested that it should be feasible to implement an OPA Gatekeeper constraint to act on the PodExecOptions which determine whether an exec is interactive or not.
OPA Gatekeeper uses the OPA policy engine to enforce policy in K8s by defining Custom Resource Definitions: ConstraintTemplates and Constraints. Those resources integrate with K8s admission controllers to reject API calls and resources which violate a constraint. K8s admission controls are implemented using validating and mutating webhooks.
OPA Gatekeeper also provides a library of ConstraintTemplates for many common policy use cases. Unfortunately, preventing interactive exec is not one of the already implemented ConstraintTemplates in the community library.
SDA set up the OPA Gatekeeper and then started experimenting and learning how to craft ConstraintTemplates and Constraints based on the examples in the library. OPA policies are expressed in Rego, and this required some learning by members of the SDA team as it’s a Domain-Specific Language (DSL).
The first challenge we faced was ensuring that the OPA gatekeeper ValidatingWebhookConfiguration could validate the exec operations. Validating webhook rules match on the following API features:
Operations
apiGroups
apiVersions
Resources
Scope
To act on exec calls, the webhook must include the pod/exec subresource in the resources, and it must include CONNECT in the operations. We discovered that the released Helm chart for OPA Gatekeeper at the time, only specified the CREATE and UPDATE operations, and had omitted the CONNECT operation. After we modified our OPA Gatekeeper install to add the CONNECT operation our constraints were able to act upon exec calls.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: gatekeeper-validating-webhook-configuration
namespace: gatekeeper-system
webhooks:
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
- CONNECT
resources:
- '*'
- pods/ephemeralcontainers
- pods/exec
- pods/log
- pods/eviction
- pods/portforward
- pods/proxy
- pods/attach
- pods/binding
- deployments/scale
- replicasets/scale
- statefulsets/scale
- replicationcontrollers/scale
- services/proxy
- nodes/proxy
- services/status
ConstraintTemplates contain policy violation rules, which can then be used by multiple different Constraints.
The PodExecOption which determines whether an exec is interactive is the stdin option. In the following ConstraintTemplate, the Rego rule is reviewing the PodExecOptions object passed to it from the Constraint to determine whether stdin is true or false. If true, the request will violate the Constraint.
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sdenyinteractiveexec
namespace: gatekeeper-system
spec:
crd:
spec:
names:
kind: K8sDenyInteractiveExec
targets:
- rego: |
package k8sdenyinteractiveexec
violation[{"msg": msg}] {
input.review.object.stdin == true
msg := sprintf("Interactive exec is not permitted in production constrained environments. REVIEW OBJECT: %v", [input.review])
}
target: admission.k8s.gatekeeper.sh
The Constraint determines the objects to which the specified ConstraintTemplate should be applied and any enforcement action to take.
SDA provides namespaces for teams operating applications in the EKS clusters. Namespaces containing applications subject to constraints are labeled.
The following Constraint applies the K8sDenyInteractiveExec ConstraintTemplate above to the PodExecOptions object. It also uses a namespaceSelector to only apply the ConstraintTemplate in namespaces bearing the label. The default enforcement action is to deny.
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sDenyInteractiveExec
metadata:
name: k8sdenyinteractiveexec
namespace: gatekeeper-system
spec:
match:
kinds:
- apiGroups:
- ""
kinds:
- PodExecOptions
namespaceSelector:
matchExpressions:
- key: <label to constrain the environment goes here>
operator: In
values:
- "true"
scope: Namespaced
Once this Constraint was in place, we tested by issuing kubectl exec commands against some test Pods in the labeled namespace with and without the stdin option (-i).
% kubectl exec -it test-679bdcc64b-gnjll -- /bin/bash
Error from server (Forbidden): admission webhook "validation.gatekeeper.sh" denied the request: [k8sdenyinteractiveexec] Interactive exec are not permitted in production constrained environment. REVIEW OBJECT: {"dryRun": false, "kind": {"group": "", "kind": "PodExecOptions", "version": "v1"}, "name": "test-679bdcc64b-gnjll", "namespace": "default", "object": {"apiVersion": "v1", "command": ["/bin/bash"], "container": "efs-csi-test-deployment-nginx", "kind": "PodExecOptions", "stdin": true, "stdout": true, "tty": true}, "oldObject": null, "operation": "CONNECT", "options": null, "requestKind": {"group": "", "kind": "PodExecOptions", "version": "v1"}, "requestResource": {"group": "", "resource": "pods", "version": "v1"}, "requestSubResource": "exec", "resource": {"group": "", "resource": "pods", "version": "v1"}, "subResource": "exec"}
% kubectl exec test-679bdcc64b-gnjll -- echo foo
foo
Posted by Gigi Jackson
(Feature photo by Nikola Knezevic on Unsplash)