This is the multi-page printable view of this section. Click here to print.
Configuration
1 - Containerd
The base containerd configuration expects to merge in any additional configs present in /var/cri/conf.d/*.toml
.
An example of exposing metrics
Into each machine config, add the following:
machine:
...
files:
- content: |
[metrics]
address = "0.0.0.0:11234"
path: /var/cri/conf.d/metrics.toml
op: create
Create cluster like normal and see that metrics are now present on this port:
$ curl 127.0.0.1:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive
# TYPE container_blkio_io_service_bytes_recursive_bytes gauge
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"} 0
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"} 0
...
...
2 - Custom Certificate Authorities
Appending the Certificate Authority
Put into each machine the PEM encoded certificate:
machine:
...
files:
- content: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
permissions: 0644
path: /etc/ssl/certs/ca-certificates
op: append
3 - Disk Encryption
It is possible to enable encryption for system disks at the OS level.
As of this writing, only STATE and EPHEMERAL partitions can be encrypted.
STATE contains the most sensitive node data: secrets and certs.
EPHEMERAL partition may contain some sensitive workload data.
Data is encrypted using LUKS2, which is provided by the Linux kernel modules and cryptsetup
utility.
The operating system will run additional setup steps when encryption is enabled.
If the disk encryption is enabled for the STATE partition, the system will:
- Save STATE encryption config as JSON in the META partition.
- Before mounting the STATE partition, load encryption configs either from the machine config or from the META partition. Note that the machine config is always preferred over the META one.
- Before mounting the STATE partition, format and encrypt it. This occurs only if the STATE partition is empty and has no filesystem.
If the disk encryption is enabled for the EPHEMERAL partition, the system will:
- Get the encryption config from the machine config.
- Before mounting the EPHEMERAL partition, encrypt and format it. This occurs only if the EPHEMERAL partition is empty and has no filesystem.
Configuration
Right now this encryption is disabled by default. To enable disk encryption you should modify the machine configuration with the following options:
machine:
...
systemDiskEncryption:
ephemeral:
provider: luks2
keys:
- nodeID: {}
slot: 0
state:
provider: luks2
keys:
- nodeID: {}
slot: 0
Encryption Keys
Note: What the LUKS2 docs call “keys” are, in reality, a passphrase. When this passphrase is added, LUKS2 runs argon2 to create an actual key from that passphrase.
LUKS2 supports up to 32 encryption keys and it is possible to specify all of them in the machine configuration. Talos always tries to sync the keys list defined in the machine config with the actual keys defined for the LUKS2 partition. So if you update the keys list you should have at least one key that is not changed to be used for keys management.
When you define a key you should specify the key kind and the slot
:
machine:
...
state:
keys:
- nodeID: {} # key kind
slot: 1
ephemeral:
keys:
- static:
passphrase: supersecret
slot: 0
Take a note that key order does not play any role on which key slot is used. Every key must always have a slot defined.
Encryption Key Kinds
Talos supports two kinds of keys:
nodeID
which is generated using the node UUID and the partition label (note that if the node UUID is not really random it will fail the entropy check).static
which you define right in the configuration.
Note: Use static keys only if your STATE partition is encrypted and only for the EPHEMERAL partition. For the STATE partition it will be stored in the META partition, which is not encrypted.
Key Rotation
It is necessary to do talosctl apply-config
a couple of times to rotate keys, since there is a need to always maintain a single working key while changing the other keys around it.
So, for example, first add a new key:
machine:
...
ephemeral:
keys:
- static:
passphrase: oldkey
slot: 0
- static:
passphrase: newkey
slot: 1
...
Run:
talosctl apply-config -n <node> -f config.yaml
Then remove the old key:
machine:
...
ephemeral:
keys:
- static:
passphrase: newkey
slot: 1
...
Run:
talosctl apply-config -n <node> -f config.yaml
Going from Unencrypted to Encrypted and Vice Versa
Ephemeral Partition
There is no in-place encryption support for the partitions right now, so to avoid losing any data only empty partitions can be encrypted.
As such, migration from unencrypted to encrypted needs some additional handling, especially around explicitly wiping partitions.
apply-config
should be called with--mode=staged
.- Partition should be wiped after
apply-config
, but before the reboot.
Edit your machine config and add the encryption configuration:
vim config.yaml
Apply the configuration with --mode=staged
:
talosctl apply-config -f config.yaml -n <node ip> --mode=staged
Wipe the partition you’re going to encrypt:
talosctl reset --system-labels-to-wipe EPHEMERAL -n <node ip> --reboot=true
That’s it! After you run the last command, the partition will be wiped and the node will reboot. During the next boot the system will encrypt the partition.
State Partition
Calling wipe against the STATE partition will make the node lose the config, so the previous flow is not going to work.
The flow should be to first wipe the STATE partition:
talosctl reset --system-labels-to-wipe STATE -n <node ip> --reboot=true
Node will enter into maintenance mode, then run apply-config
with --insecure
flag:
talosctl apply-config --insecure -n <node ip> -f config.yaml
After installation is complete the node should encrypt the STATE partition.
4 - Editing Machine Configuration
Talos node state is fully defined by machine configuration. Initial configuration is delivered to the node at bootstrap time, but configuration can be updated while the node is running.
Note: Be sure that config is persisted so that configuration updates are not overwritten on reboots. Configuration persistence was enabled by default since Talos 0.5 (
persist: true
in machine configuration).
There are three talosctl
commands which facilitate machine configuration updates:
talosctl apply-config
to apply configuration from the filetalosctl edit machineconfig
to launch an editor with existing node configuration, make changes and apply configuration backtalosctl patch machineconfig
to apply automated machine configuration via JSON patch
Each of these commands can operate in one of four modes:
- apply change in automatic mode(default): reboot if the change can’t be applied without a reboot, otherwise apply the change immediately
- apply change with a reboot (
--mode=reboot
): update configuration, reboot Talos node to apply configuration change - apply change immediately (
--mode=no-reboot
flag): change is applied immediately without a reboot, fails if the change contains any fields that can not be updated without a reboot - apply change on next reboot (
--mode=staged
): change is staged to be applied after a reboot, but node is not rebooted - apply change in the interactive mode (
--mode=interactive
; only fortalosctl apply-config
): launches TUI based interactive installer
Note: applying change on next reboot (
--mode=staged
) doesn’t modify current node configuration, so next call totalosctl edit machineconfig --mode=staged
will not see changes
Additionally, there is also talosctl get machineconfig
, which retrieves the current node configuration API resource and contains the machine configuration in the .spec
field.
It can be used to modify the configuration locally before being applied to the node.
The list of config changes allowed to be applied immediately in Talos v1.1.1:
.debug
.cluster
.machine.time
.machine.certCANs
.machine.install
(configuration is only applied during install/upgrade).machine.network
.machine.sysfs
.machine.sysctls
.machine.logging
.machine.controlplane
.machine.kubelet
.machine.pods
.machine.kernel
.machine.registries
(CRI containerd plugin will not pick up the registry authentication settings without a reboot)
talosctl apply-config
This command is mostly used to submit initial machine configuration to the node (generated by talosctl gen config
).
It can be used to apply new configuration from the file to the running node as well, but most of the time it’s not convenient, as it doesn’t operate on the current node machine configuration.
Example:
talosctl -n <IP> apply-config -f config.yaml
Command apply-config
can also be invoked as apply machineconfig
:
talosctl -n <IP> apply machineconfig -f config.yaml
Applying machine configuration immediately (without a reboot):
talosctl -n IP apply machineconfig -f config.yaml --mode=no-reboot
Starting the interactive installer:
talosctl -n IP apply machineconfig --mode=interactive
Note: when a Talos node is running in the maintenance mode it’s necessary to provide
--insecure (-i)
flag to connect to the API and apply the config.
taloctl edit machineconfig
Command talosctl edit
loads current machine configuration from the node and launches configured editor to modify the config.
If config hasn’t been changed in the editor (or if updated config is empty), update is not applied.
Note: Talos uses environment variables
TALOS_EDITOR
,EDITOR
to pick up the editor preference. If environment variables are missing,vi
editor is used by default.
Example:
talosctl -n <IP> edit machineconfig
Configuration can be edited for multiple nodes if multiple IP addresses are specified:
talosctl -n <IP1>,<IP2>,... edit machineconfig
Applying machine configuration change immediately (without a reboot):
talosctl -n <IP> edit machineconfig --mode=no-reboot
talosctl patch machineconfig
Command talosctl patch
works similar to talosctl edit
command - it loads current machine configuration, but instead of launching configured editor it applies a set of JSON patches to the configuration and writes the result back to the node.
Example, updating kubelet version (in auto mode):
$ talosctl -n <IP> patch machineconfig -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v1.24.2"}]'
patched mc at the node <IP>
Updating kube-apiserver version in immediate mode (without a reboot):
$ talosctl -n <IP> patch machineconfig --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.24.2"}]'
patched mc at the node <IP>
A patch might be applied to multiple nodes when multiple IPs are specified:
talosctl -n <IP1>,<IP2>,... patch machineconfig -p '[{...}]'
Patches can also be sourced from files using @file
syntax:
talosctl -n <IP> patch machineconfig -p @kubelet-patch.json -p @manifest-patch.json
It might be easier to store patches in YAML format vs. the default JSON format. Talos can detect file format automatically:
# kubelet-patch.yaml
- op: replace
path: /machine/kubelet/image
value: ghcr.io/siderolabs/kubelet:v1.24.2
talosctl -n <IP> patch machineconfig -p @kubelet-patch.yaml
Recovering from Node Boot Failures
If a Talos node fails to boot because of wrong configuration (for example, control plane endpoint is incorrect), configuration can be updated to fix the issue.
5 - Logging
Viewing logs
Kernel messages can be retrieved with talosctl dmesg
command:
$ talosctl -n 172.20.1.2 dmesg
172.20.1.2: kern: info: [2021-11-10T10:09:37.662764956Z]: Command line: init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 random.trust_cpu=on printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 console=ttyS0 reboot=k panic=1 talos.shutdown=halt talos.platform=metal talos.config=http://172.20.1.1:40101/config.yaml
[...]
Service logs can be retrieved with talosctl logs
command:
$ talosctl -n 172.20.1.2 services
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
172.20.1.2 apid Running OK 19m27s ago Health check successful
172.20.1.2 containerd Running OK 19m29s ago Health check successful
172.20.1.2 cri Running OK 19m27s ago Health check successful
172.20.1.2 etcd Running OK 19m22s ago Health check successful
172.20.1.2 kubelet Running OK 19m20s ago Health check successful
172.20.1.2 machined Running ? 19m30s ago Service started as goroutine
172.20.1.2 trustd Running OK 19m27s ago Health check successful
172.20.1.2 udevd Running OK 19m28s ago Health check successful
$ talosctl -n 172.20.1.2 logs machined
172.20.1.2: [talos] task setupLogger (1/1): done, 106.109µs
172.20.1.2: [talos] phase logger (1/7): done, 564.476µs
[...]
Container logs for Kubernetes pods can be retrieved with talosctl logs -k
command:
$ talosctl -n 172.20.1.2 containers -k
NODE NAMESPACE ID IMAGE PID STATUS
172.20.1.2 k8s.io kube-system/kube-flannel-dk6d5 k8s.gcr.io/pause:3.5 1329 SANDBOX_READY
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:install-cni ghcr.io/siderolabs/install-cni:v0.7.0-alpha.0-1-g2bb2efc 0 CONTAINER_EXITED
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:install-config quay.io/coreos/flannel:v0.13.0 0 CONTAINER_EXITED
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:kube-flannel quay.io/coreos/flannel:v0.13.0 1610 CONTAINER_RUNNING
172.20.1.2 k8s.io kube-system/kube-proxy-gfkqj k8s.gcr.io/pause:3.5 1311 SANDBOX_READY
172.20.1.2 k8s.io └─ kube-system/kube-proxy-gfkqj:kube-proxy k8s.gcr.io/kube-proxy:v1.24.2 1379 CONTAINER_RUNNING
$ talosctl -n 172.20.1.2 logs -k kube-system/kube-proxy-gfkqj:kube-proxy
172.20.1.2: 2021-11-30T19:13:20.567825192Z stderr F I1130 19:13:20.567737 1 server_others.go:138] "Detected node IP" address="172.20.0.3"
172.20.1.2: 2021-11-30T19:13:20.599684397Z stderr F I1130 19:13:20.599613 1 server_others.go:206] "Using iptables Proxier"
[...]
Sending logs
Service logs
You can enable logs sendings in machine configuration:
machine:
logging:
destinations:
- endpoint: "udp://127.0.0.1:12345/"
format: "json_lines"
- endpoint: "tcp://host:5044/"
format: "json_lines"
Several destinations can be specified.
Supported protocols are UDP and TCP.
The only currently supported format is json_lines
:
{
"msg": "[talos] apply config request: immediate true, on reboot false",
"talos-level": "info",
"talos-service": "machined",
"talos-time": "2021-11-10T10:48:49.294858021Z"
}
Messages are newline-separated when sent over TCP.
Over UDP messages are sent with one message per packet.
msg
, talos-level
, talos-service
, and talos-time
fields are always present; there may be additional fields.
Kernel logs
Kernel log delivery can be enabled with the talos.logging.kernel
kernel command line argument, which can be specified
in the .machine.installer.extraKernelArgs
:
machine:
install:
extraKernelArgs:
- talos.logging.kernel=tcp://host:5044/
Kernel log destination is specified in the same way as service log endpoint.
The only supported format is json_lines
.
Sample message:
{
"clock":6252819, // time relative to the kernel boot time
"facility":"user",
"msg":"[talos] task startAllServices (1/1): waiting for 6 services\n",
"priority":"warning",
"seq":711,
"talos-level":"warn", // Talos-translated `priority` into common logging level
"talos-time":"2021-11-26T16:53:21.3258698Z" // Talos-translated `clock` using current time
}
extraKernelArgs
in the machine configuration are only applied on Talos upgrades, not just by applying the config. (Upgrading to the same version is fine).
Filebeat example
To forward logs to other Log collection services, one way to do this is sending them to a Filebeat running in the cluster itself (in the host network), which takes care of forwarding it to other endpoints (and the necessary transformations).
If Elastic Cloud on Kubernetes is being used, the following Beat (custom resource) configuration might be helpful:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: talos
spec:
type: filebeat
version: 7.15.1
elasticsearchRef:
name: talos
config:
filebeat.inputs:
- type: "udp"
host: "127.0.0.1:12345"
processors:
- decode_json_fields:
fields: ["message"]
target: ""
- timestamp:
field: "talos-time"
layouts:
- "2006-01-02T15:04:05.999999999Z07:00"
- drop_fields:
fields: ["message", "talos-time"]
- rename:
fields:
- from: "msg"
to: "message"
daemonSet:
updateStrategy:
rollingUpdate:
maxUnavailable: 100%
podTemplate:
spec:
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
containers:
- name: filebeat
ports:
- protocol: UDP
containerPort: 12345
hostPort: 12345
The input configuration ensures that messages and timestamps are extracted properly. Refer to the Filebeat documentation on how to forward logs to other outputs.
Also note the hostNetwork: true
in the daemonSet
configuration.
This ensures filebeat uses the host network, and listens on 127.0.0.1:12345
(UDP) on every machine, which can then be specified as a logging endpoint in
the machine configuration.
Fluent-bit example
First, we’ll create a value file for the fluentd-bit
Helm chart.
# fluentd-bit.yaml
podAnnotations:
fluentbit.io/exclude: 'true'
extraPorts:
- port: 12345
containerPort: 12345
protocol: TCP
name: talos
config:
service: |
[SERVICE]
Flush 5
Daemon Off
Log_Level warn
Parsers_File custom_parsers.conf
inputs: |
[INPUT]
Name tcp
Listen 0.0.0.0
Port 12345
Format json
Tag talos.*
[INPUT]
Name tail
Alias kubernetes
Path /var/log/containers/*.log
Parser containerd
Tag kubernetes.*
[INPUT]
Name tail
Alias audit
Path /var/log/audit/kube/*.log
Parser audit
Tag audit.*
filters: |
[FILTER]
Name kubernetes
Alias kubernetes
Match kubernetes.*
Kube_Tag_Prefix kubernetes.var.log.containers.
Use_Kubelet Off
Merge_Log On
Merge_Log_Trim On
Keep_Log Off
K8S-Logging.Parser Off
K8S-Logging.Exclude On
Annotations Off
Labels On
[FILTER]
Name modify
Match kubernetes.*
Add source kubernetes
Remove logtag
customParsers: |
[PARSER]
Name audit
Format json
Time_Key requestReceivedTimestamp
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name containerd
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
outputs: |
[OUTPUT]
Name stdout
Alias stdout
Match *
Format json_lines
# If you wish to ship directly to Loki from Fluentbit,
# Uncomment the following output, updating the Host with your Loki DNS/IP info as necessary.
# [OUTPUT]
# Name loki
# Match *
# Host loki.loki.svc
# Port 3100
# Labels job=fluentbit
# Auto_Kubernetes_Labels on
daemonSetVolumes:
- name: varlog
hostPath:
path: /var/log
daemonSetVolumeMounts:
- name: varlog
mountPath: /var/log
tolerations:
- operator: Exists
effect: NoSchedule
Next, we will add the helm repo for FluentBit, and deploy it to the cluster.
helm repo add fluent https://fluent.github.io/helm-charts
helm upgrade -i --namespace=kube-system -f fluentd-bit.yaml fluent-bit fluent/fluent-bit
Now we need to find the service IP.
$ kubectl -n kube-system get svc -l app.kubernetes.io/name=fluent-bit
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fluent-bit ClusterIP 10.200.0.138 <none> 2020/TCP,5170/TCP 108m
Finally, we will change talos log destination with the command talosctl edit mc
.
machine:
logging:
destinations:
- endpoint: "tcp://10.200.0.138:5170"
format: "json_lines"
This example configuration was well tested with Cilium CNI, and it should work with iptables/ipvs based CNI plugins too.
Vector example
Vector is a lightweight observability pipeline ideal for a Kubernetes environment. It can ingest (source) logs from multiple sources, perform remapping on the logs (transform), and forward the resulting pipeline to multiple destinations (sinks). As it is an end to end platform, it can be run as a single-deployment ‘aggregator’ as well as a replicaSet of ‘Agents’ that run on each node.
As Talos can be set as above to send logs to a destination, we can run Vector as an Aggregator, and forward both kernel and service to a UDP socket in-cluster.
Below is an excerpt of a source/sink setup for Talos, with a ‘sink’ destination of an in-cluster Grafana Loki log aggregation service. As Loki can create labels from the log input, we have set up the Loki sink to create labels based on the host IP, service and facility of the inbound logs.
Note that a method of exposing the Vector service will be required which may vary depending on your setup - a LoadBalancer is a good option.
role: "Stateless-Aggregator"
# Sources
sources:
talos_kernel_logs:
address: 0.0.0.0:6050
type: socket
mode: udp
max_length: 102400
decoding:
codec: json
host_key: __host
talos_service_logs:
address: 0.0.0.0:6051
type: socket
mode: udp
max_length: 102400
decoding:
codec: json
host_key: __host
# Sinks
sinks:
talos_kernel:
type: loki
inputs:
- talos_kernel_logs_xform
endpoint: http://loki.system-monitoring:3100
encoding:
codec: json
except_fields:
- __host
batch:
max_bytes: 1048576
out_of_order_action: rewrite_timestamp
labels:
hostname: >-
{{`{{ __host }}`}}
facility: >-
{{`{{ facility }}`}}
talos_service:
type: loki
inputs:
- talos_service_logs_xform
endpoint: http://loki.system-monitoring:3100
encoding:
codec: json
except_fields:
- __host
batch:
max_bytes: 400000
out_of_order_action: rewrite_timestamp
labels:
hostname: >-
{{`{{ __host }}`}}
service: >-
{{`{{ "talos-service" }}`}}
6 - Managing PKI
Generating an Administrator Key Pair
In order to create a key pair, you will need the root CA.
Save the CA public key, and CA private key as ca.crt
, and ca.key
respectively.
Now, run the following commands to generate a certificate:
talosctl gen key --name admin
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
Now, base64 encode admin.crt
, and admin.key
:
cat admin.crt | base64
cat admin.key | base64
You can now set the crt
and key
fields in the talosconfig
to the base64 encoded strings.
Renewing an Expired Administrator Certificate
In order to renew the certificate, you will need the root CA, and the admin private key. The base64 encoded key can be found in any one of the control plane node’s configuration file. Where it is exactly will depend on the specific version of the configuration file you are using.
Save the CA public key, CA private key, and admin private key as ca.crt
, ca.key
, and admin.key
respectively.
Now, run the following commands to generate a certificate:
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
You should see admin.crt
in your current directory.
Now, base64 encode admin.crt
:
cat admin.crt | base64
You can now set the certificate in the talosconfig
to the base64 encoded string.
7 - NVIDIA GPU
Enabling NVIDIA GPU support on Talos is bound by NVIDIA EULA Talos GPU support is an alpha feature.
These are the steps to enabling NVIDIA support in Talos.
- Talos pre-installed on a node with NVIDIA GPU installed.
- Building a custom Talos installer image with NVIDIA modules
- Building NVIDIA container toolkit system extension which allows to register a custom runtime with containerd
- Upgrading Talos with the custom installer and enabling NVIDIA modules and the system extension
Both these components require that the user build and maintain their own Talos installer image and the NVIDIA container toolkit Talos System Extension.
Prerequisites
This guide assumes the user has access to a container registry with push
permissions, docker installed on the build machine and the Talos host has pull
access to the container registry.
Set the local registry and username environment variables:
export USERNAME=<username>
export REGISTRY=<registry>
For eg:
export USERNAME=talos-user
export REGISTRY=ghcr.io
The examples below will use the sample variables set above. Modify accordingly for your environment.
Building the installer image
Start by cloning the pkgs repository.
Now run the following command to build and push custom Talos kernel image and the NVIDIA image with the NVIDIA kernel modules signed by the kernel built along with it.
make kernel nonfree-kmod-nvidia PLATFORM=linux/amd64 PUSH=true
Replace the platform with
linux/arm64
if building for ARM64
Now we need to create a custom Talos installer image.
Start by creating a Dockerfile
with the following content:
FROM scratch as customization
COPY --from=ghcr.io/talos-user/nonfree-kmod-nvidia:v1.1.1-nvidia /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:v1.1.1
COPY --from=ghcr.io/talos-user/kernel:v1.1.1-nvidia /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
Now build the image and push it to the registry.
DOCKER_BUILDKIT=0 docker build --squash --build-arg RM="/lib/modules" -t ghcr.io/talos-user/installer:v1.1.1-nvidia .
docker push ghcr.io/talos-user/installer:v1.1.1-nvidia
Note: buildkit has a bug #816, to disable it use DOCKER_BUILDKIT=0
Building the system extension
Start by cloning the extensions repository.
Now run the following command to build and push the system extension.
make nvidia-container-toolkit PLATFORM=linux/amd64 PUSH=true TAG=510.60.02-v1.9.0
Replace the platform with
linux/arm64
if building for ARM64
Upgrading Talos and enabling the NVIDIA modules and the system extension
Make sure to use
talosctl
version v1.1.1 or later
First create a patch yaml gpu-worker-patch.yaml
to update the machine config similar to below:
- op: add
path: /machine/install/extensions
value:
- image: ghcr.io/talos-user/nvidia-container-toolkit:510.60.02-v1.9.0
- op: add
path: /machine/kernel
value:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
- op: add
path: /machine/sysctls
value:
net.core.bpf_jit_harden: 1
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU’s installed:
talosctl patch mc --patch @gpu-worker-patch.yaml
Now we can proceed to upgrading Talos with the installer built previously:
talosctl upgrade --image=ghcr.io/talos-user/installer:v1.1.1-nvidia
Once the node reboots, the NVIDIA modules should be loaded and the system extension should be installed.
This can be confirmed by running:
talosctl read /proc/modules
which should produce an output similar to below:
nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
talosctl get extensions
which should produce an output similar to below:
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-frezbo-nvidia-container-toolkit-510.60.02-v1.9.0 1 nvidia-container-toolkit 510.60.02-v1.9.0
talosctl read /proc/driver/nvidia/version
which should produce an output similar to below:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.60.02 Wed Mar 16 11:24:05 UTC 2022
GCC version: gcc version 11.2.0 (GCC)
Deploying NVIDIA device plugin
First we need to create the RuntimeClass
Apply the following manifest to create a runtime class that uses the extension:
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
Install the NVIDIA device plugin:
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm install nvidia-device-plugin nvdp/nvidia-device-plugin --version=0.11.0 --set=runtimeClassName=nvidia
Apply the following manifest to run CUDA pod via nvidia runtime:
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Pod
metadata:
name: gpu-operator-test
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-vector-add
image: "nvidia/samples:vectoradd-cuda11.6.0"
resources:
limits:
nvidia.com/gpu: 1
<<EOF
The status can be viewed by running:
kubectl get pods
which should produce an output similar to below:
NAME READY STATUS RESTARTS AGE
gpu-operator-test 0/1 Completed 0 13s
kubectl logs gpu-operator-test
which should produce an output similar to below:
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
8 - Pull Through Image Cache
In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.
When running Talos locally, pulling images from Docker registries might take a significant amount of time. We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies. A similar approach might be used to run Talos in production in air-gapped environments. It can be also used to verify that all the images are available in local registries.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Requirements
The follow are requirements for creating the set of caching proxies:
Launch the Caching Docker Registry Proxies
Talos pulls from docker.io
, k8s.gcr.io
, quay.io
, gcr.io
, and ghcr.io
by default.
If your configuration is different, you might need to modify the commands below:
docker run -d -p 5000:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
--restart always \
--name registry-docker.io registry:2
docker run -d -p 5001:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://k8s.gcr.io \
--restart always \
--name registry-k8s.gcr.io registry:2
docker run -d -p 5002:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://quay.io \
--restart always \
--name registry-quay.io registry:2.5
docker run -d -p 5003:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://gcr.io \
--restart always \
--name registry-gcr.io registry:2
docker run -d -p 5004:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://ghcr.io \
--restart always \
--name registry-ghcr.io registry:2
Note: Proxies are started as docker containers, and they’re automatically configured to start with Docker daemon. Please note that
quay.io
proxy doesn’t support recent Docker image schema, so we run older registry image version (2.5).
As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own host port (5000, 5001, 5002, 5003 and 5004).
Using Caching Registries with QEMU
Local Cluster
With a QEMU local cluster, a bridge interface is created on the host. As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.
sudo talosctl cluster create --provisioner qemu \
--registry-mirror docker.io=http://10.5.0.1:5000 \
--registry-mirror k8s.gcr.io=http://10.5.0.1:5001 \
--registry-mirror quay.io=http://10.5.0.1:5002 \
--registry-mirror gcr.io=http://10.5.0.1:5003 \
--registry-mirror ghcr.io=http://10.5.0.1:5004
The Talos local cluster should now start pulling via caching registries.
This can be verified via registry logs, e.g. docker logs -f registry-docker.io
.
The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.
Note:
10.5.0.1
is a bridge IP with default network (10.5.0.0/24
), if using custom--cidr
, value should be adjusted accordingly.
Using Caching Registries with docker
Local Cluster
With a docker local cluster we can use docker bridge IP, default value for that IP is 172.17.0.1
.
On Linux, the docker bridge address can be inspected with ip addr show docker0
.
talosctl cluster create --provisioner docker \
--registry-mirror docker.io=http://172.17.0.1:5000 \
--registry-mirror k8s.gcr.io=http://172.17.0.1:5001 \
--registry-mirror quay.io=http://172.17.0.1:5002 \
--registry-mirror gcr.io=http://172.17.0.1:5003 \
--registry-mirror ghcr.io=http://172.17.0.1:5004
Cleaning Up
To cleanup, run:
docker rm -f registry-docker.io
docker rm -f registry-k8s.gcr.io
docker rm -f registry-quay.io
docker rm -f registry-gcr.io
docker rm -f registry-ghcr.io
Note: Removing docker registry containers also removes the image cache. So if you plan to use caching registries, keep the containers running.
9 - Role-based access control (RBAC)
Talos v0.11 introduced initial support for role-based access control (RBAC). This guide will explain what that is and how to enable it without losing access to the cluster.
RBAC in Talos
Talos uses certificates to authorize users. The certificate subject’s organization field is used to encode user roles. There is a set of predefined roles that allow access to different API methods:
os:admin
grants access to all methods;os:reader
grants access to “safe” methods (for example, that includes the ability to list files, but does not include the ability to read files content);os:etcd:backup
grants access to/machine.MachineService/EtcdSnapshot
method.
Roles in the current talosconfig
can be checked with the following command:
$ talosctl config info
[...]
Roles: os:admin
[...]
RBAC is enabled by default in new clusters created with talosctl
v0.11+ and disabled otherwise.
Enabling RBAC
First, both the Talos cluster and talosctl
tool should be upgraded.
Then the talosctl config new
command should be used to generate a new client configuration with the os:admin
role.
Additional configurations and certificates for different roles can be generated by passing --roles
flag:
talosctl config new --roles=os:reader reader
That command will create a new client configuration file reader
with a new certificate with os:reader
role.
After that, RBAC should be enabled in the machine configuration:
machine:
features:
rbac: true
10 - System Extensions
System extensions allow extending the Talos root filesystem, which enables a variety of features, such as including custom container runtimes, loading additional firmware, etc.
System extensions are only activated during the installation or upgrade of Talos Linux. With system extensions installed, the Talos root filesystem is still immutable and read-only.
Configuration
System extensions are configured in the .machine.install
section:
machine:
install:
extensions:
- image: ghcr.io/siderolabs/gvisor:33f613e
During the initial install (e.g. when PXE booting or booting from an ISO), Talos will pull down container images for system extensions,
validate them, and include them into the Talos initramfs
image.
System extensions will be activated on boot and overlaid on top of the Talos root filesystem.
In order to update the system extensions for a running instance, update .machine.install.extensions
and upgrade Talos.
(Note: upgrading to the same version of Talos is fine).
Building a Talos Image with System Extensions
System extensions can be installed into the Talos disk image (e.g. AWS AMI or VMWare OVF) by running the following command to generate the image from the Talos source tree:
make image-metal IMAGER_SYSTEM_EXTENSIONS="ghcr.io/siderolabs/amd-ucode:20220411 ghcr.io/siderolabs/gvisor:20220405.0-v1.0.0-10-g82b41ad"
Authoring System Extensions
A Talos system extension is a container image with the specific folder structure.
System extensions can be built and managed using any tool that produces container images, e.g. docker build
.
Sidero Labs maintains a repository of system extensions.
Resource Definitions
Use talosctl get extensions
to get a list of system extensions:
$ talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.20.0.2 runtime ExtensionStatus 000.ghcr.io-talos-systems-gvisor-54b831d 1 gvisor 20220117.0-v1.0.0
172.20.0.2 runtime ExtensionStatus 001.ghcr.io-talos-systems-intel-ucode-54b831d 1 intel-ucode microcode-20210608-v1.0.0
Use YAML or JSON format to see additional details about the extension:
$ talosctl -n 172.20.0.2 get extensions 001.ghcr.io-talos-systems-intel-ucode-54b831d -o yaml
node: 172.20.0.2
metadata:
namespace: runtime
type: ExtensionStatuses.runtime.talos.dev
id: 001.ghcr.io-talos-systems-intel-ucode-54b831d
version: 1
owner: runtime.ExtensionStatusController
phase: running
created: 2022-02-10T18:25:04Z
updated: 2022-02-10T18:25:04Z
spec:
image: 001.ghcr.io-talos-systems-intel-ucode-54b831d.sqsh
metadata:
name: intel-ucode
version: microcode-20210608-v1.0.0
author: Spencer Smith
description: |
This system extension provides Intel microcode binaries.
compatibility:
talos:
version: '>= v1.0.0'