Spec

spec描述一个完成的容器的全部信息。

type Spec struct {
	// Version of the Open Container Initiative Runtime Specification with which the bundle complies.
	Version string `json:"ociVersion"`
	// Process configures the container process.
	Process *Process `json:"process,omitempty"`
	// Root configures the container's root filesystem.
	Root *Root `json:"root,omitempty"`
	// Hostname configures the container's hostname.
	Hostname string `json:"hostname,omitempty"`
	// Domainname configures the container's domainname.
	Domainname string `json:"domainname,omitempty"`
	// Mounts configures additional mounts (on top of Root).
	Mounts []Mount `json:"mounts,omitempty"`
	// Hooks configures callbacks for container lifecycle events.
	Hooks *Hooks `json:"hooks,omitempty" platform:"linux,solaris,zos"`
	// Annotations contains arbitrary metadata for the container.
	Annotations map[string]string `json:"annotations,omitempty"`
 
	// Linux is platform-specific configuration for Linux based containers.
	Linux *Linux `json:"linux,omitempty" platform:"linux"`
	// Solaris is platform-specific configuration for Solaris based containers.
	Solaris *Solaris `json:"solaris,omitempty" platform:"solaris"`
	// Windows is platform-specific configuration for Windows based containers.
	Windows *Windows `json:"windows,omitempty" platform:"windows"`
	// VM specifies configuration for virtual-machine-based containers.
	VM *VM `json:"vm,omitempty" platform:"vm"`
	// ZOS is platform-specific configuration for z/OS based containers.
	ZOS *ZOS `json:"zos,omitempty" platform:"zos"`
}

这里我们主要关注

  1. Process:容器内运行的进程信息
  2. Root:容器内的文件系统信息
  3. Mounts:会挂载到容器内的路径
  4. Hooks:用于为容器不同生命周期切换触发的事件设置回调函数
  5. Linux:linux平台相关的配置,其他平台这里不关注。

只要能够按照上面的规范进行填充好字段,那么就足以完成容器的管理。

Root与Mounts

两者一起沟通了容器的完整的文件系统。

Root配置了容器中的根目录,除此之外就可以一个是否将根目录对容器只读的选项了。

// Root contains information about the container's root filesystem on the host.
type Root struct {
	// Path is the absolute path to the container's root filesystem.
	Path string `json:"path"`
	// Readonly makes the root filesystem for the container readonly before the process is executed.
	Readonly bool `json:"readonly,omitempty"`
}

Mount则是挂载其他路径到容器根目录下。

// Mount specifies a mount for a container.
type Mount struct {
	// Destination is the absolute path where the mount will be placed in the container.
	Destination string `json:"destination"`
	// Type specifies the mount kind.
	Type string `json:"type,omitempty" platform:"linux,solaris,zos"`
	// Source specifies the source path of the mount.
	Source string `json:"source,omitempty"`
	// Options are fstab style mount options.
	Options []string `json:"options,omitempty"`
 
	// UID/GID mappings used for changing file owners w/o calling chown, fs should support it.
	// Every mount point could have its own mapping.
	UIDMappings []LinuxIDMapping `json:"uidMappings,omitempty" platform:"linux"`
	GIDMappings []LinuxIDMapping `json:"gidMappings,omitempty" platform:"linux"`
}
  • Type:挂载的文件系统类型。常见的有
    • bind - 绑定挂载,将主机目录/文件绑定到容器内
    • tmpfs - 内存文件系统
    • proc - /proc 文件系统
    • sysfs - /sys 文件系统
    • devpts - 设备伪终端文件系统
    • ext4xfsbtrfs 等传统文件系统类型
  • Options:挂载选项,控制挂载行为
    • ro - 只读挂载
    • rw - 读写挂载
    • nosuid - 禁用 setuid 位
    • nodev - 禁用设备文件
    • noexec - 禁止执行文件
    • bind - 绑定挂载标志
    • rbind - 递归绑定挂载
    • sharedprivateslave - 挂载传播模式
  • UIDMappings和GIDMappings:用于将容器中的用户id或者组id映射到宿主机上。这样可以避免容器启动大量的chown操作。但是需要文件系统支持id映射挂载

Hooks生命周期管理

一个容器具有如下的生命周期

  1. CreateRuntime:容器已经创建。命名空间还是宿主机命名空间。用来设置宿主机层面的资源,比如网络配置、设备等。
  2. CreateContainer:容器已经创建。进入到容器命名空间了。在容器内部进行初始化,如挂载文件系统、设置容器内环境等。pivot_root之前,可以理解为chroot切换根目录之前。
  3. StartContainer:start操作调用,但是容器进程启动前。命名空间还是容器命名空间,用来做容器启动前最后的工作。pivot_root完成之后。也就是之后看到的就是容器内文件系统视图了。
  4. Poststart:容器进程启动后。命名空间为宿主机命名空间。做启动后的清理,监视工作
  5. Poststop:容器进程退出后。命名空间为宿主机命名空间。做退出后的资源清理等工作
type Hooks struct {
	// Prestart is Deprecated. Prestart is a list of hooks to be run before the container process is executed.
	// It is called in the Runtime Namespace
	//
	// Deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and
	// [Hooks.StartContainer] instead, which allow more granular hook control
	// during the create and start phase.
	Prestart []Hook `json:"prestart,omitempty"`
	// CreateRuntime is a list of hooks to be run after the container has been created but before pivot_root or any equivalent operation has been called
	// It is called in the Runtime Namespace
	CreateRuntime []Hook `json:"createRuntime,omitempty"`
	// CreateContainer is a list of hooks to be run after the container has been created but before pivot_root or any equivalent operation has been called
	// It is called in the Container Namespace
	CreateContainer []Hook `json:"createContainer,omitempty"`
	// StartContainer is a list of hooks to be run after the start operation is called but before the container process is started
	// It is called in the Container Namespace
	StartContainer []Hook `json:"startContainer,omitempty"`
	// Poststart is a list of hooks to be run after the container process is started.
	// It is called in the Runtime Namespace
	Poststart []Hook `json:"poststart,omitempty"`
	// Poststop is a list of hooks to be run after the container process exits.
	// It is called in the Runtime Namespace
	Poststop []Hook `json:"poststop,omitempty"`
}

每个生命周期可以设置一组回调命令,Hook的结构体如下

type Hook struct {
	Path    string   `json:"path"`
	Args    []string `json:"args,omitempty"`
	Env     []string `json:"env,omitempty"`
	Timeout *int     `json:"timeout,omitempty"`
}

从定义中就可以看出来就是执行一个可执行文件。

Process

这个比较复杂,描述了容器内一个进程要运行所需的所有信息

type Process struct {
	// Terminal creates an interactive terminal for the container.
	Terminal bool `json:"terminal,omitempty"`
	// ConsoleSize specifies the size of the console.
	ConsoleSize *Box `json:"consoleSize,omitempty"`
	// User specifies user information for the process.
	User User `json:"user"`
	// Args specifies the binary and arguments for the application to execute.
	Args []string `json:"args,omitempty"`
	// CommandLine specifies the full command line for the application to execute on Windows.
	CommandLine string `json:"commandLine,omitempty" platform:"windows"`
	// Env populates the process environment for the process.
	Env []string `json:"env,omitempty"`
	// Cwd is the current working directory for the process and must be
	// relative to the container's root.
	Cwd string `json:"cwd"`
	// Capabilities are Linux capabilities that are kept for the process.
	Capabilities *LinuxCapabilities `json:"capabilities,omitempty" platform:"linux"`
	// Rlimits specifies rlimit options to apply to the process.
	Rlimits []POSIXRlimit `json:"rlimits,omitempty" platform:"linux,solaris,zos"`
	// NoNewPrivileges controls whether additional privileges could be gained by processes in the container.
	NoNewPrivileges bool `json:"noNewPrivileges,omitempty" platform:"linux,zos"`
	// ApparmorProfile specifies the apparmor profile for the container.
	ApparmorProfile string `json:"apparmorProfile,omitempty" platform:"linux"`
	// Specify an oom_score_adj for the container.
	OOMScoreAdj *int `json:"oomScoreAdj,omitempty" platform:"linux"`
	// Scheduler specifies the scheduling attributes for a process
	Scheduler *Scheduler `json:"scheduler,omitempty" platform:"linux"`
	// SelinuxLabel specifies the selinux context that the container process is run as.
	SelinuxLabel string `json:"selinuxLabel,omitempty" platform:"linux"`
	// IOPriority contains the I/O priority settings for the cgroup.
	IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"`
	// ExecCPUAffinity specifies CPU affinity for exec processes.
	ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"`
}

包括命令、参数、环境变量、linux平台相关一些进程信息。就不细讲了,在后续容器运行过程中涉及到比较重要再单独进行说明就好

Linux平台配置

runc整个就是基于linux平台所提供的容器技术来实现的。包括namespace隔离与cgroup限制等都是使用linux的平台支撑的。

type Linux struct {
	// UIDMapping specifies user mappings for supporting user namespaces.
	UIDMappings []LinuxIDMapping `json:"uidMappings,omitempty"`
	// GIDMapping specifies group mappings for supporting user namespaces.
	GIDMappings []LinuxIDMapping `json:"gidMappings,omitempty"`
	// Sysctl are a set of key value pairs that are set for the container on start
	Sysctl map[string]string `json:"sysctl,omitempty"`
	// Resources contain cgroup information for handling resource constraints
	// for the container
	Resources *LinuxResources `json:"resources,omitempty"`
	// CgroupsPath specifies the path to cgroups that are created and/or joined by the container.
	// The path is expected to be relative to the cgroups mountpoint.
	// If resources are specified, the cgroups at CgroupsPath will be updated based on resources.
	CgroupsPath string `json:"cgroupsPath,omitempty"`
	// Namespaces contains the namespaces that are created and/or joined by the container
	Namespaces []LinuxNamespace `json:"namespaces,omitempty"`
	// Devices are a list of device nodes that are created for the container
	Devices []LinuxDevice `json:"devices,omitempty"`
	// NetDevices are key-value pairs, keyed by network device name on the host, moved to the container's network namespace.
	NetDevices map[string]LinuxNetDevice `json:"netDevices,omitempty"`
	// Seccomp specifies the seccomp security settings for the container.
	Seccomp *LinuxSeccomp `json:"seccomp,omitempty"`
	// RootfsPropagation is the rootfs mount propagation mode for the container.
	RootfsPropagation string `json:"rootfsPropagation,omitempty"`
	// MaskedPaths masks over the provided paths inside the container.
	MaskedPaths []string `json:"maskedPaths,omitempty"`
	// ReadonlyPaths sets the provided paths as RO inside the container.
	ReadonlyPaths []string `json:"readonlyPaths,omitempty"`
	// MountLabel specifies the selinux context for the mounts in the container.
	MountLabel string `json:"mountLabel,omitempty"`
	// IntelRdt contains Intel Resource Director Technology (RDT) information for
	// handling resource constraints and monitoring metrics (e.g., L3 cache, memory bandwidth) for the container
	IntelRdt *LinuxIntelRdt `json:"intelRdt,omitempty"`
	// Personality contains configuration for the Linux personality syscall
	Personality *LinuxPersonality `json:"personality,omitempty"`
	// TimeOffsets specifies the offset for supporting time namespaces.
	TimeOffsets map[string]LinuxTimeOffset `json:"timeOffsets,omitempty"`
}

下一步就是进入容器run