Why Docker? namespace, cgoups, aufs

김지밍 2015. 12. 23. 19:24

2015. 12. 23. 19:24

##### Namespaces, Cgroups and Docker

### 0. System Design

1. Container

- Host provides Kernel

- Filesystem, network interface, etc are already there

- Guests starts from /sbin/init

2. Application Container

- Host provides Kernel

- User data, socket fd, etc are already there

- starts from application not init

### 1. Namespaces

: 하나의 system에서 시스템 자원을 가상화하여 namespace로 구분되는 각각의 독립된 공간을 만들어주는 기술

<linux에서 제공하는 namespaces>

Namespace	flags	Isolates
IPC	CLONE_NEWIPS	System V IPC, POSIX message queues (프로세스 간 격리)
Network	CLONE_NEWNET	Network devices, stacks, ports, etc. (network interface, iptables, ..).
Mount	CLONE_NEWNS	Mount points(file system의 mount 지점을 분할하여 격리)
PID	CLONE_NEWPID	Process IDs
User	CLONE_NEWUSER	User and group IDs
UTS	CLONE_NEWUTS	Hostname and NIS domain name

* Linux namespace는 namespaces API를 사용하는데,

"clone", "setns", "unshare"이라는 system call을 포함하며 flags를 사용하여 각각의 namespace를 구현

(linux man page 참고. http://linux.die.net/man/2/)

1. File System

a. read-only, mount RO /usr inside a container

b. shared - sharing data across containers via binds

c. slave

d. private - /tmp per service

=> docker에서는 aufs를 사용

2. Networking

a. Root namespace

- docker container 내부에 network interface가 172.17.0.1/24로 네트워크가 할당 됨을 확인할 수 있음

$ ifconfig

eth0 ~

- Full access to the machine interfaces

fast, easy to get setup, Network looks normal to the container

but, No separation of concerns, Container has full control .......-> MAC addresses

b. Bridging

- docker 외부에는 브릿지 네트워크로 가상 브릿지가 생성됨

$ ifconfig

/eth952NTB ~

$ bridge show

docker0 veth952NTB

- More complex to get setup

- Network looks normal to the container

but, Less speed, NAT to the internet, iptables to expose public socket

c. Private namespace with socket activation

- No inerface

- Sockets are passed via stdin (inetd)

- systemd style listen fd API

3. Process Namespace

: PID I is something else outside the namespace

### 3. Cgroups

- Control Groups은 실행 프로세스가 사용하는 block I/O, CPU, memory 등의 자원을 제한하고 감시하고 계산하는 Linux Kernel의 기능

- 실행 프로세스들의 그룹을 만드는 역할을 수행하며 이런 그룹은 hirachy를 가지게 됨

- 시스템의 자원 할당, 우선순위 지정, 거부, 관리, 모니터링 등의 제어 기능을 수행하므로 자원의 효율성을 향상시킴

- 단순 그룹핑을 제공하므로 실제 자원 분배를 위해서는 각 자원마다 해당하는 subsystems가 필요함

: blkio, cpu, cpuacct, cpuset, devices, momory, net_cls, ns(namespace subsystem), ..

1. Block I/O

weighting system

iops serviced, waiting and queued

2. CPU

shares system

cpuacct.stats user and system

3. Memory

total RSS momory limit

swap, total rss, # page ins/outs

### 4. tools

1. docker

2. nspawn

3. nsenter

4. /sys/fs/cgroup

5. systemd units

참고.

Modern Linux Servers with cgroups - Brandon Philips, CoreOS, YouTube

https://wiki.archlinux.org/index.php/Cgroups

http://man7.org/linux/man-pages/man7/namespaces.7.html

https://access.redhat.com/documentation/ko-KR/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html

--------------------------------------------------------------------------------------------------------

#####

해당 포스팅이 시발점.

http://idchowto.com/?p=19919

### 위의 포스팅 내용 정리

Linux에서 보안과 성능의 향상을 위해 리소스의 사용을 제한하고 사용자와 분리하고자 함

Cgroups을 통해 보안을 강화하였고 자원의 할당량을 조절하였으나 가상화나 사용자 격리에 효율적이지 못 하였음.

namespace를 강화하기 시작함 (커널 2.6.x ~ 3.8, 오랫동안 공들임)

특히 네트워크 관점에서의 namespace에서,

트래픽을 network namespace로 분산한 만큼 방화벽 룰셋이 줄어들며, 결과적으로 지연시간도 줄어들게 됨

또한 network namespace에 해당하는 conntrack은 slab allocator를 통해 관리되는데,

conntrack 또한 줄어들게 되므로 자원의 효율에 있어서 아주 효과적임.

(하나의 룰셋을 찾기위해 검색해야할 conntrack의 범위를 한정 시켜주기 때문에)

또한 Docker는 lxc+aufs를 제공하기 위해 특화되었으며 이 결과 docker image reposigory가 활성화 됨

사용자 이미지의 저장 및 배포가 굉장히 단순화 되어 빠른 배포를 구현

이는 기존의 다른 가상화보다 docker가 인기있는 이유임

###

# slab allocator

http://jiming.tistory.com/131

# conntrack

http://manpages.ubuntu.com/manpages/trusty/man8/conntrack.8.html

# VFS와 UnionFS

리눅스에서는 VFS을 통해서 다양한 파일시스템을 지원

* VFS : http://jiming.tistory.com/127

- UnionFS은 몇개의 branch를 가지며 여러 타입의 파일시스템을 지원한다.

- UnionFS 내에서 각각의 branch들은 precedence를 할당받고, higher precedence의 branch는 lower precedence의 branch를 override 한다.

- branch는 Directory에 동작한다.

: 2개의 directory에 걸쳐있는 하나의 branch가 있다면,

UnionFS의 그 directory(high-level)의 내용과 속성은 두 directories(low-level)의 조합이다.

- UnionFS는 사용자가 알지 못하게 자동으로 중복된 directory entreies를 처리한다.

- 만약 두 branch에 하나의 파일이 존재한다면

UnionFS file의 속성과 내용은 높은 우선순위의 branch의 파일에 따르며 낮은 우선순위의 내용은 무시된다.

(http://www.slideshare.net/endhrk/introduction-to-docker-36472476)

# AUFS

: short for Advanced multi layered Unification File System

- Developed by Junjiro Okajima in 2006

- 이전의 UnionFS에서 완전히 새로쓰임

- 신뢰성과 성능 개선을 목적으로 like writable branch balancing같은 새로운 컨셉이 소개됨

- Linux File System의 Union Mount를 구현

- AUFS의 처음 full name은 "Another UnionFS" 이였으나, version 2부터 "Advanced multi layered Unification File System"로 바뀜

- aufs는 writable branch balancing 기능을 제공

몇개의 파티션을 writing 하는 동안 setup을 할 수 있음

=> 다른 여러개의 파일 시스템을 하나로 통합하여 사용할 수 있음

모든 새로운 파일이나 수정된 파일들끼리 여유 메모리 공간, 부모 directory의 존재, 랜덤하거나 그렇지 않은 것(??)들을 기반으로 분할 됨

=>이미지에 대한 변경사항을 메모리에 저장해 두고 새로 쓰거나 변경된 파일을 읽을 경우 메인메모리에서 저장한 파일을 로드함

이러한 데이터는 USB 메모리에도 저장이 가능하여 다양하게 활용될 수 있음

* Why to use AuFS instead of unionfs ?

: http://www.unionfs.org/

(UnionFS, AUFS가 다른 점 좀 더 찾아보기)

# Docker와 AuFS

: Docker는 LxC와 AuFS의 제공을 위해 시작. 즉 file system으로 AuFS를 사용

- copy-on-write 제공

=> Docker에서 이미지 생성 시, base image에 포인터로 연결을 한 후 이미지를 생성하므로 디스크 write가 발생하지 않고 base image에서 파일시스템으로 분기할 때 해당 내용을 디스크에 기록

- stacking, 변경된 이미지로 커밋은 새로운 파일 시스템 레이어로 구현됨

(http://www.slideshare.net/colinsurprenant/docker-introduction-dev-ops-mtl)

가벼운 이미지의 구현은, Github와 같은 Docker Hub라는 docker만의 repository를 탄생시키게 되었고

Docker 생태계에 큰 중추가 되고있음.

저작자표시 비영리 변경금지

'Virtualization + Cloud > Etc.' 카테고리의 다른 글

ssh-keygen -R (0)	2015.11.27
[번역] Running Docker.io Under Ubuntu Linux (0)	2015.11.19
[DPA] 데이터플레인 가속화 기술동향 (2) (0)	2015.04.29
[DPA] 데이터플레인 가속화 기술동향 (1) (0)	2015.04.29
[Network Virtualization] OpenStack + SDN (0)	2015.04.23

지밍이 블로그

Why Docker? namespace, cgoups, aufs

'Virtualization + Cloud > Etc.' 카테고리의 다른 글

+ Recent posts

티스토리툴바