[Capnp] Cap'n Proto 란

Computer Programs 2022. 11. 7. 11:31

Cap'n Proto 란

구조화된 데이터의 format을 interchange하는 방식. Google ProtoBuf 와 방식 및 스키마는 비슷하지만, 인코딩 디코딩 속도면에서 ∞% 빠르다.

왜빠른가?→ encoding/decoding step이 없다.

인코딩이 Byte 단위로 정의된다. 데이터는 fixed width, fixed offset으로 정렬된다.

Schema Language

Google Protobuf 처럼 메시지 format을 먼저 지정하고, → capnp 컴파일러를 이용해서 원하는 언어로 컴파일.

//myproto.capnp
@0xdbb9ad1f14bf0b36;  # unique file ID, generated by `capnp id`

struct Person {
  id @0 :UInt32;
  name @1 :Text;
  email @2 :Text;
  phones @3 :List(PhoneNumber);

  struct PhoneNumber {
    number @0 :Text;
    type @1 :Type;

    enum Type {
      mobile @0;
      home @1;
      work @2;
    }
  }

  employment :union {
    unemployed @4 :Void;
    employer @5 :Text;
    school @6 :Text;
    selfEmployed @7 :Void;
    # We assume that a person is only one of these.
  }
}

struct AddressBook {
  people @0 :List(Person);
}

sample.cpp

//samepl./cpp
#include "addressbook.capnp.h"
#include <capnp/message.h>
#include <capnp/serialize-packed.h>
#include <iostream>

void writeAddressBook(int fd) {
  ::capnp::MallocMessageBuilder message;

  AddressBook::Builder addressBook = message.initRoot<AddressBook>();
  ::capnp::List<Person>::Builder people = addressBook.initPeople(2);

  Person::Builder alice = people[0];
  alice.setId(123);
  alice.setName("Alice");
  alice.setEmail("alice@example.com");
  // Type shown for explanation purposes; normally you'd use auto.
  ::capnp::List<Person::PhoneNumber>::Builder alicePhones =
      alice.initPhones(1);
  alicePhones[0].setNumber("555-1212");
  alicePhones[0].setType(Person::PhoneNumber::Type::MOBILE);
  alice.getEmployment().setSchool("MIT");

  Person::Builder bob = people[1];
  bob.setId(456);
  bob.setName("Bob");
  bob.setEmail("bob@example.com");
  auto bobPhones = bob.initPhones(2);
  bobPhones[0].setNumber("555-4567");
  bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
  bobPhones[1].setNumber("555-7654");
  bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
  bob.getEmployment().setUnemployed();

  writePackedMessageToFd(fd, message);
}

void printAddressBook(int fd) {
  ::capnp::PackedFdMessageReader message(fd);

  AddressBook::Reader addressBook = message.getRoot<AddressBook>();

  for (Person::Reader person : addressBook.getPeople()) {
    std::cout << person.getName().cStr() << ": "
              << person.getEmail().cStr() << std::endl;
    for (Person::PhoneNumber::Reader phone: person.getPhones()) {
      const char* typeName = "UNKNOWN";
      switch (phone.getType()) {
        case Person::PhoneNumber::Type::MOBILE: typeName = "mobile"; break;
        case Person::PhoneNumber::Type::HOME: typeName = "home"; break;
        case Person::PhoneNumber::Type::WORK: typeName = "work"; break;
      }
      std::cout << "  " << typeName << " phone: "
                << phone.getNumber().cStr() << std::endl;
    }
    Person::Employment::Reader employment = person.getEmployment();
    switch (employment.which()) {
      case Person::Employment::UNEMPLOYED:
        std::cout << "  unemployed" << std::endl;
        break;
      case Person::Employment::EMPLOYER:
        std::cout << "  employer: "
                  << employment.getEmployer().cStr() << std::endl;
        break;
      case Person::Employment::SCHOOL:
        std::cout << "  student at: "
                  << employment.getSchool().cStr() << std::endl;
        break;
      case Person::Employment::SELF_EMPLOYED:
        std::cout << "  self-employed" << std::endl;
        break;
    }
  }
}

참고로 ProtoBuf의 scehma와 인코딩방법

capnp compile -oc++ myproto.capnp

컴파일 하면

myproto.capnp.h, myproto.capnp.c++가 생성된다.

사용할 코드에서 capnp.h 헤더를 추가하여 코드 작성하면 됨.

Encoding

8byte 또는 64로 정의됨. 데이터 정렬이 중요하기 때문에 word boundary에 맞춰 정렬되고, 크기는 word로 표현된다.

capnp에서 통신 단위는 Message. Message는 여러 Segment로 분할될 수 있으며 각 segment는 byte크기의 flat-blob임.

Message의 각 Segment에는 object 가 포함됨. object는 걔를 가리키는 pointer를 가질 수 있는 모든 값. ( 😒 )

Value Encoding

Primitive Values

Void : 인코딩 ❌, 정보전달 ❌
Bool : 1 - bit, ( 1: true, 0:false )
Integers : little-endian format, two’s complement.
Float : little-endian, IEEE-754 format [sign|exponent|fraction]

Object Encoding

Blobs

Data : Pointer로써 인코딩. List(UInt8)
Text : Data와 비슷, 내용이 UTF-8이어야함. 마지막 바이트는 0.

Structs

content를 가리키는 포인터가 인코딩. content는 데이터와 포인터 두 부분으로 나뉨. ( 기본 값은 0, 크기가 0 이 아닌 null pointer )

lsb                      struct pointer                       msb
+-+-----------------------------+---------------+---------------+
|A|             B               |       C       |       D       |
+-+-----------------------------+---------------+---------------+

A (2 bits) = 0, to indicate that this is a struct pointer.
B (30 bits) = Offset, in words, from the end of the pointer to the start of the struct's data section.  Signed.
C (16 bits) = Size of the struct's data section, in words.
D (16 bits) = Size of the struct's pointer section, in words.

Lists

value의 array에 대한 pointer 를 인코딩

lsb                       list pointer                        msb
+-+-----------------------------+--+----------------------------+
|A|             B               |C |             D              |
+-+-----------------------------+--+----------------------------+

A (2 bits) = 1, to indicate that this is a list pointer.
B (30 bits) = Offset, in words, from the end of the pointer to the start of the first element of the list.  Signed.
C (3 bits) = Size of each element:
    0 = 0 (e.g. List(Void))
    1 = 1 bit
    2 = 1 byte
    3 = 2 bytes
    4 = 4 bytes
    5 = 8 bytes (non-pointer)
    6 = 8 bytes (pointer)
    7 = composite (see below)
D (29 bits) = Size of the list:
    when C <> 7: Number of elements in the list.
    when C = 7: Number of words in the list, not counting the tag word

Inter-Segment Pointers

pointer가 다른 segment를 가리켜야 하는 경우, 원거리 포인터

lsb                        far pointer                        msb
+-+-+---------------------------+-------------------------------+
|A|B|            C              |               D               |
+-+-+---------------------------+-------------------------------+

A (2 bits) = 2, to indicate that this is a far pointer.
B (1 bit) = 0 if the landing pad is one word, 1 if it is two words.
C (29 bits) = Offset, in words, from the start of the target segment to the location of the far-pointer landing-pad within that segment.  Unsigned.
D (32 bits) = ID of the target segment.  (Segments are numbered sequentially starting from zero.)

Serialization Over a Stream

Message 전송 시, segment는 실제 데이터 전송하기 전에 frame 화 되어야 함. → Byte Stream을 통해 전송.

4 bytes : number of segments - 1,
N*4 bytes : size of each segments ( in words )
0 or 4 bytes : Padding for next boundary
The content of each segment, in order

Packing

Simple compression scheme. 패딩 바이트나 작은 값, 0바이트 값은 거를 없앰

패킹 되었을 떄, message 의 각 word는 tag byte 다음에 0~8의 contents byte로 축소됨. tag byte의 비트는 압축하지 않았을 때 word의 byte에 해당, 최 하위 비트는 첫번째 byte에 해당. 0bit는 해당 byte가 0임을 나타냄.

tag다음에는 0이 아닌 byte들이 packing 됨.

unpacked (hex):  08 00 00 00 03 00 02 00   19 00 00 00 aa 01 00 00
packed (hex):  51 08 03 02   31 19 aa 01

특별한 것들 → 0x00, 0xff

'Computer Programs' 카테고리의 다른 글

[Protocol Buffer] google protobuf 란 (0)	2021.06.29
[ZeroMQ] Socket 생성 및 동작 확인 및 구현 (0)	2021.06.29
[ZeroMQ] Mac Os BigSur & Linux Ubuntun 18.04 에서 설치 및 helloworld 테스트 (0)	2021.06.28
[CMake] CMakeLists.txt 작성하기 (0)	2021.06.28

ABOUT ME

kkanalog

Cap'n Proto 란

Schema Language

Encoding

Value Encoding

Object Encoding

Serialization Over a Stream

Packing

'Computer Programs' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Cap'n Proto 란

Schema Language

Encoding

Value Encoding

Object Encoding

Serialization Over a Stream

Packing

'Computer Programs' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바