tags: ceph
For AsyncMessenger, when a new connection request arrives, epoll will listen to the event and call the listener function listen_handler (C_processor_accept) corresponding to listenfd. In the listen_handler, the worker with the least number of references will be obtained (for load balancing), and the worker will be changed. To handle this connection request. The specific approach is
(1) Call listen_socket.accept(&cli_socket, opts, &addr, w); to perform the accept class on the connection request.
(2) call msgr->add_accept(w, std::move(cli_socket), addr), add_accept is implemented as follows
AsyncConnectionRef conn = new AsyncConnection(cct, this, &dispatch_queue, w);
conn->accept(std::move(cli_socket), addr);
accepting_conns.insert(conn);
The constructor of AsyncConnection is as follows ‘
AsyncConnection::AsyncConnection(CephContext *cct, AsyncMessenger *m, DispatchQueue *q,
Worker *w)
: Connection(cct, m), delay_state(NULL), async_msgr(m), conn_id(q->get_id()),
logger(w->get_perf_counter()), global_seq(0), connect_seq(0), peer_global_seq(0),
state(STATE_NONE), state_after_send(STATE_NONE), port(-1),
dispatch_queue(q), can_write(WriteStatus::NOWRITE),
keepalive(false), recv_buf(NULL),
recv_max_prefetch(std::max<int64_t>(msgr->cct->_conf->ms_tcp_prefetch_max_size, TCP_PREFETCH_MIN_SIZE)),
recv_start(0), recv_end(0),
last_active(ceph::coarse_mono_clock::now()),
inactive_timeout_us(cct->_conf->ms_tcp_read_timeout*1000*1000),
msg_left(0), cur_msg_size(0), got_bad_auth(false), authorizer(NULL), replacing(false),
is_reset_from_peer(false), once_ready(false), state_buffer(NULL), state_offset(0),
worker(w), center(&w->center)
{
read_handler = new C_handle_read(this);
write_handler = new C_handle_write(this);
wakeup_handler = new C_time_wakeup(this);
tick_handler = new C_tick_wakeup(this);
recv_buf = new char[2*recv_max_prefetch];
state_buffer = new char[4096];
}
You can see that an AsyncConnection is associated with a worker and is also associated with a center. At the same time, some callback functions are set in the AsyncConnection constructor.
In the add_accept function, AsyncConnection::accept is also called to notify the thread function of the worker that the connection has been established. The accept implementation is as follows
cs = std::move(socket);
socket_addr = addr;
state = STATE_ACCEPTING;
center->dispatch_event_external(read_handler);
Dispatch_event_external will add the callback function read_handler to the external_events of the center, and then wake up the thread handler to handle external_events. The AsyncConnection::process will eventually be called in the read_handler callback function, and the event is generated and processed in another thread (by the thread that previously accepted the connection request, to the thread that handles the data transceiving).
In AsyncConnection::process, it will choose what to do based on the state of the connection.
(1) The state of the state at the beginning is STATE_ACCEPTING, in which state, call
center->create_file_event(cs.fd(), EVENT_READABLE, read_handler) The connected descriptor is added to the listening range of epoll. Then call try_send(bl); to send the CEPH_BANNER("ceph v027") string to the other party. Finally set the state to STATE_ACCEPTING_WAIT_BANNER_ADDR.
(2) In the STATE_ACCEPTING_WAIT_BANNER_ADDR state, the CEPH_BANNER and the other party's address information (ceph_entity_addr) that the other party responded are read, as follows
read_until(strlen(CEPH_BANNER) + sizeof(ceph_entity_addr), state_buffer);
Then set the state to STATE_ACCEPTING_WAIT_CONNECT_MSG.
(3) In the STATE_ACCEPTING_WAIT_CONNECT_MSG state, the connect_msg information will be read as follows
read_until(sizeof(connect_msg), state_buffer)
Then set the state to STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH
(4) In the STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH state, the authentication information in the message is first read, and then placed in authorizer_bl, which is a bufferlist dedicated to the identifier. The handle_connect_msg is then called according to the values of authorizer_bl and authorizer_reply to handle the connection.
In AsyncConnection::handle_connect_msg(), first determine whether the connection exists according to peer_addr. If the connection exists, you can perform subsequent operations, perform some processing on the connection, and then call AsyncConnection::_reply_accept( Send the reply message to the peer. When sending the message, there is a flag. If the message can be accepted, CEPH_MSGR_TAG_SEQ is replied as flag, and then the value of state is set to STATE_ACCEPTING_WAIT_SEQ.
(5) In the STATE_ACCEPTING_WAIT_SEQ state, the confirmation information is read into the state_buffer, and then the message is prioritized according to the confirmation information, if the high priority message is processed first. Finally, set the value of state to STATE_ACCEPTING_READY, that is, you can accept the message.
(6)
In the STATE_ACCEPTING_READY state, the main operation is to print the information that accept is completed, then clear the data structure connect_msg for the connection, and finally set the value of state to STATE_OPEN.
(7) First, the identification information tag is read out in the STATE_OPEN state. If the tag is CEPH_MSGR_TAG_MSG, the identifier of the message is read, and the value of state is set to STATE_OPEN_MESSAGE_HEADER, otherwise some other processing is performed.
(8) Read the header of the message in the STATE_OPEN_MESSAGE_HEADER state, and then perform some CRC-like verification work. If a bad message is received, the current operation is interrupted, and the error message is returned. If there is no problem, the value of the state is set. For STATE_OPEN_MESSAGE_THROTTLE_MESSAGE, proceed to the next message read operation.
(9) In the STATE_OPEN_MESSAGE_THROTTLE_MESSAGE to judge the message, call the get_or_fail function to determine whether the policy's throttler_messages can also accommodate a message, if not, you need to add a wakeup_handler callback function to create_time_event, and set the callback after 1000ms, wakeup_handler will eventually call AsyncConnection ::process, and enter here according to the value of state.
If the state is normal, the value of state is set to STATE_OPEN_MESSAGE_THROTTLE_BYTES.
(10) Calculate the operation of the currently received message header in the STATE_OPEN_MESSAGE_THROTTLE_BYTES state, then add the timestamp and also call
policy.throttler_bytes->get_or_fail(cur_msg_size)
Determine whether the size of cur_msg_size can be accommodated in throttler_bytes, and finally set the value of state to STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE.
(11) Called under STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
dispatch_queue->dispatch_throttler.get_or_fail(cur_msg_size)
Determine whether the size of cur_msg_size can be accommodated in the dispatch_throttler of dispatch_queue. Then set the state to STATE_OPEN_MESSAGE_READ_FRONT.
(12) In the STATE_OPEN_MESSAGE_READ_FRONT state, the read_until() function is called to read the header of the message (unlike the previous header check information, which is the front part of the data) into the front, and the front is a bufferlist defined in the AsyncConnection. Structure, specifically for storing the header of the message. Set the value of state to STATE_OPEN_MESSAGE_READ_MIDDLE when done.
(13) In the STATE_OPEN_MESSAGE_READ_MIDDLE state, just like reading the header data, the read_until() function is called to read the middle part of the message into the middle. The middle is also the structure of a bufferlist defined in the AsyncConnection, which is used to store the middle of the message. section. Set the value of state to STATE_OPEN_MESSAGE_READ_DATA_PREPARE after completion.
(14) In the STATE_OPEN_MESSAGE_READ_DATA_PREPARE state, the preparation of reading the message data portion is performed, for example, determining whether the data structure of the data portion in the received message is sufficient to accommodate the data, and if the space size of the structure of the received data of the existing application cannot be accommodated Data, then re-apply the space size for its use, if not, do not operate, and finally set the state value to STATE_OPEN_MESSAGE_READ_DATA, the actual data part of the message.
(15) In the STATE_OPEN_MESSAGE_READ_DATA state, a while loop is used to read the data carried by the message until the data is readable, and the loop is skipped. The message is read into the data in the loop, and data is a bufferlist defined in the AsyncConnection. The structure is dedicated to the data portion of the message. If the terminal does not finish reading the current operation of the terminal, waiting for the next read data, and finally set the value of state to STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH.
(16)
mainly reads the tail of the message in the STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH state, then processes the read message and distributes it for the registered Dispatcher to process. If the current AsyncMessenger instance can handle fast_dispatch, The dispatch_queue->fast_dispatch(message) will be called; the message will be forwarded quickly, otherwise dispatch_queue->enqueue(message, message->get_priority(), conn_id) will be called to perform the enqueue operation.
Why do 80% of the code farmers can't be architects? >>> Ceph-deploy source code analysis (four) - osd module Original:http://www.hl10502.com/2017/06/21/ceph-deploy-osd/#more The o...
Why do 80% of the code farmers can't be architects? >>> Ceph-deploy source code analysis (two) - new module original:http://www.hl10502.com/2017/06/19/ceph-deploy-new/#more Ceph-d...
The above class diagram simply draws the relationship between the key classes, where there may be errors. AsyncMessenger constructs the typing process 1: Generate DispatchQueue object 2: Generate Asyn...
The jewel+ version supports the rbd-nbd features. If you need map to support rbd images with more features, you can use the nbd driver NBD(Network Block Device) The disk space of a...
Ceph Timer source code analysis The ceph timer is mainly used to implement certain timing tasks, such as heartbeat between osd and heartbeat between monitors. source file: src/common/timer.h src/commo...
Article Directory 1. MDSContext::vec Object code Key words Parsing 2. C_IO_Wrapper Object code Key words Parsing 3. MDSGatherBuilder Object code Key words Parsing to sum up This article selects severa...
The excerpt of this section comes from the introduction in the book "Ceph Source Code Analysis" by Huazhang Publishing House, author Chang Tao. For more chapters, please visit the public acc...
Why can't 80% of programmers be architects? >>> The module of librados is used on the client to access the rados object storage device, and its structure is as follows: As shown i...
1 Overview One of the main functions of ceph monitor is to use the paxos distributed protocol to maintain the consistency of a key/value database (the most important thing is the consistency of each m...