With the development of the Internet network, various new business application models have appeared, and the demand for the construction of Internet site data centers has also continued to increase, which promotes the continuous development of data centers from architecture to technology. As a representative type of business application model, search business is not only a necessary functional component in large professional search websites, but also in any application that provides massive information, and it also poses challenges to the traditional network architecture of data centers.
Challenges of Search Business Traffic Model to Data Center Network System
1. Search business structure
The search business application model is a high-performance computing application model with a Browser/Server/Server structure. The following takes the business model of a large search service provider as an example for structural analysis.
According to the definition of search service, the server is generally divided into two partitions: UI (User Interface) server and BS (Back Server) server. The UI interface server is responsible for maintaining the TCP long connection with all BS servers, and after receiving the search request from the user, it sends the search request to the BS in the form of unicast.
After the server in the BS server cluster receives the request, it searches the local database and responds the result to the UI server. The UI server receives the response for further processing and returns it to the user.
A layer of aggregation server can also be deployed between the UI server and the BS server, in order to reduce the convergence ratio. However, such deployment will increase the response delay, so the aggregation server layer is often omitted in actual network deployment, and the ratio of UI server to BS server is sometimes as high as 1:300.
2. Search traffic characteristics
The packet loss in the search service is defined as the packet loss at the application layer. After the UI server sends a request, it starts timing when it receives the first response packet returned by the BS server cluster, and waits for 200ms. If all response packets cannot be received within 200ms, it is considered that there is packet loss.
The definition of 200ms actually excludes the retransmission mechanism of the TCP layer. Such a definition is derived based on the performance requirements of search engine applications. For this reason, search service providers generally develop corresponding monitoring programs to monitor the situation of service packet loss in real time.
Generally, search services have the following key traffic characteristics:
1) Concurrency
According to the foregoing description of the search business process, after receiving the request from the UI server, all BS servers will send the obtained search results back to the UI server. This means that when the BS server is busy, there will be a certain time difference in the response of the search results, and when all the BS servers are idle, the actions of returning the search results will be completed concurrently. Therefore, the more idle the BS server cluster is, the more obvious the concurrency is.
2) Sudden
Burst is also related to the busyness of the BS server, and usually occurs with concurrency. The servers all use GE uplinks, and the size of the application layer message for each response is 30KB, about 20 frames of 1500 bytes. The GE link can complete the transmission in sub-millisecond time. When there are a large number of BS servers responding to UI queries, the instantaneous traffic burst is extremely serious.
3) Inability to lose
According to the popularity of the search object, the content searched by the BS server will be slightly different. The search results for popular words can usually reach about 30KB, so there is fragmentation at the TCP layer. According to the definition requirements of the search service for packet loss, as long as the network device loses a packet fragment when transmitting such a response packet, it can be considered that the query to the BS server fails. When the BS server is idle and the amount of concurrent data is extremely large, if the performance of the network device cannot meet the forwarding requirements of concurrent data, resulting in more packet loss, the client may not be able to open the search result page through the browser query.
3. The challenge of sudden search traffic
A UI request will generate a traffic peak. Some domestic search service providers will also use secondary search, and the traffic will double.
When the traffic in the switching network system is suddenly congested, if the cache scheduling capability of the system is limited, the centralized service access will inevitably lead to packet loss in the case of traffic bursts, causing window sliding, retransmission and further deterioration of the traffic environment at the transport layer, reducing services. Responsiveness.
However, the traditional switching method can only distinguish and schedule up to 8 kinds of flows, and the service capability is limited. The network cache cannot solve the high-throughput burst access business problem of key applications.
Data Center Network Solutions for Search Traffic Models
In order to solve the key performance problems such as the scheduling of high-density applications in the data center and the sudden impact of traffic surges, it is necessary to carry out technical innovations for the basic network architecture design of the data center switching platform under this model.
1. Hardware flow control and service scheduling
The first is to provide hardware-based traffic management capabilities on the switching platform. At present, the industry mainly deploys forwarding chips that implement this function on interface boards to precisely control and manage the inbound and outbound traffic.
The large-capacity cache and intensive hardware scheduling queues expand the scheduling capability to tens of thousands of queues. Once the upper-layer application data flow enters the corresponding hardware queues, a wide range (far more than 8 queues) of data center-level services can be implemented Scheduling capabilities.
Another technological change is to change the outgoing port caching method of the traditional switching system and adopt a distributed ingress caching architecture. In the traditional egress port buffering method, since the service burst load capacity of the entire system is only determined by the buffer size that can be allocated on the egress port, the capacity is fixed, and the traffic reaches a certain burst limit, that is, the instantaneous burst data volume exceeds When the size of the outgoing port cache is increased, it will cause packet loss in the entire system. This problem is caused by its design limitations, and can only be completely solved by fundamentally changing this caching method.
Distributed cache technology is different from traditional architecture. As shown in Figure 4, in the normal forwarding process, the outgoing port forwards data at a wire speed of no more than 10 Gigabit. The packet tag queue count of the port will increase rapidly, and when it reaches a certain threshold, the end-to-end flow control information will be notified to the ingress interface of each traffic.
The ingress interface cache will start to cache the burst traffic locally and stop or reduce the traffic flow. The exit sends the data, while the exit still sends at 10 Gigabit line speed. When the outgoing port is released from the congestion state, the flow control information informs the ingress interface to release the buffer state, and forwards the data traffic in the direction of the ingress interface normally.
The entire distributed cache mechanism is coordinated by the traffic manager at the hardware level without software participation, so it works at the system clock level. Moreover, the buffer size of each ingress interface requires a burst traffic buffer capacity of 200 milliseconds under the condition of 10 Gigabit full line speed. Therefore, in the case of instantaneous congestion caused by bursts, the buffer capacity of N ports forwarding to one port is N* 200 milliseconds, which is a substantial improvement compared with the traditional egress port cache fixation capability.
Concluding remarks
The search business is only a typical representative of the new business model. Other new applications, including P2P, video sharing, WEB2.0, e-books, and cloud computing, require data center networks to provide strong support for the stable operation of their business models. From the perspective of network technology, the main development direction to meet new business challenges will be: higher speed, more comprehensive, faster, more stable, more reliable, and more refined.