tags: computer vision

main idea:
to CHAfter Patch embedding is performed on the image of W, the spatial (global/local) information fusion part and the channel information fusion part are separated.
In MLP-Mixer:
Assuming that a picture is finally divided into 7x7=49 patches, note that the patches here do not overlap each other, which is regarded as a convolution with stride equal to the patch size, and then Per-patch Fully-connected the data in each patch Converted to channel information, and assuming that the number of channels is 512, the MLP1 part is:Token mixing(cross token mixing), I personally feel the same as Depth-wise conv (can be seen as the number of group convolutions in nn.Conv2d = the number of input feature map channels = the number of output feature map channels), and they are all implemented on each channel For the fusion of spatial information, if the previous patch embedding is regarded as local information fusion, then the 7x7 token (feature map) on each channel is regarded as the fusion of global information through MLP.
Afterchannel mixingIt can be regarded as Point-wise conv (1x1 convolution), which fuses channel information for fixed positions.
If there are mistakes, please point out. This blog is a summary of a summary of the summary of the MLP structure articles. Articles directory 1. MLP-Mixer 2. S2-MLP 3. AS-MLP 4. ViP 5. S2-MLPv2 1. MLP...
First of all, the parameters are as follows (enter the picture 3 * 224 * 224): In_Channels: Enter the number of channels, 3. DIM: The number of output channels of the convolution operation is OUT_CHAN...
MLP-Mixer Network Structure Analysis:MLP-MIXER: ALL-MLP Architecture for Vision_hzdh blog - CSDN blog Mixer's Pytorch code is difficult to implementMatrix rotation,USUse the REARRANGE implementation m...
This article is about the latest multi-level perception machine, the main contribution of the article is as follows The author said domineering: "Although convolutional and attention mechanisms h...
Summary At present, computer visual fields use convolutional neural networks (CNN) and self-focused networks (such as VIT). Recently, Google's research team (formerly VIT team) proposes a visual netwo...
Image processing is one of the most interesting sub-areas in machine learning. It starts from multi-layer sensitization, and later, it has been convolved, and later develops the attention mechanism, t...
Document reading (22): mlp-mixer: an all-mlp architecture for vision Summary 1 Introduction 2 Mixer Architecture 3 Experiments 3.1 Main results 3.2 The role of the model scale 3.3 The role of the pre-...
Mlp-mixer: an all-mlp architecture for vision paper interpretation Reference connection: Summary: In the case of no convolution or self-attention, we proposed MLP-Mixer, a architecture specifically ba...
One: MLP-Mixer Refer to an article to teach you a thorough understanding of Google MLP-Mixer (with code) - Articles of people on the moon - Know https://zhuanlan.zhihu.com/p/372692759 Paper link: http...
PX4 mixer file understanding 1 simple mixer 2 multi-rotor mixer Pure multi-rotor 1 simple mixer Take the fixed-wing mixed file as an example. The file is located. Firmware/ROMFS/px4fmu_common/mixers/f...