tags: Natural Language Processing Deep Learning
The models for semantic similarity calculation mentioned above are basically double tower structures. The main advantage of the double tower structure is the fast similarity calculation. The fast here refers not to the inference speed of the model's single data, but to the calculation in a large number of question scenarios, such as the recall scenarios of similar questions. Because the double tower model actually produces a single question expression, the similarity calculation is just a simple calculation at the end, and the most time-consuming question expression operation can be completed offline. Cross-encoder performs splicing input when inputting the model, so that the two questions interact deeper, and similar sentence tasks are completed directly in the model, rather than just the semantic representation model of the questions. Therefore, the similar calculation effect of cross-encoder is also significantly better than that of bi-encoder with a double tower structure, but cross-encoder cannot get the vector representation of the question. Similar calculations between a large number of questions require real-time model inference calculations and consume more time. Cross-encoder is suitable for a small number of candidate question scenarios, so we can use it in the sorting stage of question questions to obtain better similar recognition effects.
The model structure is as follows. The bi-encoder with a double tower structure on the left is the cross-encoder structure on the right.

Source of the picture:21 classic deep learning inter-sentence relationship models|Code & Skills - Zhihu
| hfl/chinese-roberta-wwm-ext(5 epoch) | 96.25% | 82.03% | 87.46% |
| hfl/chinese-roberta-wwm-ext-large(5 epoch) | 96.78% | 81.98% | 87.71% |
| hfl/chinese-electra-180g-large-discriminator(5 epoch) | 97.11% | 81.56% | 87.63% |
| hfl/chinese-roberta-wwm-ext(q1,q2 exchange)(1 epoch) | 96.69% | 83.30% | 88.41% |
| hfl/chinese-roberta-wwm-ext(q1,q2 exchange)(5 epoch) | 97.11% | 81.60% | 87.66% |
Tests were carried out on different pre-trained models. As in the table, hfl/chinese-roberta-wwm-ext(390M), hfl/chinese-roberta-wwm-ext-large(1.2G), and hfl/chinese-electra-180g-base-discriminator(1.2G) were used respectively. Perhaps because the training set data volume is already relatively large (50w+), the benefits of the pre-trained model after it becomes larger are not particularly obvious. During training, it was found that the effect of cross-encoder will also improve after exchanging two questions and adding them to the training set again, because the difference in sentence position was eliminated. Another point is that cross-encoder does not require too much training time. It has achieved better results with 1 epoch, which is better than 5 epoch.
The answer is pre-stored, and one or one is given. head -20 vocab.txt The CLS itself is the output of the CLS corresponding to the CLS, which is the result of the interaction of the model for the rear...
1 Fine-tune with Pretrained Models 2 9.2. Fine tuning A common technique in migration learning: fine tuning. As shown in Figure 9.1, fine tuning consists of the follow...
Java's word meaning similarity calculation (semantic recognition, words emotional trend, similarity forest similarity, pinyin similarity, conceptual similarity, literal similarity) 1. Calculation of w...
tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection The article was published in ACL2020. Here is a brief record of the main content of this article. Model...
Original link: The goal of this repo is: to help to reproduce research papers results (transfer learning setups for instance), to access pretrained ConvNets with a unique interface/API inspired by tor...
Write the directory title here Pre-trained model classification system Typical Model Bert SpanBert StructBert XLNet T5 GPT-3 Extension of pre-trained models Knowledge-Enriched PTMs Multilingual and La...
github:https://github.com/makeplanetoheaven/NlpModel/tree/master/SimNet/TransformerDSSM background knowledge byKnowledge Based Questions and Answers (KBQA) - Semantic Dependency Analysis and Code Open...
WordNet Introduction WordNet is a cognitive linguistic English dictionary designed by PRINCETON psychologists, linguists and computer engineers. It is not light to arrange words in alphabetical order,...
Quote: Fan W, Shang J, Li F, Sun Y, Yuan S, Liu JX. IDSSIM: an lncRNA functional similarity calculation model based on an improved disease semantic similarity method. BMC Bioinformatics. 2020 Jul 31;2...
Table of contents Introduce the actual combat code of Sentence_TRANSFORMERS: Semantic similarity calculation: Semantic search Sentence cluster, similar sentence cluster Picture content understanding: ...