tags: text match natural language processing deep learning machine learning
The paper "Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring" opens up a new transformer architecture, namely Poly-encoder.
Poly-encoder learns global rather than token-level self-attention features, and at the same time solves the problem of low matching quality of Bi-encoder and the slow matching speed of interactive cross-encoders such as ARC-II and BERT.

As shown in the figure, Bi-encoder uses two identical networks (such as BERT) to obtain the respective feature vectors of the two sentences, and then calculates the similarity.
Calculate the context vector: y c x t = r e d ( T 1 ( C T X T ) ) y_{cxt}=red(T_1(CTXT)) ycxt=red(T1(CTXT))
Calculate the candidate vector:
y
c
a
n
d
=
r
e
d
(
T
1
(
C
A
N
D
)
)
y_{cand}=red(T_1(CAND))
ycand=red(T1(CAND))
Three calculation methods of red():
List item takes the vector of the first position (CLS)
Find the average of all position vectors
Find the average of the first n position vectors
Similarity score: dot-product (dot product), ie s{CTXT, CAND}= y c x t ⋅ y c a n d y_{cxt} ·y_{cand} ycxt⋅ycand
Training, cross-entopy (cross entropy)
Generally speaking, two networks are used to encode context and candidate into vectors, and finally a similarity discriminant function (such as cosine, dot-product) is used to calculate the similarity between the two vectors.
The characteristic of Bi-encoder is that the process of encoding context and candidate into vectors is independent (the encoding process does not interact), and the interaction starts at the end. The speed is fast but the matching quality is not high.
Therefore, this independent encoding method allows us to calculate the vectors of all candidates offline. When serving online, we only need to calculate the vector of the query and then perform similarity matching. The matching speed is fast but the quality cannot reach the best.
As shown in the picture:


Reference 1
Reference 2
The source code of the original strategy is as follows: #Strategy overview: With ROE as the screening criterion, select the stocks in the Shanghai and Shenzhen 300 that meet the conditions as the stoc...
The source code of the original strategy is as follows: #RSRS indicator timing strategy Import function library from jqdata import * import pandas as pd import numpy as np from sklearn import linear_m...
The source code of the original strategy is as follows: # Multi-factor backtest complete template (screening and trading conditions are stronger than ‘strategy generator’) import pandas as...
The source code of the original strategy is as follows: # HS300 – Random forest turning point recognition import math import numpy as np import pandas as pd from pandas import DataFrame,Series i...
The source code of the original strategy is as follows: Get API new skills, write strategies in research and backtest def initialize(context): set_params () # 1 Set policy parameters set_variables () ...
The source code of the original strategy is as follows: # Multi-factor model without model evaluation import pandas as pd import numpy as np import math from sklearn.svm import SVR from sklearn.model_...
The source code of the original strategy is as follows: #Low valuation + TRIX + RSI low retracement strategy import jqdata Import talib library named tl import talib as tl Import numpy library named t...
The source code of the original strategy is as follows: kdj index with accer filtering Import function library import jqdata import talib_real,bot_seller from jqlib.technical_analysis import * Initial...
The source code of the original strategy is as follows: #Trend tracking strategy Import function library import jqdata import talib import pandas as pd import numpy as np import datetime from datetime...
The source code of the original strategy is as follows: #Large market capitalization strategy from prettytable import PrettyTable import numpy as np import talib import pandas import scipy as sp impor...