tags: Natural language processing pytorch Depth study
Use fastText training classification model
import fasttext
import pandas as pd
import jieba
# Training data path
train_data_path = "./name_question.train"
# Verify data path
valid_data_path = "./name_question.valid"
#
word_train_data_path = "./name_question.train.2"
word_valid_data_path = "./name_question.valid.2"
# Read training data, separator is space
df1 = pd.read_csv(train_data_path, header=None, sep=" ")
df2 = pd.read_csv(valid_data_path, header=None, sep=" ")
# #
df1[1] = pd.DataFrame(map(lambda x: " ".join(jieba.cut(x)), df1[1]))
df2[1] = pd.DataFrame(map(lambda x: " ".join(jieba.cut(x)), df2[1]))
#
df1.to_csv(word_train_data_path, header=False, index=False, sep="\t")
# Generate verification data
df2.to_csv(word_valid_data_path, header=False, index=False, sep="\t")
# Take model training, here we set N-Gram features to 2
# Other parameters are used by default, such as: Embed_size is 100, the number of training wheels EPOCH is 5
# model = fasttext.train_supervised(input=word_train_data_path, wordNgrams=2)
# Model tuning using the automatic super-parameter tuning method
# AutotuneValidationFile parameters Require specified path where the verification dataset is located,
# It will use a random search method to use a random search method on the verification set.
# Use the autotuneduration parameter to control the time of random search, the default is 300s,
# According to different needs, we can extend or shorten the time.
model = fasttext.train_supervised(
input=word_train_data_path,
autotuneValidationFile=word_valid_data_path,
autotuneDuration=600,
wordNgrams=2,
)
After we have verified on the verification set
valid_result = model.test(word_valid_data_path)
print(valid_result)
# Save the model
import time
time_ = int(time.time())
model_save_path = "./name_question_{}.bin".format(time_)
model.save_model(model_save_path)
# Loading model
#
model = fasttext.load_model(model_save_path)
# Predict the sample
Result = model.predict ("" ".join (List (" Is it still classmate? "))))))
print(result)
Result = model.predict ("" ".join (List (" surname? ")))))
print(result)
1. Introduction to fastText Document address:https://fasttext.cc/docs/en/support.html fastText is a library for efficient learning of word representations and sentence classification. fastText is a li...
Reference for this article: Faster Sentiment Analysis–with torchtext Some details may be slightly changed, and the code comments are based on your own understanding. The purpose of the article i...
fastText text classification study notes Download first, then make, get the executable c file Text classification, linux command line: ./fasttext supervised -input train.txt -output model The input fo...
forward from: http://blog.csdn.net/lxg0807/article/details/52960072#comments The training data and test data come from the network disk: https://pan.baidu.com/s/1jH7wyOY https://pan.baidu.com/s/1slGlP...
content I. Introduction Second, FastText word vector training 2.1 Data Input Format 2.2 Word Vector Training Third, word vector I. Introduction This article is a sister article of Word2Vec word. FastT...
FastText Label (space separated): natural language processing FastText FastText paper link Review FastText is not a special kind of institution, but an idea that is to get results faster. FastText(pyt...
mark~ from : https://www.jiqizhixin.com/articles/2018-06-05-3 The origin of fastText fastText is a text categorization and vectorization tool launched by FAIR (Facebook AIResearch) in 2016. Its o...
1. Sample description: Total1405506 records, of which 486996 were overdue and 486996 were non-overdue Contains two fields tag (identification is overdue), message (sms content) Actual training sample ...
faxttext in Chinese Word vector download address Call method Official documents...
Introduction FastText is Facebook's open source text classification framework for learning word vectors. It can be quickly trained on the CPU and has very powerful performance. Users only need to ente...