Use fastText training

tags: Natural language processing pytorch Depth study

Use fastText training classification model

import fasttext
import pandas as pd
import jieba

# Training data path
train_data_path = "./name_question.train"
 # Verify data path
valid_data_path = "./name_question.valid"

 #          
word_train_data_path = "./name_question.train.2"
word_valid_data_path = "./name_question.valid.2"


 # Read training data, separator is space
df1 = pd.read_csv(train_data_path, header=None, sep=" ")
df2 = pd.read_csv(valid_data_path, header=None, sep=" ")

 # #      
df1[1] = pd.DataFrame(map(lambda x: " ".join(jieba.cut(x)), df1[1]))
df2[1] = pd.DataFrame(map(lambda x: " ".join(jieba.cut(x)), df2[1]))


 #          
df1.to_csv(word_train_data_path, header=False, index=False, sep="\t")
 # Generate verification data
df2.to_csv(word_valid_data_path, header=False, index=False, sep="\t")

 # Take model training, here we set N-Gram features to 2
 # Other parameters are used by default, such as: Embed_size is 100, the number of training wheels EPOCH is 5
# model = fasttext.train_supervised(input=word_train_data_path, wordNgrams=2)

 # Model tuning using the automatic super-parameter tuning method
 # AutotuneValidationFile parameters Require specified path where the verification dataset is located,
 # It will use a random search method to use a random search method on the verification set.
 # Use the autotuneduration parameter to control the time of random search, the default is 300s,
 # According to different needs, we can extend or shorten the time.
model = fasttext.train_supervised(
    input=word_train_data_path,
    autotuneValidationFile=word_valid_data_path,
    autotuneDuration=600,
    wordNgrams=2,
)

 After we have verified on the verification set
valid_result = model.test(word_valid_data_path)
print(valid_result)

# Save the model
import time
time_ = int(time.time())
model_save_path = "./name_question_{}.bin".format(time_)
model.save_model(model_save_path)

# Loading model
 #                      
model = fasttext.load_model(model_save_path)
 # Predict the sample
 Result = model.predict ("" ".join (List (" Is it still classmate? "))))))
print(result)
 Result = model.predict ("" ".join (List (" surname? ")))))
print(result)

Intelligent Recommendation

A brief introduction and use of fastText

1. Introduction to fastText Document address:https://fasttext.cc/docs/en/support.html fastText is a library for efficient learning of word representations and sentence classification. fastText is a li...

torchtext use --FastText IMDB

Reference for this article: Faster Sentiment Analysis–with torchtext Some details may be slightly changed, and the code comments are based on your own understanding. The purpose of the article i...

Use fastText for text classification

fastText text classification study notes Download first, then make, get the executable c file Text classification, linux command line: ./fasttext supervised -input train.txt -output model The input fo...

Preliminary use of fasttext

forward from: http://blog.csdn.net/lxg0807/article/details/52960072#comments The training data and test data come from the network disk: https://pan.baidu.com/s/1jH7wyOY https://pan.baidu.com/s/1slGlP...

FastText word vector training, use and visual operation [babysitting tutorial (including Tibet treatment method)]

content I. Introduction Second, FastText word vector training 2.1 Data Input Format 2.2 Word Vector Training Third, word vector I. Introduction This article is a sister article of Word2Vec word. FastT...