Use fastText training

tags: Natural language processing  pytorch  Depth study

Use fastText training classification model

import fasttext
import pandas as pd
import jieba

# Training data path
train_data_path = "./name_question.train"
 # Verify data path
valid_data_path = "./name_question.valid"

 #          
word_train_data_path = "./name_question.train.2"
word_valid_data_path = "./name_question.valid.2"


 # Read training data, separator is space
df1 = pd.read_csv(train_data_path, header=None, sep=" ")
df2 = pd.read_csv(valid_data_path, header=None, sep=" ")

 # #      
df1[1] = pd.DataFrame(map(lambda x: " ".join(jieba.cut(x)), df1[1]))
df2[1] = pd.DataFrame(map(lambda x: " ".join(jieba.cut(x)), df2[1]))


 #          
df1.to_csv(word_train_data_path, header=False, index=False, sep="\t")
 # Generate verification data
df2.to_csv(word_valid_data_path, header=False, index=False, sep="\t")

 # Take model training, here we set N-Gram features to 2
 # Other parameters are used by default, such as: Embed_size is 100, the number of training wheels EPOCH is 5
# model = fasttext.train_supervised(input=word_train_data_path, wordNgrams=2)

 # Model tuning using the automatic super-parameter tuning method
 # AutotuneValidationFile parameters Require specified path where the verification dataset is located,
 # It will use a random search method to use a random search method on the verification set.
 # Use the autotuneduration parameter to control the time of random search, the default is 300s,
 # According to different needs, we can extend or shorten the time.
model = fasttext.train_supervised(
    input=word_train_data_path,
    autotuneValidationFile=word_valid_data_path,
    autotuneDuration=600,
    wordNgrams=2,
)

 After we have verified on the verification set
valid_result = model.test(word_valid_data_path)
print(valid_result)
# Save the model
import time
time_ = int(time.time())
model_save_path = "./name_question_{}.bin".format(time_)
model.save_model(model_save_path)
# Loading model
 #                      
model = fasttext.load_model(model_save_path)
 # Predict the sample
 Result = model.predict ("" ".join (List (" Is it still classmate? "))))))
print(result)
 Result = model.predict ("" ".join (List (" surname? ")))))
print(result)

Intelligent Recommendation

A brief introduction and use of fastText

1. Introduction to fastText Document address:https://fasttext.cc/docs/en/support.html fastText is a library for efficient learning of word representations and sentence classification. fastText is a li...

torchtext use --FastText IMDB

Reference for this article: Faster Sentiment Analysis–with torchtext Some details may be slightly changed, and the code comments are based on your own understanding. The purpose of the article i...

Use fastText for text classification

fastText text classification study notes Download first, then make, get the executable c file Text classification, linux command line: ./fasttext supervised -input train.txt -output model The input fo...

Preliminary use of fasttext

forward from: http://blog.csdn.net/lxg0807/article/details/52960072#comments The training data and test data come from the network disk: https://pan.baidu.com/s/1jH7wyOY https://pan.baidu.com/s/1slGlP...

FastText word vector training, use and visual operation [babysitting tutorial (including Tibet treatment method)]

content I. Introduction Second, FastText word vector training 2.1 Data Input Format 2.2 Word Vector Training Third, word vector I. Introduction This article is a sister article of Word2Vec word. FastT...

More Recommendation

FastText

FastText Label (space separated): natural language processing FastText FastText paper link Review FastText is not a special kind of institution, but an idea that is to get results faster. FastText(pyt...

[ ] fastText

mark~ from : https://www.jiqizhixin.com/articles/2018-06-05-3 The origin of fastText fastText is a text categorization and vectorization tool launched by FAIR (Facebook AIResearch) in 2016. Its o...

Use fasttext to classify SMS content

1. Sample description: Total1405506 records, of which 486996 were overdue and 486996 were non-overdue Contains two fields tag (identification is overdue), message (sms content) Actual training sample ...

Use of FastText Chinese word vector

faxttext in Chinese Word vector download address Call method Official documents...

Use FastText for natural language processing

Introduction FastText is Facebook's open source text classification framework for learning word vectors. It can be quickly trained on the CPU and has very powerful performance. Users only need to ente...

Copyright  DMCA © 2018-2026 - All Rights Reserved - www.programmersought.com  User Notice

Top