Pretrained Embeddings#


We provide pretrained embeddings for 12 languages in binary and text format. The binary files can be loaded using the Wikipedia2Vec.load() method (see API Usage). The text files are compatible with the text format of Word2vec. Therefore, these files can be loaded using other libraries such as Gensim's load_word2vec_format(). In the text files, all entities have a prefix ENTITY/ to distinguish them from words. Note that it is required to decompress the file before using it.

English#

Arabic#

Chinese#

Dutch#

French#

German#

Italian#

Japanese#

Polish#

Portuguese#

Russian#

Spanish#