word2vec for trading with SEC filings
In this section, we will learn word and phrase vectors from annual SEC filings using Gensim to illustrate the potential value of word embeddings for algorithmic trading. In the following sections, we will combine these vectors as features with price returns to train neural networks to predict equity prices from the content of security filings.
In particular, we will use a dataset containing over 22,000 10-K annual reports from the period 2013-2016 that are filed by over 6,500 listed companies and contain both financial information and management commentary (see Chapter 2, Market and Fundamental Data – Sources and Techniques).
For about 3,000 companies corresponding to 11,000 filings, we have stock prices to label the data for predictive modeling. (See data source details and download instructions and preprocessing code samples in the sec_preprocessing
notebook in the sec-filings
folder.)