Exploring GDELT
A large part of the EDA journey is obtaining and documenting the sources of data, and GDELT content is no exception. After researching the GKG datasets, we discovered that it was challenging just to document the actual sources of data we should be using. In the following sections, we provide a comprehensive listing of the resources we located for use, which will need to be run in the examples.
Note
A cautionary note on download times: using a typical 5 Mb home broadband, 2000 GKG files takes approximately 3.5 hours to download. Given that the GKG English language files alone have over 40,000 files, this could take a while to download.
GDELT GKG datasets
We should be using the latest GDELT data feed, version 2.1 as of December 2016. The main documentation for this data is here:
http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf
In the following section, we have included the data and secondary references to look up tables, and further documentation...