Earlier, we mentioned needing to have equally sized time intervals. Additionally, before we perform any time series analysis, we need to check for the number of non-missing time intervals. So, let's check the number of enrollment years for each category.
Using the dplyr
package, we can use summarize (n()
) to count the number of entries for each category:
# -- summarize and sort by the number of years yr.count <- x2 %>% group_by(cat) %>% summarise(n = n()) %>% arrange(n) # - we can see that there are 14 years for all of the groups. That is good! print(yr.count, 10) > Source: local data frame [24 x 2] > > cat n > (fctr) (int) > 1 18 to 24 YEARS 14 > 2 25 to 34 YEARS 14 > 3 35 to 44 YEARS 14 > 4 45 to 54 YEARS 14 > 5 55 to 64 YEARS 14 > 6 65 YEARS AND OVER 14 > 7 ALL AGES 14 > 8 ...