Okay, my last post wasn’t really rocket science but did you ever heard about the James Martin Center for Nonproliferation Studies Database?

This is the “first database to record flight tests of all missiles launched by North Korea capable of delivering a payload of at least 500 kilograms (1102.31 pounds) a distance of at least 300 kilometers (186.4 miles).”

And as usual, plotting things really helps to make one's mind up.

Unfortunately, it has become quite clear that the North Korea’s missile program has accelerated and made substantial progress over the past couple of years. And as usual, plotting things really helps to make one’s mind up. By documenting all such tests since the first one occurred in April 1984, this kind of dataset helps us to take the measure of North Korea’s missile program recent acceleration.

One last detail, as North Korea never stops to experiment, the database is being routinely updated.

This post was first published on my previous website under the excerpt North Korea is well know for its hairstyle fashion and rocket science and has been slightly revamped for this new Hugo blog.

Table of Contents

How to recover the most current data

Many people still save their data into Microsoft Excel files. This is an unhappy choice for many reasons but a lot were already written about this topic. Unfortunately, this is what we have to deal with here. So choices to read the file are somehow limited. I personally like to use the xlsx package.

I told you the database is being regularly updated. If we want to get the latest version of the file we need to use the curl package to download it directly from the website.

# Some settings
temp <- tempfile()
url <- "https://www.nti.org/documents/2137/north_korea_missile_test_database.xlsx"

# Then download it
curl_download(url, temp)

# And read it
nkMissile <- read.xlsx(temp, sheetIndex = 1, startRow = 2)

We get the following information about the tests.

##  [1] "F1"                      "Date"                   
##  [3] "Date.Entered.Updated"    "Missile.Name"           
##  [5] "Missile.Type"            "Launch.Agency.Authority"
##  [7] "Facility.Name"           "Facility.Location"      
##  [9] "Other.Name"              "Facility.Latitude"      
## [11] "Facility.Longitude"      "Landing.Location"       
## [13] "Apogee"                  "Distance.Travelled"     
## [15] "Confirmation.Status"     "Success"                
## [17] "Additional.Information"  "Source.s."

The F1 is the test identification number, then comes the test date, the missile type, the facility where it was fired from, flight information like apogee and distance, and some additional information which consists of a short written summary of the test.

Is the frequency of tests increasing over time?

Quite a hot topic, right? Let’s add a Year tag.

nkMissile$Year <- year(nkMissile$Date)

Also interesting would be to know who was in charge at the time of the test. It requires us to add a Leader tag and for that we’ll need to create intervals with lubridate. Date where found on wikipedia, as usual.

KimIlsung <- ymd("1972-12-28") %--% ymd("1994-07-08")
KimJongil <- ymd("1994-07-08") %--% ymd("2011-12-17")
KimJongun <- ymd("2011-12-17") %--% today()

Let’s match it to the test date.

nkMissile$Leader <- ifelse(nkMissile$Date %within% KimIlsung == TRUE, "Kim Il-sung",
                           ifelse(nkMissile$Date %within% KimJongil == TRUE, "Kim Jong-il",
                           ifelse(nkMissile$Date %within% KimJongun == TRUE, "Kim Jong-un", NA)))

And here we go.

nkMissile %>%
  ddply(.(Leader, Year), summarise,
        total = length(F1)) %>%
  ggplot(aes(x = Year, y = total, fill = Leader)) +
  geom_bar(stat = "identity", alpha = .7, position = position_dodge()) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1") +
  labs(x = NULL, y = "Number of missiles tested",
       title = "North Korea's missile program over time",
       subtitle = "A family story") +
  theme(legend.title = element_blank())


We can easily get the total number of tests per leader if we reorganize.

nkMissile %>%
  ddply(.(Leader), summarise,
        Missile = length(F1))
##        Leader Missile
## 1 Kim Il-sung      15
## 2 Kim Jong-il      16
## 3 Kim Jong-un      78

I think we have a clear winner! Kim Jong-un scores a truly impressive 78 missiles tested. And this truly rocket lover made it in half the time his predecessors reigned. That is what I’d call program acceleration. But what about potential progress over the past couple of years?

Did you say rocket science?

Let’s now have a look at the missiles themselves. How many different models do they have. Easy one.

##  [1] Scud-B      Unknown     Nodong      Scud-C      Taepodong-1 Unha       
##  [7] Unha-3      KN-02       Polaris-1   Musudan     ER Scud     Polaris-2  
## [13] KN-17       Hwasong-12
## 14 Levels: ER Scud Hwasong-12 KN-02 KN-17 Musudan Nodong ... Unknown

Mmm, 14 unique names. This is something none of the brewer palette chart can handle. We need to create some more.

# Define the length
colourCount <- length(unique(nkMissile$Missile.Name))

# And adapt the 'Set3' palette
getPalette <- colorRampPalette(brewer.pal(9, "Set3"))

This now allows us to visualize all the 13 missiles and the unknown type. So we can visualize which missile was tested by all three leaders.

nkMissile %>%
  ddply(.(Leader, Missile.Type, Missile.Name), summarise,
        total = length(F1)) %>%
  ggplot(aes(x = Leader, y = total, fill = Missile.Name)) +
  geom_bar(stat = "identity", alpha = .7, position = position_dodge()) +
  theme_minimal() +
  scale_fill_manual(values = getPalette(colourCount)) +
  labs(x = NULL, y = NULL,
       title = "Missiles tested by North Korean leaders",
       subtitle = "(One is more creative!)") +
  theme(legend.title = element_blank())


I must confess, probably like most of you, I never built a rocket myself and I do not really know a lot about missile science. I think we could try to organize them into types to get the big picture easily.

nkMissile %>%
  ddply(.(Leader, Missile.Type, Missile.Name), summarise,
        total = length(F1)) %>%
  ggplot(aes(x = Missile.Type, y = total, fill = Missile.Name)) +
  geom_bar(stat = "identity", alpha = .7, position = position_dodge()) +
  theme_minimal() +
  scale_fill_manual(values = getPalette(colourCount)) +
  labs(x = NULL, y = NULL,
       title = "Tested types of missile",
       subtitle = "And respective names") +
  theme(legend.title = element_blank(), axis.text.x = element_text(angle = 45)) +
  facet_wrap(~Leader, ncol = 3) +
  theme(strip.text.x = element_text(face="bold"),
        strip.background = element_blank())


Well okay, one can reasonably assume that the unknown name would most certainly belong to one of those five types. It clearly appears that Kim Jong-un is far more creative and explores many more types of missiles than its predecessors. Whereas, Kim Il-sung and Kim Jong-il restricted themselves to ~2-3 types only, he is the only one testing all of them. He also seems to have a clear preference for the SRBM type. Is there maybe a reason for that?

How far can they travel?

I’m afraid I have some bad news for you now. Distance aren’t available for all of them. So we can only work on a subset here. In addition to that, we have to do some cleaning as they all contain ‘km’, and therefore must have been handled as factors. Lastly, we have to convert to numeric. As I told you the .xlsx file is an unhappy choice. So it looks weird, I know.

# Select missiles for which we have travel distance only
nkDistance <- subset(nkMissile, grepl("km", Distance.Travelled))

# Get rid of the ' km' and ','
nkDistance$Distance.Travelled <- gsub(" km", "", nkDistance$Distance.Travelled)
nkDistance$Distance.Travelled <- gsub(",", "", nkDistance$Distance.Travelled)

# Convert to numeric
nkDistance$Distance.Travelled <- as.numeric(as.character(nkDistance$Distance.Travelled))

After this preliminary work we can use the distance and compare missile types.

nkDistance %>%
  ddply(.(Missile.Type, Missile.Name), summarise,
        dist = mean(Distance.Travelled)) %>%
  ggplot(aes(x = Missile.Type, y = dist, fill = Missile.Name)) +
  geom_bar(stat = "identity", alpha = .7, position = position_dodge()) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") +
  labs(x = NULL, y = NULL,
       title = "What is the distance they can travel",
       subtitle = "Based on the North Korean tests") +
  theme(legend.title = element_blank())


No surprise here, they all seem to travel short, intermediate, and medium distance as one could guess as per their name. Of course, if less tested, we rely on less data to estimate the mean, which therefore could not reflect the reality at all. Kim Jong-un favorite flavor of missile, the SRBM type, isn’t traveling long distance. Could this possibly be due the the numerous failures?

ddply(nkDistance, .(Missile.Type, Missile.Name), summarise,
      Max.Distance = max(Distance.Travelled))
##    Missile.Type Missile.Name Max.Distance
## 1          IRBM   Hwasong-12          787
## 2          IRBM      Musudan          400
## 3          MRBM      ER Scud         1000
## 4          MRBM       Nodong         1000
## 5          MRBM    Polaris-2          500
## 6          SLBM    Polaris-1          500
## 7           SLV  Taepodong-1         1380
## 8          SRBM        KN-17          400
## 9          SRBM       Scud-B          200
## 10         SRBM       Scud-C          500

No! On the contrary they all seem to work pretty well.

Okay, planning to visit the country anyway?

Not a good idea, I warn you! If you really have to, what are the places to avoid the most. All of them, yes I know… But if you really, really have to, or want to and even if I doubt someone wants to do tourism there.

If we want to find the places the missile tests are conducted we have to reorganize per Facility (and we will keep the Leader tag for the facet).

geoMissile <- ddply(nkMissile, .(Leader, Facility.Name, Facility.Longitude, Facility.Latitude), summarise,
                    total = length(Facility.Location))

We get a list of all the different sites used by the three leaders and their location.

geoMissile[, 2:4]
##                         Facility.Name Facility.Longitude Facility.Latitude
## 1                Chihari Missile Base           126.6847          38.62333
## 2  Tonghae Satellite Launching Ground         129.666664        40.8499966
## 3            Kittaeryong Missile Base           127.6236          38.99083
## 4  Tonghae Satellite Launching Ground         129.666664        40.8499966
## 5                             Hwangju         125.702005         38.686834
## 6                             Kaesong           126.5878           37.9382
## 7            Kittaeryong Missile Base           127.6236          38.99083
## 8               Kusong Testing Ground          125.22302          40.01325
## 9                        Lake Yonpung        125.8035851         39.618283
## 10                              Nampo           125.3247           38.7523
## 11        North Kusong Testing Ground           125.2099           40.0659
## 12                       North Wonsan           127.5369          39.40167
## 13                   Panghyon Airbase         125.207889         39.927472
## 14                  Pukchang Airfield         125.964333         39.504417
## 15                     Sinpo Shipyard           128.1839           40.0368
## 16  Sohae Satellite Launching Station           124.7057           39.6596
## 17                    Sunchon Airbase          125.89031         39.412594
## 18                            Unknown            Unknown           Unknown
## 19 Wonsan Kalma International Airport           127.4817           39.1677

One Unknown though. See the issue coming? Rather delete it right away.

geoMissile <- geoMissile[-which(geoMissile$Facility.Name == "Unknown"),]

But that’s not enough! See the other issue coming?

## [1] "factor"

Yes, you know the trick. Better do something right now.

geoMissile$Facility.Longitude <- as.numeric(as.character(geoMissile$Facility.Longitude))
geoMissile$Facility.Latitude <- as.numeric(as.character(geoMissile$Facility.Latitude))

Okay, at least we can plot now. Let’s do it per Leader to really judge the difference.

qmplot(data = geoMissile,
       x = Facility.Longitude, y = Facility.Latitude,
       size = total, maptype = "terrain", col = Leader, alpha = .25) +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Set1") +
  facet_wrap(~Leader, ncol = 1) +
  theme(strip.text.x = element_text(face="bold"),
        strip.background = element_blank())


This time again, and even if Kim Jong-un does not seem to beat Kim Il-sung and Kim Jong-il in all categories, he is astonishingly perseverant. He is the only one testing all over the country!


Alright, at a first glance it looks pretty clear. Since Kim Jong-un took over the leadership, missile tests are booming all over the country. He was able to test almost three times more missiles than both previous leaders together. That’s clearly an acceleration.

Of course, I’ll have to check wikipedia now as something is bothering me. What is the difference among all missiles? MRBM and SRBM were used by the first leaders and are still the most tested by Kim Jong-un. Successfully fired some can already reach Japan.

Among Kim Jong-un new toys are some IRBM, SLBM, and SLV. Not really good news for the rest of us as these clearly could lead to greater range and speed missiles than the North Korean currently possess, like ICBM.