Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.
When I originally wrote this post, it was because I had discovered a way of ripping Japanese subtitles from Netflix, which to my knowledge, no one else had figured out how to do at the time. My method was long and clunky though. Eventually, a user named posted in the comments section with details for a new method which was far superiod to the one I had come up with. And shortly after that, TITHEN-FIRION posted a tool that he had created which can largely automate the process altogether. So now, it is really quite simple to rip subtitles from Netflix, to the point that just about anyone can do it.
Download Subtitles I’ve Already Ripped
All of the Japanese subtitles that I have ripped have been OCR’ed using the Google Cloud Vision API. This is likely the most accurate Japanese OCR technology available at the moment, but the text still does contain a few mistakes here and there. So please keep this in mind if you are using these subtitles to study. If something looks wrong, it probably is. Go watch it on Netflix to see what the correct subtitle would look like.
Download Netflix Subtitle Pack [updated 12/23/2017]. (left click, then click the download button in the top right)
This package contains subtitles for 26 different series and movies. Just click the link to see which shows are contained.
If Netflix has another show that you would like Japanese subtitles for, or if you want subtitles in another language, then you will have to rip it for yourself using the guide below.
How to rip Japanese Subtitles from Netflix
Getting the subtitles from Netflix is quite simple now, due to a tool that does all the hard work for us!
First, you will need to download an addon for your web browser which allows you to run userscripts. One such addon is called ViolentMonkey, and it works with either Firefox or Chrome (as well as some other browsers). There are several other similar addons as well, such as TamperMonkey and GreaseMonkey. These all mostly do the same thing, so just pick one. A simple Google search for any of those titles should easily lead you to a page that lets you install it in your web browser.
Next, you want to install the Netflix Subtitle Downloader. After installing it, you will notice some new options appear inside the subtitle selection menu on the Netflix website. Simply select the subtitle language that you want, and then click on one of the download buttons. It’s that simple! You might need to give it a moment after clicking the button while it begins downloading.
Note: On my system, I have run into some issues where the subtitle downloader will sometimes try to download the subtitle for the previous video that I was looking at. If you run into this issue, this can be resolved by hitting the “refresh” button in your browser after loading a video.
For many languages, especially ones with simple character sets like English and Spanish, the subtitles are downloaded as SRT files. However, for languages with more complex character sets like Japanese, Chinese, or Korean, the subtitles are stored as images. So in order to convert these into a text format, you need to perform OCR (optical character recognition).
Create an API Key for Google Cloud Vision API
There are several OCR tools out there that can handle Japanese text. Most of them suck and result in a lot of errors. Google’s OCR is by far the most accurate I have seen, and works quite well. Unfortunately, it’s only sort of free. According to their current pricing structure, you can OCR up to 1,000 images per month for free. Since a typical episode is a few hundred images, this is enough for a few episodes each month. However, Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service. I opted for this option, and was able to OCR all of the episodes that you find in the download above for free.
If you sign up for the Google Cloud Platform, then after logging in, you need to enable the Cloud Vision API and generate an API key.
- In the left hand menu, select APIs & Services > Dashboard
- Select Enable APIs & Services
- In the search box, type “vision”, and then select Google Cloud Vision API.
- Select Enable. It may walk you through setting up a billing profile at this point if one has not been created already. Again, there is no obligation to actually pay anything, as you can use this API a certain amount for free each month, and you may get free credits when signing up.
- Back at the APIs & Services Dashboard, select Credentials > Create Credentials > API Key.
- Once you have generated the API key, be sure to copy it or keep it open in your browser so you can access it later.
Use generate_srt_from_netflix tool to OCR the images
Now, we can use a tool to send the subtitle images through the Cloud Vision API. Someone by the name of “zx573” from the Kanji Koohii forums originally wrote a python script to perform the work of sending the images to Google and generating a text-based subtitle file. I have updated his tool to make it more user friendly and to fix a few issues it had.
Updated 1/22/18, adding vietnamese language and Mac OSX version.
(left click, then click the download button in the top right)
Linux: generate_srt_from_netflix.Linux.tar.gz (tested on Ubuntu x64)
Mac: generate_srt_from_netflix.OSX.zip (untested)
Source: python3 source code
Next, you need to paste your API Key into a text file named API_KEY.txt located in the same folder as the application.
When you run the application, it should look like this:
First, you need to make sure that your API Key is displayed correctly in the top area. If not, make sure you did the previous step correctly.
Then, you just select a folder containing netflix subtitle images (note: when you first downloaded the subtitles, they were in a zip file. This zip file must be extracted to a folder before loading here).
There is also an option to select the language that you want Google to recognize. I included Japanese, Korean, and Chinese in the selection box, but you can type in a different language code if you require another language. You can find a full list of language codes here.
The only other option is the chunk size. The default of 15 is usually fine. If you press the start button, and the program appears to begin working but then gives you an error message part way through, you might need to decrease the chunk size to a smaller value like 10 or even 5. Larger values should use up less of your credit but smaller values have a greater chance of completing sucessfully.
After you press start, if all goes well, the program should run and it will output an SRT file inside your input folder.