Extracting Subtitles from Netflix

Updated 11/05/2017

Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.

When I originally wrote this post, it was because I had discovered a way of ripping Japanese subtitles from Netflix, which to my knowledge, no one else had figured out how to do at the time. My method was long and clunky though. Eventually, a user named ahlawy posted in the comments section with details for a new method which was far superiod to the one I had come up with. And shortly after that, TITHEN-FIRION posted a tool that he had created which can largely automate the process altogether. So now, it is really quite simple to rip subtitles from Netflix, to the point that just about anyone can do it.

However, there is still one caveat: Netflix stores Japanese subtitles as images rather than text (though English and most other languages with simple character sets are already stored as text). So if you want Japanese subtitles in a text based format, you will have to use OCR to convert them. This process is a bit more technical and complicated, so it might be a little difficult for some people.

Download Subtitles I’ve Already Ripped

All of the Japanese subtitles that I have ripped have been OCR’ed using the Google Cloud Vision API. This is likely the most accurate Japanese OCR technology available at the moment, but the text still does contain a few mistakes here and there. So please keep this in mind if you are using these subtitles to study. If something looks wrong, it probably is. Go watch it on Netflix to see what the correct subtitle would look like.

You can download my Netflix subtitle pack from here [updated 11/05/2017].

It contains subtitles in both English and Japanese for all of the following shows:

  • Atelier (Underwear)
  • Good Morning Call
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Mischievous Kiss (Itazura na Kiss)
  • Mischievous Kiss 2 (Itazura na Kiss 2)
  • My Little Lover (Minami Kun No Koibito)
  • Terrace House: Boys and Girls in the City
  • Terrace House: Aloha State (Parts 1+2)
  • Pee Wee’s Big Holiday (Japanese subs for English language movie)
  • Stranger Things (Japanese subs for English language series)
  • Stranger Things 2 (Japanese subs for English language series)

If Netflix has another show that you would like Japanese subtitles for, then you will have to rip it for yourself using the guide below. If you successfuly rip and OCR them, post a link to them in the comments and I will be glad to add them to my package.

How to rip Japanese Subtitles from Netflix

Getting the subtitles from Netflix is quite simple now, due to a tool that does all the hard work for us! 

First, you will need to download an addon for your web browser which allows you to run userscripts. One such addon is called ViolentMonkey, and it works with either Firefox or Chrome. There are several other similar addons as well, such as TamperMonkey and GreaseMonkey. These all mostly do the same thing, so just pick one. A simple Google search for any of those titles should easily lead you to a page that lets you install it in your web browser.

Next, you want to install the Netflix Subtitle Downloader. After installing it, you will notice some new options appear inside the subtitle selection menu on the Netflix website. Simply select the subtitle language that you want, and then click on one of the download buttons. It’s that simple! You might need to give it a moment after clicking the button while it begins downloading.

 

How to OCR using Google Cloud Vision API

Note: the following guide assumes knownledge of how to use the command prompt.

There are several OCR tools out there that can handle Japanese text. Most of them suck and result in a lot of errors. Google’s OCR is by far the most accurate I have seen, and works quite well. Unfortunately, it’s only sort of free. According to their current pricing structure, you can OCR up to 1,000 images per month for free. Since a typical episode is a few hundred images, this is enough for a few episodes each month. However, Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service. I opted for this option, and was able to OCR all of the episodes that you find in the download above while still having a lot of credit left over.  The free credit does expire if you don’t use it within a certain time.

If you sign up for the Google Cloud Platform, then after logging in, you first need to enable the Cloud Vision API. Just click the “Enable API” link at the top of your Dashboard, and then find “Vision API” under the “Google Cloud Machine Learning” heading. After that, you will also need to create an API key. Click “credentials” on the left side menu, and then click “create credentials”, and select “API Key”.

Now, we can use a python script created by “zx573” from the Kanji Koohii forums to actually perform the work of sending the images to Google and generating a text-based subtitle file. You will need a 2.7.x version of python (I don’t think it works on 3.x). You also need to install the packages Pillow and requests. This can be installed from the command line by typing:

pip install pillow
pip install requests

Next, you will need the python script, which you can grab from here. You will then need to open up the file in a text editor and insert your API Key into the line that says AUTH_KEY = “YOUR API KEY HERE”

Now, we can run this python script from the command line, with the path of the folder containing your subtitle images as an argument, like so:

python generate_srt_from_netflix.py “Terrace House – Boys & Girls in the City 01”

If all goes well, you should see it processing the images, and then it will finally spit out an SRT file named “output.srt” for you! However, these srt files will contain some errors which we need to fix up before they can be opened in other applications.

Note: if the script starts working, but throws out an error message before it completes, you may need to edit the python script to change the REQUEST_CHUNK_SIZE from 15 to a smaller value like 10 or even 5. Larger values should use up less of your credit but smaller values have a greater chance of completing sucessfully.

Additional Processing

The srt files will have a problem in that they do not always contain timestamps that include milliseconds, and most applications that edit srt files will expect there to be milliseconds. However, this is an easy fix, using software that lets you do search and replace using regular expressions. I use notepad++.

If you choose the Search > Find in Files menu option, you can search across all of your subtitles at once!

Set the directory to the location where your srt files are, and then if you want, you can set the filters to *.srt to avoid accidentally picking up any other files. Make sure the search mode is set to “regular expression” and the checkbox beside it is not checked.

Then, in the “Find what” field, you want to put: (\d\d:\d\d:\d\d,\d?\d?)(\s)
Replace with: \10\2

Press “replace in files”. Do this twice.

Then change “Find what” to: (\d\d:\d\d:\d\d)(\s)
Replace with: \1,000\2

Finally, press “replace in files” again. We have now corrected the srt files!

31 thoughts to “Extracting Subtitles from Netflix”

    1. Newer root methods are invisible, so wont be detected by that. So not really a problem. (my phone is rooted with an older method though, so I really ought to get around to updating this myself)

  1. i don’t understand anything. im just here for the subtitles and i just want you to know that terrace house subs seems to be wrong? tried syncing it but most parts just doesnt fit in the video

    1. I’m not sure what video you are trying to sync it to, but if you are trying to use it together with some video file you have downloaded, it will be unlikely to sync up properly. These subtitles and sync were created by Netflix. I have merely published them for people to be able to use as a readable script, and no attempt has been made to see if they sync to any videos that may have been published elsewhere.

      1. actually it’s not just the synchronization because i can easily fix that. but the problem for me is that the subtitles itself says different words. i know because i understand few japanese words and they do not match

  2. Thanks for sharing. I successfully use Subtitle Edit to convert manifest_ttml2.xml and PNG to Blu-ray sup files. But I have too many subtitles, do you know how to batch convert them to Blu-ray sup files? Thanks very much.

    1. When you open the xml file in subtitle edit, it shows the import window which will list all of the subtitle lines. From here, you can right click on any line and choose export>”bluray Sup”. Then a new menu appears where you can choose to “export all lines”. I think this might be what you are looking for.

      1. Thanks for your response. That is the most convenient way, but it will be heavy workload when I have many episodes to do. Is there any other better way? Thanks. (T_T)

    2. You can do it using Subtitle Edit too! Tools -> Batch convert. Add input files, select output folder, format and hit “convert”. Keep in mind that output filename is the same as input. So change names from `manifest_ttml2.xml` to something else so they don’t overwrite each other. 😉

  3. Great information and thoroughly instructions, I am finally able to get the subtitles I want and your method never failed.

    Thank you very much!

    But I still have one question, is there a way to get that “image subtitles” when the films ain’t allowed to be downloaded?

    1. Unfortunately, I have not found any way to get the subtitles for shows that aren’t available for download. It is possible to get text subtitles from any video for simple languages like English or Spanish though.

    1. Thanks for your feedback! I remember finding this thread back when I was originally trying to figure this out, but at the time, it seemed that the method mentioned for arabic subtitles didn’t seem to be working for Japanese. Once I get some time I will review what you have posted and see if I can’t get this working for Japanese as well. That certainly looks like it would be easier than my method.

    2. Looks like I got it working, thank you! The method is slightly different for Japanese than for arabic, but very similar. I’ll make a new post here soon to share this method with others!

  4. And what about text-based subtitles such as English, Spanish, or French? These will also show up in the network monitor, but there doesn’t seem to be any obvious thing to filter out. You will just have to look at all of the items that come over when you load the subtitles and try to locate the correct one.

    Try this /?o= on your monitor filters, subtitles usually start with this.

    1. Downloaded them with my Userscript. Google “netflix subtitle downloader” and there should be a link from Greasyfork. I just updated it to support image based subs after reading this post. 😉

      1. Awesome! Just checked this out and it works great! I’ll need to completely rewrite my guide again now :p

        By the way, I’m not sure if you can add any sort of feedback after clicking on the download link, but I was a bit confused if it was working or not, because it took about a minute for the download to pop up. It would be cool if it just let me know it was doing something.

    1. I just happened to OCR them today myself and posted them up just right before you. If you are manually correcting the errors, that is still very useful though. You might want to just download mine and then correct from there.

Leave a Reply

Your email address will not be published. Required fields are marked *