Extracting Subtitles from Netflix

Updated 1/22/2018

Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.

When I originally wrote this post, it was because I had discovered a way of ripping Japanese subtitles from Netflix, which to my knowledge, no one else had figured out how to do at the time. My method was long and clunky though. Eventually, a user named ahlawy posted in the comments section with details for a new method which was far superiod to the one I had come up with. And shortly after that, TITHEN-FIRION posted a tool that he had created which can largely automate the process altogether. So now, it is really quite simple to rip subtitles from Netflix, to the point that just about anyone can do it.

Download Subtitles I’ve Already Ripped

All of the Japanese subtitles that I have ripped have been OCR’ed using the Google Cloud Vision API. This is likely the most accurate Japanese OCR technology available at the moment, but the text still does contain a few mistakes here and there. So please keep this in mind if you are using these subtitles to study. If something looks wrong, it probably is. Go watch it on Netflix to see what the correct subtitle would look like.

Download Netflix Subtitle Pack [updated 12/23/2017]. (left click, then click the download button in the top right)

This package contains subtitles for 26 different series and movies. Just click the link to see which shows are contained.

If Netflix has another show that you would like Japanese subtitles for, or if you want subtitles in another language, then you will have to rip it for yourself using the guide below.

How to rip Japanese Subtitles from Netflix

Getting the subtitles from Netflix is quite simple now, due to a tool that does all the hard work for us! 

First, you will need to download an addon for your web browser which allows you to run userscripts. One such addon is called ViolentMonkey, and it works with either Firefox or Chrome (as well as some other browsers). There are several other similar addons as well, such as TamperMonkey and GreaseMonkey. These all mostly do the same thing, so just pick one. A simple Google search for any of those titles should easily lead you to a page that lets you install it in your web browser.

Next, you want to install the Netflix Subtitle Downloader. After installing it, you will notice some new options appear inside the subtitle selection menu on the Netflix website. Simply select the subtitle language that you want, and then click on one of the download buttons. It’s that simple! You might need to give it a moment after clicking the button while it begins downloading.

Note: On my system, I have run into some issues where the subtitle downloader will sometimes try to download the subtitle for the previous video that I was looking at. If you run into this issue, this can be resolved by hitting the “refresh” button in your browser after loading a video.

For many languages, especially ones with simple character sets like English and Spanish, the subtitles are downloaded as SRT files. However, for languages with more complex character sets like Japanese, Chinese, or Korean, the subtitles are stored as images. So in order to convert these into a text format, you need to perform OCR (optical character recognition).

Create an API Key for Google Cloud Vision API

There are several OCR tools out there that can handle Japanese text. Most of them suck and result in a lot of errors. Google’s OCR is by far the most accurate I have seen, and works quite well. Unfortunately, it’s only sort of free. According to their current pricing structure, you can OCR up to 1,000 images per month for free. Since a typical episode is a few hundred images, this is enough for a few episodes each month. However, Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service. I opted for this option, and was able to OCR all of the episodes that you find in the download above for free.

If you sign up for the Google Cloud Platform, then after logging in, you need to enable the Cloud Vision API and generate an API key.

  1. In the left hand menu, select APIs & Services > Dashboard
  2. Select Enable APIs & Services
  3. In the search box, type “vision”, and then select Google Cloud Vision API.
  4. Select Enable. It may walk you through setting up a billing profile at this point if one has not been created already. Again, there is no obligation to actually pay anything, as you can use this API a certain amount for free each month, and you may get free credits when signing up.
  5. Back at the APIs & Services Dashboard, select Credentials > Create Credentials > API Key.
  6. Once you have generated the API key, be sure to copy it or keep it open in your browser so you can access it later.

Use generate_srt_from_netflix tool to OCR the images

Now, we can use a tool to send the subtitle images through the Cloud Vision API. Someone by the name of “zx573” from the Kanji Koohii forums originally wrote a python script to perform the work of sending the images to Google and generating a text-based subtitle file. I have updated his tool to make it more user friendly and to fix a few issues it had.

Download

Updated 1/22/18, adding vietnamese language and Mac OSX version.

(left click, then click the download button in the top right)
Windows: generate_srt_from_netflix.Win.x86.zip
Linux: generate_srt_from_netflix.Linux.tar.gz (tested on Ubuntu x64)
Mac: generate_srt_from_netflix.OSX.zip (untested)
Source: python3 source code

Next, you need to paste your API Key into a text file named API_KEY.txt located in the same folder as the application.

When you run the application, it should look like this:

First, you need to make sure that your API Key is displayed correctly in the top area. If not, make sure you did the previous step correctly.

Then, you just select a folder containing netflix subtitle images (note: when you first downloaded the subtitles, they were in a zip file. This zip file must be extracted to a folder before loading here).

There is also an option to select the language that you want Google to recognize. I included Japanese, Korean, and Chinese in the selection box, but you can type in a different language code if you require another language. You can find a full list of language codes here.

The only other option is the chunk size. The default of 15 is usually fine. If you press the start button, and the program appears to begin working but then gives you an error message part way through, you might need to decrease the chunk size to a smaller value like 10 or even 5. Larger values should use up less of your credit but smaller values have a greater chance of completing sucessfully.

After you press start, if all goes well, the program should run and it will output an SRT file inside your input folder.

 

Hirogaru – Yet another source of beginner’s reading material!

I can’t believe how much reading material I have been finding recently. I remember my early days in Japanese, struggling to find anything at all that was on my level, but now I keep seeing more and more material becoming available. This latest resource is the newest website from The Japan Foundation. Called ひろがる、it launched in 2016 and seems to have at least 50-60 easy articles on it so far.

The level of the material seems to be aimed at those who have perhaps completed about one year of studying (able to pass JLPT N5, or completed the first Genki textbook), but may still be somewhat challenging for more advanced students as well, due to the diverse range of topics that the articles cover. Topics include:

  • Astronomy
  • Outdoors
  • Martial Arts
  • Tea
  • Sweets
  • Shopping
  • Calligraphy
  • Anime/Manga
  • Books
  • Temples
  • Music
  • Aquarium

Each topic generally contains about 4-5 articles that you can read. I believe that they may be adding new articles from time to time, but it does not seem to be at a very fast pace. Besides just the articles, there is usually a short video about each topic, as well as some short commentary from Japanese people saying what that topic means to them. For some reason, most topics also have a section containing pictures of food. There is also a comment section in each topic, which allows you to write a Japanese response to three different questions.

The articles are really the main attraction of this site, so let’s talk about those for a bit. Each article is fairly short, so that a beginner student could probably read it in 5 or 10 minutes. The articles are broken up into several paragraphs, and each paragraph has audio so you can hear it read aloud. At the end of each article you will find a quiz with a couple of multiple choice questions, to test your comprehension. At the top of the site, there are some controls which can assist you in reading the articles. One is a “Ruby” toggle, which turns furigana on or off for all of the kanji in the article. The other setting is an “English/Japanese” toggle. This seems to be poorly named, because it does not function how you might expect. If you set it to “English”, the articles remain fully in Japanese. The only thing that really changes is the navigation buttons, and also when it is set to English there will be a button under each paragraph that you can press to see a list of the difficult vocabulary. As such, I would recommend keeping it set to “English” at all times so you have access to the vocabulary words.

Overall its a nice site, and certainly worth spending some time on. My only real gripe is that the articles are kinda lame and boring (to me at least), but that’s sort of hard to avoid with these kinds of generic topics. But all in all, it’s a fantastic source of reading material at a level where such material has often been overlooked. Check it out!

What you need to know to learn a foreign language by Paul Nation (book)

I finally got around to reading this great book by Paul Nation, What you need to know to learn a foreign language. The book is offered as a free PDF from his website. If you are unfamiliar with Nation, he is a leading researcher in Foreign Language Education with an interest in vocabulary acquisition and teaching methodology. While most of his research is aimed at the classroom, with this book he attempts to bring the results of his research to the student who might be trying to learn a language on their own.

It’s a somewhat short and easy-to-read book that just gets right to the point rather than giving you long-winded anecdotes and motivational stories. It could easily be read in a single afternoon. Much of the book in influenced by his “Four Strands Principle”, in which he believes that the most effective way of learning a language involves balancing your study across four different types of learning.

The Four Strands consist of:

  1. learning from meaning-focused input (listening and reading)
  2. learning from meaning-focused output (speaking and writing)
  3. language-focused learning (studying pronunciation, vocabulary, grammar etc)
  4. fluency development (getting good at using what you already know).

The main meat of the book consists of descriptions of twenty different learning activities that you can do, with different activities fitting into each of the different strands. He also spends a short bit of time explaining exactly WHY certain activities can be helpful. For instance, did you know that doing just a bit of timed reading can quickly improve your overall reading speed by 50-200%?

Here is a list of the different types of activities described in the book:

  • Reading while listening
  • Extensive reading
  • Narrow reading
  • Role play
  • Prepared talks
  • Read and write
  • Transcription
  • Intensive reading
  • Memorized sentences or dialogues
  • Delayed copying
  • Repeated listening
  • 4/3/2
  • Repeated reading
  • Speed reading
  • 10 minute writing
  • Repeated writing
  • Word cards
  • Linked skills
  • Issue logs
  • Spelling practice

I mention this just to give you a general idea of what you can expect to read about in the book. For the details of what each activity actually entails, you’ll need to read the book (which again, is free).

There are a lot of different opinions out there about how to learn a language. There is one camp which advocates focusing solely on input, and not worrying about anything else. Nation, on the other hand, argues that a fully balanced course is the way to go. While there is research out there to argue a lot of different opinions, we may never know for sure exactly what is truly optimal. With that said, nothing that Nation writes in this book feels terribly controversial, and it all just seems to make sense. I can’t imagine that these ideas could really steer anyone wrong, so I highly recommend this book for anyone who is currently learning a language.

Pibo – Even more children’s books on your smart device

So I recently wrote about EhonNavi, which lets you read thousands of Japanese Children’s books for free, but did you know that there is also another service called Pibo which has hundreds more completely different children’s books which can also be read for free?

Pibo is completely a separate service from EhonNavi, and offers some different pros and cons. First of all, while EhonNavi is primarily a site for desktop computers, Pibo is designed primarily for phones and tablets. Upon visiting their website, you will see prominent links to get the app from either the iTunes App store or the Google Play store. There is also no signup procedure–just download the app and you are ready to start using it!

While EhonNavi shows you scans of physical books, the books on Pibo are all digital. As such, the artwork is much more crisp and clear. The books on Pibo are also completely voiced. That’s right, you can follow along as the book is read aloud to you! There is also no limit to how many times a certain book can be read, unlike on EhonNavi, where you only get to read each book once. Also, like EhonNavi, books can be browsed according to their age level (although I feel that many books fall into too large of an age range).

There are also a few downsides to the service as well. For one thing, there is no apparent way to see which books you have read already. So if your goal is to read every book that is offered, you might need to keep a list yourself. The books are also always displayed in a completely random order, which exacerbates the problem further. I have created a list of every book title, which you can grab here (updated Feb 5, 2017). The number of books available is also significantly less than what you could find on EhonNavi. However, with nearly 400 available already (and growing!), that isn’t a huge problem.

So now, it’s worth mentioning how the service operates. When you first install the app, you get a 1 week free trial to read as much as you want. After that free trial is up, you can still read up to 3 books for free every day, which seems quite generous. You can also purchase a subscription for less than $5 per month, which allows you to read all you want. Seems like a pretty fair price to me.

All in all, I think this is a good complement to EhonNavi. You don’t have to choose either-or. They both work great together! When I am at my desktop, I use EhonNavi, and when I am on my phone, I read 3 books on Pibo. I urge everyone to check out both of these free services to try them out and get some reading practice!

Using the EhonNavi app to read children’s books on your phone or tablet

Last week I wrote about the site EhonNavi, which lets you read thousands of Japanese Children’s books for free through their website. However, since the website relies on Adobe Flash to display the books, you might be wondering if it is even possible to view the website on a phone or tablet. Well, it is possible, in fact, through the use of a free viewer that is available.

Here’s a quick run down of how to install the app to view the books, and how to navigate the mobile version of EhonNavi’s site. Please note that I assume you have already set up an account on EhonNavi. If not, you can follow my instructions for that here.

First, simply navigate to the website on your mobile device and then press the menu button on the right hand side of the screen. Then scroll down to find the item that says 全ページためしよみ and press it. This will take you to a page where you can browse through the books that are available to be read in their entirety.

After clicking the link, if you scroll down a good ways, you will find the area where the books are sorted by age level. I recommend starting with books for 0歳 and working your way up.

After selecting an age level, you will be browsing all of the books in that category. However, there is a caveat. While browsing on the desktop site, you can easily see at a glance which books you have already read. On the mobile site, however, you have to click on a book and go to it’s information page to see if you have already read it or not.

After clicking on a book to go to the info page, if you see a yellow button, that means you can read it, so go ahead and press that.

Now at this point, you will be taken to another screen. First, you have to install the app to view the books, if you haven’t done so already. By clicking on the grey button that says インストールする you will be taken to either the Google play store or the iTunes store to download the free app. Once it’s installed, you would press the orange button that says アプリを起動する to open the book in the app and start reading.

A couple things worth mentioning: the app itself is just a viewer. You still need to use the mobile web site to browse and search for the books that you want to read. The android app also seems to be pretty unstable and has crashed on me several times. Because you only have a short time to read the book (I believe about 15 minutes) before it is locked away, the app crashing could mean you don’t get to finish the book that you are in the middle of. Also, it can be a bit hard to read the text if you are on a smaller device like a phone, but it is possible to zoom in by tapping in the center of the screen. You can read books in either a landscape or portrait orientation (dependent on your device’s orientation setting), but I strongly recommend the landscape orientation because many books have images that span two pages.

I think it’s pretty cool to be able to read the books on a variety of devices, but it does feel a bit clumsy at times. Reading on a desktop or laptop is a better experience overall, but sometimes you can’t beat the convenience that you get from a phone or tablet.