Better method of getting subtitles from Netflix

My previous post on ripping Japanese subtitles from Netflix has been quite popular, although the method that I proposed was fairly limited and quite difficult to do. But thanks to user ahlawy who left a comment on my post, a much better method has been discovered, that only requires a web browser!

Update 11/05/2017: Thanks to another user by the name of TITHEN-FIRION, it is EVEN EASIER now.

I have updated my original guide to include this new method. So if you are interested in getting subtitles from Netflix, check it out!

Extracting Subtitles from Netflix

Updated 11/05/2017

Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.

When I originally wrote this post, it was because I had discovered a way of ripping Japanese subtitles from Netflix, which to my knowledge, no one else had figured out how to do at the time. My method was long and clunky though. Eventually, a user named ahlawy posted in the comments section with details for a new method which was far superiod to the one I had come up with. And shortly after that, TITHEN-FIRION posted a tool that he had created which can largely automate the process altogether. So now, it is really quite simple to rip subtitles from Netflix, to the point that just about anyone can do it.

However, there is still one caveat: Netflix stores Japanese subtitles as images rather than text (though English and most other languages with simple character sets are already stored as text). So if you want Japanese subtitles in a text based format, you will have to use OCR to convert them. This process is a bit more technical and complicated, so it might be a little difficult for some people.

Download Subtitles I’ve Already Ripped

All of the Japanese subtitles that I have ripped have been OCR’ed using the Google Cloud Vision API. This is likely the most accurate Japanese OCR technology available at the moment, but the text still does contain a few mistakes here and there. So please keep this in mind if you are using these subtitles to study. If something looks wrong, it probably is. Go watch it on Netflix to see what the correct subtitle would look like.

You can download my Netflix subtitle pack from here [updated 11/05/2017].

It contains subtitles in both English and Japanese for all of the following shows:

  • Atelier (Underwear)
  • Good Morning Call
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Mischievous Kiss (Itazura na Kiss)
  • Mischievous Kiss 2 (Itazura na Kiss 2)
  • My Little Lover (Minami Kun No Koibito)
  • Terrace House: Boys and Girls in the City
  • Terrace House: Aloha State (Parts 1+2)
  • Pee Wee’s Big Holiday (Japanese subs for English language movie)
  • Stranger Things (Japanese subs for English language series)
  • Stranger Things 2 (Japanese subs for English language series)

If Netflix has another show that you would like Japanese subtitles for, then you will have to rip it for yourself using the guide below. If you successfuly rip and OCR them, post a link to them in the comments and I will be glad to add them to my package.

How to rip Japanese Subtitles from Netflix

Getting the subtitles from Netflix is quite simple now, due to a tool that does all the hard work for us! 

First, you will need to download an addon for your web browser which allows you to run userscripts. One such addon is called ViolentMonkey, and it works with either Firefox or Chrome. There are several other similar addons as well, such as TamperMonkey and GreaseMonkey. These all mostly do the same thing, so just pick one. A simple Google search for any of those titles should easily lead you to a page that lets you install it in your web browser.

Next, you want to install the Netflix Subtitle Downloader. After installing it, you will notice some new options appear inside the subtitle selection menu on the Netflix website. Simply select the subtitle language that you want, and then click on one of the download buttons. It’s that simple! You might need to give it a moment after clicking the button while it begins downloading.

 

How to OCR using Google Cloud Vision API

Note: the following guide assumes knownledge of how to use the command prompt.

There are several OCR tools out there that can handle Japanese text. Most of them suck and result in a lot of errors. Google’s OCR is by far the most accurate I have seen, and works quite well. Unfortunately, it’s only sort of free. According to their current pricing structure, you can OCR up to 1,000 images per month for free. Since a typical episode is a few hundred images, this is enough for a few episodes each month. However, Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service. I opted for this option, and was able to OCR all of the episodes that you find in the download above while still having a lot of credit left over.  The free credit does expire if you don’t use it within a certain time.

If you sign up for the Google Cloud Platform, then after logging in, you first need to enable the Cloud Vision API. Just click the “Enable API” link at the top of your Dashboard, and then find “Vision API” under the “Google Cloud Machine Learning” heading. After that, you will also need to create an API key. Click “credentials” on the left side menu, and then click “create credentials”, and select “API Key”.

Now, we can use a python script created by “zx573” from the Kanji Koohii forums to actually perform the work of sending the images to Google and generating a text-based subtitle file. You will need a 2.7.x version of python (I don’t think it works on 3.x). You also need to install the packages Pillow and requests. This can be installed from the command line by typing:

pip install pillow
pip install requests

Next, you will need the python script, which you can grab from here. You will then need to open up the file in a text editor and insert your API Key into the line that says AUTH_KEY = “YOUR API KEY HERE”

Now, we can run this python script from the command line, with the path of the folder containing your subtitle images as an argument, like so:

python generate_srt_from_netflix.py “Terrace House – Boys & Girls in the City 01”

If all goes well, you should see it processing the images, and then it will finally spit out an SRT file named “output.srt” for you! However, these srt files will contain some errors which we need to fix up before they can be opened in other applications.

Note: if the script starts working, but throws out an error message before it completes, you may need to edit the python script to change the REQUEST_CHUNK_SIZE from 15 to a smaller value like 10 or even 5. Larger values should use up less of your credit but smaller values have a greater chance of completing sucessfully.

Additional Processing

The srt files will have a problem in that they do not always contain timestamps that include milliseconds, and most applications that edit srt files will expect there to be milliseconds. However, this is an easy fix, using software that lets you do search and replace using regular expressions. I use notepad++.

If you choose the Search > Find in Files menu option, you can search across all of your subtitles at once!

Set the directory to the location where your srt files are, and then if you want, you can set the filters to *.srt to avoid accidentally picking up any other files. Make sure the search mode is set to “regular expression” and the checkbox beside it is not checked.

Then, in the “Find what” field, you want to put: (\d\d:\d\d:\d\d,\d?\d?)(\s)
Replace with: \10\2

Press “replace in files”. Do this twice.

Then change “Find what” to: (\d\d:\d\d:\d\d)(\s)
Replace with: \1,000\2

Finally, press “replace in files” again. We have now corrected the srt files!

Hukumusume Fairy Tale Collection

In my previous post, I had mentioned a website called the Hukumusume Fairy Tale Collection. While I suppose this site is fairly well-known among students of Japanese, I would like to take a bit of time to talk about it, because it is an absolutely massive site with a ton of content, and it can be easy to get lost, because the navigation menus change depending on what part of the site you are on. I think a lot of people might not even know about all of the different things offered on the site.

Put simply, this is a site with a lot of classic children’s stories. They have stories with text-only, stories with audio, and even picture-book stories. There is also a section with many stories that have English translations. They also have multiple different sections which all contain different stories for every single day of the year. There are thousands of stories here. Now, this might not necessarily be an ideal resource for absolute beginners in Japanese, because a lot of the stories may use some somewhat old words and ideas that you aren’t familiar with. However, with the sheer amount of content offered, there are plenty of really basic and easy stories to find if you are willing to dig around for a bit.

I don’t want to go into too much detail about exactly how to navigate the site and all, but I did find a pretty good write-up on another site here: http://nihongo-e-na.com/eng/site/id522.html

I’ll also point out a few direct links to what I find to be the most useful pages on the site–the daily stories:

Each of the categories above has a story for every day of the year! They are mostly Japanese-only though.

For those who’s Japanese is more at the beginner level, I would suggest starting with the stories that have an English translation available, though there are less than 50 of them at the time of this writing. You can find all the English-translated stories catalogued here: http://hukumusume.com/douwa/English/index.html

I also came across a page which lists stories according to (Japanese) grade level: http://www.hukumusume.com/douwa/0_6/0nen.html

For the past several weeks, I have been making it a point to read at least one story every day. I found it sort of annoying to navigate through a bunch of links every day, particularly if I was reading on my phone, so I wrote a simple script that will automatically take me to today’s story for the “日本昔話 – Japanese Classical Stories”. If that story doesn’t look interesting to me, the right hand sidebar beside the story will contain links to today’s story from all of the other categories as well, and it even has links to a lot of daily trivia that you can read (the first section of the sidebar is trivia, the second section is the stories).

Here is a link to my script which takes you directly to today’s story: http://www.nihongonobaka.com/Files/fairytale.php

If you want to use it, just bookmark that link. On my phone, I had to temporarily disable my internet connection in order to bookmark it, because it immediately redirects once you click the link.

 

PotPlayer

Study Subtitled Videos Using PotPlayer

I previously wrote about studying Japanese through the use of Anime, Dramas, and Movies, but I always felt that there was still a step missing from the equation. I mean, sure, you can use great tools like Subs2SRS to ease the creation of Anki cards, but what about the process of actually watching the video? How do you efficiently look up words and try to understand sentences while you are watching it? This was a question that bugged me for a long time. While there are some solutions, such as opening up the script in a text file and following along, loading the video and script into Aegisub to go line by line, or even rigging up AGTH to capture the text output from the player; all of these methods are pretty clunky and leave something to be desired.

But just recently, I came across PotPlayer, and discovered that it actually makes the whole process as smooth as you could ever imagine! It feels like some of the features in this player were practically designed for someone who is learning a language! A few great features that I love about it:

  • Click on words to either perform a search or copy it to the clipboard
  • Copy the entire subtitle line to the clipboard, can be assigned to a shortcut key
  • Shortcuts to seek to the next/previous subtitle, allowing you to easily replay a line
  • Subtitle explorer displays all lines in a separate window for you to browse and seek to a particular line
  • Load multiple subtitle streams, so you can have Japanese and English at the same time
  • It remembers the last file you had open as well as your position within it, making it easy to pick up where you left off
  • Has options for adjusting the synchronization of subtitles, as well as the font
  • Is an otherwise completely full featured player, with tons of options and advanced features

I honestly don’t know what else I could want or expect in regards to watching subtitled video. This works great in conjunction with JGlossator, which will automatically look up helpful information on any Japanese subtitles that get copied to the clipboard.

I’ve put together a short video showing how to get up and started with using PotPlayer to study Japanese from subtitles:

Do you know any other software or tools to help with studying Japanese while watching videos? Let me know in the comments!

echo.html – Rikai-chan assistant

Several years back, I put together a simple html page that I have found very helpful over the years. All it does is let you type or paste text into a box, and it outputs that same text in a larger font size below the text box.

This serves two purposes. Mainly, it gives you a place to paste text so you can use Rikai-chan on it. Helpful for when you are copying and pasting from a PDF or Word document, or some random app with Japanese text. I also use it a lot when I am writing, because I often forget if some of the words I am writing are correct or not. This can be simpler and more convenient than pulling up gmail or pastebin or something, and you can use it without an internet connection.

The other function is to simply make the text big enough that you can easily read it. Japanese text (kanji particularly) is, in my opinion, quite hard to read in comparison to English text. If you don’t immediately recognize a kanji, you might have to strain to discern the strokes or radicals that it is made up of. Sometimes I wonder if the reason so many Japanese people have bad eyesight is due to having to strain to read kanji?

But anyways, here it is. You can just right-click on the link and save it to your desktop.