Extracting Subtitles from Netflix

Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.

While it’s long been lamented that there was no way to download or rip the Japanese subtitles from Netflix (I even said so much in a previous post about Netflix), I have recently discovered a way!

In this post, I will provide a download for the subtitles that I have ripped, and I will also provide instructions on how to rip them yourself from other shows. However, the process of ripping subtitles is quite technical, and is probably not something that everyone can do.

Already Ripped Subtitles

First, some important details about Netflix subtitles: In most languages, the subtitles are just in a standard text-based subtitle format. So if you get English subtitles, you can just open them up in a text editor. With Japanese subtitles, however, the subtitles are all stored as images. This means that in order to do things like copy and paste the text to look up words, they would first have to be converted into text by OCR, which is unfortunately not perfect.

All of the subtitles that I have ripped have been OCR’ed using the Google Cloud Vision API. This is likely the most accurate Japanese OCR technology available at the moment, but the text still does contain a few mistakes here and there. So please keep this in mind if you are using these subtitles to study. If something looks wrong, it probably is. Go watch it on Netflix to see what the correct subtitle would look like.

You can download my Netflix subtitle pack from here (Mega) or here (MediaFire).

It contains subtitles in both English and Japanese for all of the following shows:

  • Atelier (Underwear)
  • Good Morning Call
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Mischievous Kiss (Itazura na Kiss)
  • Mischievous Kiss 2 (Itazura na Kiss 2)
  • My Little Lover (Minami Kun No Koibito)
  • Terrace House: Boys and Girls in the City
  • Terrace House: Aloha State (Parts 1+2)
  • Pee Wee’s Big Holiday (Japanese subs for English language movie)
  • Stranger Things (Japanese subs for English language series)

If Netflix has another show that you would like Japanese subtitles for, or if you would like audio to accompany the subtitles, then you will have to rip it yourself. As I said, it is also quite technical, so I wouldn’t attempt it unless the instructions below at least halfway makes sense to you.

How to rip Japanese Subtitles from Netflix

First, you need to choose a show that actually has Japanese subtitles available. I show how to do this in my previous post about Netflix.

Once you have found a show that has Japanese subtitles, you need to see if you will be able to download it. At the time of this writing, Netflix allows Android and IOS devices to download select shows to your device. You can not download shows directly on a PC. I have also tested several Android emulators on PC, and did not have any success. I also have no experience with IOS devices, so I can not say for certain that this would be possible on there or not. So basically I can only confirm that the following steps can be done on an Android device. I also believe your Android device will need to be rooted, but I’m not certain. If anyone manages to do this without a rooted device, please let me know.

So, assuming you have a rooted Android device, you will want to find a show that both has Japanese subtitles and allows the episodes to be downloaded. Then just download all of the episodes onto your device. Don’t download more than one series at a time! This is because the filenames do not contain the show title, so its difficult to figure out which files go with which show. If you stick to one show at a time, you wont run into this problem.

Next, you will want to have the ADB tool which lets you transfer files between your Android device and your PC. These files will be hidden to a standard file browser, so that’s why you need this tool. ADB can be downloaded as part of Android’s standalone SDK Platform Tools. You also need to Enable USB Debugging on your Android device.

Now, you need to find where Netflix downloaded the files onto your Android device. A file manager app such as Amaze should let you find them. On my device (it’s probably the same on most devices) the files are located at /sdcard/Android/data/com.netflix.mediaclient/files/Download/.of
Inside that folder there will be a separate subfolder for each episode, and each one of those subfolders will have a name made up of seemingly random numbers. You can use the ADB tool to copy all of the subfolders to your PC. You first need to open up a command prompt in the folder that adb.exe is stored in, and then do something like this:

adb pull /sdcard/Android/data/com.netflix.mediaclient/files/Download/.of

After some time, all of the files will be copied to your PC (unless you get an error message). So now just browse into the “.of” folder on your PC to find all of the subfolders for each episode. The folders should sort in the correct order, as long as you sort by name. So the first folder should be the first episode, the second folder will be the second episode, and so on. Let’s take a look at the different types of files that you can find in each folder.

  • .manifest – Contains some metadata about the files. Not really useful.
  • .nfi – Unknown – I’m not sure about the contents of this file, but it does not appear to be useful.
  • .nfv – Netflix Video – Contains the video stream. It is encrypted so it is not much use to us.
  • .nfa – Netflix Audio – Contains an AAC audio stream. Change the file extension to .m4a and you should be able to play it. Can be used with Subs2SRS.
  • .nfs – Netflix Subtitle – Contains the subtitles. If the file size is smaller (about 10-100kb) it is usually a text file and may contain the subtitles for English or some other language. Change the extension to .xml and you can open it in a text editor. If the file size is larger (a few MB), it is the Japanese subtitles. Change the extension to .zip, and you will be able to extract the contents.

After finding the Japanese subs and changing the extension to .zip, extract them into a folder, and then rename the folder so you know what episode it is. You will have many PNG files which are the subtitle images, and you will also have a file named “manifest_ttml2.xml” which has all of the timing data. Congratulations, you have successfully extracted the subtitles! But for them to be a little more useful, we will need to OCR them.

How to OCR using Google Cloud Vision API

There are several OCR tools out there that can handle Japanese text. Most of them suck and result in a lot of errors. Google’s OCR is by far the most accurate I have seen, and works quite well. Unfortunately, it’s only sort of free. According to their current pricing structure, you can OCR up to 1,000 images per month for free. Since a typical episode is a few hundred images, this is enough for a few episodes each month. However, Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service. I opted for this option, and was able to OCR all of the episodes that you find in the download above.  The free credit does expire if you don’t use it within a certain time.

If you sign up for the Google Cloud Platform, then after logging in, you first need to enable the Cloud Vision API. Just click the “Enable API” link at the top of your Dashboard, and then find “Vision API” under the “Google Cloud Machine Learning” heading. After that, you will also need to create an API key. Click “credentials” on the left side menu, and then click “create credentials”, and select “API Key”.

Now, we can use a python script created by “zx573” from the Kanji Koohii forums to actually perform the work of sending the images to Google and generating a text-based subtitle file. You will need a 2.7.x version of python (I don’t think it works on 3.x). You also need to install the packages Pillow and requests. This can be installed from the command line by typing:

pip install pillow
pip install requests

Next, you will need the python script, which you can grab from here. You will then need to open up the file in a text editor and insert your API Key into the line that says AUTH_KEY = “YOUR API KEY HERE”

Now, we can run this python script from the command line, with the path of the folder containing your subtitle images as an argument, like so:

python generate_srt_from_netflix.py “Terrace House – Boys & Girls in the City 01”

If all goes well, you should see it processing the images, and then it will finally spit out an SRT file named “output.srt” for you! However, these srt files will contain some errors which we need to fix up before they can be opened in other applications.

Additional Processing

The srt files will have a problem in that they do not always contain timestamps that include milliseconds, and most applications that edit srt files will expect there to be milliseconds. However, this is an easy fix, using software that lets you do search and replace using regular expressions. I use notepad++.

If you choose the Search > Find in Files menu option, you can search across all your subtitles at once!

Set the directory to the location where your srt files are, and then if you want, you can set the filters to *.srt to avoid accidentally picking up any other files. Make sure the search mode is set to “regular expression” and the checkbox beside it is not checked.

Then, in the “Find what” field, you want to put: (\d\d:\d\d:\d\d,\d?\d?)(\s)
Replace with: \10\2

Press “replace in files”. Do this twice.

Then change “Find what” to: (\d\d:\d\d:\d\d)(\s)
Replace with: \1,000\2

Finally, press “replace in files” again. We have now corrected the srt files!

After that, there are some further optional things that you can do, but you don’t have to. The tool Subtitle Edit is quite helpful for fixing up your subtitles. You can use it to batch convert English.xml files into SRT files (Tools > Batch Convert). It can also remove hearing impaired text from the subtitles (text that describes sounds, or names which character is speaking). Sometimes it doesn’t work so well for removing hearing impaired text from the Japanese files, because the text is enclosed in Japanese parentheses rather than the expected English parentheses, but you can still accomplish it using the Search and Replace tool (or the same tool in notepad++). After loading a Japanese subtitle file, you just want to go to Edit > Replace. Then select the “Regular Expression” option, and type (.+) as your search term (make sure you use Japanese parentheses, not English parentheses!), and press “Replace All.” That should get rid of any remaining Japanese hearing impaired text.

Hukumusume Fairy Tale Collection

In my previous post, I had mentioned a website called the Hukumusume Fairy Tale Collection. While I suppose this site is fairly well-known among students of Japanese, I would like to take a bit of time to talk about it, because it is an absolutely massive site with a ton of content, and it can be easy to get lost, because the navigation menus change depending on what part of the site you are on. I think a lot of people might not even know about all of the different things offered on the site.

Put simply, this is a site with a lot of classic children’s stories. They have stories with text-only, stories with audio, and even picture-book stories. There is also a section with many stories that have English translations. They also have multiple different sections which all contain different stories for every single day of the year. There are thousands of stories here. Now, this might not necessarily be an ideal resource for absolute beginners in Japanese, because a lot of the stories may use some somewhat old words and ideas that you aren’t familiar with. However, with the sheer amount of content offered, there are plenty of really basic and easy stories to find if you are willing to dig around for a bit.

I don’t want to go into too much detail about exactly how to navigate the site and all, but I did find a pretty good write-up on another site here: http://nihongo-e-na.com/eng/site/id522.html

I’ll also point out a few direct links to what I find to be the most useful pages on the site–the daily stories:

Each of the categories above has a story for every day of the year! They are mostly Japanese-only though.

For those who’s Japanese is more at the beginner level, I would suggest starting with the stories that have an English translation available, though there are less than 50 of them at the time of this writing. You can find all the English-translated stories catalogued here: http://hukumusume.com/douwa/English/index.html

I also came across a page which lists stories according to (Japanese) grade level: http://www.hukumusume.com/douwa/0_6/0nen.html

For the past several weeks, I have been making it a point to read at least one story every day. I found it sort of annoying to navigate through a bunch of links every day, particularly if I was reading on my phone, so I wrote a simple script that will automatically take me to today’s story for the “日本昔話 – Japanese Classical Stories”. If that story doesn’t look interesting to me, the right hand sidebar beside the story will contain links to today’s story from all of the other categories as well, and it even has links to a lot of daily trivia that you can read (the first section of the sidebar is trivia, the second section is the stories).

Here is a link to my script which takes you directly to today’s story: http://www.nihongonobaka.com/Files/fairytale.php

If you want to use it, just bookmark that link. On my phone, I had to temporarily disable my internet connection in order to bookmark it, because it immediately redirects once you click the link.



Study Subtitled Videos Using PotPlayer

I previously wrote about studying Japanese through the use of Anime, Dramas, and Movies, but I always felt that there was still a step missing from the equation. I mean, sure, you can use great tools like Subs2SRS to ease the creation of Anki cards, but what about the process of actually watching the video? How do you efficiently look up words and try to understand sentences while you are watching it? This was a question that bugged me for a long time. While there are some solutions, such as opening up the script in a text file and following along, loading the video and script into Aegisub to go line by line, or even rigging up AGTH to capture the text output from the player; all of these methods are pretty clunky and leave something to be desired.

But just recently, I came across PotPlayer, and discovered that it actually makes the whole process as smooth as you could ever imagine! It feels like some of the features in this player were practically designed for someone who is learning a language! A few great features that I love about it:

  • Click on words to either perform a search or copy it to the clipboard
  • Copy the entire subtitle line to the clipboard, can be assigned to a shortcut key
  • Shortcuts to seek to the next/previous subtitle, allowing you to easily replay a line
  • Subtitle explorer displays all lines in a separate window for you to browse and seek to a particular line
  • Load multiple subtitle streams, so you can have Japanese and English at the same time
  • It remembers the last file you had open as well as your position within it, making it easy to pick up where you left off
  • Has options for adjusting the synchronization of subtitles, as well as the font
  • Is an otherwise completely full featured player, with tons of options and advanced features

I honestly don’t know what else I could want or expect in regards to watching subtitled video. This works great in conjunction with JGlossator, which will automatically look up helpful information on any Japanese subtitles that get copied to the clipboard.

I’ve put together a short video showing how to get up and started with using PotPlayer to study Japanese from subtitles:

Do you know any other software or tools to help with studying Japanese while watching videos? Let me know in the comments!

echo.html – Rikai-chan assistant

Several years back, I put together a simple html page that I have found very helpful over the years. All it does is let you type or paste text into a box, and it outputs that same text in a larger font size below the text box.

This serves two purposes. Mainly, it gives you a place to paste text so you can use Rikai-chan on it. Helpful for when you are copying and pasting from a PDF or Word document, or some random app with Japanese text. I also use it a lot when I am writing, because I often forget if some of the words I am writing are correct or not. This can be simpler and more convenient than pulling up gmail or pastebin or something, and you can use it without an internet connection.

The other function is to simply make the text big enough that you can easily read it. Japanese text (kanji particularly) is, in my opinion, quite hard to read in comparison to English text. If you don’t immediately recognize a kanji, you might have to strain to discern the strokes or radicals that it is made up of. Sometimes I wonder if the reason so many Japanese people have bad eyesight is due to having to strain to read kanji?

But anyways, here it is. You can just right-click on the link and save it to your desktop.

Japanese Games

Learn Japanese Through Video Games

Many people would like to be able to integrate video games into their Japanese studies, but it’s often easier said than done. It’s very easy to feel like you are in well over your head when it comes to most games. There are a lot of different factors that can make things more difficult than you would imagine, so I would like to discuss some of these things, and talk about what you should look for when trying to choose a game to get started with. And before we get into this, lets be honest here–most games are fairly difficult. Your Japanese ability needs to be at least equivalent to JLPT N4 level before even a small handful of games will begin to be accessible to you. For the majority of games, your Japanese probably needs to be at least at N2 level. But the purpose of this article is to try to break down some of those barriers, and open up more games to less advanced learners. So, if you are ready to learn Japanese through video games, let’s get to it!

Choosing a game

First of all, the most important thing is to choose a game that is on your level. If you can’t understand the majority of the game without having to look things up constantly, then you won’t learn much. If you feel lost, then it will only make you frustrated, and you will start trying to play the game without even reading most of the text. As I said above, your Japanese level probably needs to be about equivalent to JLPT N4 level before you even begin to think about playing through anything in Japanese, and N3 equivalency is probably more realistic. So in other words, you should be fairly proficient in Japanese before you can even hope to understand a real game! Prior to that, you will be stuck just using a handful of games specifically designed for learning Japanese, though the effectiveness and entertainment value of most of those is rather questionable. And when you do get to the point you can begin playing games, you are likely going to have to focus on games aimed at children first.

If you are still in the very beginning stages of learning Japanese, you might want to look at Influent, which is a game designed to help teach you about 400-500 words of beginner vocabulary. The Nintendo DS also had a Japanese learning game called My Japanese Coach, which teaches beginner level Japanese. Some critics have said that My Japanese Coach does contain a few errors, primarily regarding kanji stroke order, but I believe it should be alright for the most part. The real question though, is whether you should even bother with these as opposed to learning through traditional means? Since I learned through traditional means, I really can’t answer that for you. But, give them a shot if you like.

Now, when your Japanese ability starts coming together and you think you might be reaching the point where you could try playing something, you are going to have to think very carefully about what you will be able to play. If you are going to be spending money on a game, you definitely want to do your research before plopping down a large sum on something that might be way out of your league! First of all, you want a game that has a fairly large amount of text, and is of a reasonable length. This cuts out a lot of the classic games from the NES era, and cuts out several genres of games almost entirely. We are mostly going to be limited to things like RPGs or adventure games. You also want to make sure the game displays the text onscreen, and lets you advance it with a button press. This cuts out many things like action games which might have a strong story focus. After all, if you don’t have time to read the text, what are you hoping to gain from this? The difficulty of the language used in the game is also critical. A tactical RPG based on historical storylines or a sci-fi epic might not be the best choices to start off. But something that has a simplistic story involving more typical everyday things might be a much better option. Look for something where you have about 80% or better comprehension.

Many older games have technical limitations that can make learning from them difficult. For instance, in the NES/Famicom era, cart sizes were too limited to display kanji most of the time, or even a large amount of text in most cases. Things would also sometimes have to be written strangely in order to fit in limited space. Throughout the 16-bit to 64-bit eras, things improved a lot, but games were still often produced at low resolutions. This means that though they began using kanji in most games, it can often be extremely difficult to read, as the strokes often just blur together. If you need to try looking up a kanji in a dictionary, you might not even be able to do so, because you are unsure exactly what it’s supposed to look like. And don’t even expect furigana!

More modern games bring a lot of improvements that often make them better to learn from. Higher resolution text, furigana on occasion, and even voice acting all serve to make things easier on the learner. As good as that sounds, these newer games can bring their own problems as well. For instance, the 3DS system is region locked, meaning that for many people the only options to play a Japanese game on it are to either buy a separate Japanese system, or utilizing piracy, which could possibly get your system banned from Nintendo’s online services. This is a real shame too, because it has several games which are quite nice for learning from, such as Youkai Watch, which not only uses mostly simple Japanese, but has furigana as well.

Choosing a good game for learning Japanese turns out to be a pretty difficult task. After all, game creators are definitely not making their games with language learners in mind! But a little research up front will go a long ways towards stopping a lot of frustration down the road. Now, lets move on to how to actually go about playing and learning from Japanese games.

Use Scripts

When it comes to playing games in Japanese, scripts are your savior. Having a text file containing all of the Japanese text from the game you are playing makes things so much more comfortable, as it’s a lot easier and quicker to look up words and phrases that you might not know. But… there don’t seem to be a whole lot of Japanese scripts out there! This post on the Koohii forums links to a handful, but many of those games aren’t using the easiest Japanese to begin with. If you want to try your luck at searching for Japanese scripts online, the word for “script” is セリフ集.

If you can’t find a Japanese script, then the next best thing is an English script. While it doesn’t make looking up words any easier, it will help you to understand the storyline and give you some hints as to the meaning of some words and phrases that you have difficulty with. Loading up an English script into your tablet or phone and keeping it by your side while you play through a game can be a big help. You can sometimes find game scripts on GameFAQs, but it’s fairly hit-or-miss.

Another cool site is Learning Languages Through Video Games. This site has translated scripts for several games, mostly for the NES and SNES. Most of the games covered have very small amounts of text (Mario 3 isn’t exactly known for its intricate story!) so actually playing these games in Japanese probably won’t be terribly beneficial. But it can still be cool to go back and look through the translations for games that you might have played in your childhood.

Let’s YouTube!

But what if you can’t even find a script for the game you want to play? Well in this case, we can turn to “Let’s Play” videos on YouTube! While you play the Japanese game, you can follow along with a Let’s Play of the English version of the game. Or if you are lucky, you might find someone playing through the Japanese game while translating it to English in realtime, such as on the RisingFunGaming channel. Many Let’s Play videos tend to have a lot of commentary over the gameplay, so if you prefer not to hear that, you might also want to search for videos with “longplay” in the title. These videos generally don’t have commentary.

If you really aren’t that keen on actually playing games, you might find that watching other people play through them is just as satisfying. By simply watching a Let’s Play or longplay of a Japanese game, you can pause and rewind in YouTube to take things at your own pace, and learn from a game just as effectively as you would by playing it on your own.

And for improving your listening skills, maybe you want to watch some Let’s Plays done by actual Japanese gamers? Just search on YouTube for the word 実況 along with the Japanese title of the game you are looking for!

Or maybe you are more into the competitive side of gaming? The YouTube channel Shi-G features Japanese Smash Brothers tournament play, sometimes with commentary, sometimes without. Adding 大会 into your YouTube searching can bring back results featuring tournament gameplay, through many of the videos tend to be from tournaments outside of Japan, so they may not be useful.

Visual Novels are games too! Sort of…

Ah, visual novels. The finest pornographic literature that the world of gaming has to offer! If you are over 18 and have become fairly proficient in Japanese, then these might be a good option. While most of these really aren’t suited to be classified as “games”, they usually do tend to offer some amount of interactivity and branching story paths. And to be fair, they aren’t all pornographic, and some of them even make their way onto mainstream gaming consoles. Furthermore, these types of games are great for learning Japanese, not only because there are a ton of them and they all have massive amounts of text, but there are also some amazing tools available to make reading them so much easier!

By using Interactive Text Hooker and Translation Aggregator, you can extract the text into a copy/pasteable format and get dictionary and translation assistance in realtime! A newer application called Visual Novel Reader also looks like it has some amazing features, including many of the things you can get from the previous two tools, in addition to other features like crowd-sourced translations!

To get started with setting up the software and choosing some easy visual novels, you might want to check out this article at Visual Novel Tea Party, or the visualnovels subreddit, which has a list of easy VNs to start with, and a guide to getting your software set up.

Finally, if you just want to check this stuff out without having to spend a lot of time figuring out how to work all this stuff, you might want to take a look at The Asenheim Project, which has several older Visual Novels emulated through Javascript, allowing you to play them through your web browser and look up words using Rikaichan. Most of the VNs listed here have English translations available, so you could open two browser windows side-by-side, with the Japanese version in one and English in the other!

Remember to Have Fun

Trying to play games in Japanese, especially when you aren’t that good at Japanese, can be really stressful. If it’s not working for you, don’t force it! Step back for a bit and study some more, and maybe you will be ready later on. Games are supposed to be fun, so don’t let learning take all of the fun out of them!

And every now and then, you just need to unwind and relax. Here are some things to check out when you need a break:

Game Center CX

Most gamers will probably enjoy watching Game Center CX, a Japanese television show dedicated to retro gaming. It’s been going for over 10 years, and most of the episodes have been fansubbed! While the Japanese tends to be on the more difficult side in my opinion, its a fun distraction that gives a lot of insight into the Japanese viewpoint on retro games.

Legends of Localization

A cool website that I stumbled across is Legends of Localization, which takes a look at various questions regarding the translation and localization of games, and goes back to look at the original Japanese, to see what the games were really saying. It also has some in-depth comparisons between the English and Japanese versions of several games, including Super Mario Bros., The Legend of Zelda, Earthbound, Final Fantasy IV, and more!

Alright, that’s it for now! In my next post, I will be listing my top 11 games for Japanese learners!