PNG2SRT (tool to OCR image subtitles)

Download on Github

This is a tool that can perform OCR (optical character recognition) on XML/PNG subtitles and output the result as an SRT file. This can be used for subtitles obtained from DVD, Blu-ray, and Netflix. The Google Cloud Vision API is used for the OCR, and it has very good accuracy. This program is based on a python script originally posted by zx573 on the kanji koohii forums.

Before using this program, you may need to get your subtitles into the XML/PNG format. I have previously written a guide on extracting Netflix subtitles here.

For DVD or Blu-ray, I’m not going to write a detailed guide on ripping subtitles from the disc, as there are plenty of other guides out there on the internet. It is assumed that you can figure out how to obtain your subtitles as SUB/IDX or SUP format. From there, I recommend using a Windows program called Subtitle Edit to convert them into XML/PNG format. There may be other software that can do this, but Subtitle Edit is the one I am most familiar with.

Using Subtitle Edit to convert DVD or Blu-ray subs to XML/PNG

The File menu in Subtitle Edit has several options to import your Subtitles that are in SUB/IDX or SUP format. Just choose the appropriate one, and then you will come to an import screen. From here, you just need to right-click on one of the subtitle lines, then select Export > BDN xml/png.

Then on the next screen then comes up, you just want to select “export all lines”, and select a folder to save to.

Now you should have a folder containing a bunch of PNG images and an XML file. The next step is to create an API key on the Google Cloud Platform.

Create an API Key for Google Cloud Vision API

Google’s OCR is by far the most accurate I have seen, and works quite well. It is also free for a limited amount of use each month. According to their current pricing structure, you can OCR up to 1,000 items per month for free. My program can batch several PNG images into a single item, so you should be able to do several episodes or movies in a single month without having to pay anything. Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service.

If you sign up for the Google Cloud Platform, then after logging in, you need to enable the Cloud Vision API and generate an API key.

  1. In the left hand menu, select APIs & Services > Dashboard
  2. Select Enable APIs & Services
  3. In the search box, type “vision”, and then select Google Cloud Vision API.
  4. Select Enable. It may walk you through setting up a billing profile at this point if one has not been created already. Again, there is no obligation to actually pay anything, as you can use this API a certain amount for free each month, and you may get free credits when signing up.
  5. Back at the APIs & Services Dashboard, select Credentials > Create Credentials > API Key.
  6. Once you have generated the API key, be sure to copy it or keep it open in your browser so you can access it later.

Use PNG2SRT to OCR the images

Now, we can use PNG2SRT to send the subtitle images through the Cloud Vision API.

Download

Version 1.0.1 – May 12, 2018

Download on Github

Download the appropriate version for your computer, and then extract the archive.

Next, you need to paste your API Key into a text file named API_KEY.txt located in the same folder as the application (the file should contain ONLY your API key, and no other text).

When you run the application, it should look like this:

First, you need to make sure that your API Key is displayed correctly in the top area. If not, make sure you did the previous step correctly.

Then, you just select a folder containing XML/PNG files, which is what will be converted to SRT.

Note: You may get an error if the folder name contains unicode characters. In that case, please rename the folder to use English characters.

There is also an option to select the language that you want Google to recognize. It defaults to Japanese, because that is what I use, but you can select whichever language you need. You can find a full list of language codes here.

The only other option is the chunk size. The default of 15 is usually fine. If you press the start button, and the program appears to begin working but then gives you an error message part way through, you might need to decrease the chunk size to a smaller value like 10 or even 5. I had previously stated that higher values here will use less of your credit/money on Google; this was false, I apoligize for the confusion.

After you press start, if all goes well, the program should run and it will output an SRT file inside your input folder.

Extracting Subtitles from Netflix

Updated 3/18/2018

Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.

When I originally wrote this post, it was because I had discovered a way of ripping Japanese subtitles from Netflix, which to my knowledge, no one else had figured out how to do at the time. My method was long and clunky though. But thanks to the help of several users, we have eventually arrived at newer methods that are MUCH easier and better. So now, it is really quite simple to rip subtitles from Netflix, to the point that just about anyone can do it.

Download Subtitles I’ve Already Ripped

I have already downloaded subtitles from over 30 Japanese shows and movies that are available on US Netflix, and you can grab them all here.

 

How to rip Japanese Subtitles from Netflix

Getting the subtitles from Netflix is quite simple now, due to a tool that does all the hard work for us! 

First, you will need to download an addon for your web browser which allows you to run userscripts. One such addon is called ViolentMonkey, and it works with either Firefox or Chrome (as well as some other browsers). There are several other similar addons as well, such as TamperMonkey and GreaseMonkey. These all mostly do the same thing, so just pick one. A simple Google search for any of those titles should easily lead you to a page that lets you install it in your web browser.

Next, you want to install the Netflix Subtitle Downloader. After installing it, you will notice some new options appear inside the subtitle selection menu on the Netflix website. Simply select the subtitle language that you want, and then click on one of the download buttons. It’s that simple! You might need to give it a moment after clicking the button while it begins downloading.

Note: On my system, I have run into some issues where the subtitle downloader will sometimes try to download the subtitle for the previous video that I was looking at. If you run into this issue, this can be resolved by hitting the “refresh” button in your browser after loading a video.

For many languages, especially ones with simple character sets like English and Spanish, the subtitles are downloaded as SRT files. However, for languages with more complex character sets like Japanese, Chinese, or Korean, the subtitles are stored as images. So in order to convert these into a text format, you need to perform OCR (optical character recognition).

To assist in performing this OCR, I have created a tool called PNG2SRT which makes it simple. You can see how to use PNG2SRT here.

New Method (download as text)

There is now a new method of downloading the subtitles directly as text rather than as images. However, the new method doesn’t work on every show. The method listed above will work on any show that has Japanese subtitles, so it is still useful in many cases. You can read about the new method of obtaining subtitles here.

Hukumusume Fairy Tale Collection

In my previous post, I had mentioned a website called the Hukumusume Fairy Tale Collection. While I suppose this site is fairly well-known among students of Japanese, I would like to take a bit of time to talk about it, because it is an absolutely massive site with a ton of content, and it can be easy to get lost, because the navigation menus change depending on what part of the site you are on. I think a lot of people might not even know about all of the different things offered on the site.

Put simply, this is a site with a lot of classic children’s stories. They have stories with text-only, stories with audio, and even picture-book stories. There is also a section with many stories that have English translations. They also have multiple different sections which all contain different stories for every single day of the year. There are thousands of stories here. Now, this might not necessarily be an ideal resource for absolute beginners in Japanese, because a lot of the stories may use some somewhat old words and ideas that you aren’t familiar with. However, with the sheer amount of content offered, there are plenty of really basic and easy stories to find if you are willing to dig around for a bit.

I don’t want to go into too much detail about exactly how to navigate the site and all, but I did find a pretty good write-up on another site here: http://nihongo-e-na.com/eng/site/id522.html

I’ll also point out a few direct links to what I find to be the most useful pages on the site–the daily stories:

Each of the categories above has a story for every day of the year! They are mostly Japanese-only though.

For those who’s Japanese is more at the beginner level, I would suggest starting with the stories that have an English translation available, though there are less than 50 of them at the time of this writing. You can find all the English-translated stories catalogued here: http://hukumusume.com/douwa/English/index.html

I also came across a page which lists stories according to (Japanese) grade level: http://www.hukumusume.com/douwa/0_6/0nen.html

For the past several weeks, I have been making it a point to read at least one story every day. I found it sort of annoying to navigate through a bunch of links every day, particularly if I was reading on my phone, so I wrote a simple script that will automatically take me to today’s story for the “日本昔話 – Japanese Classical Stories”. If that story doesn’t look interesting to me, the right hand sidebar beside the story will contain links to today’s story from all of the other categories as well, and it even has links to a lot of daily trivia that you can read (the first section of the sidebar is trivia, the second section is the stories).

Here is a link to my script which takes you directly to today’s story: http://www.nihongonobaka.com/Files/fairytale.php

If you want to use it, just bookmark that link. On my phone, I had to temporarily disable my internet connection in order to bookmark it, because it immediately redirects once you click the link.

 

PotPlayer

Study Subtitled Videos Using PotPlayer

I previously wrote about studying Japanese through the use of Anime, Dramas, and Movies, but I always felt that there was still a step missing from the equation. I mean, sure, you can use great tools like Subs2SRS to ease the creation of Anki cards, but what about the process of actually watching the video? How do you efficiently look up words and try to understand sentences while you are watching it? This was a question that bugged me for a long time. While there are some solutions, such as opening up the script in a text file and following along, loading the video and script into Aegisub to go line by line, or even rigging up AGTH to capture the text output from the player; all of these methods are pretty clunky and leave something to be desired.

But just recently, I came across PotPlayer, and discovered that it actually makes the whole process as smooth as you could ever imagine! It feels like some of the features in this player were practically designed for someone who is learning a language! A few great features that I love about it:

  • Click on words to either perform a search or copy it to the clipboard
  • Copy the entire subtitle line to the clipboard, can be assigned to a shortcut key
  • Shortcuts to seek to the next/previous subtitle, allowing you to easily replay a line
  • Subtitle explorer displays all lines in a separate window for you to browse and seek to a particular line
  • Load multiple subtitle streams, so you can have Japanese and English at the same time
  • It remembers the last file you had open as well as your position within it, making it easy to pick up where you left off
  • Has options for adjusting the synchronization of subtitles, as well as the font
  • Is an otherwise completely full featured player, with tons of options and advanced features

I honestly don’t know what else I could want or expect in regards to watching subtitled video. This works great in conjunction with JGlossator, which will automatically look up helpful information on any Japanese subtitles that get copied to the clipboard.

I’ve put together a short video showing how to get up and started with using PotPlayer to study Japanese from subtitles:

Do you know any other software or tools to help with studying Japanese while watching videos? Let me know in the comments!

echo.html – Rikai-chan assistant

Several years back, I put together a simple html page that I have found very helpful over the years. All it does is let you type or paste text into a box, and it outputs that same text in a larger font size below the text box.

This serves two purposes. Mainly, it gives you a place to paste text so you can use Rikai-chan on it. Helpful for when you are copying and pasting from a PDF or Word document, or some random app with Japanese text. I also use it a lot when I am writing, because I often forget if some of the words I am writing are correct or not. This can be simpler and more convenient than pulling up gmail or pastebin or something, and you can use it without an internet connection.

The other function is to simply make the text big enough that you can easily read it. Japanese text (kanji particularly) is, in my opinion, quite hard to read in comparison to English text. If you don’t immediately recognize a kanji, you might have to strain to discern the strokes or radicals that it is made up of. Sometimes I wonder if the reason so many Japanese people have bad eyesight is due to having to strain to read kanji?

But anyways, here it is. You can just right-click on the link and save it to your desktop.