anime-manga.jp Anki Decks

Some time ago, The Japan Foundation created a website to help Japanese students learn the type of Japanese that is often heard in anime and manga. While it’s got some decent content, I’ve rarely ever heard anyone mention the site. That’s probably because they stuck all of the content into a crappy flash application. You can’t view it on mobile, you can’t copy and paste text, you can’t resize it, you can’t do ANYTHING useful.

So, I dumped some of their content into Anki decks so that it would be possible to actually learn something from it. I have made a deck containing phrases, and a deck with grammar points. This is only a portion of the total content from the site, but I felt that these parts would probably be the most useful and work the best in a flashcard format.

The grammar deck in particular is a bit dense with all of the information available, but I thought it best to put too much info rather than too little. You can of course customize which fields appear on your cards, since Anki gives you complete flexibility to display the cards as you like.

One cool aspect about the site was that it has 8 different Japanese character archetypes who all speak differently. I have kept this aspect in the flashcards by indicating which character the card is for. There is also full audio, so you can hear the personal spin that each character puts on the phrases.

After studying the cards, there is still some cool stuff to go back on the website to do. For instance, they have several manga stories that you can read, which utilize all of the phrases and grammar.

Content difficulty is probably Upper Beginner – Intermediate. You should probably have at least a good command of JLPT N5 grammar before tackling these.

Grammar Deck

Phrase Deck (updated 5/3/17 to fix image links)

http://www.anime-manga.jp/

Take everything with a grain of saltEmail this to someoneShare on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on StumbleUpon

Extracting Subtitles from Netflix

Having subtitle scripts from TV shows that you are watching is an excellent study aid. Not to mention that they can be used with Subs2SRS to easily import sentences into Anki! These days, many people tend to watch Netflix more than a lot of the traditional media. I’ve also seen numerous people talking about how the Netflix Original “Terrace House” is great for Japanese listening practice, because it is unscripted and captures natural dialog.

While it’s long been lamented that there was no way to download or rip the Japanese subtitles from Netflix (I even said so much in a previous post about Netflix), I have recently discovered a way!

In this post, I will provide a download for the subtitles that I have ripped, and I will also provide instructions on how to rip them yourself from other shows. However, the process of ripping subtitles is quite technical, and is probably not something that everyone can do.

Already Ripped Subtitles

First, some important details about Netflix subtitles: In most languages, the subtitles are just in a standard text-based subtitle format. So if you get English subtitles, you can just open them up in a text editor. With Japanese subtitles, however, the subtitles are all stored as images. This means that in order to do things like copy and paste the text to look up words, they would first have to be converted into text by OCR, which is unfortunately not perfect.

All of the subtitles that I have ripped have been OCR’ed using the Google Cloud Vision API. This is likely the most accurate Japanese OCR technology available at the moment, but the text still does contain a few mistakes here and there. So please keep this in mind if you are using these subtitles to study. If something looks wrong, it probably is. Go watch it on Netflix to see what the correct subtitle would look like.

You can download my Netflix subtitle pack from here (Mega) or here (MediaFire).

It contains subtitles in both English and Japanese for all of the following shows:

  • Atelier (Underwear)
  • Good Morning Call
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Mischievous Kiss (Itazura na Kiss)
  • Mischievous Kiss 2 (Itazura na Kiss 2)
  • My Little Lover (Minami Kun No Koibito)
  • Terrace House: Boys and Girls in the City
  • Terrace House: Aloha State (Parts 1+2)
  • Pee Wee’s Big Holiday (Japanese subs for English language movie)
  • Stranger Things (Japanese subs for English language series)

If Netflix has another show that you would like Japanese subtitles for, or if you would like audio to accompany the subtitles, then you will have to rip it yourself. As I said, it is also quite technical, so I wouldn’t attempt it unless the instructions below at least halfway makes sense to you.

How to rip Japanese Subtitles from Netflix

First, you need to choose a show that actually has Japanese subtitles available. I show how to do this in my previous post about Netflix.

Once you have found a show that has Japanese subtitles, you need to see if you will be able to download it. At the time of this writing, Netflix allows Android and IOS devices to download select shows to your device. You can not download shows directly on a PC. I have also tested several Android emulators on PC, and did not have any success. I also have no experience with IOS devices, so I can not say for certain that this would be possible on there or not. So basically I can only confirm that the following steps can be done on an Android device. I also believe your Android device will need to be rooted, but I’m not certain. If anyone manages to do this without a rooted device, please let me know.

So, assuming you have a rooted Android device, you will want to find a show that both has Japanese subtitles and allows the episodes to be downloaded. Then just download all of the episodes onto your device. Don’t download more than one series at a time! This is because the filenames do not contain the show title, so its difficult to figure out which files go with which show. If you stick to one show at a time, you wont run into this problem.

Next, you will want to have the ADB tool which lets you transfer files between your Android device and your PC. These files will be hidden to a standard file browser, so that’s why you need this tool. ADB can be downloaded as part of Android’s standalone SDK Platform Tools. You also need to Enable USB Debugging on your Android device.

Now, you need to find where Netflix downloaded the files onto your Android device. A file manager app such as Amaze should let you find them. On my device (it’s probably the same on most devices) the files are located at /sdcard/Android/data/com.netflix.mediaclient/files/Download/.of
Inside that folder there will be a separate subfolder for each episode, and each one of those subfolders will have a name made up of seemingly random numbers. You can use the ADB tool to copy all of the subfolders to your PC. You first need to open up a command prompt in the folder that adb.exe is stored in, and then do something like this:

adb pull /sdcard/Android/data/com.netflix.mediaclient/files/Download/.of

After some time, all of the files will be copied to your PC (unless you get an error message). So now just browse into the “.of” folder on your PC to find all of the subfolders for each episode. The folders should sort in the correct order, as long as you sort by name. So the first folder should be the first episode, the second folder will be the second episode, and so on. Let’s take a look at the different types of files that you can find in each folder.

  • .manifest – Contains some metadata about the files. Not really useful.
  • .nfi – Unknown – I’m not sure about the contents of this file, but it does not appear to be useful.
  • .nfv – Netflix Video – Contains the video stream. It is encrypted so it is not much use to us.
  • .nfa – Netflix Audio – Contains an AAC audio stream. Change the file extension to .m4a and you should be able to play it. Can be used with Subs2SRS.
  • .nfs – Netflix Subtitle – Contains the subtitles. If the file size is smaller (about 10-100kb) it is usually a text file and may contain the subtitles for English or some other language. Change the extension to .xml and you can open it in a text editor. If the file size is larger (a few MB), it is the Japanese subtitles. Change the extension to .zip, and you will be able to extract the contents.

After finding the Japanese subs and changing the extension to .zip, extract them into a folder, and then rename the folder so you know what episode it is. You will have many PNG files which are the subtitle images, and you will also have a file named “manifest_ttml2.xml” which has all of the timing data. Congratulations, you have successfully extracted the subtitles! But for them to be a little more useful, we will need to OCR them.

How to OCR using Google Cloud Vision API

There are several OCR tools out there that can handle Japanese text. Most of them suck and result in a lot of errors. Google’s OCR is by far the most accurate I have seen, and works quite well. Unfortunately, it’s only sort of free. According to their current pricing structure, you can OCR up to 1,000 images per month for free. Since a typical episode is a few hundred images, this is enough for a few episodes each month. However, Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service. I opted for this option, and was able to OCR all of the episodes that you find in the download above.  The free credit does expire if you don’t use it within a certain time.

If you sign up for the Google Cloud Platform, then after logging in, you first need to enable the Cloud Vision API. Just click the “Enable API” link at the top of your Dashboard, and then find “Vision API” under the “Google Cloud Machine Learning” heading. After that, you will also need to create an API key. Click “credentials” on the left side menu, and then click “create credentials”, and select “API Key”.

Now, we can use a python script created by “zx573” from the Kanji Koohii forums to actually perform the work of sending the images to Google and generating a text-based subtitle file. You will need a 2.7.x version of python (I don’t think it works on 3.x). You also need to install the packages Pillow and requests. This can be installed from the command line by typing:

pip install pillow
pip install requests

Next, you will need the python script, which you can grab from here. You will then need to open up the file in a text editor and insert your API Key into the line that says AUTH_KEY = “YOUR API KEY HERE”

Now, we can run this python script from the command line, with the path of the folder containing your subtitle images as an argument, like so:

python generate_srt_from_netflix.py “Terrace House – Boys & Girls in the City 01”

If all goes well, you should see it processing the images, and then it will finally spit out an SRT file named “output.srt” for you! However, these srt files will contain some errors which we need to fix up before they can be opened in other applications.

Additional Processing

The srt files will have a problem in that they do not always contain timestamps that include milliseconds, and most applications that edit srt files will expect there to be milliseconds. However, this is an easy fix, using software that lets you do search and replace using regular expressions. I use notepad++.

If you choose the Search > Find in Files menu option, you can search across all your subtitles at once!

Set the directory to the location where your srt files are, and then if you want, you can set the filters to *.srt to avoid accidentally picking up any other files. Make sure the search mode is set to “regular expression” and the checkbox beside it is not checked.

Then, in the “Find what” field, you want to put: (\d\d:\d\d:\d\d,\d?\d?)(\s)
Replace with: \10\2

Press “replace in files”. Do this twice.

Then change “Find what” to: (\d\d:\d\d:\d\d)(\s)
Replace with: \1,000\2

Finally, press “replace in files” again. We have now corrected the srt files!

After that, there are some further optional things that you can do, but you don’t have to. The tool Subtitle Edit is quite helpful for fixing up your subtitles. You can use it to batch convert English.xml files into SRT files (Tools > Batch Convert). It can also remove hearing impaired text from the subtitles (text that describes sounds, or names which character is speaking). Sometimes it doesn’t work so well for removing hearing impaired text from the Japanese files, because the text is enclosed in Japanese parentheses rather than the expected English parentheses, but you can still accomplish it using the Search and Replace tool (or the same tool in notepad++). After loading a Japanese subtitle file, you just want to go to Edit > Replace. Then select the “Regular Expression” option, and type (.+) as your search term (make sure you use Japanese parentheses, not English parentheses!), and press “Replace All.” That should get rid of any remaining Japanese hearing impaired text.

Take everything with a grain of saltEmail this to someoneShare on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on StumbleUpon

Hirogaru – Yet another source of beginner’s reading material!

I can’t believe how much reading material I have been finding recently. I remember my early days in Japanese, struggling to find anything at all that was on my level, but now I keep seeing more and more material becoming available. This latest resource is the newest website from The Japan Foundation. Called ひろがる、it launched in 2016 and seems to have at least 50-60 easy articles on it so far.

The level of the material seems to be aimed at those who have perhaps completed about one year of studying (able to pass JLPT N5, or completed the first Genki textbook), but may still be somewhat challenging for more advanced students as well, due to the diverse range of topics that the articles cover. Topics include:

  • Astronomy
  • Outdoors
  • Martial Arts
  • Tea
  • Sweets
  • Shopping
  • Calligraphy
  • Anime/Manga
  • Books
  • Temples
  • Music
  • Aquarium

Each topic generally contains about 4-5 articles that you can read. I believe that they may be adding new articles from time to time, but it does not seem to be at a very fast pace. Besides just the articles, there is usually a short video about each topic, as well as some short commentary from Japanese people saying what that topic means to them. For some reason, most topics also have a section containing pictures of food. There is also a comment section in each topic, which allows you to write a Japanese response to three different questions.

The articles are really the main attraction of this site, so let’s talk about those for a bit. Each article is fairly short, so that a beginner student could probably read it in 5 or 10 minutes. The articles are broken up into several paragraphs, and each paragraph has audio so you can hear it read aloud. At the end of each article you will find a quiz with a couple of multiple choice questions, to test your comprehension. At the top of the site, there are some controls which can assist you in reading the articles. One is a “Ruby” toggle, which turns furigana on or off for all of the kanji in the article. The other setting is an “English/Japanese” toggle. This seems to be poorly named, because it does not function how you might expect. If you set it to “English”, the articles remain fully in Japanese. The only thing that really changes is the navigation buttons, and also when it is set to English there will be a button under each paragraph that you can press to see a list of the difficult vocabulary. As such, I would recommend keeping it set to “English” at all times so you have access to the vocabulary words.

Overall its a nice site, and certainly worth spending some time on. My only real gripe is that the articles are kinda lame and boring (to me at least), but that’s sort of hard to avoid with these kinds of generic topics. But all in all, it’s a fantastic source of reading material at a level where such material has often been overlooked. Check it out!

Take everything with a grain of saltEmail this to someoneShare on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on StumbleUpon

What you need to know to learn a foreign language by Paul Nation (book)

I finally got around to reading this great book by Paul Nation, What you need to know to learn a foreign language. The book is offered as a free PDF from his website. If you are unfamiliar with Nation, he is a leading researcher in Foreign Language Education with an interest in vocabulary acquisition and teaching methodology. While most of his research is aimed at the classroom, with this book he attempts to bring the results of his research to the student who might be trying to learn a language on their own.

It’s a somewhat short and easy-to-read book that just gets right to the point rather than giving you long-winded anecdotes and motivational stories. It could easily be read in a single afternoon. Much of the book in influenced by his “Four Strands Principle”, in which he believes that the most effective way of learning a language involves balancing your study across four different types of learning.

The Four Strands consist of:

  1. learning from meaning-focused input (listening and reading)
  2. learning from meaning-focused output (speaking and writing)
  3. language-focused learning (studying pronunciation, vocabulary, grammar etc)
  4. fluency development (getting good at using what you already know).

The main meat of the book consists of descriptions of twenty different learning activities that you can do, with different activities fitting into each of the different strands. He also spends a short bit of time explaining exactly WHY certain activities can be helpful. For instance, did you know that doing just a bit of timed reading can quickly improve your overall reading speed by 50-200%?

Here is a list of the different types of activities described in the book:

  • Reading while listening
  • Extensive reading
  • Narrow reading
  • Role play
  • Prepared talks
  • Read and write
  • Transcription
  • Intensive reading
  • Memorized sentences or dialogues
  • Delayed copying
  • Repeated listening
  • 4/3/2
  • Repeated reading
  • Speed reading
  • 10 minute writing
  • Repeated writing
  • Word cards
  • Linked skills
  • Issue logs
  • Spelling practice

I mention this just to give you a general idea of what you can expect to read about in the book. For the details of what each activity actually entails, you’ll need to read the book (which again, is free).

There are a lot of different opinions out there about how to learn a language. There is one camp which advocates focusing solely on input, and not worrying about anything else. Nation, on the other hand, argues that a fully balanced course is the way to go. While there is research out there to argue a lot of different opinions, we may never know for sure exactly what is truly optimal. With that said, nothing that Nation writes in this book feels terribly controversial, and it all just seems to make sense. I can’t imagine that these ideas could really steer anyone wrong, so I highly recommend this book for anyone who is currently learning a language.

Take everything with a grain of saltEmail this to someoneShare on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on StumbleUpon

Pibo – Even more children’s books on your smart device

So I recently wrote about EhonNavi, which lets you read thousands of Japanese Children’s books for free, but did you know that there is also another service called Pibo which has hundreds more completely different children’s books which can also be read for free?

Pibo is completely a separate service from EhonNavi, and offers some different pros and cons. First of all, while EhonNavi is primarily a site for desktop computers, Pibo is designed primarily for phones and tablets. Upon visiting their website, you will see prominent links to get the app from either the iTunes App store or the Google Play store. There is also no signup procedure–just download the app and you are ready to start using it!

While EhonNavi shows you scans of physical books, the books on Pibo are all digital. As such, the artwork is much more crisp and clear. The books on Pibo are also completely voiced. That’s right, you can follow along as the book is read aloud to you! There is also no limit to how many times a certain book can be read, unlike on EhonNavi, where you only get to read each book once. Also, like EhonNavi, books can be browsed according to their age level (although I feel that many books fall into too large of an age range).

There are also a few downsides to the service as well. For one thing, there is no apparent way to see which books you have read already. So if your goal is to read every book that is offered, you might need to keep a list yourself. The books are also always displayed in a completely random order, which exacerbates the problem further. I have created a list of every book title, which you can grab here (updated Feb 5, 2017). The number of books available is also significantly less than what you could find on EhonNavi. However, with nearly 400 available already (and growing!), that isn’t a huge problem.

So now, it’s worth mentioning how the service operates. When you first install the app, you get a 1 week free trial to read as much as you want. After that free trial is up, you can still read up to 3 books for free every day, which seems quite generous. You can also purchase a subscription for less than $5 per month, which allows you to read all you want. Seems like a pretty fair price to me.

All in all, I think this is a good complement to EhonNavi. You don’t have to choose either-or. They both work great together! When I am at my desktop, I use EhonNavi, and when I am on my phone, I read 3 books on Pibo. I urge everyone to check out both of these free services to try them out and get some reading practice!

Take everything with a grain of saltEmail this to someoneShare on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on StumbleUpon