This is a tool that can perform OCR (optical character recognition) on XML/PNG subtitles and output the result as an SRT file. This can be used for subtitles obtained from DVD and Blu-ray. The Google Cloud Vision API is used for the OCR, and it has very good accuracy. This program is based on a python script originally posted by zx573 on the kanji koohii forums.
Before using this program, you may need to get your subtitles into the XML/PNG format. For DVD or Blu-ray, I’m not going to write a detailed guide on ripping subtitles from the disc, as there are plenty of other guides out there on the internet. It is assumed that you can figure out how to obtain your subtitles as SUB/IDX or SUP format. From there, I recommend using a Windows program called Subtitle Edit to convert them into XML/PNG format. There may be other software that can do this, but Subtitle Edit is the one I am most familiar with.
Using Subtitle Edit to convert DVD or Blu-ray subs to XML/PNG
The File menu in Subtitle Edit has several options to import your Subtitles that are in SUB/IDX or SUP format. Just choose the appropriate one, and then you will come to an import screen. From here, you just need to right-click on one of the subtitle lines, then select Export > BDN xml/png.
Then on the next screen then comes up, you just want to select “export all lines”, and select a folder to save to.
Now you should have a folder containing a bunch of PNG images and an XML file. The next step is to create an API key on the Google Cloud Platform.
Create an API Key for Google Cloud Vision API
Google’s OCR is by far the most accurate I have seen, and works quite well. It is also free for a limited amount of use each month. According to their current pricing structure, you can OCR up to 1,000 items per month for free. My program can batch several PNG images into a single item, so you should be able to do several episodes or movies in a single month without having to pay anything. Google also offers a great trial offer (at least at the time I write this). You can get $300 of free credit when you sign up, and you have no obligation pay anything or continue using the service.
If you sign up for the Google Cloud Platform, then after logging in, you need to enable the Cloud Vision API and generate an API key.
- In the left hand menu, select APIs & Services > Dashboard
- Select Enable APIs & Services
- In the search box, type “vision”, and then select Google Cloud Vision API.
- Select Enable. It may walk you through setting up a billing profile at this point if one has not been created already. Again, there is no obligation to actually pay anything, as you can use this API a certain amount for free each month, and you may get free credits when signing up.
- Back at the APIs & Services Dashboard, select Credentials > Create Credentials > API Key.
- Once you have generated the API key, be sure to copy it or keep it open in your browser so you can access it later.
Use PNG2SRT to OCR the images
Now, we can use PNG2SRT to send the subtitle images through the Cloud Vision API.
Download
Version 1.0.1 – May 12, 2018
Download the appropriate version for your computer, and then extract the archive.
Next, you need to paste your API Key into a text file named API_KEY.txt located in the same folder as the application (the file should contain ONLY your API key, and no other text).
When you run the application, it should look like this:
First, you need to make sure that your API Key is displayed correctly in the top area. If not, make sure you did the previous step correctly.
Then, you just select a folder containing XML/PNG files, which is what will be converted to SRT.
Note: You may get an error if the folder name contains unicode characters. In that case, please rename the folder to use English characters.
There is also an option to select the language that you want Google to recognize. It defaults to Japanese, because that is what I use, but you can select whichever language you need. You can find a full list of language codes here.
The only other option is the chunk size. The default of 15 is usually fine. If you press the start button, and the program appears to begin working but then gives you an error message part way through, you might need to decrease the chunk size to a smaller value like 10 or even 5.
After you press start, if all goes well, the program should run and it will output an SRT file inside your input folder.