Just got hold of the Criterion Collection (CC) of one of my favourite trilogies—Infernal Affairs. I have had the original Blu-ray rips show up just fine in Infuse, so thought I’d use the new Smart Groups feature to add the CC edition to the same titles. But… that doesn’t seem to work as advertised.
Here’s what my folder structure looks like—
Infernal Affairs (2002)
> Extras
- folder.jpg
- Infernal Affairs (2002) {edition-1 The Criterion Collection}-poster.jpg
- Infernal Affairs (2002) {edition-1 The Criterion Collection}.mkv
- Infernal Affairs (2002) {edition-2 Theatrical Release}-poster.jpg
- Infernal Affairs (2002) {edition-2 Theatrical Release}.mkv
Infernal Affairs II (2003)
> Extras
- folder.jpg
- Infernal Affairs II (2003) {edition-1 The Criterion Collection}-poster.jpg
- Infernal Affairs II (2003) {edition-1 The Criterion Collection}.mkv
- Infernal Affairs II (2003) {edition-2 Theatrical Release}-poster.jpg
- Infernal Affairs II (2003) {edition-2 Theatrical Release}.mkv
Infernal Affairs III (2003)
> Extras
- folder.jpg
- Infernal Affairs III (2003) {edition-1 The Criterion Collection}-poster.jpg
- Infernal Affairs III (2003) {edition-1 The Criterion Collection}.mkv
- Infernal Affairs III (2003) {edition-2 Theatrical Release}-poster.jpg
- Infernal Affairs III (2003) {edition-2 Theatrical Release}.mkv
When I open Infuse (and I have waited for it to scan for changes), this is how it shows up—
I now see 2 entries for Infernal Affairs instead of the 3 that should show up (based on my folder structure). The 3rd Infernal Affairs poster you see at the end of that same row, is actually a file form the Extras folder (which is supposed to be ignored, but, well, that’s another battle). Anyways, there are actually couple of things wrong here.
Scrolling down to try and verify the file names, they seem right. But seem to be just getting scrapped wrongly. Infuse seems to think Infernal Affairs II and Infernal Affairs III are one and the same. They were released the same year (2003), but they are 2 different titles.
For whatever reason, the top result for ‘Infernal Affairs II’ on TMDB is Infernal Affairs III.
What you can do is use the Edit Metadata option to select the correct title for the 2 copies of II, or just swap the roman numerals in the filename to numbers.
EG
Infernal Affairs 3 (2003) {edition-1 The Criterion Collection}.mkv
The scraping algorithm has demonstrated difficulty properly identifying between similarly named titles when the only distinction between them are short words (or “words” as in the case of II vs III); and especially so when the titles wholly consist of a single short word. The algorithm by default fails to prioritize exact title matches over near-matches that include what ought to be exclusionary “extraneous” content (such as an additional I).
The reason this works is because TMDB users have previously identified the above issue and created alternative title entries to circumvent it.
Infuse relies on the top result as provided by TMDB for automatic matching. This is usually pretty reliable, but there are some edge cases like this where issues can come up.
You can see this on the TMDB site as well by searching for Infernal Affairs II.
Yes, I know. I wasn’t throwing blame at either of you (Firecore or TMDB). My presumption would be that TMDB controlled the search API and algorithm.
I’ve been posting screencaps of manual TMDB web site searches here for nearly two years; both when demonstrating this specific issue, and when helping other Infuse users here troubleshoot their own various scraping mismatch issues.
FWIW, the ‘scraping’ of names is done by Infuse. We filter out a bunch of irrelevant things that may be included in filenames and send the resulting title (with or without a year) to TMDB, and then we get a list of potential matches back.
Important distinction; and I appreciate the correction. In the previous post (and probably several others) I have used the term to refer to the whole process — both the parsing of users’ media files’ filenames to generate search queries for the TMDB API (scraping?), and Infuse’s assignment of metadata to users’ files based on the paring of those filenames with the top search result returned for the provided search terms by TMDB (matching?).
Which word would be best used to refer to the whole process?