Handling `AKA` in filenames for metadata matching

There is a common filename format that I think can be handled properly in Infuse.

Parasite AKA Gisaengchung 2019 some-encoding-tags.mkv

Perdita Durango AKA Dance with the Devil 1997 some-encoding-tags.mkv

Notice different capitalizations of aka.

Chi bi Part II Jue zhan tian xia aka Red Cliff II 2009 International Version.mkv

El hombre de las mil caras aka The Man with Thousand Faces 2016 some-encoding-tags.mkv

Notice there can be multiple AKAs in a single filename.

Black Sunday AKA The Mask of Satan AKA La maschera del demonio 1960 some-encoding-tags.mkv

AKA is often used for alternate names or bilingual names of a movie/TV. This naming convention is widely used.

If Infuse use regular expression to find AKA(s) and discard everything before the last AKA and use everything between the last AKA and the year number to search on TMDB, it would work as if it’s not a bilingual filename.

Yo, what about my point that this would break searches for films and TV shows that include “AKA” or “aka” in their proper titles?

I forgot about that.

I guess, the optimal engineering solution would be to run both cases separately, and merge the results. The results should be ranked based on how close the match is.

1 Like

Hmm. So, thinking this through …

Infuse currently identifies content by scraping their filenames and pulling from them a list of search terms to submit to TMDB and thereafter accepting the first (i.e. highest ranked by TMDB) result returned.

For your suggestion to be implemented, Infuse would first need to check every searched filename for the isolated phrase “aka” (non-case sensitive); and when found, process the scraping differently from all others, by running both a traditional search (as described above) and your case-specific alternate search (this time discarding all search terms detected prior to and including the last instance of “aka” in the filename), separately storing the TMDB-returned results for each search — likely requiring Firecore to code a new variable to hold the second search result alongside the first for use in the next step.

Firecore would then need to include additional logic into Infuse allowing it to choose the best of the two different matches returned by TMDB (while referring back to the filename previously scraped — which might by this point under normal circumstances have already been dropped from memory — before sending the best match on to become the identification of the content.

Laid out that way, I don’t see why the above couldn’t technically be a workable way around the issue I earlier proposed. It could feasibly take care of the “included aka” cases; but I still don’t see a way to easily handle the cases of bilingual / trilingual torrent filenames which don’t include distinct separators (like “aka”) which you originally posted about — so this solution still would only get you part of the way there.

How much time would scanning all filenames for an isolated “aka” add to each search? I don’t have a clue. Perhaps insignificant. Perhaps not. And I don’t imagine the hit rate would be very high, across the vast majority of users’ collections. But like everything, I suppose integration of this capability would come down to the results of Firecore’s judging the efforts required to implement a solution vs. their expectation of return on that investment. Not knowing the relative difficulty of the former (there’s perhaps a similar way to attack the issue), nor the accuracy of my estimation of users who would benefit, I couldn’t say.