How many faces are needed to make an accurate facial average of a population?
Plan: Pre-experimentation write-up
I want to determine the minimum number of faces needed to create an accurate facial average or composite face of a population.
My plan is to gather 100 male faces from one ethnic group and create a composite of all 100 faces. Then, I will gradually scale down the number of faces in the composite to identify the point at which the smaller-sized composites start deviating significantly from the facial average. Each cohort will be randomized. For example, the first level (level = 100 face, 33 face, 20 face, etc.) will likely consist of 33 faces per composite, divided into three cohorts. Faces will not appear in multiple cohorts at the same level. For example, I will randomize the 100 faces into three groups, and each group of 33 faces will form its own facial average.
I plan to be starting at 33 because, during a large project where I had created over 200 facial averages, I typically used 25 to 35 faces per composite. What I found was that, in many cases, even when I combined only 15 faces into a composite, the result closely resembled the composite made from 28 faces. Based on this, I don’t believe it’s necessary to begin with as many as 50 faces.
After starting with 33, I will then test composites made from 25 faces, followed by 17, 12, and finally 9. If, for instance, the composites made from 17 faces remain accurate but those made from 12 begin to deviate too much, I will not proceed to 9 faces. Instead, I might test an intermediate size, like 14 faces.
The rationale behind this experiment is simple: creating a large number of facial averages can be very time-consuming. If I can achieve accurate results using just 15 faces instead of 25, it would significantly save time.
Results: picture
Results: post-experimentation write-up
I ended up using Tajik men.
From my perspective, using 17 faces to create a composite will reliably produce a result that looks similar enough to the 100-face composite. When we get down to 14, no doubt a few of them still look very similar to the 100-face composite, but in my opinion, a couple of them start to stray a little too far. I don’t mean far enough that they couldn’t pass as brothers or even fraternal twins, but some of them depict more so-called extreme or peripheral phenotypes for the population.
Of course, the utility depends on one’s goal or what they want to show. If the goal is to truly represent the facial average, then the more faces used, the better. 33 is probably the minimum number to use. However, if the aim is to provide a survey of what people look like globally, using 17 or even 15 faces is probably a better option. This would allow you to process more composites, cover ground more quickly, and not really lose much in terms of accuracy. The result would still look like a fairly average person from that place.
Another opinion I have is that if someone were going to create a facial average using 35 faces, I think it would be more interesting to produce two composites of 17 faces each rather than one composite of all 35 faces. Perhaps even manually sort them into two groups to represent a spectrum of average.
A.I. facial similarity analysis
I did try plugging the composites into some AIs that measure similarity between two faces, but I was getting very high readings. For example, one tool called facecomparison.toolpie.com gave a 94% similarity for the face that, to me, looked the least like the 100-face composite. According to that website's metric, anything that’s 80% or higher is determined to be the same person.
I suspect the similarity readings were so high because of one of the techniques I used to improve the resolution and clarity of the facial composites. Specifically, I adjusted the resolution of features like the nostrils, eyes, and eyebrows by moving the features on each individual face to the average position. However, I only did this once—on the 100-face composite—and then used those same cut-and-aligned faces for the rest of the composites. I assume these ratings are artificially high because the nostrils are in the same position, the eye spacing is identical, the eyebrows are at the same height, and the distance between the nostrils and lips is the same across all composites.
It takes a lot of work and time to cut each face and realign the features, which is why I didn’t do it for every single facial composite. It would have taken significantly longer. My intent was simply to see how it looked from my subjective perspective, as I hadn’t initially planned on using AI similarity detectors. I only used the detectors at the end to see if there was a way to quantify these results. However, I don’t think that data is useful for my purposes because of the alignment technique I mentioned.
94% was indeed the lowest percentage of all the composites, so me and AI were on the same page, but clearly they did not look like the same person. So the AI is useful for quantifying relative comparisons, but the absolute numbers it gives, like 94% or 97% similar, seem to be inflated. Perhaps just using the last number and taking the initial 9 off would be more helpful, ie: 94% = 4/10 similar, 97% = 7/10 similar.
Comments
Post a Comment