This is actually reasonably easy. You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win.
#!python
import os, glob
from PIL import Image
def get_skin_ratio(im):
im = im.crop((int(im.size[0]*0.2), int(im.size[1]*0.2), im.size[0]-int(im.size[0]*0.2), im.size[1]-int(im.size[1]*0.2)))
skin = sum([count for count, rgb in im.getcolors(im.size[0]*im.size[1]) if rgb[0]>60 and rgb[1]<(rgb[0]*0.85) and rgb[2]<(rgb[0]*0.7) and rgb[1]>(rgb[0]*0.4) and rgb[2]>(rgb[0]*0.2)])
return float(skin)/float(im.size[0]*im.size[1])
for image_dir in ('porn','clean'):
for image_file in glob.glob(os.path.join(image_dir,"*.jpg")):
skin_percent = get_skin_ratio(Image.open(image_file)) * 100
if skin_percent>30:
print "PORN {0} has {1:.0f}% skin".format(image_file, skin_percent)
else:
print "CLEAN {0} has {1:.0f}% skin".format(image_file, skin_percent)
This code measures skin tones in the center of the image. I've tested on 20 relatively tame "porn" images and 20 completely innocent images. It flags 100% of the "porn" and 4 out of the 20 of the clean images. That's a pretty high false positive rate but the script aims to be fairly cautious and could be further tuned. It works on light, dark and Asian skin tones.
It's main weaknesses with false positives are brown objects like sand and wood and of course it doesn't know the difference between "naughty" and "nice" flesh (like face shots).
Weakness with false negatives would be images without much exposed flesh (like leather bondage), painted or tattooed skin, B&W images, etc.
source code and sample images
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…