View Single Post
Posts: 915 | Thanked: 3,209 times | Joined on Jan 2011 @ Germany
#14
Originally Posted by pichlo View Post
OSM's data is an amalgam of those and other sources, including digitizing paper maps which are copyrighted. This is the source of their headache.
Good point!

Originally Posted by pichlo View Post
That will have to be a very clever hash, then, to avoid being fooled by slight modifications.
Not sure, if "hash" is the right term here. If you have some checksum algorithm in mind (md5, sha), then you'll end up with a lot of false negatives and virtually no false positives, because every single switched bit will change the checksum completely. This is obviously not what you want for the detection of copyright infringement.

Think of picture comparison algorithms instead! One I know basically gradually reduces the resolution of two pictures and keeps checking whether the color info of the remaining pixels is identical (or similar).
Taken to the extreme you'll end up with a single pixel for each picture and then you'll check whether their RGB (or HSV or whatever) values match.
With such an algorithm you don't just get a binary true or false result but some fuzzy resemblance factor.
In this case the copyright DB would have to store the equivalent of a thumbnail of the copyrighted picture at the resolution you consider to be your resemblance threshold.
 

The Following 5 Users Say Thank You to sulu For This Useful Post: