Skip to content

Hashing

Unstable API

0.8.0

@project-lakechain/hashing-image-processor

TypeScript

The hashing image processor makes it possible to enrich the metadata of images with hash values associated with the visual representation of an image. This middleware supports different hashing algorithms, including average hashing, perceptual hashing, difference hashing, wavelet hashing, and color hashing.

Thoses hashing algorithm can be used to compare how different images are from a visual standpoint. They provide a more computationally efficient way to compare images, compared to vector embeddings which also take into account the semantic aspect of an image.


An example using average hashing.
Credits Branislav Rodman on Unsplash




#️⃣ Computing Hashes

To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline.

import { HashingImageProcessor } from '@project-lakechain/hashing-image-processor';
import { CacheStorage } from '@project-lakechain/core';
class Stack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string) {
const cache = new CacheStorage(this, 'Cache');
// Computes the different image hashes based on all supported algorithms.
const hashing = new HashingImageProcessor.Builder()
.withScope(this)
.withIdentifier('HashingImageProcessor')
.withCacheStorage(cache)
.withSource(source) // 👈 Specify a data source
.build();
}
}


Selecting Algorithms

You can explicitely select which hashing algorithm to enable or not when enriching the document metadata with the different types of image hashes.

💁 By default, all hashing algorithms are enabled.

import { HashingImageProcessor } from '@project-lakechain/hashing-image-processor';
const hashing = new HashingImageProcessor.Builder()
.withScope(this)
.withIdentifier('LaplacianProcessor')
.withCacheStorage(cache)
.withSource(source)
// Optionally specify which algorithms to use.
.withAverageHashing(true)
.withPerceptualHashing(true)
.withDifferenceHashing(false)
.withWaveletHashing(false)
.withColorHashing(false)
.build();


📄 Output

The Hashing image processor does not modify or alter source images in any way. It instead enriches the metadata of processed documents by setting the hash values associated with each of the enabled hashing algorithms.

💁 Click to expand example

ℹ️ Below is an example of a CloudEvent emitted by the Hashing image processor.

{
"specversion": "1.0",
"id": "1780d5de-fd6f-4530-98d7-82ebee85ea39",
"type": "document-created",
"time": "2023-10-22T13:19:10.657Z",
"data": {
"chainId": "6ebf76e4-f70c-440c-98f9-3e3e7eb34c79",
"source": {
"url": "s3://bucket/image.png",
"type": "image/png",
"size": 245328,
"etag": "1243cbd6cf145453c8b5519a2ada4779"
},
"document": {
"url": "s3://bucket/image.png",
"type": "image/png",
"size": 245328,
"etag": "1243cbd6cf145453c8b5519a2ada4779"
},
"metadata": {
"properties": {
"kind": "image",
"attrs": {
"hashes": {
"average": "00077ffbf2fefee0",
"perceptual": "f53a175d6848d9c4",
"difference": "1c4ccea3269084c8",
"wavelet": "000707d1f2fefee0",
"color": "06e00000040"
}
}
}
},
"callStack": []
}
}


🏗️ Architecture

This middleware runs within a Lambda compute, and packages the imagehash to compute the Laplacian variance of images.

Architecture



🏷️ Properties


Supported Inputs
Mime TypeDescription
image/jpegJPEG image
image/pngPNG image
image/bmpBMP image
image/webpWebP image
Supported Outputs
Mime TypeDescription
image/jpegJPEG image
image/pngPNG image
image/bmpBMP image
image/webpWebP image
Supported Compute Types
TypeDescription
CPUThis middleware only supports CPU compute.


📖 Examples