feat(server): near-duplicate detection (#8228)

* duplicate detection job, entity, config

* queueing

* job panel, update api

* use embedding in db instead of fetching

* disable concurrency

* only queue visible assets

* handle multiple duplicateIds

* update concurrent queue check

* add provider

* add web placeholder, server endpoint, migration, various fixes

* update sql

* select embedding by default

* rename variable

* simplify

* remove separate entity, handle re-running with different threshold, set default back to 0.02

* fix tests

* add tests

* add index to entity

* formatting

* update asset mock

* fix `upsertJobStatus` signature

* update sql

* formatting

* default to 0.03

* optimize clustering

* use asset's `duplicateId` if present

* update sql

* update tests

* expose admin setting

* refactor

* formatting

* skip if ml is disabled

* debug trash e2e

* remove from web

* remove from sidebar

* test if ml is disabled

* update sql

* separate duplicate detection from clip in config, disable by default for now

* fix doc

* lower minimum `maxDistance`

* update api

* Add and Use Duplicate Detection Feature Flag (#9364)

* Add Duplicate Detection Flag

* Use Duplicate Detection Flag

* Attempt Fixes for Failing Checks

* lower minimum `maxDistance`

* fix tests

---------

Co-authored-by: mertalev <101130780+mertalev@users.noreply.github.com>

* chore: fixes and additions after rebase

* chore: update api (remove new Role enum)

* fix: left join smart search so getAll works without machine learning

* test: trash e2e go back to checking length of assets is zero

* chore: regen api after rebase

* test: fix tests after rebase

* redundant join

---------

Co-authored-by: Nicholas Flamy <30300649+NicholasFlamy@users.noreply.github.com>
Co-authored-by: Zack Pollard <zackpollard@ymail.com>
Co-authored-by: Zack Pollard <zack@futo.org>
This commit is contained in:
Mert 2024-05-16 13:08:37 -04:00 committed by GitHub
parent 673e97e71d
commit 64636c0618
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
61 changed files with 1254 additions and 61 deletions

View file

@ -49,6 +49,7 @@ FROM
"SharedLinkEntity__SharedLinkEntity_assets"."originalFileName" AS "SharedLinkEntity__SharedLinkEntity_assets_originalFileName",
"SharedLinkEntity__SharedLinkEntity_assets"."sidecarPath" AS "SharedLinkEntity__SharedLinkEntity_assets_sidecarPath",
"SharedLinkEntity__SharedLinkEntity_assets"."stackId" AS "SharedLinkEntity__SharedLinkEntity_assets_stackId",
"SharedLinkEntity__SharedLinkEntity_assets"."duplicateId" AS "SharedLinkEntity__SharedLinkEntity_assets_duplicateId",
"9b1d35b344d838023994a3233afd6ffe098be6d8"."assetId" AS "9b1d35b344d838023994a3233afd6ffe098be6d8_assetId",
"9b1d35b344d838023994a3233afd6ffe098be6d8"."description" AS "9b1d35b344d838023994a3233afd6ffe098be6d8_description",
"9b1d35b344d838023994a3233afd6ffe098be6d8"."exifImageWidth" AS "9b1d35b344d838023994a3233afd6ffe098be6d8_exifImageWidth",
@ -115,6 +116,7 @@ FROM
"4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6"."originalFileName" AS "4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6_originalFileName",
"4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6"."sidecarPath" AS "4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6_sidecarPath",
"4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6"."stackId" AS "4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6_stackId",
"4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6"."duplicateId" AS "4a35f463ae8c5544ede95c4b6d9ce8c686b6bfe6_duplicateId",
"d9f2f4dd8920bad1d6907cdb1d699732daff3c2f"."assetId" AS "d9f2f4dd8920bad1d6907cdb1d699732daff3c2f_assetId",
"d9f2f4dd8920bad1d6907cdb1d699732daff3c2f"."description" AS "d9f2f4dd8920bad1d6907cdb1d699732daff3c2f_description",
"d9f2f4dd8920bad1d6907cdb1d699732daff3c2f"."exifImageWidth" AS "d9f2f4dd8920bad1d6907cdb1d699732daff3c2f_exifImageWidth",
@ -237,6 +239,7 @@ SELECT
"SharedLinkEntity__SharedLinkEntity_assets"."originalFileName" AS "SharedLinkEntity__SharedLinkEntity_assets_originalFileName",
"SharedLinkEntity__SharedLinkEntity_assets"."sidecarPath" AS "SharedLinkEntity__SharedLinkEntity_assets_sidecarPath",
"SharedLinkEntity__SharedLinkEntity_assets"."stackId" AS "SharedLinkEntity__SharedLinkEntity_assets_stackId",
"SharedLinkEntity__SharedLinkEntity_assets"."duplicateId" AS "SharedLinkEntity__SharedLinkEntity_assets_duplicateId",
"SharedLinkEntity__SharedLinkEntity_album"."id" AS "SharedLinkEntity__SharedLinkEntity_album_id",
"SharedLinkEntity__SharedLinkEntity_album"."ownerId" AS "SharedLinkEntity__SharedLinkEntity_album_ownerId",
"SharedLinkEntity__SharedLinkEntity_album"."albumName" AS "SharedLinkEntity__SharedLinkEntity_album_albumName",