feat(server): separate face clustering job (#5598)

* separate facial clustering job

* update api

* fixed some tests

* invert clustering

* hdbscan

* update api

* remove commented code

* wip dbscan

* cleanup

removed cluster endpoint

remove commented code

* fixes

updated tests

minor fixes and formatting

fixed queuing

refinements

* scale search range based on library size

* defer non-core faces

* optimizations

removed unused query option

* assign faces individually for correctness

fixed unit tests

remove unused method

* don't select face embedding

update sql

linting

fixed ml typing

* updated job mock

* paginate people query

* select face embeddings because typeorm

* fix setting face detection concurrency

* update sql

formatting

linting

* simplify logic

remove unused imports

* more specific delete signature

* more accurate typing for face stubs

* add migration

formatting

* chore: better typing

* don't select embedding by default

remove unused import

* updated sql

* use normal try/catch

* stricter concurrency typing and enforcement

* update api

* update job concurrency panel to show disabled queues

formatting

* check jobId in queueAll

fix tests

* remove outdated comment

* better facial recognition icon

* wording

wording

formatting

* fixed tests

* fix

* formatting & sql

* try to fix sql check

* more detailed description

* update sql

* formatting

* wording

* update `minFaces` description

---------

Co-authored-by: Jason Rasmussen <jrasm91@gmail.com>
Co-authored-by: Alex Tran <alex.tran1502@gmail.com>
This commit is contained in:
Mert 2024-01-18 00:08:48 -05:00 committed by GitHub
parent 44873b4224
commit 68f52818ae
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
57 changed files with 1081 additions and 631 deletions

View file

@ -231,12 +231,12 @@ Immich optionally uses machine learning for several features. However, it can be
### Can I lower CPU and RAM usage?
The initial backup is the most intensive due to the number of jobs running. The most CPU-intensive ones are transcoding and machine learning jobs (Tag Images, Smart Search, Recognize Faces), and to a lesser extent thumbnail generation. Here are some ways to lower their CPU usage:
The initial backup is the most intensive due to the number of jobs running. The most CPU-intensive ones are transcoding and machine learning jobs (Smart Search, Face Detection), and to a lesser extent thumbnail generation. Here are some ways to lower their CPU usage:
- Lower the job concurrency for these jobs to 1.
- Under Settings > Transcoding Settings > Threads, set the number of threads to a low number like 1 or 2.
- Under Settings > Machine Learning Settings > Facial Recognition > Model Name, you can change the facial recognition model to `buffalo_s` instead of `buffalo_l`. The former is a smaller and faster model, albeit not as good.
- You _must_ re-run the Recognize Faces job for all images after this for facial recognition on new images to work properly.
- You _must_ re-run the Face Detection job for all images after this for facial recognition on new images to work properly.
- If these changes are not enough, see [below](/docs/FAQ#how-can-i-disable-machine-learning) for how you can disable machine learning.
### Can I limit the amount of CPU and RAM usage?
@ -247,10 +247,10 @@ You can look at the [original docker docs](https://docs.docker.com/config/contai
### How an I boost machine learning speed?
:::note
This advice increases throughput, not latency. This is to say that it will make Smart Search jobs process more quickly, but it won't make searching faster.
This advice improves throughput, not latency. This is to say that it will make Smart Search jobs process more quickly, but it won't make searching faster.
:::
You can increase throughput by increasing the job concurrency for machine learning jobs (Smart Search, Recognize Faces). With higher concurrency, the host will work on more assets in parallel. You can do this by navigating to Administration > Settings > Job Settings and increasing concurrency as needed.
You can increase throughput by increasing the job concurrency for machine learning jobs (Smart Search, Face Detection). With higher concurrency, the host will work on more assets in parallel. You can do this by navigating to Administration > Settings > Job Settings and increasing concurrency as needed.
:::danger
On a normal machine, 2 or 3 concurrent jobs can probably max the CPU, so if you're not hitting those maximums with, say, 30 jobs.

View file

@ -79,7 +79,7 @@ The default configuration looks like this:
"modelName": "buffalo_l",
"minScore": 0.7,
"maxDistance": 0.6,
"minFaces": 1
"minFaces": 3
}
},
"map": {