feat(server): separate face clustering job (#5598)

* separate facial clustering job * update api * fixed some tests * invert clustering * hdbscan * update api * remove commented code * wip dbscan * cleanup removed cluster endpoint remove commented code * fixes updated tests minor fixes and formatting fixed queuing refinements * scale search range based on library size * defer non-core faces * optimizations removed unused query option * assign faces individually for correctness fixed unit tests remove unused method * don't select face embedding update sql linting fixed ml typing * updated job mock * paginate people query * select face embeddings because typeorm * fix setting face detection concurrency * update sql formatting linting * simplify logic remove unused imports * more specific delete signature * more accurate typing for face stubs * add migration formatting * chore: better typing * don't select embedding by default remove unused import * updated sql * use normal try/catch * stricter concurrency typing and enforcement * update api * update job concurrency panel to show disabled queues formatting * check jobId in queueAll fix tests * remove outdated comment * better facial recognition icon * wording wording formatting * fixed tests * fix * formatting & sql * try to fix sql check * more detailed description * update sql * formatting * wording * update `minFaces` description --------- Co-authored-by: Jason Rasmussen <jrasm91@gmail.com> Co-authored-by: Alex Tran <alex.tran1502@gmail.com>
2025-11-14 17:36:12 +00:00 · 2024-01-18 00:08:48 -05:00 · 2024-01-18 00:08:48 -05:00 · 68f52818ae
commit 68f52818ae
parent 44873b4224
57 changed files with 1081 additions and 631 deletions
--- a/docs/docs/FAQ.mdx
+++ b/docs/docs/FAQ.mdx
@ -231,12 +231,12 @@ Immich optionally uses machine learning for several features. However, it can be

 ### Can I lower CPU and RAM usage?

-The initial backup is the most intensive due to the number of jobs running. The most CPU-intensive ones are transcoding and machine learning jobs (Tag Images, Smart Search, Recognize Faces), and to a lesser extent thumbnail generation. Here are some ways to lower their CPU usage:
+The initial backup is the most intensive due to the number of jobs running. The most CPU-intensive ones are transcoding and machine learning jobs (Smart Search, Face Detection), and to a lesser extent thumbnail generation. Here are some ways to lower their CPU usage:

 - Lower the job concurrency for these jobs to 1.
 - Under Settings > Transcoding Settings > Threads, set the number of threads to a low number like 1 or 2.
 - Under Settings > Machine Learning Settings > Facial Recognition > Model Name, you can change the facial recognition model to `buffalo_s` instead of `buffalo_l`. The former is a smaller and faster model, albeit not as good.
-  - You _must_ re-run the Recognize Faces job for all images after this for facial recognition on new images to work properly.
+  - You _must_ re-run the Face Detection job for all images after this for facial recognition on new images to work properly.
 - If these changes are not enough, see [below](/docs/FAQ#how-can-i-disable-machine-learning) for how you can disable machine learning.

 ### Can I limit the amount of CPU and RAM usage?
@ -247,10 +247,10 @@ You can look at the [original docker docs](https://docs.docker.com/config/contai
 ### How an I boost machine learning speed?

 :::note
-This advice increases throughput, not latency. This is to say that it will make Smart Search jobs process more quickly, but it won't make searching faster.
+This advice improves throughput, not latency. This is to say that it will make Smart Search jobs process more quickly, but it won't make searching faster.
 :::

-You can increase throughput by increasing the job concurrency for machine learning jobs (Smart Search, Recognize Faces). With higher concurrency, the host will work on more assets in parallel. You can do this by navigating to Administration > Settings > Job Settings and increasing concurrency as needed.
+You can increase throughput by increasing the job concurrency for machine learning jobs (Smart Search, Face Detection). With higher concurrency, the host will work on more assets in parallel. You can do this by navigating to Administration > Settings > Job Settings and increasing concurrency as needed.

 :::danger
 On a normal machine, 2 or 3 concurrent jobs can probably max the CPU, so if you're not hitting those maximums with, say, 30 jobs.
--- a/docs/docs/install/config-file.md
+++ b/docs/docs/install/config-file.md
@ -79,7 +79,7 @@ The default configuration looks like this:
      "modelName": "buffalo_l",
      "minScore": 0.7,
      "maxDistance": 0.6,
-      "minFaces": 1
+      "minFaces": 3
    }
  },
  "map": {