[bug#42023] Retry deduplication on ENOENT
diff mbox series

Message ID 87sgbikclv.fsf_-_@cune.org
State New
Headers show
Series
  • [bug#42023] Retry deduplication on ENOENT
Related show

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch fail View Laminar job

Commit Message

Caleb Ristvedt Sept. 15, 2020, 8:29 p.m. UTC
Continued where left off from 42023:

Ludovic Courtès <ludo@gnu.org> writes:

>> From b992a3aaac7e3b30222e0bf1df09093f18e25e6a Mon Sep 17 00:00:00 2001
>> From: Caleb Ristvedt <caleb.ristvedt@cune.org>
>> Date: Sat, 8 Aug 2020 11:25:57 -0500
>> Subject: [PATCH 3/6] deduplication: retry on ENOENT.
>>
>> It's possible for the garbage collector to remove the "canonical" link after
>> it's been detected as existing by 'deduplicate'.  This would cause an ENOENT
>> error when replace-with-link attempts to create the temporary link.  This
>> changes it so that it will properly handle that by retrying.
>
> Would that ENOENT cause an error, or just a missed deduplication
> opportunity?

Depends on how we handle it. Previously the error would be uncaught
entirely; simply skipping deduplication of that file would work, though
it would be suboptimal.

>
>> * guix/store/deduplication.scm (replace-with-link): renamed to
>>   canonicalize-with-link, now also handles the case where the target link
>>   doesn't exist yet, and retries on ENOENT.
>>   (deduplicate): modified to use canonicalize-with-link.
>
> There’s an issue with this patch.  I gave it a spin (offloading a few
> builds) and it got stuck in a infinite loop:
>
> stat("/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk", 0x7ffe43898cd0) = -1 ENOENT (Dosiero aŭ dosierujo ne ekzistas)
> link("/gnu/store/83jy739bn644w3pnwgb5kwjig0kzs92f-libwps-0.4.12/lib/libwps-0.4.so", "/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk") = -1 EEXIST (Dosiero jam ekzistas)
> stat("/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk", 0x7ffe43898cd0) = -1 ENOENT (Dosiero aŭ dosierujo ne ekzistas)
> link("/gnu/store/83jy739bn644w3pnwgb5kwjig0kzs92f-libwps-0.4.12/lib/libwps-0.4.so", "/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk") = -1 EEXIST (Dosiero jam ekzistas)
> stat("/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk", 0x7ffe43898cd0) = -1 ENOENT (Dosiero aŭ dosierujo ne ekzistas)
> link("/gnu/store/83jy739bn644w3pnwgb5kwjig0kzs92f-libwps-0.4.12/lib/libwps-0.4.so", "/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk") = -1 EEXIST (Dosiero jam ekzistas)
>

I believe I can explain this. In 'deduplicate' we currently treat
anything that isn't a directory as a hardlinkable thing. This includes
symlinks (although it's implementation-defined whether symlinks can be
hardlinked to - we use CAN_LINK_SYMLINK to test this in
nix/libstore/optimise-store.cc). This means that at present we
unconditionally attempt to deduplicate symlinks (which happens to work
on linux). However, 'file-exists?' uses stat, not lstat, to check for
file existence. Thus, if there is a dangling symlink, 'file-exists?'
will return #f when passed it, but of course attempting to call link()
to create it will fail with EEXIST. Attached is a modified patch that
tests for file existence with lstat instead. I expect that will fix the
problem.

We should probably still add a test in 'deduplicate' for whether
symlinks can be hardlinked to.

Tangent: I was curious why libwps-0.4.so would be a dangling symlink,
and it turns out that it's actually a relative symlink, so when
accessing it via /gnu/store/...-libwps-0.4.12/lib/libwps-0.4.so it isn't
dangling, but when accessing it via /gnu/store/.links/0k63r... it is.

> I think we should work on reducing the complexity of that code (e.g.,
> there are several layers of system-error handling).

I've since flattened it down into just one layer of catch'es. It adds a
bit of redundancy, but might make it clearer.

- reepca

Comments

Ludovic Courtès Sept. 16, 2020, 8:37 p.m. UTC | #1
Hi!

Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:

[...]

>> There’s an issue with this patch.  I gave it a spin (offloading a few
>> builds) and it got stuck in a infinite loop:
>>
>> stat("/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk", 0x7ffe43898cd0) = -1 ENOENT (Dosiero aŭ dosierujo ne ekzistas)
>> link("/gnu/store/83jy739bn644w3pnwgb5kwjig0kzs92f-libwps-0.4.12/lib/libwps-0.4.so", "/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk") = -1 EEXIST (Dosiero jam ekzistas)
>> stat("/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk", 0x7ffe43898cd0) = -1 ENOENT (Dosiero aŭ dosierujo ne ekzistas)
>> link("/gnu/store/83jy739bn644w3pnwgb5kwjig0kzs92f-libwps-0.4.12/lib/libwps-0.4.so", "/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk") = -1 EEXIST (Dosiero jam ekzistas)
>> stat("/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk", 0x7ffe43898cd0) = -1 ENOENT (Dosiero aŭ dosierujo ne ekzistas)
>> link("/gnu/store/83jy739bn644w3pnwgb5kwjig0kzs92f-libwps-0.4.12/lib/libwps-0.4.so", "/gnu/store/.links/0k63r0n3681r2gqd00blq4z5xd7cw1knv0x049p99f0pq31brhrk") = -1 EEXIST (Dosiero jam ekzistas)
>>
>
> I believe I can explain this. In 'deduplicate' we currently treat
> anything that isn't a directory as a hardlinkable thing. This includes
> symlinks (although it's implementation-defined whether symlinks can be
> hardlinked to - we use CAN_LINK_SYMLINK to test this in
> nix/libstore/optimise-store.cc). This means that at present we
> unconditionally attempt to deduplicate symlinks (which happens to work
> on linux). However, 'file-exists?' uses stat, not lstat, to check for
> file existence. Thus, if there is a dangling symlink, 'file-exists?'
> will return #f when passed it, but of course attempting to call link()
> to create it will fail with EEXIST. Attached is a modified patch that
> tests for file existence with lstat instead. I expect that will fix the
> problem.

Ah ha!

> We should probably still add a test in 'deduplicate' for whether
> symlinks can be hardlinked to.

If GNU/Linux and GNU/Hurd support it, it’s unnecessary.

> Tangent: I was curious why libwps-0.4.so would be a dangling symlink,
> and it turns out that it's actually a relative symlink, so when
> accessing it via /gnu/store/...-libwps-0.4.12/lib/libwps-0.4.so it isn't
> dangling, but when accessing it via /gnu/store/.links/0k63r... it is.

I see, good catch!

> From 12f5848e79b0ede95babebea240264b32e39812c Mon Sep 17 00:00:00 2001
> From: Caleb Ristvedt <caleb.ristvedt@cune.org>
> Date: Sat, 8 Aug 2020 11:25:57 -0500
> Subject: [PATCH] deduplication: retry on ENOENT.
>
> It's possible for the garbage collector to remove the "canonical" link after
> it's been detected as existing by 'deduplicate'.  This would cause an ENOENT
> error when replace-with-link attempts to create the temporary link.  This
> changes it so that it will properly handle that by retrying.
>
> * guix/store/deduplication.scm (replace-with-link): renamed to
>   canonicalize-with-link, now also handles the case where the target link
>   doesn't exist yet, and retries on ENOENT.  Also modified to support
>   canonicalizing symbolic links, though it is the caller's responsibility to
>   ensure that the system supports hardlinking to a symbolic link (on Linux it
>   does).
>   (deduplicate): modified to use canonicalize-with-link.

[...]

> +            (lambda args
> +              (let ((errno (system-error-errno args)))
> +                (cond
> +                 ((= errno ENOENT)
> +                  ;; either SWAP-DIRECTORY has missing directory
> +                  ;; components or TARGET was deleted - this is a
> +                  ;; fundamental ambiguity to the errno produced by
> +                  ;; link()
> +                  (if (file-exists? swap-directory)
> +                      ;; we must assume link failed because target doesn't
> +                      ;; exist, so create it.

Nitpick: Please capitalize sentences, add a period at the end, and write
“'link'” instead of “link()” or “link” for clarity.

Otherwise LGTM.

I think we’ll have to stress-test it through offloading to catch any
remaining issues.

Thank you!

Ludo’.

Patch
diff mbox series

From 12f5848e79b0ede95babebea240264b32e39812c Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 8 Aug 2020 11:25:57 -0500
Subject: [PATCH] deduplication: retry on ENOENT.

It's possible for the garbage collector to remove the "canonical" link after
it's been detected as existing by 'deduplicate'.  This would cause an ENOENT
error when replace-with-link attempts to create the temporary link.  This
changes it so that it will properly handle that by retrying.

* guix/store/deduplication.scm (replace-with-link): renamed to
  canonicalize-with-link, now also handles the case where the target link
  doesn't exist yet, and retries on ENOENT.  Also modified to support
  canonicalizing symbolic links, though it is the caller's responsibility to
  ensure that the system supports hardlinking to a symbolic link (on Linux it
  does).
  (deduplicate): modified to use canonicalize-with-link.
---
 guix/store/deduplication.scm | 125 ++++++++++++++++++++---------------
 1 file changed, 70 insertions(+), 55 deletions(-)

diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm
index 0655ceb890..42116ed47d 100644
--- a/guix/store/deduplication.scm
+++ b/guix/store/deduplication.scm
@@ -115,26 +115,63 @@  store."
 ;; things link to this" (EMLINK), "this link already exists" (EEXIST), and
 ;; "can't fit more stuff in this directory" (ENOSPC).
 
-(define* (replace-with-link target to-replace
-                            #:key (swap-directory (dirname target))
-                            (store (%store-directory)))
+(define* (canonicalize-with-link target to-replace
+                                 #:key (swap-directory (dirname target))
+                                 (store (%store-directory)))
   "Atomically replace the file TO-REPLACE with a link to TARGET.  Use
 SWAP-DIRECTORY as the directory to store temporary hard links.  Upon ENOSPC
 and EMLINK, TO-REPLACE is left unchanged.
 
 Note: TARGET, TO-REPLACE, and SWAP-DIRECTORY must be on the same file system."
-  (define temp-link
+  (define (file-or-symlink-exists? filename)
+    ;; Like 'file-exists?', but doesn't follow symlinks.
     (catch 'system-error
       (lambda ()
-        (get-temp-link target swap-directory))
+        (lstat filename)
+        #t)
       (lambda args
-        ;; We get ENOSPC when we can't fit an additional entry in
-        ;; SWAP-DIRECTORY.  If it's EMLINK, then TARGET has reached its
-        ;; maximum number of links.
-        (if (memv (system-error-errno args) `(,ENOSPC ,EMLINK))
+        (if (= (system-error-errno args) ENOENT)
             #f
             (apply throw args)))))
 
+  (define temp-link
+    (let retry ()
+      (if (file-or-symlink-exists? target)
+          (catch 'system-error
+            (lambda ()
+              (get-temp-link target swap-directory))
+            (lambda args
+              (let ((errno (system-error-errno args)))
+                (cond
+                 ((= errno ENOENT)
+                  ;; either SWAP-DIRECTORY has missing directory
+                  ;; components or TARGET was deleted - this is a
+                  ;; fundamental ambiguity to the errno produced by
+                  ;; link()
+                  (if (file-exists? swap-directory)
+                      ;; we must assume link failed because target doesn't
+                      ;; exist, so create it.
+                      (retry)
+                      (apply throw args)))
+                 ((memv errno `(,ENOSPC ,EMLINK))
+                  ;; We get ENOSPC when we can't fit an additional entry
+                  ;; in SWAP-DIRECTORY.  If it's EMLINK, then TARGET has
+                  ;; reached its maximum number of links.  In both cases,
+                  ;; skip deduplication.
+                  #f)
+                 (else (apply throw args))))))
+          (catch 'system-error
+            (lambda ()
+              (link to-replace target)
+              #f)
+            (lambda args
+              (cond
+               ((= (system-error-errno args) EEXIST)
+                (retry))
+               ((memv (system-error-errno args) `(,ENOSPC ,EMLINK))
+                #f)
+               (else (apply throw args))))))))
+
   ;; If we couldn't create TEMP-LINK, that's OK: just don't do the
   ;; replacement, which means TO-REPLACE won't be deduplicated.
   (when temp-link
@@ -155,49 +192,27 @@  under STORE."
   (define links-directory
     (string-append store "/.links"))
 
-    (mkdir-p links-directory)
-    (let loop ((path path)
-               (type (stat:type (lstat path)))
-               (hash hash))
-      (if (eq? 'directory type)
-          ;; Can't hardlink directories, so hardlink their atoms.
-          (for-each (match-lambda
-                      ((file . properties)
-                       (unless (member file '("." ".."))
-                         (let* ((file (string-append path "/" file))
-                                (type (match (assoc-ref properties 'type)
-                                        ((or 'unknown #f)
-                                         (stat:type (lstat file)))
-                                        (type type))))
-                           (loop file type
-                                 (and (not (eq? 'directory type))
-                                      (nar-sha256 file)))))))
-                    (scandir* path))
-          (let ((link-file (string-append links-directory "/"
-                                          (bytevector->nix-base32-string hash))))
-            (if (file-exists? link-file)
-                (replace-with-link link-file path
-                                   #:swap-directory links-directory
-                                   #:store store)
-                (catch 'system-error
-                  (lambda ()
-                    (link path link-file))
-                  (lambda args
-                    (let ((errno (system-error-errno args)))
-                      (cond ((= errno EEXIST)
-                             ;; Someone else put an entry for PATH in
-                             ;; LINKS-DIRECTORY before we could.  Let's use it.
-                             (replace-with-link path link-file
-                                                #:swap-directory
-                                                links-directory
-                                                #:store store))
-                            ((= errno ENOSPC)
-                             ;; There's not enough room in the directory index for
-                             ;; more entries in .links, but that's fine: we can
-                             ;; just stop.
-                             #f)
-                            ((= errno EMLINK)
-                             ;; PATH has reached the maximum number of links, but
-                             ;; that's OK: we just can't deduplicate it more.
-                             #f)
-                            (else (apply throw args)))))))))))
+  (mkdir-p links-directory)
+  (let loop ((path path)
+             (type (stat:type (lstat path)))
+             (hash hash))
+    (if (eq? 'directory type)
+        ;; Can't hardlink directories, so hardlink their atoms.
+        (for-each (match-lambda
+                    ((file . properties)
+                     (unless (member file '("." ".."))
+                       (let* ((file (string-append path "/" file))
+                              (type (match (assoc-ref properties 'type)
+                                      ((or 'unknown #f)
+                                       (stat:type (lstat file)))
+                                      (type type))))
+                         (loop file type
+                               (and (not (eq? 'directory type))
+                                    (nar-sha256 file)))))))
+                  (scandir* path))
+        (let ((link-file (string-append links-directory "/"
+                                        (bytevector->nix-base32-string
+                                         hash))))
+          (canonicalize-with-link link-file path
+                                  #:swap-directory links-directory
+                                  #:store store)))))
-- 
2.28.0