Preserve Skim notes

EF does not seem to be able to retain Skim notes. The extended attributes containing them are lost both when importing from the Finder or using the capture button inside Skim.

Thanks for the report. Unfortunately, the OS file copying routine that EagleFiler is using does not preserve extended attributes. I’ll look into having EagleFiler do an extra step to copy them.

I too would like to see some kind of interaction with Skim notes.

Skim has an option to save the notes in a parallel file (.skim) which it will detect when opening the PDF. The advantage of this is that that it does not require extended attributes to be copied when synching EF libraries, which is good for me, since I’m at least as of now using a ditto-based sync program (Duover). But using this option when opening a PDF with Skim as external viewer this violates the Primary Rule of EF – it adds a .skim file within the library without notifying EF that it has done so.

Perhaps in future versions EF will detect when files it wasn’t expecting are within the Files hierarchy and provide an option to either add them to the library or move them out of the library to somewhere. This way, the new .skim files would get detected upon next opening of the library (or perhaps if there is a manual way to scan the library, upon next invocation of that function).

My view of extended attributes at the moment is that they’re kind of precariously attached to files, since a number of widely adopted methods of transferring files don’t transfer these attributes, so .skim files seem like a much safer route. This would introduce a lot of clutter, though, so it might also lead to a request to allow certain files (like *.skim) to be hidden, and a followup request to integrate Skim directly and provide a visual indicator for when a .skim file is present (but hidden) for a PDF.

Perhaps someone could also work on the Skim end, and have it send EF a notification that a .skim file has been added (or modified) when saved. If EF is scriptable in this way (I haven’t investigated) or can be made so scriptable, perhaps that’s another angle to work from, since Skim is open source. This might be better, in fact, for EF to implement the more general things (receive notifications, hide certain file types, indicators when files are hidden) while Skim implements the Skim-specific things (.skim is the file extension, the file has been modified, a file has been added), so that other apps might also be able to join in later.

I agree that extended attributes are currently precarious, but I think they’re the right solution to this problem. Right now, adding a .skim file is harmless—EagleFiler will simply ignore it. But there’s no way for EagleFiler to know that you want to move this file along with the similarly name PDF when you rearrange your library. Automatically adding files to the library would be useful for other reasons, but it doesn’t really address this problem; it only creates the additional one of whether the .skim file should be hidden.

Hmm. I see, of course. So, I guess what would be required is actually some kind of grouping of the .skim file with the PDF file, and traveling down that path would quickly lead to essentially re-implementing extended attributes.

My concern about extended attributes is probably short-sighted. Now that they’re available, and starting to be used for metadata that people want to back up, there will be increasing demand for copying and synching procedures that preserve them. I imagine they will be more widely supported (and used) before too long.

A workaround that should work for the preservation of Skim notes inside an EF library would be to import the file into EF (losing the Skim notes), and then for each file with notes, reveal it in the Finder, and then re-copy the original file on top of the one inside the Library, since Finder copying does preserve extended attributes, and once the import has occurred, EF knows to look for the file in the library. I tested it out, and it does appear that moving a file around inside a library does not not destroy extended attributes. So this should be practical enough if the number of Skim-annotated PDF files is small and if one’s backup/sync programs respect and copy extended attributes.

Incidentally, EF currently doesn’t notice if extended attributes change (they are not part of the checksum), so if I update a PDF in Skim by attaching notes, the resulting file will still pass verification in EF. This also means that importing a second copy of a PDF file into a library, but with different Skim notes attached, would be rejected as a duplicate (even if the EF import preserved extended attributes). I’m not sure what the right behavior is – that might be the right behavior (since “allow duplicates” can be turned on if needed). I imagine this also means that the Skim notes are not at present searchable.

Exactly. I think extended attributes will be more widely supported soon. As I recall, even the Finder didn’t used to preserve them, and now it does.

Yes, that should work.

Right. I think extended attributes should probably not be part of the checksum. However, EagleFiler should notice when they change, so that it can keep its index up-to-date once it knows how to index the Skim notes for searching.

EagleFiler 1.2.4 adds preservation of extended attributes when importing files.

EagleFiler 1.3 can index Skim notes for searching.

I’ve tried it on a PDFD and it works great. However, as notes are not displayed, the found string is not highlighted. Could there be an indication that there was a note that matched (and one has to open the file with skim to see it)?

Thanks for the suggestion. I don’t think there’s a good way to do this at present, but I’ll keep it in mind.

Do you have a suggestion on how to identify pdfs not in an EF library that are duplicates (including identical skim notes) of those already imported?

I’m not aware of any tools that compare Skim notes.

I see that if a library contains a pdf with (without) skim notes, EF blocks importing of a pdf without (with).

Could the following help EF resolve that?

#!/bin/bash
if -d “$1” ]; then
find “$1” -type f -name “*.pdf” -exec “$0” “{}” “;”
elif ! -z xattr "$1" | grep net_sourceforge_skim-app_notes ]; then
echo “$1”
fi

posted by Christiaan as a shell script to call on a file or recursively through directories to find any pdf that has notes.]

No, because EagleFiler does not consider the extended attributes to be part of the file for the purposes of finding duplicates. I suggest saving your annotated PDFs in Skim’s PDFD format.

File Buddy can identify “.skim” notes files (both inside and outside pdfd bundles) that are duplicates (as I wrote to you via pm). I see now that Find Duplicate Files can do this as well. (File Buddy can also identify skim notes files that are unique.)

So, if one runs the above script to identify all pdfs that have notes, one can then use EF’s pdf-to-pdfd script for such files in EF libraries to convert per your recommendation.

For files outside of EF, here’s a bash script (aggregated with Christiaan’s help from other scripts he wrote) that converts all pdf files with skim notes in any folder or subfolder to pdfd format.

#!/bin/bash

Converts pdfs with notes in specified directories and subdirectories to pdfds

if -d “$1” ]; then
find “$1” -type f -name “*.pdf” -exec “$0” “{}” “;”
elif ! -z xattr "$1" | grep net_sourceforge_skim-app_notes ]; then
echo “$1”
SAVEIFS=$IFS
IFS=$(echo -en "
\b")

dir=dirname $1
parentFolder=basename $dir
parentFolderExt=echo $parentFolder|awk -F . '{print $NF}'
if $parentFolderExt != “pdfd” ]; then
[INDENT]/Applications/Skim.app/Contents/SharedSupport/skimnotes convert “$1” && rm “$1”
fi
IFS=$SAVEIFS[/INDENT]
fi

Once all skim-annotated pdf files are in pdfd bundles, then File Buddy or Find Duplicate Files can identify those with identical skim notes and one can selectively delete those outside EF libraries. (To avoid false positives, one should also check whether the corresponding pdf files are marked as duplicate; I’m not sure if there’s a chance of false negatives.)

This might be a help to those, like me, who have hundreds of skim-annotated pdf files among thousands of pdfs, with some duplicates among the annotated files.

[Note: the script does not transfer non-Skim metadata (label, note text, etc.) from the pdf to the pdfd. Someone might want to fold that in. :>]

Here’s a version that also copies label, spotlight comments, and modification date from the pdf to the pdfd

#!/bin/bash

Converts each skim-annotated pdf in specified directories and subdirectories to pdfd

Transfers label, comment, and modification date to pdfd

if -d “$1” ]; then
find “$1” -type f -name “*.pdf” -exec “$0” “{}” “;”
elif ! -z xattr "$1" | grep net_sourceforge_skim-app_notes ]; then
echo “$1”
SAVEIFS=IFS IFS=(echo -en "
\b")

file="1" {file:0:1} == “/” ] || file="{PWD}/{file}"

dir=dirname $1
parentFolder=basename $dir
parentFolderExt=echo $parentFolder|awk -F . '{print $NF}'
if $parentFolderExt != “pdfd” ]; then
[INDENT]/Applications/Skim.app/Contents/SharedSupport/skimnotes convert “$1”

osascript > /dev/null <<-EOF
set thepdf to POSIX file “{file}" set thepdfd to POSIX file "{file}d”
tell application “Finder”
[INDENT]set label index of (file (thepdfd)) to label index of (file (thepdf))
set comment of (file (thepdfd)) to comment of (file (thepdf))
set modification date of (file (thepdfd)) to modification date of (file (thepdf))
end tell
EOF

rm “$1”
[/INDENT]fi

IFS=$SAVEIFS[/INDENT]

fi