Introduction
Location-specific feedback has always been fundamental to collaboration. At Dropbox, we’ve recognized this need and implemented annotations on document previews. Our goal was to allow users to provide focused and clear feedback by drawing rectangles and highlighting text on their documents. We ran into a few main challenges along the way: How do we ensure annotations can be drawn and rendered accurately on any kind of document, with any viewport size, and using any platform? How can we maintain isolation of user documents for security? How can we keep performance smooth and snappy? Below, I’m going to answer these questions and dive a bit deeper into how annotations work at Dropbox.
FileViewer architecture
Before jumping into the annotations library, let’s take a look at our existing file preview architecture. On the web, files are previewed by our FileViewer
module, which behaves differently depending on file type. Images and text files are relatively simple, and can be inserted directly into the DOM. Previewing more complicated files (e.g. PDFs, Microsoft Office files, and Adobe Illustrator files), requires first generating a PDF preview on the back-end and then displaying that preview in an iframe within the FileViewer
.
User documents are incredibly variable and could potentially contain malicious content. Therefore, complicated filetypes with generated previews are shown in an iframe. Since the iframe’s source comes from a different domain, its context doesn’t have direct access to the main site’s DOM, CSS styles, JavaScript functions, cookies, or local storage. Thus, the user-generated content is effectively isolated. Within that iframe, Dropbox uses PDF.js
to display the generated previews. To maintain isolation, PDF.js
knows nothing about the user and has the sole purpose of rendering a PDF at a given URL. Using PDF.js
in an iframe has some additional benefits besides increased security: it keeps the code simple and allows us to benefit from an existing technology with a large user base and continuous upgrades.
While this structure worked very well for vanilla read-only document previews, it provided some substantial challenges when it came time to enhance previews with inline annotations. We now needed to do more than simply view a document, so we had to establish communication between PDF.js
and our FileViewer
. This communication happens with FrameMessenger
, a Dropbox proprietary message-passing module which sends information in JSON. Although annotations must work for arbitrary file types and platforms, the following discussion will use PDF previews on the web as an illustrative example.
Annotations
At Dropbox we use the React JavaScript library for our front end. Annotations have two main React components, which can easily be reused on arbitrary document types: the inline markup itself (the Annotation
) and the corresponding comment bubble (the AnnotationBubble
). The Annotation
is a yellow overlay positioned within the document itself that refers to part of its content. Currently, this could be a text highlight or a rectangle. In the future we may add other types, such as freehand shapes or pointers. The Annotation
is placed and sized based on user mouse events and must react smoothly while being created. Annotations
also must move with the document when it’s scrolled or resized.
The second component is the corresponding AnnotationBubble
, which has comment text contained in a popup “bubble” which floats near the Annotation
. The AnnotationBubble
will contain the original comment, along with any replies and a list of users relevant to the conversation. When a user @mentions someone, our CommentComposer
React component brings the feedback directly to the attention of a recipient via an email and popup notification. This AnnotationBubble
must be visually attached to its Annotation
, but must also connect with other Dropbox components, such as the User
object, the contact list popup, and the comments list side panel.
Integrating annotations with the existing framework
One of the most challenging aspects of designing the architecture for annotations was deciding how to bridge the divide between the disjointed ecosystems of the Dropbox FileViewer and the PDF.js iframe. Where should we get mouse events from? Where should we draw the Annotation and AnnotationBubble? Ideally, the Annotation and AnnotationBubble should move smoothly while the mouse interacts with them or the document scrolls or resizes. Also, AnnotationBubble should be able to float over the edge of the document, but the Annotation should be clipped at the edge. For implementation simplicity, we wanted to limit modifications to the third-party PDF.js and do most of our development in Dropbox’s FileViewer. Finally, we needed to be careful about what data we’re sending to and from the iframe. If we relied on too much data flow, performance could be adversely affected. More importantly, we didn’t want to compromise the security provided by the iframe’s encapsulation by sending sensitive user data across to the document.
Option 1: All components in the PDF.js iframe
One option would have been to customize PDF.js and implement everything within the iframe. The annotation components could be, in every sense, “inside” the document. This means that resizes and scrolls could immediately and seamlessly update the Annotation’s position, no calculations needed. Also, the Annotation would never overflow the bounds of the document, since it would be automatically clipped by the iframe. Although this has huge performance and simplicity benefits, it also has some serious drawbacks:
- The AnnotationBubble would also be clipped by the iframe, which should instead be allowed to be overlaid across the document bounds to maximize valuable viewport space.
- There would be a large cost to implementation simplicity. Since all the annotation components would be inside PDF.js, they would have to be compiled into this block of “vanilla” JavaScript, and would be very hard to maintain as PDF.js develops.
- These components would also be hard or impossible to generalize for non-iframe preview types.
- Finally, we would have to send all of the necessary information for the comment bubble from the FileViewer ecosystem into the iframe, including a user’s information and their contact list. Sending this sensitive user data across to the iframe would break the FileViewer’s security encapsulation. Although we could’ve overcome the other difficulties mentioned above, preserving security was the main reason an all-iframe implementation wasn’t chosen.
Option 2: All components in the parent FileViewer
FileViewer
, in an overlay on “top” of the iframe. Advantages would include aligning the development process more with the rest of the Dropbox website and allowing for easier code reuse between other Dropbox systems and between document types. Also, information passing between the AnnotationBubble
and FileViewer
would be trivial and would have no security implications. However, with this approach it becomes very hard to make the Annotation
look like a part of the document. Instead of having to transmit bulky user information in JSON via the FrameMessenger
as before, we’d have to send streams of fast-moving mouse, scroll, and resize events. The time required for this cross-document communication, along with translation between coordinate systems and manual repaints of the Annotation
would cause the Annotation
to perceptibly lag behind a user’s mouse or the document’s scroll. Annotations
could also flow outside the document’s edges, and the illusion that the Annotation
was attached to the document would be impossible to maintain.
Option 3: Hybrid solution
We found that a compromise between these two options was the best solution, both for code quality and performance. The code for the Annotation
is integrated into PDF.js
so that relevant mouse events are captured and used right away. Since the Annotation
is inside the iframe and attached to the document as a child div
, it moves smoothly along with the document when it’s scrolled or resized. The Annotation
is also automatically clipped when it overflows the iframe. The AnnotationBubble
, however, is in the parent FileViewer
, and benefits greatly from direct access to other Dropbox components and data. It also can easily overflow the iframe window, allowing for a better use of viewport space. However, since its position needs to follow the Annotation
in the iframe, any movements of the Annotation
are sent up through the FrameMessenger
and then translated to the viewport’s coordinates. This does introduce a delay in the AnnotationBubble
’s movements, which we mitigate by hiding it when its Annotation
is moving. There is also some necessary algorithmic complexity involved in translating positions between the iframe and FileViewer
, which we describe in the appendix at the bottom of the post. (In fact, every different type of preview has its own interface for accepting and translating movement events sent from the Annotation
to the AnnotationBubble
.)
This table summarizes the three options above:
Option | Pros | Cons |
---|---|---|
|
|
|
|
|
|
|
|
|
Example action flowing through whole system
The following example shows how we isolate the preview and how we deal with communication across the iframe. In this scenario, the user has decided to place an annotation on a PDF and has already begun drawing a rectangle by clicking and dragging her mouse across a part of the screen. Now, the user releases the mouse, starting a flurry of events, summarized in the diagram below the animation.
iframe events
- PDF.js contains the actual document preview and has event listeners set up on the browser’s window object. It receives the mouseup and informs PdfJsAnnotationInterface.
- PdfJsAnnotationInterface does all of the document type-specific communication between the preview, the more general AnnotationController, and the FileViewer.
- From here, the event gets passed to the AnnotationController, which determines which Annotation the event gets passed to next (this could be either a new Annotation or one that’s currently being drawn/edited). In this case, AnnotationController knows we’ve previously been dragging the mouse to create a region, so it calls the AnnotationRegion’s onMouseUp callback.
The following is a simplified version of the coffeescript code in AnnotationRegion’s onMouseUp callback (the code path specific to this example is bold):
AnnotationRegion = React.createClass(
...
# If dragging/resizing, we can stop now.
# Otherwise, the click happened elsewhere and we just hide the rectangle
onMouseUp: (event) ->
if @_isModifying() # the mouse was just interacting with the region
# Update the annotation
@_updateAnnotationFromState() # updates @annotation dict based on state
# Call "Annotation Placed" or "End Drag" depending on whether or not
# we were just creating the region
if @state.isInitialCreation
@props.onAnnotationPlaced?(@annotation) # back to AnnotationController
else
@props.onAnnotationEndDrag?(@annotation)
# Disable creation mode
@setState {
isInitialCreation: false
}
# the mouse was not interacting with the region,
# so a click outside it tells it to hide.
else
@hideAnnotation(event)
...
)
This is an example of the actual JSON that gets sent across the iframe boundary:
payload: {
action: "annotation-placed"
parameters: {
pdf_coordinates: [ // original PDF location information
page: 1
page_size: {
height: 790
width: 610
}
coordinates: [
x: 250.0, y: 540.0
x: 250.0, y: 360.0
x: 410.0, y: 360.0
x: 410.0, y: 540.0
]
]
type: 2 // 2 = region
// text_highlight would contain the selected text for a highlight
text_highlight: null
viewport_coordinates: [ // translated viewport pixels
x: 530, y: 510
x: 530, y: 870
x: 830, y: 870
x: 830, y: 510
]
}
}
FileViewer events
return Reflux.createStore({
...
onStartAnnotationCreation: ({annotation}) ->
@setState({
# createAnnotationBubble contains information for creating
# the AnnotationBubble for a new Annotation
createAnnotationBubble: {
annotation: annotation
showBubble: true
}
})
...
})
10. The new AnnotationBubble is created in FilePreviewOverlay, which listens for updates in the Store. When Store.createAnnotationBubble changes, FilePreviewOverlay receives this update. As a result, it positions and creates a new AnnotationBubble.
11. The user then types a comment into the AnnotationBubble and hits “Post”.
12. This triggers an event in ActionCreators.addAnnotation. While the Store contains global state in the Flux paradigm, it is ActionCreators that handles global actions, including side effects and I/O. ActionCreators.addAnnotation actually saves the annotation and comment to Dropbox’s back-end data centers.
13. If the save is successful, ActionCreators updates the Store, clearing Store.createAnnotationBubble.
14. Like before, FilePreviewOverlay hears this update and hides the AnnotationBubble.
Conclusion
The information cascade in the above example was started by a single user mouse action, and multitudes of other events are fired continuously as the user interacts with the preview. Events also go in the reverse direction, as FileViewer needs to inform the iframe of higher-level actions such as the user turning commenting on or off. To make the annotations system react smoothly and sensibly to all of this input, we needed to bridge the gap between our intentionally isolated document preview and the broader Dropbox environment. As explained above, we kept the purely visual Annotation simple and attached it directly to the document to maximize its performance. The information-heavy AnnotationBubble was kept outside and a flexible interface was made to connect them. This separation of components and use of interfaces made it easy to gracefully extend this implementation for image files, and will make annotations possible on many more file types in the future.
Try out annotations on a sample file today!
Appendix: Coordinate translation between iframe and FileViewer
For PDFs, the decision outlined above to split annotations between the iframe and FileViewer meant that coordinates would have to be translated between two different systems: PDF points and viewport pixels.
On PDFs, positions are expressed in relation to a physical printed document. Each position is measured from the bottom left corner of a page and expressed in “points” (one of which equals 1/72 of an inch on a printed page). Conversely, positions in the viewport are measured from the top left of the viewer’s viewport and expressed in pixels. When translating from PDF points in the iframe to pixels in the viewport, the current page and scroll position of the document both need to be taken into account to calculate an offset. Also, the vertical component of the point needs to be reversed. Finally, the zoom level of the document is used to determine the multiplier required to complete the translation to viewport pixels.
All of this translation is required every time the Annotation
moves, whether the movement is caused by the user drawing the Annotation
, scrolling/resizing the document, etc. This position information is sent as a stream of information from the iframe to the FileViewer
. Information is also passed in the other direction, from the FileViewer
to the iframe. For example, a message is passed down when a user changes the visibility of all comments on a document, or when the user interacts with the AnnotationBubble
to post or delete a specific comment. Fortunately, all these simple messages are fast to transmit, resulting in no performance issues.