Hack Links

Proposal by John Walker
July 12th, 1995

Introduction

One essential part of Ted Nelson's original concept of Xanadu was that links between documents be bi-directional--when a link was made to a document, the linked-to text would become a link back to the document that referenced it. This was believed to be an essential component of an open hypertext system intended for discussion of complex issues, and a great improvement over current forms of scholarly publication.

With the advent of the World-Wide Web, links have come into the mainstream of writing and publishing, but these links are unidirectional--there's no way to know if a link has been made to your document, and no way to attach comments of your own to documents you read on the Web. In the Web, as it exists today, we have forward links but no back links.

Hack Links is a crude mechanism that provides a limited back link capability for Web documents. Despite its many shortcomings, it may prove useful to demonstrate the utility of back links and obtain practical experience in their use which can guide the evolution of a more practical and comprehensive facility for eventual inclusion in the HTML standard with Web client and server support.

Design Constraints

Many of the limitations in Hack Links arise from the need to satisfy the following design constraints. The reason I decided to impose each is given.

No browser changes
Hack Links requires no support at the browser level, and will work with even the simplest text-oriented browsers. Requiring a special browser would drastically reduce the audience who could access back links and would exclude users of machines on which the modified browser was not implemented. Browser support would help enormously, especially in making the creation of back links essentially painless; perhaps the existence of Hack Links will motivate one or more browser developers to prototype a friendly interface to it.

No server changes
Any site which wishes to experiment with back links can do so without replacing or modifying their existing HTTP dæmon. Again, few sites would take the risk or go to the bother of changing their server just to experiment with back links.

No HTML extensions
This falls out of the browser and server constraints.

Local installation not required to make back links in remote documents
Users can make back links in documents on machines with Hack Links installed without having Hack Links installed on their own local machine. There are many nice things we could do if this constraint were relaxed, but it would block access to users of commercial Internet Service Providers who might be reluctant to install Hack Links on their public servers.

Back links enabled on a document-by-document basis
Many Web site operators would be rightly concerned if not horrified at the prospect of anybody on the Web being able to place links in documents they publish. Xanadu envisioned extensive link filtering mechanisms not present in Hack Links. If my site contains, for example, stories for children, I'd be worried about some malicious soul making links in them to material to which I don't believe children should be exposed. To encourage experimentation with back links, I've required back links to be explicitly enabled by the creator of a document, based on the audience to which it is addressed and whatever disclaimers it may contain regarding the fact that any Web user can create links in the document.

No modifications to documents required
Back links can be enabled in any HTML document without requiring it to be modified. Again, I want to reduce the barrier potential to encourage experimentation with back links.

Assume a Unix server machine
Since the overwhelming majority of Web server machines are Unix based, we'll indulge ourselves in assuming that environment, and the ability to compile and run vanilla ANSI C programs. We'll need to do some things that are highly system-specific, such as TCP/IP communications and file locking, and no attempt will be made to implement non-Unix versions of this code. On the other hand, the code will not deliberately use Unix-specific features where portable alternatives exist.

Keep it simple
Most issues of elegance and efficiency have been ruthlessly jettisoned in favour of simplifying the implementation. The simpler the program, the easier it is to convince a site manager (or yourself, if you run your own site) that it's safe to install. Also, since it's intended only as a starting point for development of a genuine bidirectional link facility, there's no reason to build in a lot of speculative functionality until we get a feel for what's needed.

How it Works

Hack Links is implemented as a set of C programs which are installed on a server which wishes to provide back links to its users and executed via the Common Gateway Interface (CGI) mechanism. These programs maintain a database of extant back links in a file external to a document and permit retrieval of a documents containing back links, addition of new back links, and following links when a given piece of text contains multiple overlapping links. The individual programs are described below.

hlxget local_file [ -t target ]
The HTML file named local_file, present in the directory from which httpd serves documents, is opened, along with the file containing its back links, local_file.bl. If no .bl file exists, back links are not available for this document, and it is returned without modifications. If a back link file is present it is read, along with the text file, and HTML anchors are inserted in the text for each back link found in the .bl file. If a segment of text is the object of only one href anchor, a direct link to the target is inserted. For text which contains links to multiple destinations, whether exclusively back links or a conventional link in the document and one or more overlapping back links, an executable link to the hlxchoose program is generated, with arguments that identify the links from the given text. The user's browser thus receives an HTML document in which all back links currently extant have been interpolated as conventional links.

If the -t option is specified, rather than returning the interpolated HTML directly, hlxget writes it into a temporary cache directory on the server, adding an anchor target for the word number range given by the *TARGET statement in the document's .bl file with target name target. A URL is returned which references this temporary file, using a "#" specification so the document is positioned to the given passage. Words within the range of the target are shown in bold face. A job is scheduled to delete the temporary file after a decent interval. (This is an inelegant approach, but there's no other way I know that allows us to position the user's browser to a given passage in the text while meeting all the design constraints. Keeping the targets in the .bl file allows their word ranges to be updated when the document is revised; otherwise remote documents would have local line numbers embedded in them with no way to know if they become invalid.)

hlxchoose local_file link_id1 link_id2 ... [ -a "url" ]
A CGI reference to this program is created by hlxget whenever a sequence of text in a document it is transmitting is found to be the object of two or more links, whether all back links or an explicit link in the document and one or more back links. The local_file.bl is read, and a HTML document is returned to the user which explains that the link just followed goes to multiple destinations, listing them by document title. The applicable links are identified by their unique link_ids in the .bl file. Clicking on any of the titles sends the user to that document. A link allows returning to the original document, but it's generally better to use the Back button of most browsers since that preserves position in the file. The "-a" switch is used to supply the URL referenced by a forward link in the document, and will be clearly distinguished from back links made by others.

hlxmake local_file target_url args
If a local_file.bl file exists, a back link is created in it to the specified target_url. The link is created in the local document at the location given by args, which can be in any of the following formats. In all cases hlxmake will create a back link only if the given target_url is found to be accessible. hlxmake uses a lock file to prevent two concurrent back link requests from turning the .bl file into green slime. Lock files are kept in /tmp so they're automatically cleaned up if the system reboots.

-t "text passage" [ -tb "text before" ] [ -ta "text after" ]
The local_file is searched for the given text passage, ignoring all HTML mark-up and punctuation. The link is placed at the unique occurrence of that passage. If the passage occurs more than once in the file, the user can indicate the precise location by specifying text before or after the passage to which the link is to be made.

This form permits creation of back links without any support in the browser. A back-linkable document can contain a button which pops up a form (or, alternatively, it can provide the form itself as part of the document). The server can provide a standard back link form which allows placing back links in eligible documents with no co-operation by the target document at all. The user cuts and pastes the passage to which the link is to be made, along with neighbouring text, if necessary, to specify a unique location, and the URL the back link goes to.

-w start end
The link is made to the passage of text composed of words numbered start through end in the document, ignoring all HTML mark-up and and punctuation.

This request syntax is included to encourage intelligent browsers to support back links in a more convenient fashion. The user can simply highlight the passage to which the link is to be made, then when the user requests a back link there, the browser calculates the word numbers of the highlighted passage and passes them directly to hlxmake.

-a target_name
A *TARGET with the given name is added to the .bl file spanning the word range of the previous back link specification. The target_name must not already exist in the document's .bl file.

-b document_url args
If the -b switch is present, no target_url is specified. Instead, a second set of arguments in one of the forms described above follows, and is used to create a back link in the named document which points back to the first link. This allows creating bi-directional links between documents on remote servers, as long as both documents are enabled for back links. When this form is used, URLs are created in each document which request the other with hlxget, using the -t option to position the browser to the text at the other end of the link and display it in a highlighted form. Targets are added to the .bl files of both documents as needed. Local targets can be added directly, while targets in remote documents can be added by invoking hlxmake with the "-a" option.

This request form is intended to support intelligent browsers which permit the user to highlight passages of text in two concurrently-open documents and create a bidirectional link between them.

What if the Document Changes?

One of the advantages of Xanadu's central back-end was its ability to automatically guarantee the validity of link locations when documents were revised. Web documents are created by a variety of external tools, and the externally stored back links used by Hack Links have no means of being automatically updated if a document changes. This places the burden on the creator of a document who has chosen to permit back links to deal with existing back links if the document is subsequently revised. Failure to do so will result in back links moving to locations in the document unintended by their creators, which is highly undesirable to all involved. Several different approaches to this problem are discussed below.

Do nothing: document is static.
Before investing a large amount of effort to resolve the revision vs. links problem, it's worth noting that many documents are essentially static. In paper publishing, anything published is static by definition, unless superseded by a subsequent edition with a different designation. When hypertext is used as a medium for debate and discussion, it may be the case that most contributions are as static as news postings or E-mail; they aren't revised at all, but rather re-written when appropriate and redistributed. In this case, the author would make whatever links he considered appropriate in the new edition, then add a link on the title of the old edition pointing to the revised document.

Automatic link update.
Since back links are stored as word number spans within a document, when revisions are made to a document, one could create a "word diff" program which attemptsd to identify the changes between the original and revised documents and then adjust the word numbers of the back links in the original document to the corresponding positions in the revised text. Like line-oriented diff, finding changes is a heuristic process which can become confused, particularly when a document is reorganised, moving large sections around. Still, despite its shortcomings, many source code control systems have been built upon diff, and a word diff may prove adequate for many cases of document revisions.

Back link embedder/extractor.
When extensive editing of documents is contemplated, to a degree that the automatic link updating described above would get hopelessly lost (for example, if you're assembling a summary document from extracts of contributions by a variety of people), we could develop a program which merged a document and its back link file, inserting <BACKLINK HREF=...> and </BACKLINK> tags for each back link and target. These tags, not being valid HTML tags, would be ignored by browsers, allowing the author to preview the document as it was edited. When the final edition was complete, a second program would extract the BACKLINK tags into a separate .bl file and create an corresponding HTML file with the BACKLINK tags removed.

This approach has the disadvantage that new back links added to the document during the editing process will be lost. The author might, at the time of extraction, add a *BLOCK statement to the .bl file (see below), with a *MESSAGE notifying people who attempt to make back links that the document is currently being revised and back links should be made in the new document when it is published.

Back link file format

Back links are kept in an ASCII file in the following format. The file consists of various control records, each identified by an asterisk in the first column. All control records are case-insensitive.

*COMMENT any text
The line is a comment and is ignored.

*MESSAGE any text
The text is displayed in the confirmation box returned to a user who attempts to add a back link. For example, the message might notify the sender that the document has been superseded by a new edition, or invite the person who made the link to E-mail the author with a description of it.

*BLOCK
The addition of new back links is disabled. Existing back links can still be followed. This lets an author lock out back links to obsolete documents or documents currently being revised. A *MESSAGE would usually be included to explain the reason for the blockage.

*BACKLINKS count next_link_id
Following this item are count lines, each containing a back link. The first character of each back link is a space, followed by the following items. Back links appear in ascending order of their first word number. next_link_id gives the next available unique link_id.

First word number
The starting word of the link, counting words after removal of all HTML mark-up tags and punctuation. The first word in the document is word 1.

Last word number
The last word of the link, counting words as above. A link to a single word will have the same first and last word numbers.

Link_id
A unique number, starting at 1, identifying the link. This number is used by hlxchoose to identify the links it wishes the user to choose among. These numbers are never reused, even if a link is deleted because it is found to be invalid.

URL
The URL of the back link destination. If the link is to another site running Hack Links, and the link is within a back link enabled document, this will be a CGI invocation of hlxget with an "-t" specification pointing to the target destination within the document. The URL is quoted, with any embedded quotes escaped.

Title
The title of the target document, obtained by accessing it via the URL at the time the link is created. This is used by hlxchoose to identify target documents when a link has multiple destinations. We store the title at link creation time rather than obtaining it when the link is followed to reduce network traffic and the attendant delay in the appearance of the hlxchoose results. If changes in document titles prove to be a problem, a utility which re-verifies all the URLs in a .bl file and refreshes their titles could be created and run periodically.

*TARGETS count
Following this item are count lines containing the word number range of link targets within this document which can be accessed with the "-t" option of hlxget. The first character of each target is a space, followed by the following items. Targets are added as needed to local documents when hlxmake is invoked with the "-a" option.

Target name
The unique name to which the target is referred to in remote documents. This is generated by an algorithm similar to that used to generate unique message identifiers for news postings. Target names are quoted, with embedded quotes escaped.

First word number
The starting word of the target, counting words after removal of all HTML mark-up tags and punctuation. The first word in the document is word 1. If the first word number is -1, the target has been removed (usually because it pointed into a section of the document which has been deleted in a subsequent revision).

Last word number
The last word of the target, counting words as above. A link to a single word will have the same first and last word numbers.

Conclusion

Hack Links allows any Unix-based Web site that's willing to install three public domain portable C programs in its CGI directory to provide a back link facility that enables authors of documents to selectively permit readers to make links in their documents to other documents on the Web. No modifications to the Web server are required, and back links are accessible to all existing Web browsers. Facilities are included which allow future back-link-aware browsers to make the process of back link creation much easier.

While the constraints imposed with the intent of making Hack Links completely compatible with existing Web documents and software require sacrificing the automatic revision updating contemplated in Xanadu, implementable solutions are proposed which allow back links to be maintained across most document revisions.

It is estimated that the software envisioned in this proposal (excluding the automatic revision tools) could be implemented by one person in one week.


by John Walker