By Julienne Walker
License: Public Domain

I'm constantly dismayed at the tutorials and descriptions of linked lists that I find online, but I can let that slide because there's no real restriction on what can be put on websites in terms of quality. The real disappointment is that books written on data structures usually gloss over linked lists on the way to more interesting data structures. It's a disappointment because not only are linked lists a foundation concept, they're also quite interesting by themselves.

As a foundation concept, linked lists provide the basis for further linked data structure work. If you don't understand linked lists, you'll be lost when working with binary search trees, for example. A firm understanding of linked lists will make further learning much easier, so the rush to get past them that most resources do is harmful. Linked lists aren't a necessary evil.

Too often people are fooled by the apparent simplicity of linked lists. The core concept really is that simple, but when it comes to creating and using linked lists, the options are virtually limitless. Therefore, this tutorial will aim not only to provide the strongest foundation in the data structure possible, but also to tickle the creative nerve so that you can see more of the options available to you when it comes to this fascinating data structure.

 

Concept

From a conceptual standpoint, there's really nothing to a linked list. Imagine a train. Each railroad car on a train would be a single node in a linked list. A node contains the contents of the node, such as gravel or passengers, as well as a link to the next node. This gives us a sequential list of nodes where each node is completely independent. Textually, we can represent a list containing the set of items {3,1,4,2,5} like so:

3 -> 1 -> 4 -> 2 -> 5 -> ~

Each item is represented by a number, the links are represented by arrows, and the end of the list is represented by my preferred null designation character, the tilde. I use the tilde to signal something like "no node here". In reality it turns out to be a null pointer or a sentinel, but we'll get to that later. The order of the numbers depends on how the nodes are linked together. If 4 linked to 3 and 1 linked to 2, it might look something like this without changing the order of display:

         +-------+
         |       |
+-> 3 -> 1   4   +-> 2 -> 5 -> ~
|            |
+------------+

That's overly confusing, but if you display the items in the order that they're linked, you get this:

4 -> 3 -> 1 -> 2 -> 5 -> ~

This is an important concept when it comes to linked lists. Simply by changing links, you can re-order a linked list without moving around any data. This flexibility comes from the fact that, as said earlier, nodes are completely independent of each other. A node doesn't know or care where its links point to. But be careful, this can be a source of frustrating bugs if your links don't point where you think they do. ;-)

Because nodes are independent, it's very simple to add nodes to a list, remove them, insert them in between other nodes, splice lists into or out of another list, or any number of structural changes with relative ease. This is all compared to the same operations on an array, where all but the simplest of operations from the end of the array result in a lot of shifting and copying of data.

One drawback of a linked list is that of access. While working with a node is simple and efficient if you have immediate access to that node, getting to the node can be an issue. For example, in an array, you can use an index and get right to the item with a simple jump:

1 #include <stdio.h> 2 3 int main() 4 { 5 int a[] = {1,2,3,4,5}; 6 7 printf ( "%d\n", a[3] ); 8 9 return 0; 10 }

This code will print 4, but if the same numbers were stored in a linked list, you would have to follow the links from 1 to get to 4 before being able to print. This is because linked lists do not support random access, only sequential access from whichever node you happen to have a reference to. Usually the only guaranteed reference is the first node in the list, so printing the 3rd index of a linked list would look more like this:

1 node it = first_node; 2 size_t i = 0; 3 4 while ( i != 3 ) { 5 it = next_node ( it ); 6 ++i; 7 } 8 9 printf ( "%d\n", node_data ( it );

The exact details of the linked list code will be covered shortly, so I've kept the snippet to a minimum and used as little specifics as possible. Please accept my apologies and rest assured that you'll be writing linked lists in C, C++, C#, Java, or whatever your favorite language is like a pro by the end of the tutorial. :-) To be specific about the code, assuming we have a reference to the first node in the list, the only way to get to the 3rd index is to follow the link from the first node, then the second node, then the third, and finally the fourth (where the zero-based index is finally incremented to 3). For longer lists this can become a serious bottleneck.

In summary, a linked list is a sequential collection of nodes where each node is independent and contains a link to the next node in the list. Linked lists shine as a replacement for arrays where insertions and deletions all over the list are common and random access is uncommon.

Okay, enough concept. Let's get on to the good stuff.

 

Implementation Details

There are tons of ways to implement a linked list. By far the most common is using a self-referential structure as the node, with a data member and a next member marking the link. The data member in a practical implementation would be a pointer to void such that the list could be used with any number of types. A second structure would contain the first node in the list and a counter for the number of nodes in the list. Other members could also be added for more list-specific information, such as a flag specifying whether the list has a dummy head or not (see Insertion and Deletion).

1 struct jsw_node { 2 void *data; 3 struct jsw_node *next; 4 }; 5 6 struct jsw_list { 7 struct jsw_node *head; 8 int has_dummy_head; 9 size_t size; 10 };

However, this isn't the only way to create a linked list, and this tutorial will cover some other methods. A few helper functions are in order as well, because common operations like creating and destroying nodes and lists would bulk up the list algorithms. So we'll just toss them into functions to reduce redundancy.

1 /* 2 Create a new node with the given data and links 3 Returns a pointer to the new node, or NULL on error 4 */ 5 struct jsw_node *new_node ( void *data, struct jsw_node *next ) 6 { 7 struct jsw_node *rv = malloc ( sizeof *rv ); 8 9 if ( rv != NULL ) { 10 rv->data = data; 11 rv->next = next; 12 } 13 14 return rv; 15 } 16 17 /* 18 Create a new list with an optional dummy head 19 Returns a pointer to the new list, or NULL on error 20 */ 21 struct jsw_list *new_list ( int has_dummy_head ) 22 { 23 struct jsw_list *rv = malloc ( sizeof *rv ); 24 25 if ( rv != NULL ) { 26 rv->head = has_dummy_head ? new_node ( NULL, NULL ) : NULL; 27 rv->has_dummy_head = has_dummy_head; 28 rv->size = 0; 29 30 if ( has_dummy_head && rv->head == NULL ) { 31 /* Release the list if a dummy couldn't be allocated */ 32 free ( rv ); 33 rv = NULL; 34 } 35 } 36 37 return rv; 38 } 39 40 /* 41 Destroy a single given node, assuming it has been unlinked 42 Optionally destroy the data contained in the node 43 Returns the next node specified by the link 44 */ 45 struct jsw_node *destroy_node ( 46 struct jsw_node *node, 47 void (destroy_data) ( void * ) ) 48 { 49 struct jsw_node *rv = NULL; 50 51 if ( node != NULL ) { 52 /* 53 Save a reference to the next node 54 because we're about to destroy this one 55 */ 56 rv = node->next; 57 58 if ( destroy_data != NULL ) 59 destroy_data ( node->data ); 60 61 free ( node ); 62 } 63 64 return rv; 65 } 66 67 /* 68 Destroy all nodes in a given list 69 Optionally destroy all data in each node 70 */ 71 void destroy_list ( struct jsw_list *list, void (destroy_data) ( void * ) ) 72 { 73 while ( list->head != NULL ) 74 list->head = destroy_node ( list->head, destroy_data ); 75 }

We'll be assuming changes to these helpers later when we start to look at different variations of the linked list, but the changes are trivial and I'll describe what needs to be done.

Now that the basics are out of the way, let's look at inserting and deleting in a linked list!

 

Insertion and Deletion

To insert a node into a linked list, you need a reference to the node before where the new node is going to be inserted. Then it's a simple matter of re-linking the links in those two nodes. The new node's link should be set to the previous node's link, and the previous node's link should be set to the new node. Graphically the entire operation looks like this:

          3 -> ~

1 -> 2 ------> 4 -> 5 -> ~


          3
          |
1 -> 2 ---+--> 4 -> 5 -> ~


     +--> 3
     |    |
1 -> 2    +--> 4 -> 5 -> ~


1 -> 2 -> 3 -> 4 -> 5 -> ~

The fourth step is merely a cleanup of the representation so that you can see how the new node has been completely inserted into the list between 2 and 4. Keep in mind that 2's link cannot be reset to point to 3 until after 3's link has been set. Otherwise 2's link would be lost and you wouldn't know what to set 3's link to. Order of operations is very important here. The code to perform this magic is extremely simple:

1 /* 2 Insert a new node after the given node 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_after ( 6 struct jsw_list *list, 7 struct jsw_node *pos, 8 void *data ) 9 { 10 struct jsw_node *rv = NULL; 11 12 if ( list != NULL && pos != NULL ) { 13 /* Create a new node and set the next link */ 14 rv = new_node ( data, pos->next ); 15 16 if ( rv != NULL ) { 17 pos->next = rv; 18 ++list->size; 19 } 20 } 21 22 return rv; 23 }

There are a few problems by design in this function. The first problem is that if the list is empty, there's nothing to insert after. In such a case pos would be a null pointer and the insertion would fail. However, taking a null pointer and assuming that it refers to an empty list is potentially dangerous as well. If the list isn't actually empty, but the function assumes it is and terminates successfully, you have a difficult to trace bug.

The second problem is that it's impossible with to add a node to the head of the list with this function. Even if pos refers to the head of the list and the head of the list is not a null pointer, the new node will always be inserted after the given node. That is, the new node can only be the second in the list if added to the front. So the only way to use insert_after to add a new node to the head of the list is to make sure that the list always has a dummy node for the head (notice the parameter to new_list).

A dummy node is a node that has no value other than to simplify structural changes to the list. Dummy nodes are used in linked data structures to avoid special cases with the start or end of the data structure. The dummy head has no value, but it does have a link that can be followed to get to the "real" head of the list. The only problem now is that you have two heads to deal with: one for structural changes at the head of the list, and one for marking the first legitimate value in the list.

Now, you might be thinking "Well why not just write an insert_before function?", and that's an excellent question. In fact, we'll be doing that right now. However, for solving the problem of inserting at the head of the list without a dummy, how might you do it while avoiding the first problem?

Keep in mind that a null pointer representing an empty list is indistinguishable from a null pointer representing a programming oopsie. All null pointers are equal in the world of C (or C++, or Java, or C#, etc...), so the only way to guarantee that you're looking at a truly empty list is to use a sentinel node instead of a null pointer to mark an empty list. That adds unnecessary complexity because now you have two markers: one for an empty list and one for the end of the list. Using the sentinel as both does nothing to alleviate the problem, and you end up adding extra special cases.

The insert_before function introduces a whole new bundle of problems. Now, without the dummy node, we're stuck with the problem of trying to insert before a null pointer, with the option of assuming that a null pointer means to insert onto the end of the list. This time the operation won't simply succeed with unexpected results. We would have to specifically add special case code to handle this situation, and it introduces the problem of assuming a null pointer means the calling code wants to insert onto the end of the list.

Option 1 is to disallow insert_before to the end of the list. That's the safest solution, but it does make the users of the list think harder about which functions to use. Now, that's not necessarily a bad thing, but good library design says that things should work as expected. I'm not inclined to think that insert_before on the end of the list failing is expected, so let's call this option a last resort.

Option 2 is to fix the problem the same way as with insert_after, with a dummy node. This time the dummy node becomes a dummy tail. Add a new jsw_node pointer called tail to the jsw_list structure as well as a flag, has_dummy_tail. This requires changes to new_list so that the tail can be created properly:

1 /* 2 Create a new list with an optional dummy head and tail 3 Returns a pointer to the new list, or NULL on error 4 */ 5 struct jsw_list *new_list ( 6 int has_dummy_head, 7 int has_dummy_tail ) 8 { 9 struct jsw_list *rv = malloc ( sizeof *rv ); 10 11 if ( rv != NULL ) { 12 struct jsw_node *tail = 13 has_dummy_tail ? new_node ( NULL, NULL ) : NULL; 14 15 if ( has_dummy_tail && tail == NULL ) { 16 /* Release the list if a dummy couldn't be allocated */ 17 free ( rv ); 18 rv = NULL; 19 } 20 else { 21 rv->head = has_dummy_head ? new_node ( NULL, tail ) : NULL; 22 rv->tail = tail; 23 24 rv->has_dummy_head = has_dummy_head; 25 rv->has_dummy_tail = has_dummy_tail; 26 27 rv->size = 0; 28 29 if ( has_dummy_head && rv->head == NULL ) { 30 /* Release the list if a dummy couldn't be allocated */ 31 free ( rv ); 32 rv = NULL; 33 } 34 } 35 } 36 37 return rv; 38 }

The changes are straightforward, but take note that other insertion and deletion algorithms need changes as well. Another problem with insert_before is what happens when insert_before is called with the dummy head as pos? This is a legitimate special case because we clearly don't want a node that links to the dummy but the dummy doesn't link to. This is effectively a resource leak for the list and the new node will appear to vanish.

With the introduction of a dummy tail, insert_after will have the same problem, so both functions will need to be modified with a special case that changes the direction of the insert when on the dummies. The simplest solution is for insert_after to call insert_before when pos is the dummy tail, and insert before to call insert_after when pos is the dummy head:

1 /* 2 Insert a new node after the given node 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_after ( 6 struct jsw_list *list, 7 struct jsw_node *pos, 8 void *data ) 9 { 10 struct jsw_node *rv = NULL; 11 12 if ( list != NULL && pos != NULL ) { 13 if ( pos != list->tail ) { 14 /* Create a new node and set the next link */ 15 rv = new_node ( data, pos->next ); 16 17 if ( rv != NULL ) { 18 pos->next = rv; 19 ++list->size; 20 } 21 } 22 else 23 rv = insert_before ( list, pos, data ); 24 } 25 26 return rv; 27 } 28 29 /* 30 Insert a new node before the given node 31 Returns a pointer to the new node or NULL on failure 32 */ 33 struct jsw_node *insert_before ( 34 struct jsw_list *list, 35 struct jsw_node *pos, 36 void *data ) 37 { 38 struct jsw_node *rv = NULL; 39 40 if ( list != NULL && pos != NULL ) { 41 if ( pos != list->head ) { 42 /* Find the previous node */ 43 struct jsw_node *it = list->head; 44 45 while ( it->next != pos ) 46 it = it->next; 47 48 /* Create a new node and set the next link */ 49 rv = new_node ( data, it->next ); 50 51 if ( rv != NULL ) { 52 it->next = rv; 53 ++list->size; 54 } 55 } 56 else 57 rv = insert_after ( list, pos, data ); 58 } 59 60 return rv; 61 }

The insert_before function is largely the inverse of insert after, but I'd like to draw your attention to the while loop in insert_before. This represents a huge failing in this variation of the linked list, because the only way to insert before a node is to find the node that links to it. Because nodes only link to the next node, and not the previous node, this means that a complete search of the list from the very beginning is required.

Deletion from a linked list is extremely simple, provided you have a reference to the previous node. Deletion involves changing the links around a node so that they no longer refer to that node. Then the node itself can be freed. No link changes need to be made to the node being deleted unless it's being added to another list. Here's the graphical representation of deleting a node in a linked list:

1 -> 2 -> 3 -> 4 -> 5 -> ~


          3
          |
1 -> 2 ---+--> 4 -> 5 -> ~


          3 -> ~

1 -> 2 ------> 4 -> 5 -> ~

I've added the update of 3's link to null so that the removal is more obvious. The operation really is the inverse of insertion in that 2's next link is set to 3's next link, then 3's next link is set to null. In code it's equally simple, but we need to take several things into account.

First, deletion of the tail should probably remove the last node in the list since that's what a user might expect. Second, deletion of the head should probably remove the first node in the list for the same reason. Finally, we need to account for an empty list. Have a look at the code, and I'll explain the decisions behind it afterward.

1 /* 2 Remove the selected node 3 Returns the removed node or NULL on failure 4 */ 5 struct jsw_node *remove_node ( 6 struct jsw_list *list, 7 struct jsw_node *pos ) 8 { 9 struct jsw_node *rv = NULL; 10 11 if ( list != NULL && pos != NULL ) { 12 struct jsw_node *it = list->head; 13 struct jsw_node *prev = NULL; 14 15 /* Shift the head by one if removing the dummy */ 16 if ( pos == list->head ) 17 pos = pos->next; 18 19 /* Find the previous node and its previous node */ 20 while ( it->next != pos ) { 21 prev = it; 22 it = it->next; 23 } 24 25 if ( pos != list->tail ) { 26 /* Remove the selected node */ 27 rv = pos; 28 it->next = rv->next; 29 rv->next = NULL; 30 --list->size; 31 } 32 else if ( it != list->head ) { 33 /* Remove the node before the tail */ 34 rv = it; 35 prev->next = rv->next; 36 rv->next = NULL; 37 --list->size; 38 } 39 else { 40 /* The list was empty */ 41 } 42 } 43 44 return rv; 45 }

The first decision is how to handle the case where pos is the dummy head of the list. This turns out to be a relatively simple problem because we can simply shift the node down by one and be at the first value node in the list. If the list is empty, pos would become the dummy tail, and that falls into the next special case.

For the case of when pos is the dummy tail, the obvious solution is also the one to be avoided. Walk down the list once to find the last node, the walk down the list again to remove that node. This is an extremely inefficient solution for finding the previous node of the previous node of pos. Instead, I selected a nice little trick for saving back links. The iterator variable is used to find the previous node of pos starting at the dummy head. Now, if you have another one of those that gets set to the iterator just before it moves to the next link, you have the iterator's previous node. If that previous node is NULL, then the iterator didn't have a previous node. With this technique, we can find the dummy tail's previous node's previous node and save ourselves a second trip down the list.

If you're thinking that this is a complete pain in the butt and totally inefficient when you already have a reference to the node you want to delete, I agree thoroughly. In fact, I agree so much that I'll immediately start writing about how to fix that problem. :-)

 

Double Linked Lists

So far, we've been looking at a linked list with one link to the next node. This is known technically as a single linked list, and it's certainly not the only option. In fact, another named variation uses two links: one to the next node and one to the previous node. This variation is known as the double linked list, and it's only slightly more difficult than the single linked list. Graphically, I would represent a double linked list like so:

~ <- 1 <-> 2 <-> 3 <-> 4 <-> 5 -> ~

Notice how the links now go both ways, so any node can link to the previous and next nodes. This means quite a few things to us. It means that we can traverse a list in either direction or even change directions and go back if we missed something. It means that we don't need to traverse the list to find the previous node for insertion or deletion because the pos node can now give us that information directly. And of course, it means that we have to work harder to maintain links. ;-)

Let's begin with the changes needed to our helpers. First, add a new link to the jsw_node structure and call it prev. We'll also add another argument to new_node before next (of the same type) and call it prev. This way new nodes can be created with data, a previous link, and a next link. Finally, we'll modify new_list so that it properly links together the head and tail. This involves making sure that the next link of the head points to the tail and the prev link of the tail points to the head.

1 /* 2 Create a new list with an optional dummy head and tail 3 Returns a pointer to the new list, or NULL on error 4 */ 5 struct jsw_list *new_list ( 6 int has_dummy_head, 7 int has_dummy_tail ) 8 { 9 struct jsw_list *rv = malloc ( sizeof *rv ); 10 11 if ( rv != NULL ) { 12 struct jsw_node *tail = 13 has_dummy_tail ? new_node ( NULL, NULL, NULL ) : NULL; 14 15 if ( has_dummy_tail && tail == NULL ) { 16 /* Release the list if a dummy couldn't be allocated */ 17 free ( rv ); 18 rv = NULL; 19 } 20 else { 21 rv->head = has_dummy_head ? new_node ( NULL, NULL, tail ) : NULL; 22 23 /* Finish linking the tail to the head */ 24 rv->tail = tail; 25 rv->tail->prev = rv->head; 26 27 rv->has_dummy_head = has_dummy_head; 28 rv->has_dummy_tail = has_dummy_tail; 29 30 rv->size = 0; 31 32 if ( has_dummy_head && rv->head == NULL ) { 33 /* Release the list if a dummy couldn't be allocated */ 34 free ( rv ); 35 rv = NULL; 36 } 37 } 38 } 39 40 return rv; 41 }

The prev link of the tail relies on the head already existing in a non-null state, which isn't the case. So the code is a little awkward in that new_node isn't being used to it's fullest when creating the tail, and the prev link for the tail needs to be updated after the head has been created. This problem doesn't exist with the head because the tail has already been created, but for symmetry, one might choose to ignore new_node's capabilities and manually set all of the links in this case.

The insert_before and insert_after functions suddenly become exact inverses of each other because insert_before no longer needs to search the list. However, both need to take care to set all of the links (there are up to four that need to be set) without accidentally dereferencing a null pointer. The changes are shockingly simple:

1 /* 2 Insert a new node after the given node 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_after ( 6 struct jsw_list *list, 7 struct jsw_node *pos, 8 void *data ) 9 { 10 struct jsw_node *rv = NULL; 11 12 if ( list != NULL && pos != NULL ) { 13 if ( pos != list->tail ) { 14 /* Create a new node and set the next link */ 15 rv = new_node ( data, pos, pos->next ); 16 17 if ( rv != NULL ) { 18 if ( pos->next != NULL ) 19 pos->next->prev = rv; 20 21 pos->next = rv; 22 ++list->size; 23 } 24 } 25 else 26 rv = insert_before ( list, pos, data ); 27 } 28 29 return rv; 30 } 31 32 /* 33 Insert a new node before the given node 34 Returns a pointer to the new node or NULL on failure 35 */ 36 struct jsw_node *insert_before ( 37 struct jsw_list *list, 38 struct jsw_node *pos, 39 void *data ) 40 { 41 struct jsw_node *rv = NULL; 42 43 if ( list != NULL && pos != NULL ) { 44 if ( pos != list->head ) { 45 /* Create a new node and set the next link */ 46 rv = new_node ( data, pos->prev, pos ); 47 48 if ( rv != NULL ) { 49 if ( pos->prev != NULL ) 50 pos->prev->next = rv; 51 52 pos->prev = rv; 53 ++list->size; 54 } 55 } 56 else 57 rv = insert_after ( list, pos, data ); 58 } 59 60 return rv; 61 }

Finally, deletion becomes an exercise in the trivial. Now that we can access both sides from any node, we no longer need to do the awkward search that was necessary before. The majority of the code consists of dealing with pos being either the dummy head or tail, and making sure that the two remaining links in the list are properly set:

1 /* 2 Remove the selected node 3 Returns the removed node or NULL on failure 4 */ 5 struct jsw_node *remove_node ( 6 struct jsw_list *list, 7 struct jsw_node *pos ) 8 { 9 struct jsw_node *rv = NULL; 10 11 if ( list != NULL && pos != NULL ) { 12 /* Shift off of the dummies */ 13 if ( pos == list->head ) 14 pos = pos->next; 15 16 if ( pos == list->tail ) 17 pos = pos->prev; 18 19 if ( pos != list->head ) { 20 /* Remove a non-dummy node */ 21 rv = pos; 22 23 /* Reset the list links if necessary */ 24 if ( rv->prev != NULL ) 25 rv->prev->next = rv->next; 26 27 if ( rv->next != NULL ) 28 rv->next->prev = rv->prev; 29 30 /* Clean up the old node */ 31 rv->prev = rv->next = NULL; 32 --list->size; 33 } 34 } 35 36 return rv; 37 }

Notice that removing a non-dummy node only tests for the dummy head. While you could add a test for the dummy tail too, it would be unnecessary. There's no case where pos would be the dummy tail at that point because of the conditional immediately prior. It sets pos to pos->prev if it's the dummy tail.

Double linking solves so many problems with single linking, but maintaining the extra links can be a chore. Also note that with each new link, you're introducing more storage into the linked list. Sometimes it's better to restrict your operations in favor of less memory usage.

 

Circular References

Now that we've looked at single and double linked lists, null sentinels and dummy nodes, I'm sure you're convinced that linked lists are an exercise in annoying special cases. Now that you're disillusioned, let's simplify the world by changing the rules slightly. One simple modification to how a list ends can remove all of those annoying edge cases. Consider a linked list where the last node in the list wraps around to the first node:

+-> 1 <-> 2 <-> 3 <-> 4 <-> 5 <-+
|                               |
+-------------------------------+

This is called an indirect circular reference. It's kind of like recursion in that 5 refers to itself by way of referring to 1, which refers to 2, which refers to 3, which refers to 4, which then refers to 5. What we have here is an infinite loop by design called a circular linked list. :-)

Now, I imagine you're starting to wonder what possible use could something like this have, so I'll let the code do the talking. Here are the updated helper functions to take advantage of circular references (the structure is the simple one from the beginning of the tutorial without the has_dummy_head flag):

1 /* 2 Create a new list 3 Returns a pointer to the new list, or NULL on error 4 */ 5 struct jsw_list *new_list ( void ) 6 { 7 struct jsw_list *rv = malloc ( sizeof *rv ); 8 9 if ( rv != NULL ) { 10 rv->head = new_node ( NULL, NULL, NULL ); 11 rv->size = 0; 12 13 if ( rv->head != NULL ) { 14 /* Set up a circular reference situation */ 15 rv->head->next = rv->head; 16 rv->head->prev = rv->head; 17 } 18 else { 19 /* Release the list if a dummy couldn't be allocated */ 20 free ( rv ); 21 rv = NULL; 22 } 23 } 24 25 return rv; 26 } 27 28 /* 29 Destroy all nodes in a given list 30 Optionally destroy all data in each node 31 */ 32 void destroy_list ( struct jsw_list *list, void (destroy_data) ( void * ) ) 33 { 34 while ( list->size > 0 ) { 35 list->head = destroy_node ( list->head, destroy_data ); 36 --list->size; 37 } 38 }

The new_list function is greatly simplified because there's less work to do. As with the tail issue before, we can't set the prev or next links for head because they rely on head itself. The object has to be completely defined before the circular references can be set. The destroy_list function needed to be changed because it relied on a null pointer terminating the list. Since that's not the case anymore, I modified it to use the size of the list instead. This isn't an ideal solution, but it should suffice for this tutorial.

The benefit of a circular linked list is that edge cases no longer exist. None of the algorithms have to check for a null pointer terminating the list, and all (well, most) operations are guaranteed to succeed barring memory allocation errors and other catastrophic meltdowns that break just about any program. Let's look at how insert_before, insert_after, and remove_node were simplified through circular referencing:

1 /* 2 Insert a new node after the given node 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_after ( 6 struct jsw_list *list, 7 struct jsw_node *pos, 8 void *data ) 9 { 10 struct jsw_node *rv = NULL; 11 12 if ( list != NULL && pos != NULL ) { 13 /* Create a new node and set the next link */ 14 rv = new_node ( data, pos, pos->next ); 15 16 if ( rv != NULL ) { 17 pos->next->prev = rv; 18 pos->next = rv; 19 ++list->size; 20 } 21 } 22 23 return rv; 24 } 25 26 /* 27 Insert a new node before the given node 28 Returns a pointer to the new node or NULL on failure 29 */ 30 struct jsw_node *insert_before ( 31 struct jsw_list *list, 32 struct jsw_node *pos, 33 void *data ) 34 { 35 struct jsw_node *rv = NULL; 36 37 if ( list != NULL && pos != NULL ) { 38 /* Create a new node and set the next link */ 39 rv = new_node ( data, pos->prev, pos ); 40 41 if ( rv != NULL ) { 42 pos->prev->next = rv; 43 pos->prev = rv; 44 ++list->size; 45 } 46 } 47 48 return rv; 49 } 50 51 /* 52 Remove the selected node 53 Returns the removed node or NULL on failure 54 */ 55 struct jsw_node *remove_node ( 56 struct jsw_list *list, 57 struct jsw_node *pos ) 58 { 59 struct jsw_node *rv = NULL; 60 61 if ( list != NULL && pos != NULL ) { 62 if ( pos != list->head ) { 63 /* Remove a non-header node */ 64 rv = pos; 65 66 /* Reset the list links if necessary */ 67 rv->prev->next = rv->next; 68 rv->next->prev = rv->prev; 69 70 /* Clean up the old node */ 71 rv->prev = rv->next = NULL; 72 --list->size; 73 } 74 } 75 76 return rv; 77 }

All of those pesky special cases for the end of the list are gone with the exception of removing the header node. Now, it's possible to do without the header node entirely, but then you need to account somehow for a truly empty list, or guarantee that a node is present before doing any work. The header acts as a guaranteed node and makes life easier. I encourage you to try different variations that you can think of and see what happens.

One thing to note about circular references is that in some languages with garbage collection, special care must be taken to clean up the links if a node points to itself. Sometimes this can trip up garbage collectors and the node won't ever be released. This is a subtle memory leak.

 

Maintaining Order

Now for a little fun. Often it's useful to keep a linked list sorted, and sorting a linked list can get a little tricky, as EC's Art of Sorting tutorial shows. However, because the nodes are independent and insertion anywhere in the list is an efficient operation, we can write an insert_sorted function that maintains order in the list even when new items are being added. This is as simple as finding the sorted location of the new item and calling insert_before to place it:

1 /* 2 Insert a new node in sorted order 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_sorted ( struct jsw_list *list, void *data ) 6 { 7 struct jsw_node *rv = NULL; 8 9 if ( list != NULL ) { 10 /* Find the sorted position of the new node */ 11 struct jsw_node *it = list->head->next; 12 13 while ( it != list->head 14 && compare ( it->data, data ) < 0 ) 15 { 16 it = it->next; 17 } 18 19 /* Delegate the insertion to an existing function */ 20 rv = insert_before ( list, it, data ); 21 } 22 23 return rv; 24 }

The insert_sorted function is little more than a wrapper around insert_before that searches for the first node that has a greater or equal value to the new node's data. Once we have that node, we know that the new node's sorted position is directly in front of it. I'm assuming a compare function that knows how to take two pointers to void and compare the pointed-to values.

This technique is relevant to all of the linked list variations we've looked at so far. The number of links is irrelevant unless you want to be able to insert sorted nodes from back to front rather than front to back, and care must be taken to use the correct sentinel value for the end of the list. In the case of a circular list, we've started the search at the header's link and stop the search at the header itself. This produces the correct semantics.

With a dummy tail, the algorithm works properly, however, if the tail sentinel is a null pointer you need to change the algorithm slightly so that it uses insert_after and the search stops at the node before the sorted location. This is because insert_before would fail miserably with a null pointer. :-) With a dummy head, it would look like this:

1 /* 2 Insert a new node in sorted order 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_sorted ( struct jsw_list *list, void *data ) 6 { 7 struct jsw_node *rv = NULL; 8 9 if ( list != NULL ) { 10 /* Find the sorted position of the new node */ 11 struct jsw_node *it = list->head; 12 13 while ( it->next != NULL 14 && compare ( it->next->data, data ) < 0 ) 15 { 16 it = it->next; 17 } 18 19 /* Delegate the insertion to an existing function */ 20 rv = insert_after ( list, it, data ); 21 } 22 23 return rv; 24 }

Without a dummy head, the function would have no choice but to use a special case when the new node is to be inserted onto the front of the list. While the following function assumes a single linked list, the only difference between it and the algorithm for a double linked list is the number of parameters to the new_node function.

1 /* 2 Insert a new node in sorted order 3 Returns a pointer to the new node or NULL on failure 4 */ 5 struct jsw_node *insert_sorted ( struct jsw_list *list, void *data ) 6 { 7 struct jsw_node *rv = NULL; 8 9 if ( list != NULL ) { 10 if ( compare ( data, list->head->data ) < 0 ) { 11 rv = new_node ( data, list->head ); 12 13 if ( rv != NULL ) { 14 /* Only reset the head on success */ 15 list->head = rv; 16 } 17 } 18 else { 19 /* Find the sorted position of the new node */ 20 struct jsw_node *it = list->head; 21 22 while ( it->next != NULL 23 && compare ( it->next->data, data ) < 0 ) 24 { 25 it = it->next; 26 } 27 28 /* Delegate the insertion to an existing function */ 29 rv = insert_after ( list, it, data ); 30 } 31 } 32 33 return rv; 34 }

In the majority of cases, it's faster and easier to maintain order in a linked list than to explicitly sort an unsorted list, but what if you want to sort on multiple keys in the data? What if you want both the initial order as well as sorted order at the same time? This is where linked lists can get very complicated, because the sky is the limit when it comes to the number of links in a node and how you use them.

 

Threads

A linked list isn't restricted to one or two links, those are simply the most common variations. Further links that provide a different ordering of the nodes, or that offer a subset of nodes can be called threads because they're link chains that work their way through the nodes differently from the "main thread", which would be the prev and next links. For example, let's say you wanted a linked list that threaded the even and odd numbered nodes.

To do something like this we would need extra links for each node that specifies even or odd, and is a null pointer if not set. We would also need links to the start of each thread in the main list structure:

1 struct jsw_node { 2 void *data; 3 struct jsw_node *next; 4 struct jsw_node *link[2]; 5 }; 6 7 struct jsw_list { 8 struct jsw_node *head; 9 struct jsw_node *link[2]; 10 int has_dummy_head; 11 size_t size; 12 };

The new_node and new_list functions would need to be updated as well to handle these new links by setting them to null pointers or some other sentinel value. Creating the threads themselves is fairly simple. I'll break the operation out into a separate function for extra simplicity and because it makes sense to only create threads when they're needed. A truly generic approach would be to dynamically allocate new links as needed for the threads, with the thread logic being largely unknown to the list proper. Here is a create_threads function that builds the even and odd threads:

1 /* 2 Create the odd/even threads 3 */ 4 void create_threads ( struct jsw_list *list ) 5 { 6 struct jsw_node *it = 7 list->has_dummy_head ? list->head->next : list->head; 8 struct jsw_node *link[2] = {0}; 9 int i; 10 11 for ( i = 1; it != NULL; i++ ) { 12 int dir = ( i % 2 == 0 ); 13 14 /* Set the start of the thread if necessary */ 15 if ( link[dir] == NULL ) 16 list->link[dir] = it; 17 18 /* Set the next link in the thread */ 19 if ( link[dir] != NULL ) 20 link[dir]->link[dir] = it; 21 22 /* Update the walkers */ 23 link[dir] = it; 24 it = it->next; 25 } 26 }

Now, it might be difficult to see a situation where this kind of threading is needed, so let's look at something a bit more practical. Given a list of jobs, we want to sort those jobs by a certain name but also retain the original order that the jobs arrived. An example of this is a priority queue by insertion order that needs to be displayed onto a web site both by priority and by job name for user convenience.

To solve this problem, one might re-sort the queue as necessary for printing, or store a separate list with the different ordering. But with a threaded linked list, the same list can handle both orderings. This minimizes storage costs and maintains peformance. Like with the even-odd threads, we need to have both a next thread link in the node structure as well as a first thread link in the list structure. Since this example will be adding to the end of the threaded list, I've also added a tail link for the thread:

1 struct jsw_node { 2 void *data; 3 struct jsw_node *next; 4 struct jsw_node *next_added; 5 }; 6 7 struct jsw_list { 8 struct jsw_node *head; 9 struct jsw_node *head_added; 10 struct jsw_node *last_added; 11 int has_dummy_head; 12 size_t size; 13 };

The list will be built with insert_sorted, and the thread will handle the insertion priority. Be sure to select the thread that will be most commonly used as the main thread because this will make the list easier to understand as a whole. I chose the sorted list, but it's very likely that an application would be better off with the insertion priority list as the main thread.

With the structure changes (also the corresponding new_node and new_list updates), changing insert_sorted to add newly inserted nodes to the thread is bordering on trivial:

1 /* 2 Insert a new node in sorted order 3 Maintains the priority thread 4 Returns a pointer to the new node or NULL on failure 5 */ 6 struct jsw_node *insert_sorted ( struct jsw_list *list, void *data ) 7 { 8 struct jsw_node *rv = NULL; 9 10 if ( list != NULL ) { 11 /* Find the sorted position of the new node */ 12 struct jsw_node *it = list->head; 13 14 while ( it->next != NULL 15 && compare ( it->next->data, data ) < 0 ) 16 { 17 it = it->next; 18 } 19 20 /* Delegate the insertion to an existing function */ 21 rv = insert_after ( list, it, data ); 22 23 /* Add to the priority thread */ 24 if ( rv != NULL ) { 25 if ( list->head_added == NULL ) 26 list->head_added = rv; 27 else 28 list->last_added->next_added = rv; 29 30 list->last_added = rv; 31 } 32 } 33 34 return rv; 35 }

The big changes are at the end. Making sure that rv isn't a null pointer (which would signal that the node failed to be inserted), the algorithm sets the start of the thread if needed and otherwise links the new node onto the end of the thread. Finally, the end of the thread is set to the new node so that further thread links will be in the correct order.

Keep in mind that threading does not create new nodes! Given five numbers inserted in the order {2,5,3,1,4}, the sorted list would look like this:

head -> 1 -> 2 -> 3 -> 4 -> 5 -> ~

With the threads added, the nodes themselves do not change at all. The only difference is that now there are more links that represent the insertion order of {2,5,3,1,4}. Graphically, the threaded list could be represented like this:

head_added ----+
               |
               |
               |
          +----|---------+
          |    |         |
head -+-> 1 -> 2 -+-> 3 -+-> 4 -+-> 5 -> ~
      |        |  |   |      |  |   |
      |        +--|---|------|--+   |
      |           +---|------|------+
      +---------------+      +-> ~

Take some time to read that, because it's difficult to follow with my crude ASCII art. Start at head_added, then go down to 2, then down and over to 5, then down and back to 3, then down and back to 1, then up and over to 4, then down and over to null. The picture is convoluted, but the logic is extremely simple. And that's all there is to threading. The sky is the limit. :-)

 

Linked Lists vs. Arrays

The data structure that linked lists compete directly with is the array. There are four significant advantages to using linked lists over arrays that come to mind:

Often these advantages outweigh the disadvantages, but there are also several advantages for arrays when it comes to linked lists vs. arrays:

However, arrays have a big problem growing and shrinking, or adding or removing items anywhere except the end. So the usual advice is to use an array when the size is fairly set in stone, and adding or removing will only occur at the end of the array. Use a linked list when adding and removing is common all across the list, or the size of the list is unknown and likely to change often.

 

Node Storage

Up until now we've looked at the nodes in a linked list as little islands returned by malloc. This is by far the most common implementation, but the linked list is actually an abstraction. We're not restricted by an implementation details as long as the basic rule for a linked list is met: each node provides a link to the next.

This means that we can trade the advantages of one storage medium for the disadvantages of another. For example, a linked list where each node is allocated dynamically has awful cache performance. If we're writing a linked list that needs good cache performance but we're reasonably sure that it won't exceed a certain size, we can use an array to store the nodes.

I've introduced this concept to people before, and the typical response is "But that sounds like a lot of extra work and complication!". I'll let you be the judge. Converting all of the above linked list code so that it uses an array instead of dynamic nodes can be done by replacing new_node and destroy_node:

static struct jsw_node free_list[MAX_SIZE];
static size_t next_node = 0;

/*
  Create a new node with the given data and links
  Returns a pointer to the new node, or NULL on error
*/
struct jsw_node *new_node ( void *data, struct jsw_node *next )
{
  struct jsw_node *rv = NULL;

  if ( next_node < MAX_SIZE ) {
    rv = &free_list[next_node];
    rv->data = data;
    rv->next = next;
  }

  return rv;
}

/*
  Destroy a single given node, assuming it has been unlinked
  Optionally destroy the data contained in the node
  Returns the next node specified by the link
*/
struct jsw_node *destroy_node (
  struct jsw_node *node,
  void (destroy_data) ( void * ) )
{
  struct jsw_node *rv = node->next;

  if ( node != NULL ) {
    if ( destroy_data != NULL )
      destroy_data ( node->data );
  }

  return rv;
}

That's it! Well, not really. While this works to a point, the destruction of a node only destroys the data; it doesn't handle returning the node back to the free list for later use. This means that once you use a node, it's used up forever. The problem with handling this case is that new_node is doling out pointers to the nodes in the free list. The only way to avoid invalidating those pointers while still being able to return freed nodes to the pool is to search the pool for unused pointers.

static struct jsw_node free_list[MAX_SIZE];

/*
  Create a new node with the given data and links
  Returns a pointer to the new node, or NULL on error
*/
struct jsw_node *new_node ( void *data, struct jsw_node *next )
{
  struct jsw_node *rv = NULL;
  size_t i;

  for ( i = 0; i < MAX_SIZE; i++ ) {
    if ( !free_list[i].is_used )
      break;
  }

  if ( i != MAX_SIZE ) {
    rv = &free_list[i];
    rv->data = data;
    rv->is_used = 1;
    rv->next = next;
  }

  return rv;
}

/*
  Destroy a single given node, assuming it has been unlinked
  Optionally destroy the data contained in the node
  Returns the next node specified by the link
*/
struct jsw_node *destroy_node (
  struct jsw_node *node,
  void (destroy_data) ( void * ) )
{
  struct jsw_node *rv = node->next;

  if ( node != NULL ) {
    if ( destroy_data != NULL )
      destroy_data ( node->data );

    node->is_used = 0;
  }

  return rv;
}

Notice that I've added a flag to the node struct called is_used and that it's designed so that the default initialization for the static array will set it to false. This eliminates the need to make an initialization pass over the array to make sure that the flags have the correct default value.

Now, the problem with this is fairly obvious. First, we've added a lot of extra work to new_node, and with a long and largely full list we should expect it to take some time. Whether this time exceeds that of a call to malloc is up to debate, and I encourage benchmarking. However, the performance will fluctuate with this search, so we've cut ourselves off from certain speed guarantees. Fortunately, returning a node to the pool is very efficient.

Another problem deals with using an array to store the nodes in the first place. We've locked ourselves into one of the disadvantages of an array, and that's a bounded size. The linked list cannot grow larger than MAX_SIZE with this setup. If MAX_SIZE is chosen carefully, it probably won't be an issue, but wasted space and not enough space are two concerns.

As you may have noticed by now, when I raise an issue I usually have a solution or a workaround to the issue, and this one is no different. Unfortunately, there's not much you can do if your language of choice doesn't support dynamic allocation. However if it does, you can both minimize the number of allocations as well as keep mostly to arrays by using a linked list of array-based linked lists.

I'll repeat that because it bears repeating. The top level linked list will contain nodes where each node contains an array-based linked list like we defined above. Those inner linked lists will be linked together so that the outer linked list only serves as a container for the arrays. How about a drawing?

head -+
      |
      +-> {[0]->[1]->[2]->[3]->[4]->[5]} -+
                                          |
      +-----------------------------------+
      |
      +-> {[0]->[1]->[2]->[3]->[4]->[5]} -+
                                          |
      +-----------------------------------+
      |
      +-> {[0]->[1]->[2]->[3]->[4]->[5]} -> ~

At the end of each array-based list, the link will go either to a sentinel (or null pointer), or to the first node in the next array-based list. This creates a linked list of arrays where each array is a linked list, and they are all chained together into one big abstract linked list. The implementation isn't overly difficult, and it's a good way to test your understanding of linked lists, so I'll leave it as an exercise. :-)

 

<Postponed due to time constraints  -jsw>

Top

From the twisted mind of Julienne Walker
EC Logo