How to Filter and Traverse DOM Tree with JavaScript

Did you know there’s a JavaScript API whose sole mission is to filter out and iterate through the nodes we want from a DOM tree? In fact, not one but there are two such APIs: NodeIterator and TreeWalker. They’re quite similar to one another, with some useful differences. Both can return a list of nodes that are present under a given root node while complying with any predefined and/or custom filter rules applied to them.

The predefined filters available in the APIs can help us target different kinds of nodes such as text nodes or element nodes, and custom filters (added by us) can further filter the bunch, for instance by looking for nodes with specific contents. The returned list of nodes are iterable, i.e. they can be looped through, and we can work with all the individual nodes in the list.

How to use the NodeIterator API

A NodeIterator object can be created using the createNodeIterator() method of the document interface. This method takes three arguments. The first one is required; it”s the root node that holds all the nodes we want to filter out.

The second and third arguments are optional. They are the predefined and custom filters, respectively. The predefined filters are available for use as constants of the NodeFilter object.

For example, if the NodeFilter.SHOW_TEXT constant is added as the second parameter it will return an iterator for a list of all the text nodes under the root node. NodeFilter.SHOW_ELEMENT will return only the element nodes. See a full list of all the available constants.

The third argument (the custom filter) is a function that implements the filter.

Here is an example code snippet:

<!doctype html>
<html lang='en'>
  <head>
    <meta charset='UTF-8'>
    <title>Document</title>
  </head>
  <body>
    <header><h1>title</h1></header>
    <div id='wrapper'>
      this is the page wrapper
      <p>Hello</p>
      <p>How are you?</p>
    </div>
    <span>txt</span>
    <a href='#'>some link</a>
    <footer>copyrights</footer>
  </body>
</html>

Assuming we want to extract the contents of all the text nodes that are inside the #wrapper div, this is how we go about it using NodeIterator:

var div = document.querySelector('#wrapper');
var nodeIterator = document.createNodeIterator(
  div,
  NodeFilter.SHOW_TEXT
);
while(nodeIterator.nextNode()) {
  console.log(nodeIterator.referenceNode.nodeValue.trim());
}
/* console output
[Log] this is the page wrapper
[Log] Hello
[Log]
[Log] How are you?
[Log]
*/

The nextNode() method of the NodeIterator API returns the next node in the list of iterable text nodes. When we use it in a while loop to access each node in the list, we log the trimmed contents of every text node into the console. The referenceNode property of NodeIterator returns the node the iterator is currently attached to.

As you can see in the output, there are some text nodes with just empty spaces for their contents. We can avoid showing these empty contents using a custom filter:

var div = document.querySelector('#wrapper');
var nodeIterator = document.createNodeIterator(
  div,
  NodeFilter.SHOW_TEXT,
  function(node) {
    return (node.nodeValue.trim() !== "") ?
    NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
  }
);
while(nodeIterator.nextNode()) {
  console.log(nodeIterator.referenceNode.nodeValue.trim());
}
/* console output
[Log] this is the page wrapper
[Log] Hello
[Log] How are you?
*/

The custom filter function returns the constant NodeFilter.FILTER_ACCEPTif the text node is not empty, which leads to the inclusion of that node in the list of nodes the iterator will be iterating over. Contrary, the NodeFilter.FILTER_REJECT constant is returned in order to exclude the empty text nodes from the iterable list of nodes.

How to use the TreeWalker API

As I mentioned before, the NodeIterator and TreeWalker APIs are similar to each other.

TreeWalker can be created using the createTreeWalker() method of the document interface. This method, just like createNodeFilter(), takes three arguments: the root node, a predefined filter, and a custom filter.

If we use the TreeWalker API instead of NodeIterator the previous code snippet looks like the following:

var div = document.querySelector('#wrapper');
var treeWalker = document.createTreeWalker(
  div,
  NodeFilter.SHOW_TEXT,
  function(node) {
    return (node.nodeValue.trim() !== "") ?
    NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
  }
);
while(treeWalker.nextNode()) {
  console.log(treeWalker.currentNode.nodeValue.trim());
}
/* output
[Log] this is the page wrapper
[Log] Hello
[Log] How are you?
*/

Instead of referenceNode, the currentNode property of the TreeWalker API is used to access the node to which the iterator is currently attached. In addition to the nextNode() method, Treewalker has other useful methods. The previousNode() method (also present in NodeIterator) returns the previous node of the node the iterator is currently anchored to.

Similar functionality is performed by the parentNode(), firstChild(), lastChild(), previousSibling(), and nextSibling() methods. These methods are only available in the TreeWalker API.

Here’s a code example that outputs the last child of the node the iterator is anchored to:

var div = document.querySelector('#wrapper');
  var treeWalker = document.createTreeWalker(
  div,
  NodeFilter.SHOW_ELEMENT
);
console.log(treeWalker.lastChild());
/*  output
[Log] <p>How are you?</p>
*/

Which API to choose

Choose the NodeIterator API, when you need just a simple iterator to filter and loop through the selected nodes. And, pick the TreeWalker API, when you need to access the filtered nodes’ family, such as their immediate siblings.

WebsiteFacebookTwitterInstagramPinterestLinkedInGoogle+YoutubeRedditDribbbleBehanceGithubCodePenWhatsappEmail