HTML

Extracts elements from an HTML document.

Where and why do we use the HTML node?

The HTML node parses HTML documents and extracts specific elements using CSS selectors. This is essential when you need to scrape data from web pages, extract specific content from HTML responses, or process HTML documents to retrieve structured information. Unlike the template node which generates HTML, this node is purely for parsing and extraction.

How it works

The HTML node uses CSS selectors to find and extract elements from HTML content in msg.payload. You specify which elements to extract using standard CSS selector syntax (like h1, .classname, #id, or more complex selectors). The node supports a combination of CSS and jQuery selectors - see the css-select documentation for the full syntax.

The selector can be configured in the node's edit panel or provided dynamically via msg.select.

Modes of operation

The HTML node can output extracted content in different ways:

Single Message with Array

Returns one message where msg.payload contains an array of all matched elements. Use this when you want to process all results together or need to know the total count of matches.

Multiple Messages

Sends separate messages for each matched element. Each message contains one matched element in msg.payload and includes a msg.parts property for sequence tracking. Use this when you want to process each match individually through subsequent nodes.

Return Format

For each matched element, you can choose to return:

HTML markup - the complete HTML including tags and attributes
Text content - just the text with all HTML tags stripped

How the node handles messages

The HTML node processes the HTML string in msg.payload. After parsing and extracting the specified elements, it outputs the results according to the configured mode.

When outputting multiple messages, the node automatically adds the msg.parts property to enable proper handling by downstream nodes like Join. This property includes the sequence identifier, message index, and total count.

The node uses CSS selector syntax with jQuery extensions, so you can use:

Tag selectors: h1, div, span
Class selectors: .classname
ID selectors: #elementid
Attribute selectors: [href], [data-value="123"]
Complex selectors: div.content > p, ul li:first-child
jQuery extensions: :first, :last, :even, :odd

Examples

Extracting page titles

This example fetches the Node-RED homepage and extracts the text from the h1 tag. The HTTP Request node retrieves the page, and the HTML node parses it to find the heading.

Node Documentation

Extracts elements from an html document held in msg.payload using a CSS selector.

Inputs

payload string: the html string from which to extract elements.
select string: if not configured in the edit panel the selector can be set as a property of msg.

Output

payload array | string: the result can be either a single message with a payload containing an array of the matched elements, or multiple messages that each contain a matched element. If multiple messages are sent they will also have parts set.

Details

This node supports a combination of CSS and jQuery selectors. See the css-select documentation for more information on the supported syntax.