Skip to content

Apologies for the appearance of the site. I'm designing in public!

Some code on a laptop screen

Using html-parser for when an API returns HTML

Published on Tuesday, 28 February 2023.
At 1,067 words, this article should take about 7 minutes to read.

As (very) eagle-eyed visitors may have noticed - I now have a Liked Posts section on my website.

I have been using the social features of Inoreader to curate a list of posts I have enjoyed reading from my subscribed RSS feeds.

So I thought;

Hey, Thom! Why not share this with your loyal followers?

I have seen other people do this (and my page notes this is not my work and you can ask me to unlike your post if you don't want it to be there!) and I think it's a nice feature - a way of expanding my own reading list too.

The Problem™️

Inoreader's free tier does not export the Liked posts in a JSON feed 😭

The Solution™️

There are a few solutions —

As it currently stands, I can't justify an Inoreader Enterprise account (sorry, folks!)

I have tried a few RSS feed readers and I do really like Inoreader so I don't really want to move elsewhere - just yet, at least.

So that leaves make do and mend

What do I actually get?

I get given a URL to hit: https://www.inoreader.com/stream/user/1005469327/tag/user-liked/view/html?cs=m which returns a webpage that looks a little like this;

<body class="article_magazine_content_wrapper">
<div class="wrapper">
<div class="header">
<!-- Nothing of use here -->
</div>
<div class="body" id="snip_body">
<!-- This is where your liked posts actually are -->
</div>
</div>
</body>

Inside the <div class="body" id="snip_body" /> is a list of your liked posts in this format;

<div class="article_magazine_content_wrapper">
<div class="article_magazine_picture_wrapper">
<a href="https://css-irl.info/setting-up-a-newish-macbook/" target="_blank">
<div class="article_magazine_picture" style="background-image:url('https://css-irl.info/setting-up-a-new-macbook.svg')">
</div>
</a>
</div>
<div class="article_magazine_title_content">
<div>
<div class="article_magazine_title">
<a class="article_magazine_title_link" target="_blank" href="https://css-irl.info/setting-up-a-newish-macbook/" id="at_36518581135">Setting Up a New(ish) MacBook</a>
</div>
</div>
<div class="article_magazine_content">
I recently dusted off a relatively old (~5 years) MacBook and replaced the battery with the plan that I could use it as a secondary machine, for my “non-work” stuff. The last couple of times I’ve got a new Mac I’ve gone for the option of cloning my old setup, so I don’t need to install everything again. This time, however, the whole point was...
</div>
</div>
<div class="article_magazine_footer">
<div class="article_author">
<span class="au1">posted 28m ago by <span style="font-style:italic">Michelle Barker</span></span>
<span class="au1">via</span>
<a class="feed_link" href="https://css-irl.info/" target="_blank">CSS { In Real Life }</a>
</div>
</div>
</div>

So far, so good! There are two things we could do now;

  1. Churn out the markup as is into our page and write new CSS to handle the styles
  2. Make a new data object from the HTML so we can style it however we want!

I chose the second option…

Enter html-parser

Parsing HTML is a horrible thing to have to deal with so I'm glad someone took one for the team and released html-parser - Thanks tmont 💜.

I'm using Eleventy to build my site so this guide may be quite specific. I'm sure the general principles work for a lot of other ways to build websites though!

Install the package

npm install -D html-parser

We're using the -D flag here because we're doing this data manipulation at build time so we don't need access to the package in the browser. This means we're not unnecessarily shipping JavaScript (or anything else) to the user.

Create a _data file

// _src/_data/likedPosts.js (or wherever!)

const fetch = require('isomorphic-fetch')
const { parse } = require('node-html-parser')

const init = async () => {}

module.exports = init()

You're going to need some way to fetch the endpoint. I like using Isomorphic Fetch (mostly because I like the name 🤪) but you could use Node Fetch or Axios or whatever.

init()

const init = async () => {
try {
const endpoint = 'YOUR_API_ENDPOINT'
const html = await fetch(endpoint).then(r => r.text())
const content = await edit(html)
return content
}
catch(error) {
console.log('🤡', error)
return []
}
}

Notice we're using r.text() instead of r.json() when the API returns successfully. This gives us a String of HTML to be manipulated in the edit() function.

edit()

const edit = (markup) => {
const x = parse(markup)
const y = [...x.querySelectorAll('.article_magazine_content_wrapper')]
const z = y.map(a => render(a))
return z
}

What html-parser does that makes it vastly superior to using RegEx is it parses the given string into a version of the DOM.

Now we have the DOM, we can query it like we have access to the document - which is exactly what we want to do! Yay!

The edit function is doing a few things here, so let's dig in;

const x = parse(markup)

As described above, this line creates a DOM of the passed markup.

const y = [...x.querySelectorAll('.article_magazine_content_wrapper')]

This uses the (hopefully familiar) [...element.querySelectorAll('.identifier')] syntax to create an Array of (in this case) every element with a class of article_magazine_content_wrapper - our liked posts.

const z = y.map(a => render(a))

For each liked post found, render the new markup. In this case the render function is written specifically for this use case - keep reading!

render()

You can render the response however you like (that's kind of the point) but, for completeness, here's what I chose to do…

const trim = (x) => x.replace(/\t|\n/g, '')

const el = {
hero: (x) => x.querySelector('.article_magazine_picture_wrapper'),
title: (x) => trim(x.querySelector('.article_magazine_title').rawText),
content: (x) => trim(x.querySelector('.article_magazine_content').rawText),
author: (x) => trim(x.querySelector('.article_magazine_footer .feed_link').innerText),
link: (x) => x.querySelector('.article_magazine_footer .feed_link').attributes.href,
}

const render = (x) => trim(`<article class="card flow border shadow radius">
<header class="content">
<h3>
${el.title(x)}</h3>
<small>From
${el.author(x)}</small>
</header>
<div class="content flow">
${el.content(x)}
</div>
<footer class="content flow">
<a href="
${el.link(x)}" class="button breakout border shadow radius">Read more</a>
<p><small>&nbsp;</small></p>
</footer>
</article>
`
)

Which should give you the cards you can see on the Liked Posts page 🎉

<article class="card flow border shadow radius">
<header class="content">
<h3> Setting Up a New(ish) MacBook </h3> <small>From CSS { In Real Life }</small>
</header>
<div class="content flow"> I recently dusted off a relatively old (~5 years) MacBook and replaced the battery with the plan that I could use it as a secondary machine, for my “non-work” stuff. The last couple of times I’ve got a new Mac I’ve gone for the option of cloning my old setup, so I don’t need to install everything again. This time, however, the whole point was... </div>
<footer class="content flow">
<a href="https://css-irl.info/" class="button breakout border shadow radius">Read more</a>
<p><small>&nbsp;</small></p>
</footer>
</article>

There's probably loads of stuff that could be improved with this approach but it's working for me so far.

I hope someone else finds it useful 😎


Fin

Comments

In almost all cases, the comments section is a vile cesspool of Reply Guys, racists, and bots.

I don't want to have to deal with that kind of hell so I don't have a comments section.

If you want to continue the conversation, you can always hit me up on Mastodon (which is not a vile cesspool of Reply Guys, racists, and bots).

onward-journeys module