Using html-parser
for when an API returns HTML
As (very) eagle-eyed visitors may have noticed - I now have a Liked Posts section on my website.
I have been using the social features of Inoreader to curate a list of posts I have enjoyed reading from my subscribed RSS feeds.
So I thought;
Hey, Thom! Why not share this with your loyal followers?
I have seen other people do this (and my page notes this is not my work and you can ask me to unlike your post if you don't want it to be there!) and I think it's a nice feature - a way of expanding my own reading list too.
The Problem™️
Inoreader's free tier does not export the Liked posts in a JSON feed 😭
The Solution™️
There are a few solutions —
- I could fork out for Inoreader's Enterprise tier
- I could jump ship to a competitor that offers JSON feeds for free (or cheaper)
- I could work with what I have
As it currently stands, I can't justify an Inoreader Enterprise account (sorry, folks!)
I have tried a few RSS feed readers and I do really like Inoreader so I don't really want to move elsewhere - just yet, at least.
So that leaves make do and mend…
What do I actually get?
I get given a URL to hit: https://www.inoreader.com/stream/user/1005469327/tag/user-liked/view/html?cs=m
which returns a webpage that looks a little like this;
<body class="article_magazine_content_wrapper">
<div class="wrapper">
<div class="header">
<!-- Nothing of use here -->
</div>
<div class="body" id="snip_body">
<!-- This is where your liked posts actually are -->
</div>
</div>
</body>
Inside the <div class="body" id="snip_body" />
is a list of your liked posts in this format;
<div class="article_magazine_content_wrapper">
<div class="article_magazine_picture_wrapper">
<a href="https://css-irl.info/setting-up-a-newish-macbook/" target="_blank">
<div class="article_magazine_picture" style="background-image:url('https://css-irl.info/setting-up-a-new-macbook.svg')">
</div>
</a>
</div>
<div class="article_magazine_title_content">
<div>
<div class="article_magazine_title">
<a class="article_magazine_title_link" target="_blank" href="https://css-irl.info/setting-up-a-newish-macbook/" id="at_36518581135">Setting Up a New(ish) MacBook</a>
</div>
</div>
<div class="article_magazine_content">
I recently dusted off a relatively old (~5 years) MacBook and replaced the battery with the plan that I could use it as a secondary machine, for my “non-work” stuff. The last couple of times I’ve got a new Mac I’ve gone for the option of cloning my old setup, so I don’t need to install everything again. This time, however, the whole point was...
</div>
</div>
<div class="article_magazine_footer">
<div class="article_author">
<span class="au1">posted 28m ago by <span style="font-style:italic">Michelle Barker</span></span>
<span class="au1">via</span>
<a class="feed_link" href="https://css-irl.info/" target="_blank">CSS { In Real Life }</a>
</div>
</div>
</div>
So far, so good! There are two things we could do now;
- Churn out the markup as is into our page and write new CSS to handle the styles
- Make a new data object from the HTML so we can style it however we want!
I chose the second option…
Enter html-parser
Parsing HTML is a horrible thing to have to deal with so I'm glad someone took one for the team and released html-parser
- Thanks tmont 💜.
I'm using Eleventy to build my site so this guide may be quite specific. I'm sure the general principles work for a lot of other ways to build websites though!
Install the package
npm install -D html-parser
We're using the -D
flag here because we're doing this data manipulation at build time so we don't need access to the package in the browser. This means we're not unnecessarily shipping JavaScript (or anything else) to the user.
Create a _data
file
// _src/_data/likedPosts.js (or wherever!)
const fetch = require('isomorphic-fetch')
const { parse } = require('node-html-parser')
const init = async () => {}
module.exports = init()
You're going to need some way to fetch
the endpoint. I like using Isomorphic Fetch (mostly because I like the name 🤪) but you could use Node Fetch or Axios or whatever.
init()
const init = async () => {
try {
const endpoint = 'YOUR_API_ENDPOINT'
const html = await fetch(endpoint).then(r => r.text())
const content = await edit(html)
return content
}
catch(error) {
console.log('🤡', error)
return []
}
}
Notice we're using r.text()
instead of r.json()
when the API returns successfully. This gives us a String
of HTML to be manipulated in the edit()
function.
edit()
const edit = (markup) => {
const x = parse(markup)
const y = [...x.querySelectorAll('.article_magazine_content_wrapper')]
const z = y.map(a => render(a))
return z
}
What html-parser
does that makes it vastly superior to using RegEx is it parses
the given string into a version of the DOM.
Now we have the DOM, we can query it like we have access to the document
- which is exactly what we want to do! Yay!
The edit
function is doing a few things here, so let's dig in;
const x = parse(markup)
As described above, this line creates a DOM of the passed markup.
const y = [...x.querySelectorAll('.article_magazine_content_wrapper')]
This uses the (hopefully familiar) [...element.querySelectorAll('.identifier')]
syntax to create an Array
of (in this case) every element with a class of article_magazine_content_wrapper - our liked posts.
const z = y.map(a => render(a))
For each liked post found, render the new markup. In this case the render
function is written specifically for this use case - keep reading!
render()
You can render the response however you like (that's kind of the point) but, for completeness, here's what I chose to do…
const trim = (x) => x.replace(/\t|\n/g, '')
const el = {
hero: (x) => x.querySelector('.article_magazine_picture_wrapper'),
title: (x) => trim(x.querySelector('.article_magazine_title').rawText),
content: (x) => trim(x.querySelector('.article_magazine_content').rawText),
author: (x) => trim(x.querySelector('.article_magazine_footer .feed_link').innerText),
link: (x) => x.querySelector('.article_magazine_footer .feed_link').attributes.href,
}
const render = (x) => trim(`<article class="card flow border shadow radius">
<header class="content">
<h3>${el.title(x)}</h3>
<small>From ${el.author(x)}</small>
</header>
<div class="content flow">
${el.content(x)}
</div>
<footer class="content flow">
<a href="${el.link(x)}" class="button breakout border shadow radius">Read more</a>
<p><small> </small></p>
</footer>
</article>`)
Which should give you the cards you can see on the Liked Posts page 🎉
<article class="card flow border shadow radius">
<header class="content">
<h3> Setting Up a New(ish) MacBook </h3> <small>From CSS { In Real Life }</small>
</header>
<div class="content flow"> I recently dusted off a relatively old (~5 years) MacBook and replaced the battery with the plan that I could use it as a secondary machine, for my “non-work” stuff. The last couple of times I’ve got a new Mac I’ve gone for the option of cloning my old setup, so I don’t need to install everything again. This time, however, the whole point was... </div>
<footer class="content flow">
<a href="https://css-irl.info/" class="button breakout border shadow radius">Read more</a>
<p><small> </small></p>
</footer>
</article>
There's probably loads of stuff that could be improved with this approach but it's working for me so far.
I hope someone else finds it useful 😎
Fin
onward-journeys module
Real. Simple. Syndication.
Get my latest content in your favorite RSS reader.
I use InoReader but you don't have to.
Comments
In almost all cases, the comments section is a vile cesspool of Reply Guys, racists, and bots.
I don't want to have to deal with that kind of hell so I don't have a comments section.
If you want to continue the conversation, you can always hit me up on Mastodon (which is not a vile cesspool of Reply Guys, racists, and bots).