I'm building Resgen to create resumes tailored to job descriptions and a key feature is the ability for users to do final adjustments before saving. Then, when they are happy with the resume, save it to a PDF via the browser's print dialog.
One issue that became clear is that despite the user's PDF looking fine, when they uploaded the resume to an applicant tracking system (ATS) there were now a bunch of misspellings. What gives?
To get to the bottom of this, let's take a short typographic journey together and understand more about the browser's "Save to PDF" functionality.
A couple users reached out that they were experiencing misspellings in their resumes when they uploaded through job portals. Since it's pretty uncommon for GPT to make a typo (given that it made a bit of a stir when someone found a typo in ChatGPT), I felt pretty comfortable believing the generated text wasn't misspelled. So, I asked for some examples of what they were seeing and was given back the following:
|Original text||Parsed text|
|workflow efficiency||workow eciency|
|Identified efficiency||Identied eciency|
The keen among you probably already see a pattern: the f's are missing! Specifically, f's followed by 'i', 't', 'l', or another 'f'. There are a few more strange issues like q's missing and the 'gy' in strategy, but we will come to those next. So, what's going on here? Why are f's missing? To understand that, we need to learn about ligatures.
A ligature is when two or more letters are joined to form a single glyph, or shape. It's really best understood through a couple examples.
Below, I have two example ligatures using the very popular Roboto font. Notice how when the 'i' or 'l' is typed that the top of the 'f' changes shape? That's the 'f' turning into a ligature. It shifts from being an 'f' and an 'i' (or 'l') to a combined shape.
Upon learning this, it gave me the first hint of what's going on under the hood. Let's then take a look at how each of these are encoded by some browsers when saving to PDF.
Resgen is able to generate resumes using a chrome extension as well as through the site. So, different browsers are able to make use of Resgen without issue, meaning different browsers can potentially have different results.
Let's run a little experiment across 5 browsers to see how they encode the word 'office' using the Cormorant Garamond font (example below). I test the encoding by copying the output to the clipboard and pasting it. That is, all outputs show the word 'office', but not all of them are encoded that way and would show up differently when pasted.
From this little experiment, Firefox seems to be the only one that correctly exports 'office'! While Chrome, Chromium, and Arc (chromium-based) miss the 'ffi' ligature, and Safari replaces it with a double quotation mark.
The ligatures with 'f' are pretty common, common enough that they have specified CSS properties, which we'll get into next. But what's going on with the 'gy' and 'q'? Let's look more closely:
In Cormorant Garamond, when 'gy' is italicized, it turns into a ligature and the 'Qu' combination becomes a ligature as well. This clears up a lot now! All the font-specific and normal ligatures were not being properly encoded by Chrome. This is something that did not pop up in other fonts since many others did not have ligatures like those found in Cormorant Garamond.
The font-variant-ligatures property has a good number of options, but the two I needed to pay most attention to are: common-ligatures and discretionary-ligatures.
The common ligatures correspond to ones like we've seen for 'f': 'fi', 'ffi', 'ft', etc. They happen so frequently within writing that you are able to switch them on or off by using
Discretionary ligatures are font-specific and have similar controls to common ligatures with
So, through this little journey we've learned what ligatures are and how they are encoded for a handful of browsers. This has a real-world impact on how resumes are ingested in ATS systems and can potentially negatively impact an applicant.
The issue is resolved for Resgen, but if you use Word, Google Docs, or Overleaf, it's definitely worth checking if the resume really says what it shows when you save to PDF.