Originally published November 14, 2021 @ 2:56 am

WordPress is my favorite CMS, but the complexity and security issues introduced by PHP and the database are unnecessary in some cases. Migrating to a static Web site may be a better option when it is all you need.

There are plenty of choices like Gatsby, Hugo, and Jekyll. The trick is to convert your WP site to a compatible format while preserving the structure and as much of the original formatting as possible. Doing so isn’t just difficult – in some cases, it’s just an impossible task for an automated process and would require extensive manual editing.

Here I will revisit the process of converting your WP site to Markdown format. I already talked about the process of converting WP articles to Markdown some years ago. I think the time is ripe to revisit the matter. The wp2md utility I mentioned previously is pretty good, but there may be a better tool in the works – the wordpress-export-to-markdown Node.js script.

The installation process took a bit of figuring out, even though initially it seemed straightforward. The first step was to upgrade the Node.js version on my WSL2 Ubuntu 18.04.6 LTS from v.8-something to v.12. The process was simple enough:

apt -y upgrade
apt -y install curl dirmngr apt-transport-https lsb-release ca-certificates
curl -sL https://deb.nodesource.com/setup_12.x | bash -
apt -y install nodejs gcc g++ make
node --version
npm --version

The second step was to install the wordpress-export-to-markdown script:

npm install -g wordpress-export-to-markdown

There was an issue with Windows-style EOL characters in the script – likely introduced by the Windows GitHub client used by the author. I let him know, but just in case you run into the /usr/bin/env: "node\r": No such file or directory error, here’s how to fix it:

dos2unix /usr/local/lib/node_modules/wordpress-export-to-markdown/index.js

Step number three is to export your WordPress to XML using the “Export” function under the “Tools” menu. I just exported everything. All you need to do is to go to the folder containing your XML archive and run something like this:

/usr/local/bin/wordpress-export-to-markdown --post-folders=true --prefix-date=false --input "igoroseledko.WordPress.2021-11-13.xml" --output "/mnt/c/zip/tmp/output" --year-folders=false --month-folders=false --save-attached-images=true --save-scraped-images=true --include-other-types=true

You can adjust the settings as you see fit. This script works just fine for your regular blog posts where you talk about cats and politics. However, the result leaves a lot to be desired if your posts contain more complicated formatting, such as code with syntax highlighting.

A possible alternative is the “Export to Gatsby” plugin for WordPress. The plugin comes with a Web UI that should work just fine for smaller blogs. In my case, however, I found that the plugin seemed to lose track of things after running for a few minutes and offered me a broken ZIP archive. Probably something times out. It’s nothing you can fix by upping the max_execution_time in php.ini.

blank

I found that to avoid any timeouts, a better option is to use the WordPress CLI utility – the wp. Here’s what I did:

wp --allow-root gatsby-markdown-export \
--path='/data/htdocs/igoroseledko.com' \
--directory='/mnt/nas04/backups/wp2md/' \
--post_types=post \
--post_status=publish

The result still needs some cleanup. I use EnglighterJS Syntax Highlighter for displaying code snippets in my posts, and there’s no ready-made utility that can convert this correctly into Markdown format. In the Markdown files, the code sections are contained inside ```tags but also has extraneous <pre class="EnlighterJSRAW" data-enlighter-language="sh" data-enlighter-title="">So this gives me something to work with. I wrote a quick script to replace these Enlighter tags with the ones that work with my favorite Markdown editor (Typora):

f="/var/tmp/markdown_file.md"
sed -e 's/://' -e 's/\(<pre\).*\(>\)//' "${f}" | awk '{for(i=1; i<=NF; i++) if($i=="```") if(++count%2==1) $i="```bash"}1' | sponge "${f}"

Whichever conversion tool you end up using, some cleanup work will be needed in most cases. The amount of this manual effort will depend on the complexity of your blog posts.

In case you need to convert from Markdown to PDF, HTML, or DOCX, you will need to install pandoc and texlive packages. The latter may require quite a bit of disk space, so be careful there:

yum -y install pandoc texlive texlive-latex pdflatex texlive-*.noarch 

# Or

apt install -y install texlive texlive-*

Create a listings-setup.tex with the following configuration:

% Contents of listings-setup.tex
\usepackage{xcolor}

\lstset{
    language=Bash,
    basicstyle=\fontsize{8}{10}\ttfamily,
    numbers=left,
    keywordstyle=\color[rgb]{0.13,0.29,0.53}\bfseries,
    stringstyle=\color[rgb]{0.31,0.60,0.02},
    commentstyle=\color[rgb]{0.56,0.35,0.01}\itshape,
    numberstyle=\footnotesize,
    stepnumber=1,
    numbersep=5pt,
    backgroundcolor=\color[RGB]{248,248,248},
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    tabsize=2,
    captionpos=b,
    breaklines=true,
    breakatwhitespace=true,
    breakautoindent=true,
    escapeinside={\%*}{*)},
    linewidth=\textwidth,
    basewidth=0.5em,
    showlines=true,
}

And now you can convert your Markdown file to PDF:

f="markdown_file.md"
pandoc 
--listings 
--latex-engine=xelatex 
--highlight-style pygments 
-H listings-setup.tex "${f}"  
-V geometry:"left=2cm, top=2cm, right=2cm, bottom=3cm" 
-V fontsize=11pt -o "${f%.md}.pdf" 2>/dev/null