Originally published November 14, 2021 @ 2:56 am
WordPress is my favorite CMS, but the complexity and security issues introduced by PHP and the database are unnecessary in some cases. Migrating to a static Web site may be a better option when it is all you need.
There are plenty of choices like Gatsby, Hugo, and Jekyll. The trick is to convert your WP site to a compatible format while preserving the structure and as much of the original formatting as possible. Doing so isn’t just difficult – in some cases, it’s just an impossible task for an automated process and would require extensive manual editing.
Here I will revisit the process of converting your WP site to Markdown format. I already talked about the process of converting WP articles to Markdown some years ago. I think the time is ripe to revisit the matter. The wp2md
utility I mentioned previously is pretty good, but there may be a better tool in the works – the wordpress-export-to-markdown
Node.js script.
The installation process took a bit of figuring out, even though initially it seemed straightforward. The first step was to upgrade the Node.js version on my WSL2 Ubuntu 18.04.6 LTS from v.8-something to v.12. The process was simple enough:
apt -y upgrade apt -y install curl dirmngr apt-transport-https lsb-release ca-certificates curl -sL https://deb.nodesource.com/setup_12.x | bash - apt -y install nodejs gcc g++ make node --version npm --version
The second step was to install the wordpress-export-to-markdown
script:
npm install -g wordpress-export-to-markdown
There was an issue with Windows-style EOL characters in the script – likely introduced by the Windows GitHub client used by the author. I let him know, but just in case you run into the /usr/bin/env: "node\r": No such file or directory
error, here’s how to fix it:
dos2unix /usr/local/lib/node_modules/wordpress-export-to-markdown/index.js
Step number three is to export your WordPress to XML using the “Export” function under the “Tools” menu. I just exported everything. All you need to do is to go to the folder containing your XML archive and run something like this:
/usr/local/bin/wordpress-export-to-markdown --post-folders=true --prefix-date=false --input "igoroseledko.WordPress.2021-11-13.xml" --output "/mnt/c/zip/tmp/output" --year-folders=false --month-folders=false --save-attached-images=true --save-scraped-images=true --include-other-types=true
You can adjust the settings as you see fit. This script works just fine for your regular blog posts where you talk about cats and politics. However, the result leaves a lot to be desired if your posts contain more complicated formatting, such as code with syntax highlighting.
A possible alternative is the “Export to Gatsby” plugin for WordPress. The plugin comes with a Web UI that should work just fine for smaller blogs. In my case, however, I found that the plugin seemed to lose track of things after running for a few minutes and offered me a broken ZIP archive. Probably something times out. It’s nothing you can fix by upping the max_execution_time
in php.ini
.
I found that to avoid any timeouts, a better option is to use the WordPress CLI utility – the wp
. Here’s what I did:
wp --allow-root gatsby-markdown-export \ --path='/data/htdocs/igoroseledko.com' \ --directory='/mnt/nas04/backups/wp2md/' \ --post_types=post \ --post_status=publish
The result still needs some cleanup. I use EnglighterJS Syntax Highlighter for displaying code snippets in my posts, and there’s no ready-made utility that can convert this correctly into Markdown format. In the Markdown files, the code sections are contained inside ```
tags but also has extraneous <pre class="EnlighterJSRAW" data-enlighter-language="sh" data-enlighter-title="">
So this gives me something to work with. I wrote a quick script to replace these Enlighter tags with the ones that work with my favorite Markdown editor (Typora):
f="/var/tmp/markdown_file.md" sed -e 's/://' -e 's/\(<pre\).*\(>\)//' "${f}" | awk '{for(i=1; i<=NF; i++) if($i=="```") if(++count%2==1) $i="```bash"}1' | sponge "${f}"
Whichever conversion tool you end up using, some cleanup work will be needed in most cases. The amount of this manual effort will depend on the complexity of your blog posts.
In case you need to convert from Markdown to PDF, HTML, or DOCX, you will need to install pandoc
and texlive
packages. The latter may require quite a bit of disk space, so be careful there:
yum -y install pandoc texlive texlive-latex pdflatex texlive-*.noarch # Or apt install -y install texlive texlive-*
Create a listings-setup.tex
with the following configuration:
% Contents of listings-setup.tex \usepackage{xcolor} \lstset{ language=Bash, basicstyle=\fontsize{8}{10}\ttfamily, numbers=left, keywordstyle=\color[rgb]{0.13,0.29,0.53}\bfseries, stringstyle=\color[rgb]{0.31,0.60,0.02}, commentstyle=\color[rgb]{0.56,0.35,0.01}\itshape, numberstyle=\footnotesize, stepnumber=1, numbersep=5pt, backgroundcolor=\color[RGB]{248,248,248}, showspaces=false, showstringspaces=false, showtabs=false, tabsize=2, captionpos=b, breaklines=true, breakatwhitespace=true, breakautoindent=true, escapeinside={\%*}{*)}, linewidth=\textwidth, basewidth=0.5em, showlines=true, }
And now you can convert your Markdown file to PDF:
f="markdown_file.md" pandoc --listings --latex-engine=xelatex --highlight-style pygments -H listings-setup.tex "${f}" -V geometry:"left=2cm, top=2cm, right=2cm, bottom=3cm" -V fontsize=11pt -o "${f%.md}.pdf" 2>/dev/null
Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.