(I’m going to edit this post a few times. Do not yet consider it complete.)
I get the Believer magazine as a couple of files in Dropbox, including a PDF (for reading/hightlighting in GoodReader on iPad or Preview on Mac) and an IDML export of the original InDesign file.
Some non-printing characters are find&replace-d in InDesign
Each article, hed, and footnote is clicked, Cmd-A, Cmd-E, saved as an RTF to a folder in Dropbox called Export RTFs. Filenames are important.
Each of those files is selected.
Filename is copied.
File is dragged onto an Automator script generously concocted by @waldojaquith that basically turns these gears:
— uses textutil to export from RTF to HTML
— normalizes the linebreaks with BBEdit
— tidy on the textutil util
— runs a bunch of Find&Replace for the markup texutil keeps (a bunch of spans and stuff)
— concatenates
— BBEdit opens with marked up text.
textutil -convert html -stdout "$1"
tidy --wrap 0 --drop-font-tags yes --output-html yes --show-body-only yes --drop-proprietary-attributes yes --char-encoding utf8 --logical-emphasis yes --vertical-space yes --show-warnings no -quiet; exit 0
Cmd-S to save; Cmd-V to paste the filename; Cmd-W to close;
Repeat for each file. (Yes, this should be a bash script or anything else, but given that this whole thing involves ~ 50 files and only once/month, it’s oddly meditative and quick)
Open all those HTML/TXT files in BBEdit at once.
A lot of regex/grep replaces in specific files and for specific errors the Applescript hasn’t yet picked up on.
Here are the those grepfors: https://gist.github.com/71f25235d290d3cb849b
That’s where I’m up to today.
– I have to read the whole PDF to look for smallcaps.
– Can’t yet batch the whole folder
– Some of those replacements in grepfor can be rolled back into the Automator
– Automator should become a python or shell script
From there, each snippet file gets read and re-read a number of times to iron out the flaws.
Once OK, I copy/paste all the parts into the database through an app called Sequel Pro, which @kissane told me about. Before that, it was all phpmyadmin, which had a habit of actually timing out. (Also, scary how hard it was to back up.)
Screenshot of Sequel Pro on a large screen.
Then some PHP stuff happens that actually generates a bunch of flat files. Scott (my predecessor) built this and it’s actually pretty amazing. He pretty much wrote an equivalent of Movable Type from scratch (this was 2004 or something!)
Recently we added CloudFront to take some load off the hosting and speed things up.
I dream of moving this to Casein or WordPress.