A simple hack to Andreas Mueller’s https://amueller.github.io/word_cloud that produces SVG output while making the bare minimum changes to the code.
Some examples to demonstrate that it really does work. The output is produced by a modified version of a_new_hope.py that prints out SVG in addition to the PNG. Unless otherwise noted, the fonts are from https://fonts.google.com. I’ve tested the SVG files in Firefox, Chrome, and Safari on OSX, iOS POP!os(a variant of Ubuntu) and Windows and they worked ok. The comparison examples did not work in I.E. You can click within the svg and search for words - even on an iPhone (search within a webpage is located under the sharing button).
oswald.svg, amatic_sc_bold.svg, zcool_kuaile.svg, zilla_slab_highlight_bold.svg, libre_barcode_39_text.svg, stalinist_one.svg, black_ops_one.svg, press_start_2p.svg
stalinist_one.svg has an incorrect layout of words due to an error in the TTF font file in googles github repository that has been corrected in the version at https://fonts.google.com. I left it as is to show what can happen when there is a mismatch between the font file that generated the PNG and the font file that generates the SVG. It is corrected in the example below. Oddly, the Google github repository has two versions of Roboto Slab with slight differences in “g”,”k”,”K” and “R”. Also, Black Ops One has a bug affecting “R” - it looks bad - but doesn’t affect wordcloud layout.
This example uses a font from Propublica. https://github.com/propublica/weepeople
Some more examples. This time the PNG output is generated with red text and the SVG output is generated with blue text. In an HTML file, the results are then stacked and the mix-blend-mode set to multiply. The two circles show the individual text colors and the resulting color when they overlap.
Amatic SC Bold, Crushed, Mountains of Christmas, Permanent Marker, Roboto Slab Bold, Stalinist One, Syncopate Bold, WeePeople
In most cases the PNG and SVG match quite well. Generally, it seems that mismatches most commonly occur in the longer text strings, possibly because of accumulating error due to integer math. The WeePeople example has the most significant errors, but it is also a rather idiosyncratic typeface.
When the code that generates the PNG writes a word to an image at (x,y) - (x,y) is the coordinate of the upper left bounding box of the rendered word. When SVG text is rendered to a screen at (x,y) - (x,y) is the coordinate of the start of the baseline of the rendered word. With some caveats, I wondered if that was the only significant difference between the two. So the first step was to transform (x,y) of the upper left bounding box of the text to (xSVG, ySVG) the coordinates of the text’s baseline and see what it looked like. And the result was ok, but clearly not correct either. And I thought that was possibly because the SVG rendered text in a more sophisticated manner. PNG code would write a word, “first” for example, as “f” glyph, “i” glyph, “r” glyph, “s” glyph, “t” glyph - and the SVG code would write a word to screen as “fi” ligature glyph, “r” glyph, “s” glyph, “t” glyph with kerning applied (and perhaps other transformations). And that’s not easy to fix in the Python code. But it’s trivial to fix in SVG in the style element - just turn it off to match what the PNG code does.
text{
font-kerning:none;
font-variant-ligatures:none
}
Add a single line - self.to_svg()
- to generate_from_text()
.
Here is the complete code for generate_from_text()
. The additional line is before the return call.
def generate_from_text(self, text):
"""Generate wordcloud from text.
The input "text" is expected to be a natural text. If you pass a sorted
list of words, words will appear in your output twice. To remove this
duplication, set ``collocations=False``.
Calls process_text and generate_from_frequencies.
..versionchanged:: 1.2.2
Argument of generate_from_frequencies() is not return of
process_text() any more.
Returns
-------
self
"""
words = self.process_text(text)
self.generate_from_frequencies(words)
self.to_svg()
return self
to_svg()
- before to_img()
(not that it has to be in that particular location - it’s just where I placed it).to_svg()
prints the words in the layout as SVG to standard output - just the words not the opening/closing <svg><style></style>...</svg>
elements. def to_svg(self):
for (word, count), font_size, position, orientation, color in self.layout_:
x = position[0]
y = position[1]
font = ImageFont.truetype(self.font_path, font_size)
ascent, descent = font.getmetrics()
"""
from stackoverflow - doesn't seem to be according to PIL docs (should return height, width) but doesn't work otherwise...
https://stackoverflow.com/questions/43060479/how-to-get-the-font-pixel-height-using-pil-imagefont
"""
(getsize_width, baseline), (offset_x, offset_y) = font.font.getsize(word)
"""
svg transform string - empty if no rotation (text horizontal), otherwise contains rotate and translate numbers
"""
svgTransform = ""
svgFill = ' fill="{}"'.format(color)
"""
this is all it takes to transform x,y to svg space
it was arrived at using the methods of computer graphics programmers
https://twitter.com/erkaman2/status/1104105232034861056
"""
if orientation is None:
svgX = y - offset_x
svgY = x + ascent - offset_y
else:
svgX = y + ascent - offset_y
svgY = x + offset_x
svgTransform = ' transform="rotate(-90, {}, {}) translate({}, 0)"'.format(svgX, svgY, -getsize_width)
"""
print SVG to standard output
"""
print ('<text x="{}" y="{}" font-size="{}"{}{}>{}</text>'.format(svgX, svgY, font_size, svgTransform, svgFill, word))
The examples at the top of the page were produced with this script - svg_a_new_hope_net.py.
It requires an internet connection. When run, it will download the required font, image mask, and text from github and output an SVG wordcloud using the Roboto typeface. So I’d suggest running it as:
python svg_a_new_hope_net.py >roboto.svg
The code contains a data structure containing data for several other typefaces. With the exception of the WeePeople typeface, the resulting output can be viewed by any web browser that has access to the internet - all the font information is accessed from Google Fonts. To produce the output for the overlay examples, modify the script as follows - set the background color to white, set the SVG text to blue and the PNG text to red. If the typeface the SVG uses is installed on your system, you will be able to open the SVG in Adobe Illustrator, Affinity Desiginer, or Inkscape.
If you want to produce output from font files on your machine, you can use - svg_a_new_hope_local.py. As is, it will produce an SVG wordcloud using bold weighted text from the Roboto Slab typeface, because the Ubuntu variant I tested it on happened to have the font file - /usr/share/fonts/truetype/roboto-slab/RobotoSlab-Bold.ttf. Just substitute appropriate values from your system for these lines:
fontFILE = '/usr/share/fonts/truetype/roboto-slab/RobotoSlab-Bold.ttf'
fontFamily = 'Roboto Slab'
fontWeight = 'Bold'
And run the script similarly… python svg_a_new_hope_local.py>roboto_slab_bold.svg
And remember that if your intention is to open the output in something like Adobe Illustrator, you must specify fonts that are in your system path or one of the other specific places that the application looks for font files, as opposed to some random folder where you may have squirreled away downloads.
The Output can be opened by Adobe Illustrator, Affinity Designer or Inkscape. There may be others but these are the ones that I am most familiar with. Inkscape is open source and free but is lacking in some of the text/character features you will need. Adobe Illustrator is expensive but works well with SVG in most cases. Affinity designer costs $50 - one time payment - no perpetual license fees and it actually works better with TrueType files than Adobe (so far, knock on wood). It has other oddities - sometimes SVG elements, for instance the background rectangles, are masked and you have to drag on a corner to unmask the element and make it visible.
Inkscape ignores the font-weight value set in the style element of the SVG. And you can’t just box select all the text and change the font weight - you can’t change one attribute of all the text elements. Instead it changes all the text to the same font size and weight. You could probably write a Python plugin to implement it.
Adobe Illustrator and Affinity Designer both ignore the ligature and kerning settings. For both applications, you can box select all the text and set the kerning to 0. In Affinity Designer, you can turn off ligatures in the same panel that lets you zero out kerning. Adobe Illustrator allows you to turn off the ligatures for OpenType fonts but not TrueType fonts. So in Illustrator you have to do the clumsy work-around of box selecting the text, changing it to an OpenType font (Myriad Pro happens to be the first one that appears for me), turn off ligatures, then change the font back to the original TrueType font - Illustrator “remembers” the ligature settings. I didn’t bother to investigate this in Inkscape - the font-weight issue was enough for me.
After opening roboto_slab_bold.svg
in Inkscape and noting the font weight problem, I transfered it to my Mac and opened it in Affinity Designer. Here’s a screenshot (the red squiggles are spell-check warnings) Note the panel to the right with the various character/typeface settings.