This is a great reminder to follow the advice of a previous article here about the blurring of sensitive data.
1) If you care about the data, don't blur. Black it out, entirely.
2) If you don't want to black it out for aesthetic reasons, then you don't care enough about the data being private.
3) Black out every pixel, every time.
Pixelation is better than blurring but also not sufficient unless you use huge blocks.
If it's an aesthetic thing, someone could make a tool that looks like it's blurring, but actually puts in fake data and then blurs that.
But anyone who knows or cares enough to go download a purpose-made tool for censoring images probably just opened it in Paint and drew a black rectangle.
> If it's an aesthetic thing, someone could make a tool that looks like it's blurring
The classy thing to do would be to replace it with a real QR code that spells out "remember to drink your ovaltine" but is damaged enough you need about a million brute force attempts to decode it.
An alternative to black out is to interpolate the pixels at the borders of the "blacked out" box. I've seen it done to suppress logos on images, in many cases it's barely intrusive.
Applied to that QR-code, the currently blurred out part would be entirely white —because the original QR-code was surrounded by a white background.
That should take care of many aesthetic concerns, without using a single bit of secret information for the final rendering.
until you run into a government agency that thinks drawing a black box over something in a PDF is acceptable redaction, where the PDF is exported/published to the public in its full vector/metadata markup format, not a raster image... hilarious when you can just highlight and copy/paste to get whatever text is hidden. Have seen it several times.
be careful asking non technical employees, sans training, to "just draw a black box over it", is what I'm saying.
The govt should write themselves a virtual printer that would take in the data stream and output a PDF made of individual images. A typical print-to-PDF virtual printer, but government-sanctioned, and they could also make it do something like store the document in an audit database or whatever, just to give people a strong incentive not to skip it or use anything else.
It does open another risk of nation state style attacks that compromise the virtual print driver in a manner that it does not redact information, or encodes 'hidden' information into the format of the document. That seems easier to do with a digital print driver than with the messy digital to analog conversions that occur in a scanner/printer.
That would be interesting and not hard to setup. Install cups on some box, definitely not the host containing sensitive info, and you have a limited, auditable outbound channel.
Well if you can blur an arbitrary section you can, even if not black-black, just choose an average and write over everything with that (so adaptive grey).
I may misinterpret what you mean by “print it back as image to PDF printer”, but I doubt that removes the text. When printing the document, the program likely will just render the text first, and the black box over it. Reason? To be able to skip rendering the text, the computer must know:
- a black box will get drawn over it.
- that box completely covers the text.
- that box will be drawn opaquely.
Verifying that isn’t easy.
Save to .png is safer. It doesn’t support layers, so what you see is what you get.
PDF-xChange keeps text as text if printed normally, completely selectable, vectorized, but changes and rasterises it to image if printed as image, pixelated and jpeg-ized. At about 200% zoom it is clearly visible that it is pixelated, and nothing else.
1) If you care about the data, don't blur. Black it out, entirely. 2) If you don't want to black it out for aesthetic reasons, then you don't care enough about the data being private. 3) Black out every pixel, every time.