GitHub's Camo Aggressive Caching Prevent Diagrams from being Updated in Markdown

+1 vote
asked Feb 13, 2018 in Closed feature request by Marcio
Hey PlantUML Team,

I use your proxy server to show UML diagrams in GitHub markdown files and they work just great. GitHub caches the generated images using a service of their own named "Camo" though: https://help.github.com/articles/about-anonymized-image-urls/

That's a pain, given that the generated images don't get updated when I change my UML files.

I wonder if you guys could set the header "Cache-Control" to "no-cache" as a permanent solution to this issue?

(more details on the GitHub help page in the link)

Thanks, dudes.

Marcio.

1 Answer

0 votes
answered Feb 13, 2018 by plantuml (294,960 points)
Are you sure that the header "Cache-Control" is the source of this issue ?

Changing your UML file means changing the generated URL (for example from http://www.plantuml.com/plantuml/png/SyfFKj2rKt3CoKnELR1Io4ZDoSa70000 to http://www.plantuml.com/plantuml/png/SoWkIImgAStDuNBAJrBGjLDmpCbCJbMmKiX8pSd9LqZEICnBJqtXSaZDIm4g0W00 )

If the UML source is different, then the generated URL is different. So "Cache-Control" should not be an issue here. Except if we miss something...
commented Feb 14, 2018 by Marcio
Hi,

Thanks for the reply. I'm using the API (i.e., http://www.plantuml.com/plantuml/proxy?src=<plantuml file's URL in rawgit>. I don't want to regenerate the URL every time. What I want is to change the UML file in my GitHub repository and see the correspondent rendered UML diagram image updated.

I called API in verbose (with cURL) and you are indeed setting "Cache-Control" with an expiration date. All you would need to do is set it to "no-cache" (as in the GitHub help page link I have provided previously).

Cheers,
Marcio.
commented Feb 14, 2018 by plantuml (294,960 points)
Ok so you are using /proxy?src=... We better understand the issue.

We are very uncomfortable about setting "Cache-Control" to "no-cache" for the whole proxy servlet, because we try to limit the usage of our bandwidth.

However, we have implemented something that might solve your issue :

If the user-agent of the HTTP request contains the string "Camo" (which is likely the case for request coming from GitHub server) then we remove Date / ETag headers and we add a "Cache-Control: no-cache" header instead. You can double-check this using cURL (so you need to setup a specific user-agent).

Tell us if it helps for the GItHub issue!

Regards,
commented Feb 14, 2018 by Marcio
Hi,

Thank you for looking into the issue! You guys rock! I have no idea what's the user agent though...

May I suggest a different solution? You could create a query parameter named "no_cache" and default it to "false" (or "cache" and default it to "true").  If caching is an issue, users could simply set the new query parameter to "no_cache=true" or "cache=false" in the request.

I think that this would be a more general solution which could possibly be applied to Camo or any other similar URL anonymizer or caching system calling your proxy server.

Cheers,
Marcio.
commented Feb 14, 2018 by plantuml (294,960 points)
Good idea!
So you can now have:

http://www.plantuml.com/plantuml/proxy?cache=no&src=https://raw.github.com/plantuml/plantuml-server/master/src/main/webapp/resource/test2diagrams.txt

The "Camo" user-agent stuff is still there also.

Tell us if it solves your issue!
Thanks
commented Feb 14, 2018 by Marcio
Perfect! Thanks so much! It should solve the issue (I see "Cache-Control: no_cache" in the headers when I call your proxy server).
commented Feb 15, 2018 by Marcio
Hi Again,

I asked GitHub's support what's the user agent they use and here's the answer:


"Hi Marcio,

Thanks for reaching out to GitHub Support about camo.

Camo uses a user agent of "Camo Asset Proxy #{version}"

I hope that this is the information you were seeking. Let me know if I can be of further help.

Thanks,

Steve
@slgraff
GitHub Support"

I think you could simply remove the user agent checking altogether, given that the new query parameter would suffice, but here's the info anyway in case you want to keep it.

Cheers,
Marcio.
commented Jul 20, 2020 by steve
Hi,

It seems that now the "Cache-Control" is set to "public, max-age=604800" despite passing "cache=no" in the query string. Any idea about what's happening? Or perhaps do you custom it only for "Camo" user agent?

Thanks,
commented Jul 21, 2020 by VGoncharuk
Having the same issue with ignoring the "cache=no" parameter. Please, fix it ASAP ( if possible )
commented Jul 22, 2020 by plantuml (294,960 points)
This should be fixed now.

Tell us if it's not working for you !
commented Jan 13, 2021 by Charles
I'm using .../proxy?cache=no&src=... in a GitHub Markdown file and it's still not updating properly.

I'm using !include in the files referred to in the Markdown file, and I would like to be able to change an included PlantUML file and see the change in the Markdown file.
commented Jul 16, 2021 by Fuhrmanator (1,700 points)
@Charles - if you're trying this on a private repo, it won't work because there's a token generated every time.
commented Aug 14, 2021 by kirchsth (4,880 points)
Hi @plantuml,

I think that the plantuml server cache control is the problem (details see dynamic-content-changed-include-calculated-cached-for-days)

Best regards
Helmut

...