Skip to content

Conversation

JosePineiro
Copy link

@JosePineiro JosePineiro commented Aug 22, 2025

This PR updates AsyncStaticWebHandler::handleRequest to correctly manage caching headers when template processing introduces dynamic content. The previous implementation always set ETag and Last-Modified headers based on the underlying filesystem metadata, which was misleading when templates generated non-static responses (e.g., current time). Issue #237

The new implementation:

  • Uses file-based ETag when .gz files or template callback is not present.
  • Ensures gzip files generate consistent ETag values using _getEtag.
  • For non-gzip files, generate ETag values ​​using the timestamp or file size.
  • Returns 304 Not Modified only when If-None-Match matches the valid ETag.
  • Removed use of Variable Length Array and String, ensuring the same implementation works consistently on all platforms.

This prevents browsers from incorrectly reusing cached content when template output is dynamic, ensuring correctness while still allowing efficient caching for static resources.

This PR updates AsyncStaticWebHandler::handleRequest to correctly manage caching headers when template processing introduces dynamic content. The previous implementation always set ETag and Last-Modified headers based on the underlying filesystem metadata, which was misleading when templates generated non-static responses (e.g., current time). Issue ESP32Async#237 (ESP32Async#237)

The new implementation:
- Uses file-based ETag when .gz files or template callback is not present.
- Ensures gzip files generate consistent ETag values using _getEtag.
- For non-gzip files, generate ETag values ​​using the timestamp or file size.
- Returns 304 Not Modified only when If-None-Match matches the valid ETag.
- Removed use of Variable Length Array and String, ensuring the same implementation works consistently on all platforms.

This prevents browsers from incorrectly reusing cached content when template output is dynamic, ensuring correctness while still allowing efficient caching for static resources.
Copilot

This comment was marked as outdated.

JosePineiro and others added 3 commits August 22, 2025 19:37
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@mathieucarbou
Copy link
Member

@JosePineiro @me-no-dev @willmmiles : FYI the previous behavior was using as an etag value the last modification time or file size to speed up file serving and avoid too many file reads, especially on concurrent requests. The drawback is that this is incorrect as per what an etag should be, and also incorrect in regard to the callback option (templating).

This PR fixes that for gz file, keeps same behavior for non gz files and considers the callback value.

That's a good fix + improvement, even if at the expense of more file reading for gz files.

@mathieucarbou mathieucarbou requested a review from Copilot August 25, 2025 11:44
@mathieucarbou
Copy link
Member

Thanks a lot @JosePineiro 👍

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes incorrect ETag handling in AsyncStaticWebHandler to properly manage caching for dynamic template responses. The previous implementation always generated ETags based on file metadata, causing browsers to incorrectly cache dynamic content when templates were used.

  • Implements conditional ETag generation based on file type and template presence
  • Updates gzip file handling to extract ETag from CRC trailer data
  • Removes platform-specific Variable Length Array usage for better portability

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/literals.h Removes unused MJS file type constant and adds GZ_LEN constant for gzip file detection
src/WebHandlers.cpp Refactors handleRequest method to conditionally generate ETags and properly handle caching for template responses
src/ESPAsyncWebServer.h Adds AsyncStaticWebHandler as friend class to access private _getEtag method

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 262 to 265
response->addHeader(T_Cache_Control, T_no_cache, true);
}
else if (_cache_control.length()) {
response->addHeader(T_Cache_Control, _cache_control.c_str(), false);
Copy link
Preview

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The else-if condition creates an asymmetric caching behavior where files with ETags always get 'no-cache' directive, while files without ETags might get custom cache control. This logic should be clarified or the cache control handling should be more consistent.

Suggested change
response->addHeader(T_Cache_Control, T_no_cache, true);
}
else if (_cache_control.length()) {
response->addHeader(T_Cache_Control, _cache_control.c_str(), false);
}
if (_cache_control.length()) {
response->addHeader(T_Cache_Control, _cache_control.c_str(), false);
} else {
response->addHeader(T_Cache_Control, T_no_cache, true);

Copilot uses AI. Check for mistakes.

@willmmiles
Copy link

@JosePineiro @me-no-dev @willmmiles : FYI the previous behavior was using as an etag value the last modification time or file size to speed up file serving and avoid too many file reads, especially on concurrent requests. The drawback is that this is incorrect as per what an etag should be, and also incorrect in regard to the callback option (templating).

This PR fixes that for gz file, keeps same behavior for non gz files and considers the callback value.

That's a good fix + improvement, even if at the expense of more file reading for gz files.

+1 for using internal checksums for GZ files. A future PR could maybe generalize this for other kinds of content with some getETagFor(file) type function.

I'm glad to see the handling of templated content is getting some attention too. The previous behaviour of treating it as static was definitely not ideal!

I do have some concerns though:

  • This PR completely overrides any "Cache-Control" set by the user for non-templated static content. While I agree that typical best practices with ETags would be to require an "If-None-Match" transaction to validate that things haven't changed, I can easily imagine that there could be a client application that preferred using caching to reduce the number of HTTP calls against the microcontroller. For rarely updated content, some potential jankiness when it does change might be seen as the lesser evil than all the extra network operations all the time. If a client had that previously configured, it will break with this PR. Do we want to insist that ETags are the only supported cache management approach for non-templated content?

  • This PR will break the semantics of setLastModified() when used with filesystems that don't support getLastWrite(). Some clever user might have tried to use setLastModified() in conjunction with the templating engine -- ie. track when variables used in the template are updated, and call setLastModified() to apply that timestamp to the response.

I think what I'd ask for is:

  • If the user has supplied a cache control policy, always send it. If not set, for fully static content with an ETag, send Cache-Control: none.
  • If the user has supplied a cache control policy, try to send Last-Modified. Use the value from setLastModified() if set. If not set, for static content only (gzip and otherwise!), use the filesystem timestamp. (If no value is available, don't send, though.)

@JosePineiro
Copy link
Author

JosePineiro commented Aug 26, 2025

@willmmiles
1 - I think "ETag" and "If-None-Match" are the most efficient and secure ways to control browser cache.
2 - If I'm not mistaken, "Last-Modified" and "If-Modified-Since" make exactly the same amount of http requests as "ETag" and "If-None-Match"
3 - If you agree with points 1 and 2, ETags should be the only supported cache management approach for non-template content.
4 - I haven't considered that any user has used LastModified() in conjunction with the template engine (i.e., tracking when variables used in the template are updated).
Is it okay to add _last_modified in this case?
5 - I'd like to include disabling the template processor if the files are binary (webm, jpg, avif, wolf2, etc.). What do you think?

@mathieucarbou
Copy link
Member

@JosePineiro : I think we all agree Etags are better but Will mentioned a good point in the fact that a user could decide to control themselves their caching based on their own rules.

And I can say that this is typically what was often happening when serving files because caching management in the lib is quite new.

So instead of forcing Etags, a better approach could be to read the user request headers and answer appropriately according to the spec in the best way we can

In funcion void AsyncStaticWebHandler::handleRequest(AsyncWebServerRequest *request) add Last-Modified header if no GZ file or have template processor.
Sugeestion of @willmmiles
@JosePineiro
Copy link
Author

I think the ETag system used in this PR is ideal for files without templates or GZIP. I can't think of any circumstances where a user would be harmed, even if they use a different system.

In template files, we can respect user input in a generic way. This way, the user would have the best of both worlds: an automatic ETagsystem when possible and a manual system controlled by the user in all other cases.

Handling ETag and Last-Modified simultaneously complicates and slows down the code. It also leads to potential conflicts whose resolution can be very obscure for the user.

If the current behavior doesn't seem right, please tell me what you think is the best options:

  • If the user defines Cache-Control or Last-Modified, disable ETag?
  • If the user defines Cache-Control or Last-Modified, add them to ETag?
    And when there is a match:
  • If If-Modified-Since is OK and ETag is bad, do we send a 304 or send the file?
  • If If-Modified-Since is bad and ETag is OK, do we send a 304 or send the file?

@willmmiles
Copy link

Re cache_control: The most critical use case for cache_control is if the user has set "max-age" with the intent to reduce the number of transactions on static content. I think this must continue to be respected for static content. For microcontrollers, re-validation is not always the best policy -- every transaction has a cost in CPU time and memory and can affect the performance of other tasks. I think this decision should be left in the hands of the application author.

Or to put it another way: sometimes it is preferable to sacrifice correctness in rare cases (ie. the cache may become stale when you perform a firmware update) for performance in the common case (everyday UI access).

So I think the logic should be:

  if (_cache_control.length()) {
     response->addHeader(T_Cache_Control, _cache_control.c_str(), false);
  } else if (*etag != '\0') {
     response->addHeader(T_Cache_Control, T_no_cache, true);
  }

Re Last-Modified: I agree that trying to juggle both "If-Match" and "If-Modified-Since" would be problematic. I do think we should find a way to respect setLastModified() in some way, though. Probably the easiest thing to do is include the user value in the ETag if it was set by the user -- that way we need only one code path. Since there's no getLastModified(), we could replace the local String _lastModified with, say, a uint64_t, and change all of the setLastModified() implementations to populate it with a hash or somesuch. And, also, if it was explicitly set by the user, we should check/send it even with templated content.

Does that make sense?

@JosePineiro
Copy link
Author

I think I understand what you're saying.
Suppose the user has defined Cache-Control: private, max-age=3600

We would send to the browser:
Cache-Control: private, max-age=3600
ETag: "abc123"

This way, the browser won't request the page for the next hour.
Once the hour has passed, the browser will send us;
If-None-Match: "abc123"

If the etag remains, we will send:
HTTP/1.1 304 Not Modified
ETag: "abc123"

What to do if Cache-Control contains "no-store" or "immutable"? In both cases, an "ETag" is pointless, but sending it won't cause any problems.

I still think that if we can use "ETag," we shouldn't send "Last-Modified". If we can't use "ETag," we should send "Last-Modified" if it's user-defined.

Does this seem like an acceptable solution to you?

@willmmiles
Copy link

willmmiles commented Aug 29, 2025

I think I understand what you're saying. Suppose the user has defined Cache-Control: private, max-age=3600

We would send to the browser: Cache-Control: private, max-age=3600 ETag: "abc123"

This way, the browser won't request the page for the next hour. Once the hour has passed, the browser will send us; If-None-Match: "abc123"

If the etag remains, we will send: HTTP/1.1 304 Not Modified ETag: "abc123"

What to do if Cache-Control contains "no-store" or "immutable"? In both cases, an "ETag" is pointless, but sending it won't cause any problems.

You got it! :)

I still think that if we can use "ETag," we shouldn't send "Last-Modified". If we can't use "ETag," we should send "Last-Modified" if it's user-defined.

Does this seem like an acceptable solution to you?

I think we can do better. If we send the user "last modified time" as the ETag, then we don't have to parse "If-Modified-Since". Since ETags have no particular required format, I believe we can get away with this simplification.

So code like:

char etag_buf[9];
char* etag = etag_buf;
const char* tempFileName = request->_tempFile.name();
const size_t lenFilename = strlen(tempFileName);

if (_last_modified.length()) {
   etag = _last_modified.c_str();
} else if (lenFilename > T__GZ_LEN && memcmp(tempFileName + lenFilename - T__GZ_LEN, T__gz, T__GZ_LEN) == 0) {
   ...

(above code is not const-correct, but conveys the idea, I hope)

If the user has defined them, "Last-Modified" and "Cache-Control" are always sent.
@JosePineiro
Copy link
Author

@willmmiles
The user-defined "Cache-Control" is always sent. If not, "no-cache" is sent. Thank you for pointing out this improvement to me.

Implemented as specified in RFC 7232, section 2.4:

  • SHOULD send an entity-tag validator unless it is not feasible to generate one. (In this function, whenever the file is GZ or we have not a Template Processor)
  • SHOULD send a Last-Modified value if it is feasible to send one. (In this function,whenever the file is GZ or we have not a Template Processor, and the user has defined it)

In RFC 7232, section 3.3:
"A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field"
I have implemented exactly this behavior, since not following this instruction could be considered a bug.

Please note that "entity-tag" is ALWAYS sent when "Last-Modified" is sent.
Therefore, in a modern browser capable of handling "entity-tag," I can't think of any circumstances under which "Last-Modified" would have any effect.
In any case, I've implemented the code to validate "Last-Modified" as you requested.

Note for @willmmiles:
Please, answer to #220

Copy link

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more little things. Thanks for your perseverence on this, I'm excited to see this merged.

}
}
// 2. Otherwise, if there is no ETag and no Template processor but we have Last-Modified and Last-Modified matches
else if (*etag == '\0' && _callback == nullptr && _last_modified.length() > 0 && ims && *ims && strcmp(ims, _last_modified.c_str()) == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should respect _last_modified if the user has set it even if they're running the template processor. While this feature could be misused, it provides a mechanism to explicitly allow caching behaviour for templates if the user calls setLastModified() when the template variables are updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see bug #237.
This PR attempts to fix that bug. The behavior you suggest is precisely what's causing problems for the user.
If you think the bug is incorrect, please let me know, and I'll discard this PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed code still fixes that bug. You have removed the call to setLastModified() in the generation path, so unless the user has explicitly called setLastModified(), no "Last-Modified" header will be produced in non-static cases. The default behaviour would be correct for #237, without defeating an advanced user's ability to enable caching based on external knowledge of template state.

Comment on lines 245 to 261
// Get raw header pointers to avoid creating temporary String objects
const char* inm = request->header(T_INM).c_str(); // If-None-Match
const char* ims = request->header(T_IMS).c_str(); // If-Modified-Since

AsyncWebServerResponse *response;
bool notModified = false;
// 1. If the client sent If-None-Match and we have an ETag → compare
if (*etag != '\0' && inm && *inm) {
if (strcmp(inm, etag) == 0) {
notModified = true;
}
}
// 2. Otherwise, if there is no ETag and no Template processor but we have Last-Modified and Last-Modified matches
else if (*etag == '\0' && _callback == nullptr && _last_modified.length() > 0 && ims && *ims && strcmp(ims, _last_modified.c_str()) == 0) {
log_d("_last_modified: %s", _last_modified.c_str());
log_d("ims: %s", ims);
notModified = true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use references to avoid the use of raw pointers, eg. const String& inm = request->header(T_INM); -- but I'm not sure even that is actually required here, since the String needs to be referenced only once in both conditionals. We can omit the length check on the header() result as a missing header will return a vaild but empty string.

Also the log_d call needs to be gated on ESP32.

  bool notModified = false;
  // 1. If the client sent If-None-Match and we have an ETag → compare
  if (*etag != '\0' && request->header(T_INM) == etag) {
      notModified = true;
  }
  // 2. Otherwise, if there is no ETag but we have Last-Modified and Last-Modified matches
  else if (*etag == '\0' && _last_modified.length() > 0 && request->header(T_IMS) == _last_modified) {
    #ifdef ESP32
        log_d("_last_modified: %s", _last_modified.c_str());
    #endif
    notModified = true; 
  }

//File is a gz, get etag from CRC in trailer
if (!AsyncWebServerRequest::_getEtag(request->_tempFile, etag)) {
// File is corrupted or invalid
log_e("File is corrupted or invalid: %s", tempFileName);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ESP32 logging call must be #ifdef'd on ESP32.

request->_tempFile.close();
response = new AsyncBasicResponse(304); // Not modified
} else {
response = new AsyncFileResponse(request->_tempFile, filename, emptyString, false, _callback);
}
if (!response) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a regression here -- this check must also applied to the AsyncBasicResponse above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants