-
-
Notifications
You must be signed in to change notification settings - Fork 6k
Stream repo zip/tar.gz/bundle achives by default #35487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here I kept the POST+GET negotiation, where the POST does nothing when enabled |
Will check later, I think it should be possible to turn into a GET. Benefit is that these downloads will be curl-able. |
They are curl-able for some amount of time already, just not via html inspect. I believe in 2021 this didn't work in gitea. e.g. We would get a cleaner html page if we remove the POST as long this setting on. |
I wonder if we even need the setting at all. Are there any benefits to non-streaming downloads? The only one I can think of is reduced CPU usage for instances that receive a lot of downloads, but it's rather niche I think, especially because if a repo is pushed to often, that cache will invalidate every push. |
I would not require any setting, but I also do not host gitea productively anywhere. Let's hear others opinions, given I assume this would be more a change for a future release like 1.26.0. If we decide to removing the repo-archive cache, then we also no longer need the repo-archive settings. I assume this way of doing this has been inherited from Gogs. |
I guess non-streaming does have a benefit for archived or otherwise rarely pushed-to repos. FWIW, Github does only implement streaming and I think this is also the only mode that can work on big instances without consuming copious amounts of disk space (think crawlers crawling archive links for all commit hashes). |
I pushed the UI change which will omit the POST when streaming is enabled. The archive links work natively without JS when streaming is enabled. |
* origin/main: Clean up npm dependencies (go-gitea#35484) Update eslint to v9 (go-gitea#35485) Revert the rspack change (go-gitea#35482) Replace gobwas/glob package (go-gitea#35478) Fix various typos in codebase (go-gitea#35480) Fix different behavior in status check pattern matching with double stars (go-gitea#35474) Replace webpack with rspack (go-gitea#35460) Don't store repo archives on `gitea dump` (go-gitea#35467)
web_src/js/features/repo-common.ts
Outdated
// when archive streaming is enabled, the links will work natively without JS | ||
if (!window.config.streamArchives) { | ||
queryElems(document, 'a.archive-link[href]', (el) => el.addEventListener('click', onDownloadArchive)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too hacky and fragile, there will be bugs in the future if somebody touches the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what do you suggest? We could alternatively just remove non-streaming mode entirely which would make all the JS and probably a good chunk of Go code deletable (my preference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you need this JS? "GET" request can always start the downloading. So you can just make POST check return immediately for streaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still a unneccesary round-trip to the server. It's better to not tamper with these links if we don't need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think we could eliminate the POST (and the JS) in both cases if we just delay the GET response in non-streaming mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any objection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me would be your plan ok as well.
On the other hand the current js change makes the file download start sightly faster without right click menu and stops showing me http 400 codes shortly after restarts (invalid csrf tokens? auto fixes itself with later page refresh, GET is not protected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make some changes according to my plan. I think eventually we will get rid of the repo archive and remove a lot of code.
the current js change ....
I would prefer not going with current JS. It couples a lot of modules (source files) together, a single "true/false" value is passed everywhere, which is not ideal. We should avoid such fragile "global variable".
And eventually, we will remove all the legacy JS
showing me http 400 codes shortly after restarts (invalid csrf tokens?
IIRC CSRF token is preserved between restarts, can you reproduce the CSRF token problem by other forms? For example: a comment form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead and revert the JS if you like. I still hope we eventually can change all use cases to GET only, but the POST+GET isn't bothering me too much I guess.
Edit: I see you did already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all. Let's keep the unnecessary "POST" for a short time, then completely remove it in next release 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After simplifing Archiverequest.Stream, I decided to call this as well when caching the repository but use the pipe as destination
Made some changes, and I merged the legacy copy-pasted code (since long time ago ...) into one Now There are tests covering "repo archive download", so as long as tests pass, I think it is good enough. |
Initial implementation of linked proposal.