-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume partial downloads. #317
Comments
A sample of what I'm currently (2024-12-17 02:37 UTC) experiencing (same work-unit):
|
I think this might be a non-issue related to #130. I was sure that I saw upload progress on Windows previously, but I must have been mistaken. Being stuck on 0% must have been a factor of this issue here, where uploads were constantly restarting from 0%. And when three uploads were in progress simultaneously, no more work was being downloaded. |
Could this be the MTU size problem? |
All my hosts running fah are connected via ethernet and I verified MTU is the standard 1500. I suspect the problem might be at my end - if you see the above output failures were on multiple servers and I doubt they are all broken. Also, today is much milder than recent days (was over 40C yesterday) and I noticed network transmission seems more stable. (That repeatedly failing upload eventually completed overnight, at 17:34:48 UTC, over 21 hours after task completion.) So that brings back my original question on whether it's possible to implement some sort of resume functionality for potentially unreliable connections. Spending several hours on a task and then not being able to upload seems rather wasteful. (I've had a few tasks expire or be marked as 'failed'.) |
This is possible and something I've had in mind for some time now. It will require changes to all of the Work Servers (WS) so they can save and continue partial uploads. Then the client itself must also support this. It would likely result in significant network bandwidth savings but it's going to take some time and there are currently other higher priority items. |
No worries, just good to know it's possible and part of the plan. Thanks. |
Windows client has an issue with Windows stabbing it in the back every time system is rebooted. So I til the fix is sorted we advise Windows users to pause their work before rebooting. |
I did notice it was usually on Windows... a shame it's something that has to be done manually - sometimes a reboot might not come at a good time, particularly if the user is on a recent Windows that reboots automatically all the time. |
Unfortunately the issue is quite difficult to fix since it is more Windows rather than FAHClient |
I feel this is getting off-topic for the actual issue about uploads, but what do you mean about 'more (to do with) Windows'? Uncontrolled rebooting? Yeah, that's a Windows problem. But the Windows fah-client not resuming properly - as I mentioned above that didn't seem to be a problem with the v7 client. I don't know what the architectural changes are between v7 and v8 but it must be pretty significant for this to be accepted as standard for Windows hosts. |
It is not accepted as standard. Just the fix is a bit elusive |
Pausing and then rebooting while uploading the WU would not have saved it. |
The task with the interrupted upload would still be marked as failed, but the pause would have at least saved the other task. Fortunately progress was only 3.6%. Most of my hosts are Linux-based, but I was in such a hurry to reboot the Windows host in this particular instance that by the time I remembered to check what FaH was doing it was too late. As I said, this was never an issue with the v7 client, so at the very least this sort of behaviour should be considered a regression. |
Make no mistake, this is considered a bug and very serious one. This is in no way considered a normal and acceptable behaviour from any software |
I see particularly large results uploads (eg ~150 MiB) running into
Failed response: EOF
fairly frequently and it sometimes takes several hours - even more than day, running into the risk of missing the deadline - to complete a result upload because with every 'failed response' it has to restart from 0%.I don't seen any option to enable debug-level output, so I can't tell if the issue is at my end or the server's.
I'm not sure if this is feasible or even possible with HTTP POST but could we have the ability to ask server what it did manage to receive and continue from there? (Or would that run into even more difficulty with respect to transmission integrity, etc?)
Perhaps a separate issue, but I have problems uploading on a lone Windows 7 host with three results stuck at 0%. Previously I had trouble with the v8 client even downloading tasks in the first place, so I reverted to the v7 client. But since v8.4 beta release I managed to get some work downloaded and now have run into stuck uploads. Not sure if it could be something certificate-related in Windows 7.
The text was updated successfully, but these errors were encountered: