Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster cleanup of output folder (PEP-0471) #3356

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

egberts
Copy link
Contributor

@egberts egberts commented Jul 1, 2024

Use os.scandir() instead of os.listdir()

In accordance with PEP-0471, os.listdir has been marked for discontinued sometime in the future.

https://peps.python.org/pep-0471/

Pull Request Checklist

Resolves: #issue-number-here

  • Ensured tests pass and (if applicable) updated functional test output
  • Conformed to code style guidelines by running appropriate linting tools
  • Added tests for changed code
  • Updated documentation for changed code

Use os.scandir() instead of os.listdir()

In accordance with PEP-0471, os.listdir has been marked for discontinued
sometime in the future.

https://peps.python.org/pep-0471/
@justinmayer justinmayer requested a review from a team July 2, 2024 08:02
@pauloxnet
Copy link
Member

pauloxnet commented Jul 2, 2024

Thanks for this improvement proposal, it seems ok for me.

Can you add a note in the Changelog?

It would be also great to see performance differences, before and after the patch.

@justinmayer
Copy link
Member

I agree it would be nice to quantify in this PR's description how much time this change saves.

There is no need to manually modify the change-log, which is automatically appended by AutoPub. Normally I would request that a RELEASE.md file be added to the PR in order for the new change-log entry to be recorded (and thus also triggering a new release), but we are in need of a bigger catch-up release soon, so I will handle the RELEASE.md file myself this time.

@boxydog
Copy link
Contributor

boxydog commented Jul 2, 2024

Do we have to worry about performance very much? What are the long-running operations, and how long do they take? Everything I've ever done is just a couple of seconds.

Personally, I'd focus on user friendliness, robust code, less code, and look for reasons people are using static site generators other than this one.

@boxydog
Copy link
Contributor

boxydog commented Jul 2, 2024

To be clear, I don't object to this PR. I'm just proposing high-level focus.

@egberts
Copy link
Contributor Author

egberts commented Jul 4, 2024

Personally, I'd focus on user friendliness, robust code, less code, and look for reasons people are using static site generators other than this one.

Yeah, I'm currently expanding the pytest unittest for settings.py. Lots of corner cases missed, notably with tilda (~) symbol, anchor ('.'), and parent ('..') as well as $HOME), also would like to see this expanded on Windows platform a bit better with configuration settings files.

Maybe after test_settings.py, they'll let me tackle the checking and wordings of "missing" files a bit more in settings.py.

@boxydog
Copy link
Contributor

boxydog commented Jul 4, 2024

Personally, I'd focus on user friendliness, robust code, less code, and look for reasons people are using static site generators other than this one.

Yeah, I'm currently expanding the pytest unittest for settings.py. Lots of corner cases missed, notably with tilda (~) symbol, anchor ('.'), and parent ('..') as well as $HOME), also would like to see this expanded on Windows platform a bit better with configuration settings files.

I don't understand, but increased test coverage sounds good. Perhaps you already know, you can run invoke coverage for a coverage report, see more here.

Maybe after test_settings.py, they'll let me tackle the checking and wordings of "missing" files a bit more in settings.py.

I also don't understand, but feel free to propose.

@egberts
Copy link
Contributor Author

egberts commented Jul 4, 2024

It would be also great to see performance differences, before and after the patch.

$ development/os.scandir/pythonProject/utils.py 
Original Delta time 0:00:00.953052
Latest   Delta time 0:00:00.715572
Total inodes: 0
Total Files   95976
Total Dirs    2

Process finished with exit code 0

Python Test file https://gist.github.com/egberts/f9f9f1f3156bc17a1c8718b3dfd068d3

@egberts
Copy link
Contributor Author

egberts commented Jul 5, 2024

Personally, I'd focus on user friendliness, robust code, less code, and look for reasons people are using static site generators other than this one.

Yeah, I'm currently expanding the pytest unittest for settings.py. Lots of corner cases missed, notably with tilda (~) symbol, anchor ('.'), and parent ('..') as well as $HOME), also would like to see this expanded on Windows platform a bit better with configuration settings files.

Maybe after test_settings.py, they'll let me tackle the checking and wordings of "missing" files a bit more in settings.py.

I also don't understand, but feel free to propose.

@boxydog, here's the details.

Relative Outside Subdirectory does not exist

Where the PATH is defined to 'content' and that (relative-type) 'content' subdirectory does NOT exist.

$ python -m pelican -D -s /tmp/pelicanconf.py

    CRITICAL Exception: `PATH` object in /tmp/pelicanconf.py is undefined. You need __init__.py:696
        to specify a path containing the content (see pelican --help for more                 
        information)     

An ERROR should have said: content subdirectory (as defined in 'PATH' variable) does not exist.

Undefined PATH

Where the PATH variable is undefined in pelicanconf.py

$ python -m pelican -s /tmp/pelicanconf.py -D content

       CRITICAL Exception: `PATH` object in /tmp/pelicanconf.py is undefined. You need __init__.py:696
                to specify a path containing the content (see pelican --help for more                 
                information)  

A WARNING should have said: 'PATH' variable is undefined; using 'content' as a default.

Absolute outside subdirectory does not exist

Where PATH="/tmp/no-such-directory" is in /tmp/pelicanconf-invalid-PATH-1.py and the (absolute-type) outside subdirectory is not accessible (a-rx file permission bit)

$ python -m pelican -v -s /tmp/pelicanconf-invalid-PATH-1.py 
[21:17:42] INFO     Loaded configuration settings from                                     settings.py:328
                '/tmp/pelicanconf-invalid-PATH-1.py' file.                                            
       WARNING  WRITE_SELECTED is present in settings but this functionality was       settings.py:656
                removed. It will have no effect.                                                      
       CRITICAL Exception: `PATH` object in /tmp/pelicanconf-invalid-PATH-1.py is      __init__.py:696
                undefined. You need to specify a path containing the content (see                     
                pelican --help for more information)      

An ERROR message should have said: '/tmp/no-such-directory' does not have read access.'

Unaccessible Directory

Where PATH="/tmp/pelican-unreadable-dir" and that subdirectory has no read access required for this user to enter that subdirectory:

$ python -m pelican -v -s /tmp/pelicanconf-invalid-PATH-2.py 
[21:12:53] INFO     Loaded configuration settings from                                     settings.py:328
                '/tmp/pelicanconf-invalid-PATH-2.py' file.                                            
       WARNING  WRITE_SELECTED is present in settings but this functionality was       settings.py:656
                removed. It will have no effect.      

An ERROR condition was missed and the program kept going along until it ran to an unrelated THEME error.

User/Group Access Denied

When PATH=/home/user-who-is-not-me is used:

$ python -m pelican -v -s /tmp/pelicanconf-invalid-PATH-3.py 
[20:49:28] INFO     Loaded configuration settings from                                     settings.py:328
                    '/tmp/pelicanconf-invalid-PATH-3.py' file.                                            
           WARNING  WRITE_SELECTED is present in settings but this functionality was       settings.py:656
                    removed. It will have no effect.                                                      
           CRITICAL Exception: Could not find the theme m.css/egbert-theme                 __init__.py:696

It did not error out.

An ERROR message should have said:

`/home/user-who-is-not-me' does not have directory ('x') file permission bit

Having the same error message for first three(3) distinctive error conditions seems rather confusing to even the most advanced end-user.

Also we failed to trap some error conditions at the most earliest opportune moment.

@boxydog
Copy link
Contributor

boxydog commented Jul 5, 2024

Gotcha. Sounds like you've thought this through. Consider filing the above as an issue if you haven't already.

@boxydog
Copy link
Contributor

boxydog commented Jul 5, 2024

Strictly speaking it's probably five issues, but I wouldn't want to discourage you with formality if you're fired up.

@egberts
Copy link
Contributor Author

egberts commented Jul 5, 2024

I know. Five issues or roll out as one.... Your call. I'm just tightening up the test_setting.py and plan to release it in my repo but not deliver into Pelican's repo, until settings.py are nice and user-friendly. But you did give me pause about the merits of 5 separate issues.

For now, trying for a single issue with #3357

@egberts
Copy link
Contributor Author

egberts commented Jul 5, 2024

A note to myself: add the full absolute resolved path of Pelican configuration settings file to each error/warning instance of variable setting, along with the variable name and the offending value.

And setup a Pelican test area on a Windows platform for thoroughness.

@egberts
Copy link
Contributor Author

egberts commented Jul 14, 2024

3 of 5 errors resolved by PR #3368

Wrote another issue #3369 for the remaining two with regard to the Python variable name PATH (which is outside the scope of my test_settings.py/load_source() focus that #3368 tackled):

Relative Outside Subdirectory does not exist

Where the PATH is defined to 'content' and that (relative-type) 'content' subdirectory does NOT exist.

$ python -m pelican -D -s /tmp/pelicanconf.py

    CRITICAL Exception: `PATH` object in /tmp/pelicanconf.py is undefined. You need __init__.py:696
        to specify a path containing the content (see pelican --help for more                 
        information)     

An ERROR should have said: content subdirectory (as defined in 'PATH' variable) does not exist.

Undefined PATH

Where the PATH variable is undefined in pelicanconf.py

$ python -m pelican -s /tmp/pelicanconf.py -D content

       CRITICAL Exception: `PATH` object in /tmp/pelicanconf.py is undefined. You need __init__.py:696
                to specify a path containing the content (see pelican --help for more                 
                information)  

A WARNING should have said: 'PATH' variable is undefined; using 'content' as a default.

@boxydog
Copy link
Contributor

boxydog commented Nov 3, 2024

I notice your work here stalled. Is there a smaller piece you can pull out, finish, and contribute? Best wishes.

@egberts
Copy link
Contributor Author

egberts commented Nov 6, 2024

Yes! it got stalled because pytest wasn't giving me consistent pass result with each test run so down the rabbit hole I went into pytest dev community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants