Itinerant Iterators #4

janine9vn · 2023-09-28T15:43:51Z

No description provided.

…ntations

Fix #10 - force save as a PNG regardless of filename to ensure lossy compression.

Implement unit tests github action and add status badges to README. Fix #11 Squashed commit of the following: commit 0ae0e9d76fcf53d5ea526c144906502c967addcb Author: JJ Style <[email protected]> Date: Mon Sep 4 22:22:17 2023 +0100 remove files accidentally staged commit 9632a54d6f670deb51e1d3622d7378eedf532fa5 Author: JJ Style <[email protected]> Date: Mon Sep 4 22:20:56 2023 +0100 add workflow_dispatch trigger and add status badges commit 28caffae91d7bf08940d9990633f8fa30c5dca88 Author: JJ Style <[email protected]> Date: Mon Sep 4 22:18:00 2023 +0100 fix install commit ccb5dff65e6a31aeb6e5d9d128117fa984c9d9cf Author: JJ Style <[email protected]> Date: Mon Sep 4 22:09:25 2023 +0100 sudo commit adbb28050782314faf392071285b44cf9b6b9e91 Author: JJ Style <[email protected]> Date: Mon Sep 4 22:07:53 2023 +0100 add another branch trigger commit 165e2571a3093f3be6c7b51816ff0df4e8a8ce00 Author: JJ Style <[email protected]> Date: Mon Sep 4 22:04:13 2023 +0100 add test CI workflow

Also includes tests for colour box and bounding box validation. Fix #13 Squashed commit of the following: commit e09ee4376e718da7573eb9c2874fe5f92601afd7 Author: JJ Style <[email protected]> Date: Tue Sep 5 22:45:40 2023 +0100 add unimplemented blur box commit 770c37aa2c48e77fff650754204ffdbb0b30cf8b Author: JJ Style <[email protected]> Date: Tue Sep 5 22:42:31 2023 +0100 implement colour box obfuscation

Fix #16

Fix #17

…terators

…1437322ca9c' git-subtree-dir: itinerant-iterators git-subtree-mainline: 4bae50a git-subtree-split: 621ae80

ichard26

Hello, I'm Richard and I'm the assigned reviewer for your project! Congratulations on submitting a top 10 project.

My review is structured in five parts:

Use of Git and GitHub + Project infrastructure (GHA and other developer tools)
Code review for the application
Code review for the server
Code review for the tests
Final comments (spoiler: it's going be pretty short, the juicy details will live in the four other parts 🙂)

This review is the first part. And before I get into it, remember, if you have any questions or want to discuss my feedback, please let me know!

Use of Git and GitHub

The Git commit history is decent. I like how most commits are "atomic" and are reasonably sized and focused. However, I would like to see more context on commits that have vaguer titles. For example, smileyface12349/itinerant-iterators@ca0eb1c 's commit message doesn't explain what was broken about the obfuscator... which makes it hard to tell what the commit does at a glance.

Also, don't forget to squash or use interactive rebase to clean up local commits before pushing to avoid situations like this one:

I'm happy to see GitHub Issues used to keep track of work and discuss tasks. Keep that up! It's a skill you'll need a lot. If you want to take it further, I'd suggest using Pull Requests to encourage code review before a change is merged.

Project set up & infrastructure

The way the project is set up makes sense. Tooling like Black and pre-commit are appropriately used to maintain a consistent code style. Great job on using a GitHub Actions workflow to automatically run tests on each push.

I think the user experience would improve if the application was packaged into an installable Python package, but I'll discuss this further in Part 2.

ichard26 · 2023-10-09T14:53:23Z

itinerant-iterators/.github/workflows/tests.yaml

@@ -0,0 +1,40 @@
+# GitHub Action Workflow enforcing our code style.


Make sure when you're copying and pasting any sort of code that you make the necessary modifications so it fits the context. It's too easy to end up with confusing comments like this :P

ichard26 · 2023-10-09T14:53:59Z

itinerant-iterators/.github/workflows/tests.yaml

+  push:
+    branches:
+      - main
+  workflow_dispatch:


I love to see workflow_dispatch to be used here. It's a great feature.

ichard26 · 2023-10-09T15:20:37Z

itinerant-iterators/LICENSE

@@ -0,0 +1,7 @@
+Copyright 2021 Python Discord


We don't hold the copyright to your project's code, don't forget to change this next year if you're participating again :)

ichard26 · 2023-10-09T15:48:58Z

itinerant-iterators/tests/data/LICENSES

I'm pleased to see the license information was maintained. You don't see that too often these days!

ichard26 · 2023-10-09T15:49:37Z

itinerant-iterators/samples/Pipfile

You don't need this folder at all. 🙂

ichard26 · 2023-10-09T15:56:41Z

itinerant-iterators/dev-requirements.txt

+flake8-docstrings~=1.6.0
+
+pytest
+black


Pin the version of Black used to at least a major revision so you don't face sudden formatting changes when the new year rolls around. For more information, see Black's stability policy.

ichard26 · 2023-10-09T16:19:48Z

itinerant-iterators/justfile

I love the use of a task runner. It makes getting up and running with a new project so much easier and faster.

Although, while just is a perfectly capable task runner, I would suggest looking into Tox, Nox, or even Hatch's own task runner. They have automatic environment provisioning (ie. setting up virtual environments and installing dependencies as necessary) that is tailored to Python's ecosystem. They may be more intuitive and provide a more streamlined experience compared to just.

ichard26

(Part. 2: Code Review of the application)

Code quality & structure

I appreciate and encourage the use of type annotations! It helps massively with the readability and maintainability of the code. However, there are certain places where type annotations are incomplete or lacking. Some APIs expect callables or complex data structures and it'd be better to thoroughly type them.

On top of that, I'd highly suggest using type aliases. Not only do they help with reducing repetition, but they give names to often complex types. For example, a Bound type alias for tuple[int, int, int, int] would be amazing!

For example, here's the beginning of the Lsb class but with the feedback above incorporated. Notice how the type annotations further explain what the code does at a glance, improving readability ("what does pixels_generator do?" ... "ah I know, it takes a list of Pixels and yields pixels in a certain order").

from collections.abc import Iterable
from typing import Callable

# This type alias should be defined somewhere else in practice,
# but for sake of the example, it'll live here.
Pixel = tuple[int, int, int]

class Lsb(Steganography):
    def __init__(
        self, pixels_generator: Callable[[list[Pixel], bytes], Iterable[Pixel]] = pixels_top_left
     ): ...

Finally, I forgot to mention this earlier, but I'd firmly encourage using a type checker (eg. mypy or pyright) to check and enforce types. Using type annotations is one thing, but having machine checked type annotations (and by extension, type documentation) is even better!

The application code is generally well organised. I like how the obfuscation and steganography code are sectioned off in their own subpackages. Most functions and classes are well designed, naturally documenting and delineating sections of logic (in a way that follows DRY and the SRP).

Certain functions and modules could be renamed and moved, but there isn't anything truly terrible. I do think some portions are overengineered and can be simplified (eg. some classes could be replaced with a module with a set of functions). Overall, it was generally easy to read the code and figure out the logic flow across the various modules.

Anyhow, that concludes my overall comments. Please see the comments below for specific feedback.

Code documentation

I'm pleased to see docstrings. They're consistently used and complete with all parameters and return values documented. However, many arguments and return values could be further specified, and some docstrings only state the obvious. Overall, they are decent and I would rather have them than not, but there is some room to improve.

More importantly, the codebase would strongly benefit from more comments. There are some areas that implement very involved logic and are challenging to read and follow. Adding comments that explain what is going on, especially the why would be a good next step for improving readability. Cases where this problem was particularly bad are noted in this review's comments.

The README is short, but it's clear and and does a good job at highlighting the features and the project can be used. I'm glad examples and installation docs are included. Kudos!

User interface (CLI/GUI)

Attention: I have never used PySimpleGUI so I was unable to make specific comments on its use. I tried my best to provide feedback on the other parts though.

The CLI and GUI are well designed and work well, but I would suggest a different way to handle the GUI/CLI distinction. Since the command line interface is (according to the README) the primary way the application is meant to be used, instead of implementing a "launcher"-type CLI, I would suggest placing the GUI start up code in a ui.py module guarded by if __name__ == "__main__":.

This simplifies the whole CLI, removing a layer of complexity. Instead of commands like python -m app cli watermark decode image.png, an user can drop the (clunky) cli part and write python -m app watermark decode image.png. (All while starting the GUI is still easy: python -m app.ui.)

In addition, if the application was packaged using say, flit, so it could be installed using pip, project scripts could be defined so once installed, the user could run (pretend the application is called "imgedit") imgedit or imgedit-gui in their terminal to launch the CLI/GUI for a smooth terminal UX.

Finally, to bring it back to the CLI, it could do with some better documentation. Command docstrings should include rich descriptions and examples and most options should have a description too. I left a comment on app/cli.py that goes into further detail. (TL;DR, it shows when --help is passed`)

ichard26 · 2023-10-09T18:32:36Z

itinerant-iterators/app/__main__.py

@@ -0,0 +1,4 @@
+from app.cli import main


I don't have a preference between relative and absolute imports, but being consistent is important. This breaks running the application from a different directory that isn't the repository root. I'd suggest using relative imports as those seem to be a better fit (the project isn't installable):

from .cli import main

ichard26 · 2023-10-09T18:36:49Z

itinerant-iterators/app/byteutils.py

+def iter_bits(bytes_: bytes) -> Generator[int, None, None]:
+    """Enumerate over each bit in the given bytes.
+
+    Args:
+        bytes_ (bytes): bytes to enumerate over
+
+    Yields:
+        Generator[int, None, None]: Return each bit as an int (0/1)
+    """


Avoid repeating the argument and return type annotations in the docstring. An IDE should be able to use both to provide rich help. Down the road, it's very likely the docstring types will be out of date with the actual type annotations.

(This applies to the rest of the project, but this was the first example of it.)

ichard26 · 2023-10-09T18:45:21Z

itinerant-iterators/app/byteutils.py

+    """Set the bit as position on the value.
+
+    Args:
+        value (int): Value to set bit on
+        bit (int): Bit position to set
+
+    Returns:
+        int: value with the bit at position `bit` set
+    """


This is what I mean by unclear docstrings. It's good that they exist, but it's no good if they do little to help the reader with what the function does. How about this?

Suggested change

"""Set the bit as position on the value.

Args:

value (int): Value to set bit on

bit (int): Bit position to set

Returns:

int: value with the bit at position `bit` set

"""

"""Set a bit in a byte to one.

Args:

value: Byte to set bit on

bit: Bit index to set

Returns:

int: Value with the bit at :bit: location set.

"""

ichard26 · 2023-10-09T18:46:58Z

itinerant-iterators/app/byteutils.py

It's great that the bit manipulation functions are located in their own module, but this should probably live in app.steganography since the obfuscation code makes no use of this code.

ichard26 · 2023-10-09T18:48:49Z

itinerant-iterators/app/cli.py

Excellent usage of Click. Command groups, parameter types and other Click features are effectively used to compose a clear CLI. The application's UI could still use some work though. Please see feedback written above.

ichard26 · 2023-10-10T01:59:43Z

itinerant-iterators/app/ui.py

The application's GUI works very nicely (barring a few bugs). Well done on using PySimpleGUI!

A note on code structure, I would suggest moving the UI layout definitions to be module level constants to avoid a level of indentation/nesting. I would also like to see the various "actions" to be handled by individual functions, in lieu of having all of the UI code in a single massive function.

ichard26 · 2023-10-10T02:00:22Z

itinerant-iterators/app/ui.py

+class UIKey(StrEnum):
+    """Collection of keys on the UI"""
+
+    WATERMARK_TEXT_INPUT = "-WATERMARK-TEXT-INPUT-"
+    WATERMARK_ENCODING_MODE = "-WATERMARK-ENCODING-MODE-"
+    APPLY_WATERMARK = "-APPLY-WATERMARK-"
+    VIEW_WATERMARK = "-VIEW-WATERMARK-"
+    APPLY_OBFUSCATE = "-APPLY-OBFUSCATE-"
+    OBFUSCATE_TEXT_INPUT = "-OBFUSCATE-TEXT-INPUT-"
+    WATERMARK_FRAME = "-WATERMARK-"
+    OBFUSCATE_FRAME = "-OBFUSCATE-"
+    IMAGE = "-IMAGE-"
+    IMAGE2 = "-IMAGE2-"


Enums are underrated IMO, it's great to see them used here.

ichard26 · 2023-10-10T02:14:31Z

itinerant-iterators/app/ui.py

+        elif event == UIKey.VIEW_WATERMARK:
+            lsb = steganography.Lsb()
+            steg_image = steganography.Image(filename)
+            decoded_text = lsb.decode(steg_image)
+            if decoded_text is None:
+                sg.Popup("Image does not contain any encoded data!")
+            else:
+                window[UIKey.WATERMARK_TEXT_INPUT].Update(decoded_text)


So the reason why applying watermark doesn't seem to work is that this code uses the original image (path). The image variable is correctly updated after applying the watermark (so saving and then using the CLI to decode works), but that's ignored here.

This is a bit annoying to fix because steganography.Image doesn't accept a Pillow Image instance. Ideally, you would be able to pass the image variable so any updates made to it would be picked up. This is one of the risks of implementing class-based abstractions, they work great if they fit the needs, but are an absolute pain if they don't quite match. Using a set of functions is generally more flexible. But, updating the steganography.Image class to accommodate this usecase wouldn't be a bad option either.

ichard26 · 2023-10-10T02:21:18Z

itinerant-iterators/app/byteutils.py

+    Yields:
+        Generator[int, None, None]: Return each bit as an int (0/1)
+    """
+    for byte in bytes_:


I'm impressed to see the terminating underscore convention for names that would (otherwise) shadow a built-in name, kudos!

ichard26 · 2023-10-10T03:11:46Z

itinerant-iterators/app/steganography/steganography.py

+        """Get the image"""
+        return self.__image
+
+    @cached_property


Solid use of functools.cached_propery!

ichard26

(Part. 3: Code Review of the server)

It's neat how your project incorporates Google Cloud Vision. I like how regex is used here to build a really cool feature (regex-based obfuscation). However, the server-side code needs improvement. As-is, it's hard to read and is dubiously written. Other than my specific comments, my two big suggestions are to:

Break out logical sections into individual functions. This will encourage code reuse (and there's a fair of fairly repetitive code here) and cleaner variable use (several variables have multiple purposes which is bad). Smaller functions tend to be also easier to read since there's just less going on.¹
Write more comments, I really mean it :)

It's a balancing act though. Sometimes too many functions is a bad thing. It's like blank lines in an essay. You shouldn't write your whole essay without a single blank line, but you also don't want to have a blank line for every 4 sentences. A general rule of thumb is each function should implement a single "step" of your program's overall logic (the little details usually don't matter/warrant their own tiny functions). ↩

ichard26 · 2023-10-10T22:07:27Z