Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nbformat/nbformat_minor not well extracted with HTTP handler #727

Open
LetMeR00t opened this issue Jul 12, 2023 · 1 comment
Open

Nbformat/nbformat_minor not well extracted with HTTP handler #727

LetMeR00t opened this issue Jul 12, 2023 · 1 comment

Comments

@LetMeR00t
Copy link

LetMeR00t commented Jul 12, 2023

🐛 Bug

I'm currently trying to create a connector between Jupyter (using papermill) and another product named "Cortex" from the Strangee project.
I encountered an issue during my development. I'm currently testing the HTTP handler by trying to execute a notebook located on a JupyterHub instance which has a "demo" user for who a "cortex_job" server is configured.

import papermill as pm

pm.execute_notebook(
    "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/notebook1.ipynb?token=SECRET",
    "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/Folder1/notebook2.ipynb?token=SECRET",
    parameters = dict(var1 = "toto")
)

Everything is working fine to recover the notebook but I get an error message:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[1], line 3
      1 import papermill as pm
----> 3 pm.execute_notebook(
      4     "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/notebook1.ipynb?token=SECRET",
      5     "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/Folder1/notebook2.ipynb?token=SECRET",
      6     parameters = dict(var1 = "toto")
      7 )

File /usr/local/lib/python3.10/dist-packages/papermill/execute.py:89, in execute_notebook(input_path, output_path, parameters, engine_name, request_save_on_cell_execute, prepare_only, kernel_name, language, progress_bar, log_output, stdout_file, stderr_file, start_timeout, report_mode, cwd, **engine_kwargs)
     86 if cwd is not None:
     87     logger.info("Working directory: {}".format(get_pretty_path(cwd)))
---> 89 nb = load_notebook_node(input_path)
     91 # Parameterize the Notebook.
     92 if parameters:

File /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:512, in load_notebook_node(notebook_path)
    502 def load_notebook_node(notebook_path):
    503     """Returns a notebook object with papermill metadata loaded from the specified path.
    504 
    505     Args:
   (...)
    510 
    511     """
--> 512     nb = nbformat.reads(papermill_io.read(notebook_path), as_version=4)
    513     nb_upgraded = nbformat.v4.upgrade(nb)
    514     if nb_upgraded is not None:

File /usr/local/lib/python3.10/dist-packages/nbformat/__init__.py:91, in reads(s, as_version, capture_validation_error, **kwargs)
     89 nb = reader.reads(s, **kwargs)
     90 if as_version is not NO_CONVERT:
---> 91     nb = convert(nb, as_version)
     92 try:
     93     validate(nb)

File /usr/local/lib/python3.10/dist-packages/nbformat/converter.py:62, in convert(nb, to_version)
     60 except AttributeError as e:
     61     msg = f"Notebook could not be converted from version {version} to version {step_version} because it's missing a key: {e}"
---> 62     raise ValidationError(msg) from None
     64 # Recursively convert until target version is reached.
     65 return convert(converted, to_version)

ValidationError: Notebook could not be converted from version 1 to version 2 because it's missing a key: cells

When looking into the code, we can see the HTTP handler way of working, which is getting the all response content:

image

Which gives:

{
   "name":"notebook1.ipynb",
   "path":"notebook1.ipynb",
   "last_modified":"2023-07-12T11:43:37.265003Z",
   "created":"2023-07-12T11:43:37.265003Z",
   "content":{
      "cells":[
         {
            "cell_type":"markdown",
            "id":"e0882b67",
            "metadata":{
               
            },
            "source":"# My title\n\n## My subtitle\n\nHello world!"
         },
         {
            "cell_type":"code",
            "execution_count":1,
            "id":"e92789a6",
            "metadata":{
               "tags":[
                  "parameters"
               ],
               "trusted":true
            },
            "outputs":[
               
            ],
            "source":"var1 = 3\nvar2 = 5"
         },
         {
            "cell_type":"code",
            "execution_count":2,
            "id":"d49d5a2b",
            "metadata":{
               "trusted":true
            },
            "outputs":[
               {
                  "name":"stdout",
                  "output_type":"stream",
                  "text":"var1 is 3, var2 is 5\n"
               }
            ],
            "source":"print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
         }
      ],
      "metadata":{
         "celltoolbar":"Tags",
         "kernelspec":{
            "display_name":"Python 3 (ipykernel)",
            "language":"python",
            "name":"python3"
         },
         "language_info":{
            "codemirror_mode":{
               "name":"ipython",
               "version":3
            },
            "file_extension":".py",
            "mimetype":"text/x-python",
            "name":"python",
            "nbconvert_exporter":"python",
            "pygments_lexer":"ipython3",
            "version":"3.10.6"
         }
      },
      "nbformat":4,
      "nbformat_minor":5
   },
   "format":"json",
   "mimetype":"None",
   "size":1188,
   "writable":true,
   "type":"notebook"
}

As you can notice, the nbformat variable is set to 4 but papermill found out that it was 1 (default value).

This assumption is coming from here (under the library nbformat which is reading the notebook):

image

As you can see, the version is taken from the root node "nbformat" instead of "content.nbformat" which is causing the issue.

Do you know if this a bug on your side or on the nbformat library maybe ? I tested it with a LocalHandler and it's working fine as the output is:

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e0882b67",
   "metadata": {},
   "source": [
    "# My title\n",
    "\n",
    "## My subtitle\n",
    "\n",
    "Hello world!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e92789a6",
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "var1 = 3\n",
    "var2 = 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d49d5a2b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "var1 is 3, var2 is 5\n"
     ]
    }
   ],
   "source": [
    "print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

A solution could be to load the JSON answer and get the "content" node before returning the result in the HTTP handler

Thank you

@LetMeR00t LetMeR00t changed the title Nbformat/nbformat_minor not well extracted Nbformat/nbformat_minor not well extracted with HTTP handler Jul 12, 2023
@LetMeR00t
Copy link
Author

LetMeR00t commented Jul 13, 2023

Fix working on my side:

papermill/iorw.py

class HttpHandler(object):
    @classmethod
    def read(cls, path):
        return json.dumps(requests.get(path, headers={'Accept': 'application/json'}).json()["content"])

    @classmethod
    def listdir(cls, path):
        raise PapermillException('listdir is not supported by HttpHandler')

    @classmethod
    def write(cls, buf, path):
        payload = {"type": "notebook", "format": "json", "path": path}
        payload["content"] = json.loads(buf)
        result = requests.put(path, json=payload)
        result.raise_for_status()

    @classmethod
    def pretty_path(cls, path):
        return path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant