Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is it necessary to check if the first two bytes of data are \r\n? #3300

Open
kwsy opened this issue Sep 30, 2024 · 1 comment
Open

Why is it necessary to check if the first two bytes of data are \r\n? #3300

kwsy opened this issue Sep 30, 2024 · 1 comment

Comments

@kwsy
Copy link

kwsy commented Sep 30, 2024

`def parse(self, unreader):
buf = io.BytesIO()
self.get_data(unreader, buf, stop=True)

  # get request line
  line, rbuf = self.read_line(unreader, buf, self.limit_request_line)

  # proxy protocol
  if self.proxy_protocol(bytes_to_str(line)):
      # get next request line
      buf = io.BytesIO()
      buf.write(rbuf)
      line, rbuf = self.read_line(unreader, buf, self.limit_request_line)

  self.parse_request_line(line)           # 解析请求行, 也就是第一行
  buf = io.BytesIO()
  buf.write(rbuf)

  # Headers   接下来解析headers
  data = buf.getvalue()
  idx = data.find(b"\r\n\r\n")

  done = data[:2] == b"\r\n"
  while True:
      idx = data.find(b"\r\n\r\n")
      done = data[:2] == b"\r\n"    # 这条语句的依据是什么?

      if idx < 0 and not done:
          self.get_data(unreader, buf)
          data = buf.getvalue()
          if len(data) > self.max_buffer_headers:
              raise LimitRequestHeaders("max buffer headers")
      else:
          break

  if done:
      self.unreader.unread(data[2:])
      return b""

  self.headers = self.parse_headers(data[:idx])

  ret = data[idx + 4:]    # body的部分
  buf = None
  return ret

`

parse is the method of Request class ,
It reads the message header part of the HTTP request. There is always \r\n\r\n between the message header and the message body. However, the method checks whether the first two bytes of data are \r\n. This is strange. Isn't the end marker of the message header only \r\n\r\n? Why, even if \r\n\r\n is not found, is the message header considered to have ended when data starts with \r\n? Is there any basis for doing this?

@pajod
Copy link
Contributor

pajod commented Oct 3, 2024

Why, even if \r\n\r\n is not found, is the message header considered to have ended?

Because on reading the request (and PROXY) line, its trailing \r\n was discarded from buf. Thus done distinguishes a) expecting two consecutive \r\n marking the last header and b) just expecting a single \r\n at the start to signal a HTTP/1.0 request with zero headers.

The code could be refactored to remove most of the complexity and be more efficient for invalid/oversize/partial input - but I do not see a logical flaw. If you do, please show the request that you consider incorrectly parsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants