-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753
Comments
Hi @gravingPro, Thanks for the bug report. |
Hi @gravingPro, A quick follow-up question. How is your custom sink defined? |
It's just a simple example. Any sink can be used here, no matter sql injection or ssrf. |
Hi @gravingPro, Is |
Oh, i made a mistake there. I missed the taint data. The correct codes are: def read_sql(sql_data):
spark.sql(sql_data) # A simple sink for example. Any other official sink can be appied here.
def process(func, args):
func(*args)
sql = request.json['data'] # Source
process(func=read_sql, args=sql) |
As said in description, it's found in the projects using multiprocess functions as "threading.Thread", "concurrent.futures.Executor" and "multiprocessing.Process". # example
import threading
import time
def print_greeting(greeting):
sink(greeting)
taint_data =
thread = threading.Thread(target=print_greeting, args=(taint_data, ))
thread.start() We have inspected the underlying code of the threading. And finally found the self._target(*self._args, **self._kwargs) in run function which can not be correctly extracted by CodeQL. class Thread(BaseThread):
# ...
def run(self):
"""A virtual method to be overridden by the subclass."""
if self._target is not None:
self._target(*self._args, **self._kwargs) |
@gravingPro Thanks for clarifying your example. |
I've encountered issues in CodeQL regarding data flow interruption. Here are the details:
1. Function Parameter Passing Interruption
In the code below:
CodeQL fails to detect that the tainted variable
sql
is passed intoread_sql
when using theprocess
function to handle the function call and its argument. This shows an interruption in data flow tracking during function parameter passing and subsequent invocation with variable arguments.2.
*args
and**kwargs
InterruptionThe problem with
*args
(variable positional arguments) and**kwargs
(variable keyword arguments) is that when used in a way that impacts data flow, CodeQL can't track accurately. In the given example, using*args
in theprocess
function leads to incorrect recognition of the data flow forsql
. This issue extends to similar scenarios involving these constructs.Moreover, these problems also occur in functions related to multithreading and multiprocessing like
threading.Thread
,mulitprocess.Process
,concurrent.futures.ThreadPoolExecutor
, andconcurrent.futures.ProcessPoolExecutor
.I hope this description helps in identifying and resolving these problems. Looking forward to a timely fix or further guidance on handling such complex data flow tracking scenarios.
Best regards
The text was updated successfully, but these errors were encountered: