-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: MVP plotly-express docs #554
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing huge jumps out on first read. I haven't tried running every block of code. @jnumainville should look through with sharper eyes on the code blocks.
jobs = dx.data.jobs() # import the ticking jobs dataset | ||
|
||
# the `by` argument is used to color the bars by another categorical variable | ||
jobs_resource_tracking = dx.timeline(jobs, x_start="StartTime", x_end="EndTime", y="Job") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is identical to the prior example and is not doing what it says it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this initially. Getting the example "right" gives some pretty bad results. Putting the code in but leaving this comment open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Conflicts: # plugins/plotly-express/docs/scatter.md # plugins/plotly-express/docs/sub-plots.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I hadn't mentioned this previously but one of the reasons I pushed for all our example data sets to be deterministic was for testing. This one example is not.
cat_dog = stocks.where("sym in `CAT`, `DOG`") | ||
|
||
# use `by` to specify the grouping column and order axes left to right with yaxis_sequence | ||
line_plot_by = dx.line(cat_dog, x="timestamp", y="price", by="sym", yaxis_sequence=[1, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a ticket to assess the library. It is not a problem with your example.
|
||
### Multiple columns | ||
|
||
When two or more response variables appear in separate columns, passing multiple column names to `x` or `y` is the recommended way to create multiple axes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take this example:
import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
# get a specific country
brazil = gapminder.where("country == `Brazil`")
# specify multiple y-axis columns and order axes left to right with yaxis_sequence
line_plot_multi = dx.line(brazil, x="year", y=["pop", "gdpPercap"], yaxis_sequence=[1, 2])
line_plot_multi_2 = dx.line(brazil, x="year", y=["pop", "gdpPercap"])
Passing multiple value to y is NOT how the multiple axes are created. That creates two lines. To create separate axes for the lines, you need to specify yaxis_sequence
. This interaction is not clear in the prose or example. The example would be better with two plots like I have. Show that providing y=[a,b]
gives two lines, and then adding yaxis_sequence
puts them on different axes. As it is, the prose is incorrect and doesn't show them how to do these two common cases.
@@ -1,5 +1,146 @@ | |||
# Plot By | |||
|
|||
To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color` (for which `by` is an alias), `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart. | |||
To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names denoting other variables of interest in the dataset. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color` (for which `by` is an alias), `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be clarified that by
is not simply an alias for color
it can be tweaked by using by_vars
and passing in these other columns such as symbol
and size
it also behaves slightly differently though, take this example
import deephaven.plot.express as dx
tips = dx.data.tips() # import the example iris data set
by_list = dx.scatter(tips, x="TotalBill", y="Tip", by=["Time", "Smoker"], by_vars=["color", "symbol"])
by_prod = dx.scatter(tips, x="TotalBill", y="Tip", by="Time", symbol="Smoker")
by_list
just loops through color/symbol combos (jointly)
whereas by_prod
assigns colors to specific column values, so, of the four joint values, Lunch
and Dinner
have the same color and Yes
and No
have the same symbol.
The first method is more useful to emphasize differences, whereas the second is more useful to emphasize similarities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the "for which by
" is an alias" part - hopefully that cleans up the confusion. As far as using by_vars
, that seems like something that should belong in the expanded version of this doc once the full thing is written, and not necessarily in the introduction. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, shouldn't be in the intro, that's fine
|
||
Under the hood, the Deephaven query engine performs a `parition_by` table operation on the given color column to create each series. This efficient implementation means that plots with multiple groups can easily scale to tens of millions or billions of rows with ease. | ||
Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given grouping column to create each series. This efficient implementation means that plots with multiple groups can easily scale to tens of millions or billions of rows with ease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, partition_by
should be linked, but I don't know that @dsmmcken is far enough down the new impl to worry about this yet.
Minimum required docs for the plotly-express plugin. Here are the outstanding items:
ecdf
once it is implemented.Directions for testing:
As of 7/17, everything needed for testing is baked into a release. Here's a simple testing environment using pip-installed DH.