-
-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(number): add exponentialDistribution function #3375
base: next
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for fakerjs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## next #3375 +/- ##
==========================================
- Coverage 99.97% 99.97% -0.01%
==========================================
Files 2811 2811
Lines 217035 217062 +27
Branches 940 950 +10
==========================================
+ Hits 216982 217007 +25
- Misses 53 55 +2
|
src/modules/number/index.ts
Outdated
* The following table shows the rough distribution of values generated using `Math.floor(exponentialDistribution({ min: 0, max: 10, base: x }))`: | ||
* | ||
* | Value | Base 0.1 | Base 0.5 | Base 1 | Base 2 | Base 10 | | ||
* | :---: | -------: | -------: | -----: | -----: | ------: | | ||
* | 0 | 4.1% | 7.4% | 10.0% | 13.8% | 27.8% | | ||
* | 1 | 4.5% | 7.8% | 10.0% | 12.5% | 16.9% | | ||
* | 2 | 5.0% | 8.2% | 10.0% | 11.5% | 12.1% | | ||
* | 3 | 5.7% | 8.7% | 10.0% | 10.7% | 9.4% | | ||
* | 4 | 6.6% | 9.3% | 10.0% | 10.0% | 7.8% | | ||
* | 5 | 7.8% | 9.9% | 10.0% | 9.3% | 6.6% | | ||
* | 6 | 9.4% | 10.7% | 10.0% | 8.8% | 5.7% | | ||
* | 7 | 12.1% | 11.5% | 10.0% | 8.2% | 5.0% | | ||
* | 8 | 16.9% | 12.6% | 10.0% | 7.8% | 4.5% | | ||
* | 9 | 27.9% | 13.8% | 10.0% | 7.5% | 4.1% | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have these tables here or in the parameter descriptions?
If we have them here we can show them on the documentation.
If we have them in the options we cannot display them.
IMO we should not have them in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should leave the tables here so they show up in the docs.
I'm also not sure whether |
I don't like calling it "exponentialDistribution " because you're not returning a distribution, you are returning a single value from that distribution. So I'd call it exponentialValue or floatExponential or something similar to emphasise the thing that is returned is a float or a value. |
* faker.number.exponentialDistribution(10) // 1.656598169056771 | ||
* faker.number.exponentialDistribution({ min: 10, max: 100 }) // 88.7273250669911 | ||
* faker.number.exponentialDistribution({ min: 0, max: 100, base: 10 }) // 6.9442760672808745 | ||
* faker.number.exponentialDistribution({ min: 0, max: 100, bias: 10 }) // 67.03715679154617 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure adding a second way (bias) of achieving the same results is that useful; would prefer picking either bias or base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base
is the mathematical basis that the function uses, however it is kind of hard to use because to invert the effect you have to base = 1/base
and big base => probably low value.
Bias is intended as a non mathematical way to define the base, where if you want to invert the value you can just bias = -bias
and big bias => probably big value.
I would prefer to keep base, because it is the actual foundation as the implementation,
and I would prefer to keep bias, due to its ease of use.
But that's just my opinion and could be changed.
I'm not even sure that is the case.
Good idea. I'm not finally sure which term I would like to use but stepping away from calling it I'm currently considering one of these:
|
* | ||
* @since 9.5.0 | ||
*/ | ||
exponentialDistribution( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes from my side (I'll leave a comment at the code to have a separate discussion block):
- The functions seems out of place in this module. All function names so far have had a technical nature to them while this one is more mathmatically named. Following that schema
number.int
would benumber.natural
ornumber.float
would benumber.rational
. Not that dramatic, but something I noticed after we tried organizing the modules in v8 so neatly. - Is it correct to make a new function at all. Reading the original issue, the usecase described a distribution of integers along an exponential scale. This function however returns floats. IMO, integrating the parameters into the existing methods is the most logical option. The default could be set to return a unionized value distribution (base = 1) to not introduce breaking changes.
2.1. When the parameters are intrgrated into the other functions they should maybe grouped in a separate, optional object. Since I would consider this API pretty advanced for the usual user, I'd "hide" it in an extra nested layer. Additionally, I could think of faker providing some utility function or constants that have predefined, commonly used distributions:
import { faker, exponentialDistibution, linearDistibution } from '@faker-js/faker';
console.log(exponentialDistibution()); // { bias: -1, base: 10 }
console.log(exponentialDistibution({ inverted: true })); // { bias: -1, base: 0.1 }
faker.nuber.int({ distibution: exponentialDistibution() });
faker.nuber.float({ distibution: exponentialDistibution({ inverted: true }) });
faker.nuber.float({ distibution: linearDistibution() });
What do you think? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there is logic in making it an option for faker.number.float() etc instead
The only downside is when the really basic functions like float() and int() have too many options it could make it overwhelming for new users to read the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree on your reasoning. Thats why I propsed an additional nested layer in the options. That way the arguments bias
and base
do not bleed into the first level of options. Additionally, adding some overloads to the function could work. Make one overload (the default) with less arguments in the options that are used more often and one overload with an options object that represents a superset of the default, simple one.
type SimpleOptions = { min: number; max: number };
type AdvancedOptions = SimpleOptions & { distibution: { base: number; bias: number } };
class NumberModule {
int(): number; // no args = default args
int(options: AdvancedOptions): number; // advanced arguments, put first to not get picked up by IDE (IDE usually picks last overload as default)
int(options: SimpleOptions): number; // simple arguments, last to be the default
int(options?: AdvancedOptions = {}): number { // implementation can just use optional advanced options since it includes all possible cases
// ...
}
// ...
}
The types are separated for representation purposes here, I'm not sure if our docs already allow for non inline type definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distribution: exponentialDistribution({base: 2})
option seems like a good solution.
type Distribution = (options: {random, min, max}): number
That way other developers can add their own distributions as needed.
I'll try it soon.
I fear AdvancedOptions
might be complicated to implement for our api docs.
Currently we just use the last (api) signature.
faker/scripts/apidocs/output/page.ts
Line 207 in 51dddb8
const signatureData = required(signatures.at(-1), 'method signature'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way other developers can add their own distributions as needed.
Thats the idea. Although, I'd propably provide some common use caes out of the box. Thats not a requirement for me, tho.
I fear AdvancedOptions might be complicated to implement for our api docs.
Currently we just use the last (api) signature.
No worries. Lets try to focus on designing the best possible API for our users first. We can discuss the handling of the documentation afterwards, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a ton of different probability distributions and they might all have different numbers of parameters and traditional names for their parameters: https://en.wikipedia.org/wiki/Probability_distribution - not to mention some make sense for discrete distributions (like ints) and some for continuous distributions (like floats).
So i dont think you'll be able to have a standardised way to refer to the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good argument. I was probably looking too naively at the problem at hand.
It would still be awesome if we could solve the problem without introducing a new method. I stated the module specification problem in my first comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats why I try to implement the distribution option as a function, not data.
function int(options: { distribution: (fakerCore) => number, ....}) {
f() ∈ [0, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another thing that occurred to me; in many cases when generating a small finite number of different ints, developers may find it easier to just use weightedArrayElement and just manually set the percentage chance of getting each value.
So i think this is actually much more useful in the float case, when you can't easily do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the usecases for this method is the regexp method: specifically for *
lengths.
Team decision The feature will be implemented by extending the already existing methods ( We are currently unsure how to name the parameter. While distribution is okay, we are not sure if this is the mathmatical correct term. |
Implements #1862
Adds a function that generates values according to an exponential distribution.
The following table shows the rough distribution of values generated using
Math.floor(exponentialDistribution({ min: 0, max: 10, base: x }))
:The exponential distribution is achieved by using
base ** exponent
where exponent is a random float between 0 and 1.The result is then stretched evenly to fit to
min
andmax
.