Data parameterization is the process of turning test values into reusable parameters, for example, through variables and shared arrays.
This page gives some examples of how to parameterize data in a test script. Parameterization is typically necessary when Virtual Users (VUs) will make a POST, PUT, or PATCH request in a test. You can also use parameterization when you need to add test data from a separate file.
Parameterization helps to prevent server-side caching from impacting your load test. This will, in turn, make your test more realistic.
Each VU in k6 is a separate JS VM. To prevent multiple copies of the whole data file, SharedArray was added. It does have some CPU overhead in accessing elements compared to a normal non shared array, but the difference is negligible compared to the time it takes to make requests. This becomes even less of an issue compared to not using it with large files, as k6 would otherwise use too much memory to run, which might lead to your script not being able to run at all or aborting in the middle if the system resources are exhausted.
For example, the Cloud service allocates 8GB of memory for every 300 VUs. So if your files are large enough and you are not using SharedArray, that might mean that your script will run out of memory at some point. Additionally even if there is enough memory, k6 has a garbage collector (as it's written in golang) and it will walk through all accessible objects (including JS ones) and figure out which need to be garbage collected. For big JS arrays copied hundreds of times this adds quite a lot of additional work.
A note on performance characteristics of SharedArray can be found within its API documentation.
k6 doesn't parse CSV files natively, but you can use an external library, Papa Parse.
You can download the library and import it locally like this:
Or you can grab it directly from jslib.k6.io like this.
Here's an example using Papa Parse to parse a CSV file of username/password pairs and using that data to login to the test.k6.io test site:
It is often a requirement not to use the same data more than once in a test. With the help of k6/execution, which includes a property scenario.iterationInTest, you can retrieve unique rows from your data set.
scenario.iterationInTest property is unique per scenario, not the overall test. That means if you have multiple scenarios in your test you might need to split your data per scenario.
Alternatively, if your use case requires using a unique data set per VU, you could leverage a property called vu.idInTest.
In the following example we're going to be using per-vu-iterations executor to ensure that every VU completes a fixed amount of iterations.
See this example project on GitHub showing how to use faker.js to generate realistic data at runtime.
The following section is here for historical reasons as it was the only way to lower the memory usage of k6 prior to v0.30.0 but after v0.27.0, but still have access to a lot of parameterization data with some caveats. All of the below should probably not be used as SharedArray should be sufficient.
After k6 version v0.27.0, while there was still no way to share memory between VUs, the __VU variable was now defined during the init context which means that we could split the data between the VUs during initialization and not have multiple copies of it during the test run. This is not useful now that SharedArray exists. Combining both will likely not bring any more performance benefit then using just the SharedArray.
With 100k lines like:
and a total of 4.8MB the script uses 3.5GB to start 300 VUs, while without it for 100 VUs (with all the data for each VU) it requires nearly 10GB. For direct comparison 100VUs used near 2GB of memory.
Playing with the value for splits will give a different balance between memory used and the amount of data each VU has.
A second approach using another technique will be to pre-split the data in different files and load and parse only the one for each VU.
The files have 10k lines and are in total 128kb. Running 100VUs with this script takes around 2GB, while running the same with a single file takes upwards of 15GBs.
Either approach works for both JSON and CSV files and they can be combined, as that will probably reduce the memory pressure during the initialization even further.