Some do not change too much, the Tobacco ones really change a lot.
I love that Black Honey fresh
A good way to truly test this, would be to work with one, maybe two flavors at a time. Mix a bottle of Test A, the bottle that will go in the cleaner for 'x' amount of time. After said time has passed, mix another bottle of Test B. This bottle will either follow Test A into the 'steep' chamber for a designated time (obviously less time than Test A, but more so than Test C). Test C will be mixed after Test A, and Test B have been given their allotted time in the 'steep' chamber, and it will be the 'Fresh Test'.
Now, have 3 separate batteries (if feasible, if not, then have at least 3 attys/cartos) to taste test each batch separately. It might be easier just to do two Tests, one steeped, one fresh. However, if you were to have the third, for a test to see if the prolonged steeping benefited, or not, I would recommend having the time differences being relatively substantial, as to allow it enough time to truly be a beneficial time difference factor.
Science
So true. And it doesn't help that this is a very virgin technology, if you will.
I think with situations like ours, being that there are literally hundreds of flavorings to be tested and tried by the public, each flavoring itself would need to be assessed. That is, rather than grouping all of said flavorings into one category, for example, alcohol based flavorings.