Ad Hoc Examples
Before we look at a real-world example, let's look at the structure of a couple of programs that have implemented ad-hoc or application-level C/R using pseudocode. In both examples, we use functions save
and load
to
save and load a single value, respectively (this value could be of any type, like a matrix of decimal numbers). In the
first example, we see a program that performs a couple of long operations, where each operation depends on the next.
As a tradeoff between the complexity of implementation and the granularity of save points, only the final results of each large operation
are saved. It is assumed the variable is saved to a file of the same name with the ".save" extension.
if (not exists intermediateResult) {
if exists File("intermediateResult.save") {
intermediateResult = load("intermediateResult.save")
} else {
intermediateResult = longComputation(inputData)
save(intermediateResult)
}
}
finalResult = computeFinalResult(intermediateResult)
save(finalResult)
In the next example, data is saved incrementally during a loop. One apparent problem is that save
function, as called, will likely
save the entire outputData vector each iteration — additional care should be taken when using or implementing a save
procedure to avoid extreme and unnecessary overhead.
startIndex = 0
if (not exists outputData) {
if exists File("outputData.save") {
outputData = load("outputData.save")
}
}
if (exists outputData) {
startIndex = first ii such that outputData(ii) is uninitialized
}
for (ii=startIndex; ii < 10000; ii++) {
outputData(ii) = runComputation(inputData(ii))
save(outputData)
}
In the following MATLAB example, we show some excerpts of code from a research project that used ad-hoc C/R.
It is not necessary to read the code in detail; it is merely showing what an undesirable example may look
like. In relationship to the prior pseudocode listings, it more closely resembles the first example. The relevant parts are highlighted in red; the save procedures use MATLAB's save
function.
The first two calls to save
are meant to store incrementally generated data, whereas the last two
calls to save
merely store the final results of the computation performed by the script.
function epiData = variedUptakeEpiSim( ...
model, method, savename, rxnid, A, fredux, WTFlux, grData ...
)
%%% initialization code removed from snippet %%%
% optional "checkpoint" arg WTFlux was not provided, so recompute
if nargin < 7
WTFlux = zeros(lA,nrxn);
parfor i=1:lA
% ...
disp(strcat('Finished geometric WT sim ',num2str(i)));
end
potentialFail = abs(WTFlux(:,[536,1577])) < 1e-7;
pFsum = sum(potentialFail(:,1) & potentialFail(:,2));
save(strcat(savename,'_Flux.mat'),'WTFlux');
if pFsum > 0
error(strcat(num2str(pFsum)),' unlikely solutions encountered from geoFBA')
end
end
% optional "checkpoint" arg grData was not provided, so recompute
if nargin < 8
grData = zeros(lA,ngen,ngen);
disp(lA)
for i=1:lA
mtmp = model;
% ...
dtmp = doubleGeneMutationIsoAvg(mtmp, method, [fredux], [fredux], ...
squeeze(WTFlux(i,:)), dlvl);
grData(i,:,:) = dtmp;
disp(i);
end
save(strcat(savename,'_gr.mat'),'grData');
end
disp('Starting Epistasis Loop');
for i=1:lA
% ...
epiData(i,:,:) = squeeze(eEffect(:,:,cellij,cellij));
dlmwrite(strcat(savename,num2str(A(i))), squeeze(grRateKOTens(:,:,cellij,cellij)));
dlmwrite(strcat('epi',savename,num2str(A(i))), squeeze(epiData(i,:,:)));
end
save(strcat(savename,'_epi.mat'),'epiData');
uptakeFlux=A;
save(strcat(savename,'_rxn',num2str(rxnid),'.mat'),'uptakeFlux');
Note that something as simple as a typo in one of the first two save
statements
would not only cause a run-time error, it would prevent the save
statement from
being executed itself, resulting in a possibly severe loss of computation time, when this is
precisely what you wanted to avoid. In order to avoid this situation, you will want to
test your code appropriately at a small scale before running at a large scale.
Where is the restore procedure? In this case, there isn't any, since it is assumed the data variables
(xWTFlux
and grData
) are optional function arguments and must be loaded separately, as needed.
However, adding optional function arguments isn't enough: we must still check for these arguments by using nargin
checks in conditionals to make
sure we don't regenerate data unnecessarily.
The fact that nargin
depends on the order of function arguments gives rise to inflexible and error-prone code.
Instead, consider using
varargin
or the
inputParser class
as better alternatives to nargin
, or similar approaches in languages other than MATLAB.