MENU
Skip menuTransfer correlations between generated jobs
Different ATLAS jobs may often need the same input files, so majority of ATLAS jobs have transfer correlations between each other, sharing partly or completely their input sets. The number of jobs and files makes it very time and resource consuming just to check this correlation for every job, so we did not add this dependency between jobs to the parameters of our model. On the other hand, sharing input files is a distinctive feature of ATLAS workflow, so we needed a way to check if the developed model can naturally get some of these transfer correlations out of following the custom CDFs for other parameters. For this we use the assessment of transfer correlations inside and between different categories of jobs. We divide jobs again by number of input files, because this is the crucial job characteristic in our model. Using job categories does not tell us exact transfer correlations between every pair of jobs in the workflow, but at least we can indirectly assess some parameters of such correlations.Jobs to files ratio inside category
The first method that we use is computing a ratio of number of jobs to number of requested unique input files per each category of jobs. That gives us to some extent the impression of how much the jobs from the selected category tend to reuse their input files, and thus share them between themselves. The table below lists all categories of jobs and the ratio for the real and generated workloads. Some categories of jobs almost never appear in the generated sample of 50000 jobs; they are designated with "N/A" in the table.
Category | Real sample | Generated |
---|---|---|
1 file | 1.01 | 1.00 |
2 files | 261 | 139.5 |
3 files | 1.2 | 1.06 |
4 files | 8.04 | 2.52 |
5 files | 1.26 | 0.93 |
6-8 | 0.34 | 0.29 |
9-11 | 0.21 | 0.17 |
12 files | 1.0 | 0.75 |
13 files | 0.123916 | 0.108 |
14-24 | 0.103159 | 0.075 |
25-42 | 0.073541 | 0.05 |
43 files | 0.025121 | 0.02 |
44 files | 0.032795 | 0.02 |
45 files | 0.104976 | 0.05 |
46 files | 0.025875 | 0.02 |
47-51 | 0.027369 | 0.02 |
52 files | 0.030782 | 0.01 |
53 files | 0.039050 | 0.019 |
54 files | 0.023421 | 0.019 |
55-62 | 0.022359 | 0.017 |
63 files | 0.032992 | 0.016 |
64-68 | 0.023484 | 0.016 |
69 files | 0.015040 | 0.014 |
70-81 | 0.019292 | 0.013 |
82-83 | 0.035016 | 0.013 |
84 files | 0.024709 | 0.0125 |
85 files | 0.024903 | 0.0126 |
86-101 | 0.014388 | 0.01 |
102-103 | 0.011046 | 0.009 |
104 | 0.013705 | 0.009 |
105 | 0.019385 | 0.009 |
106 | 0.017467 | 0.009 |
107 | 0.013596 | N/A |
108-124 | 0.014849 | 0.009 |
125 | 0.018162 | N/A |
126-127 | 0.023574 | 0.0082 |
128-133 | 0.010216 | 0.0077 |
134 | 0.014155 | 0.007 |
135-136 | 0.010471 | N/A |
137 | 0.013514 | N/A |
138-152 | 0.008471 | 0.007 |
153 | 0.039924 | 0.006 |
154-170 | 0.007351 | 0.0064 |
171 | 0.006311 | 0.005 |
172-202 | 0.006680 | 0.005 |
203 | 0.010716 | 0.006 |
204 | 0.005179 | N/A |
205-404 | 0.006162 | 0.003 |
405 | 0.005258 | N/A |
File sharing between categories
The second method is checking how many percents of files each pair of job categories share. This gives us an insight into inter-categories sharing of files. The aggregated table shows for each category the overlap with every other category.
See the aggregated table