Dmytro Karpenko

Atlas Grid Workload on NDGF resources: analysis and modeling


Transfer correlations between generated jobs

Different ATLAS jobs may often need the same input files, so majority of ATLAS jobs have transfer correlations between each other, sharing partly or completely their input sets. The number of jobs and files makes it very time and resource consuming just to check this correlation for every job, so we did not add this dependency between jobs to the parameters of our model. On the other hand, sharing input files is a distinctive feature of ATLAS workflow, so we needed a way to check if the developed model can naturally get some of these transfer correlations out of following the custom CDFs for other parameters. For this we use the assessment of transfer correlations inside and between different categories of jobs. We divide jobs again by number of input files, because this is the crucial job characteristic in our model. Using job categories does not tell us exact transfer correlations between every pair of jobs in the workflow, but at least we can indirectly assess some parameters of such correlations.

Jobs to files ratio inside category


The first method that we use is computing a ratio of number of jobs to number of requested unique input files per each category of jobs. That gives us to some extent the impression of how much the jobs from the selected category tend to reuse their input files, and thus share them between themselves. The table below lists all categories of jobs and the ratio for the real and generated workloads. Some categories of jobs almost never appear in the generated sample of 50000 jobs; they are designated with "N/A" in the table.

Category Real sample Generated
1 file 1.01 1.00
2 files 261 139.5
3 files 1.2 1.06
4 files 8.04 2.52
5 files 1.26 0.93
6-8 0.34 0.29
9-11 0.21 0.17
12 files 1.0 0.75
13 files 0.123916 0.108
14-24 0.103159 0.075
25-42 0.073541 0.05
43 files 0.025121 0.02
44 files 0.032795 0.02
45 files 0.104976 0.05
46 files 0.025875 0.02
47-51 0.027369 0.02
52 files 0.030782 0.01
53 files 0.039050 0.019
54 files 0.023421 0.019
55-62 0.022359 0.017
63 files 0.032992 0.016
64-68 0.023484 0.016
69 files 0.015040 0.014
70-81 0.019292 0.013
82-83 0.035016 0.013
84 files 0.024709 0.0125
85 files 0.024903 0.0126
86-101 0.014388 0.01
102-103 0.011046 0.009
104 0.013705 0.009
105 0.019385 0.009
106 0.017467 0.009
107 0.013596 N/A
108-124 0.014849 0.009
125 0.018162 N/A
126-127 0.023574 0.0082
128-133 0.010216 0.0077
134 0.014155 0.007
135-136 0.010471 N/A
137 0.013514 N/A
138-152 0.008471 0.007
153 0.039924 0.006
154-170 0.007351 0.0064
171 0.006311 0.005
172-202 0.006680 0.005
203 0.010716 0.006
204 0.005179 N/A
205-404 0.006162 0.003
405 0.005258 N/A

File sharing between categories


The second method is checking how many percents of files each pair of job categories share. This gives us an insight into inter-categories sharing of files. The aggregated table shows for each category the overlap with every other category.

See the aggregated table