Introduction
Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data is used to better understand customers and their behaviors and preferences. Big data is also used to optimize business processes, personal quantification and performance optimization, improving healthcare and public health, sports performance, science and research, security and law enforcement and optimizing machine and device performance. Therefore, the present study has been undertaken in order to know the growth and development of publications in the field of big data research as indexed in web of science database.
Materials and Methods
Data was collected from the Web of Science database during 2011-2020, using search terms namely ‘big data’ in ‘topic filed’. Web of Science database is one of the very comprehensive bibliographic databases covering all aspects of science and technology.1, 2 A total of 45249 publications and analysed by using the spread sheet application as per the objectives of the study.
Data analysis and interpretations
Form of publications
Table 1
The Table 1 reveals that the major source of publications covered by web of science databases on big data research is Journal Articles with 36,958 publications (81.68%) followed by Reviews with 3278 publications (7.24%). Editorial Materials ranks the third position with 2195 publications (4.85%), Meeting abstracts with 971 publications (2.15%), Proceeding papers with 880 publications (1.94%) and remaining forms are less than one percentage as seen in the table. The results indicate that the research outputs on the subject of the period covered by the study are mostly published in the form of journal articles. 3, 4
Annual Growth Rate (AGR) of publications
provides the AGR and CAGR of the number of documents for period 2006 to 2015.
Table 2
During the period of 2011 to 2020, a total of 45,249 publications were published on big data research. The highest number of publications is 9389 was published in 2020. The lowest publications of 1058 are published in 2011. The average number of publications published per year was 45289. Table 2 show that there has been a steady growth in research publications on big data during the study period.
The Table 2 also provides that the annual growth rate of the total publications calculated year wise. AGR reveals that it has decreased from 112.19 in 2012 to 26.18 in 2020. There is a downward trend in the growth rate as seen in the Figure 2.
Relative growth rate and doubling time
The Relative Growth Rate (RGR) is the increase in a number of articles or pages per unit of time. This definition derived from the definition of relative growth rates in the study of growth analysis in the field of big data. The mean relative growth rate (R) over the specific period of the interval can be calculated from the following equation.
Relative Growth Rate (RGR
2. 1 - 2R=Log W2 – Log W1/ T2-T1
Whereas
1-2 R- mean relative growth rate over the specific period of interval
Loge W1 — log of the initial number of articles
6. Loge W2 - log of the final number of articles after a specific period of interval
T2-T1- the unit difference between the initial time and the final time
The year can be taken here as the unit of time
Doubling Time (DT = 0 693/R
Table 3
Table 3 Indicates that the RGR is decreased from 0.76 in 2012 to 0.23 in 2020. The highest value 0.76 corresponds to the year 2012, whereas the lowest value 0.23 for the year 2020. Correspondingly, the Doubling Time of the publications gradually increased from 0.91 in 2012 to 3.01 in 2020.
Most prolific authors
Table 4
The data on big data research publication during 10 years between 2011 and 2020 reveals that in total, 87,541 authors contributed to the publishing of the 45249 publications. The authors having 120 or more publications during 2011-2020 are shown in Table 3. Zhang, Y is the most productive author with 268 (0.59%) publications followed by Liu Y with 232 (0.51%) publications, Li Y with 197 (0.44%) publications, Wang Y with 191 (0.42%) publications and Wang J with 160 (0.35%) publications respectively.
Highly productive institutions
Table 5
A total of 16968 organizations are contributed entire research output of the study. The scientometric profile of top 10 institutions is presented in Table 5 . Findings revealed that Chinese Academy of Science, China with 1423 (3.14%) publications is the most productive institutions in the field of big data research followed by University of California System, USA with 1264 (2.79%) publications, University of London, UK with 812 (1.79%) publications, Harvard University, USA with 702 (1.55%) publications Centre National De La Recherche Scientifique CNRS, France with 613 (1.35%) publications, and University of Texas System USA with 539 (1.24%) publications.
Highly productive countries
Table 6
The publication share of highly productive countries (≥1600 publications) on big data is given in Table 6. In all, there were 172 countries involved in the research in big data; however, USA topped the list with highest share (29.71%) of publications. China ranked second with 26.21% share of publications followed by England 9.12% share of publications, Germany with 6.52% share of publications, Australia with 5.76% share of publications, Italy with 4.85% share of publications and Spain with 4.69% share of publications.
Language wise distribution
Table 7
Publications on big data are spread over 22 languages. The study reveals that the maximum number of publications have been published in the English language with 44201 (97.68%) publications, followed by Chinese language with 284 (0.63%) publications, German language ranks third position with 217 (0.48%) publications, Spanish language with 172 (0.38%) publications, French language with 96 (0.21%) publications. And the remaining languages such as Portuguese, Polish, Russian, Hungarian and other languages are constituted in negligible percentage. The English language superiority was found in every year in total productivity on the subject during the study period.
Major source title of publications
Table 8
The publication share of most productive source titles (≥200 publications) on big data is given in Table 8. The scientific literature on big data is spread over 3951 different source journals and conference publications. It reveals that IEEE Access the list with the highest number of publications 1258 (2.78%) followed by Future generation system the international journal of e-science with a share of 491 (1.09%) publications. The PLOS one occupies third position with 489 (1.08%) publications. The fourth highest source title is Sustainability with 471 (1.04%) publications and Sensors with 304 (0.67%) publications.
High productivity subject areas
Table 9
Table 9 Shows high productivity subjects which are contributing more than1400 articles. It is found that Computer science has the highest number of articles with 11671 (25.79%) followed by Engineering contributing 8743 (19.32%) articles. Environmental sciences ecology occupies the third position with 3359 (7.42%) articles. The fourth highest articles belonged to the subject Telecommunications with 3177 (7.02%), Business economics with 3170 (7.01%) and Science technology with 2669 (5.90%) articles respectively.
Conclusions
A number of research works are being carried out all over the world in this field. A total of 45249 publications were published on big data research during a 2011-2020 and average number of publications per year was 4524.9. The density of the research output during the year 2020 with 9389 publications.
Among 172 countries, USA topped the list with highest share (29.71%) of publications. China ranked second with 26.21% share of publications followed by England 9.12% share of publications. Chinese Academy of Science, China with 1423 (3.14%) publications is the most productive institutions followed by University of California System, USA with 1264 (2.79%) publications. Among source titles, IEEE Access the list with the highest number of publications 1258 (2.78%) followed by Future generation system the international journal of e-science with a share of 491 (1.09%) publications. The maximum number of publications has been published in the English language with 44201 (97.68%) publications, followed by Chinese language with 284 (0.63%) publications.