Hadoop framework implementation and performance analysis on a cloud

Hadoop framework uses MapReduce programming paradigm to process big data by distributing data across a cluster and aggregating. MapReduce is one of the methods used to process big data hosted on large clusters. In this method, jobs are processed by dividing into small pieces and distributing over nodes. Parameters such as distributing method over nodes, the number of jobs held in a parallel fashion and the number of nodes in the cluster affect the execution time of jobs. The aim of this paper is to determine how number of nodes, maps and reduces affect the performance of Hadoop framework on a cloud environment. For this purpose, tests were carried out on a Hadoop cluster with 10 nodes hosted on a cloud environment by running PiEstimator, Grep, Teragen and Terasort benchmarking tools on it. These benchmarking tools available under Hadoop framework are classified as CPU-intensive and CPU-light applications as a result of tests. In CPU-light applications; increasing number of nodes, maps and reduces do not improve efficiency of these applications, even they cause increase of time spent on jobs by using system resources unnecessarily. Therefore, in CPU-light applications, selecting number of nodes, maps and reduces as minimum are found as the optimization of time spent on a process. In CPU-intensive applications, according to the phase that small job pieces are processed, it is found that selecting number of maps or reduces equal to total number of CPUs on a cluster as the optimization of time spent on a process.-

Keywords: Big Data, Hadoop, Cloud Computing, MapReduce, KVM Virtualization Environment, Benchmarking Tools, Ganglia. 

Просмотры
93
22.11.2016 - с этой даты
Скачано
1
22.11.2016 - с этой даты
Дата последнего доступа
28 Mayıs 2024 11:38
Проверка Google
Нажмите
Полный текст
Детальный вид
Название публикации
(dc.title)
Hadoop framework implementation and performance analysis on a cloud
Автор/ы
(dc.contributor.yazarlar)
Göksu Zekiye ÖZEN, Rayımbek SULTANOV, Mehmet TEKEREK
Вид публикации
(dc.type)
Makale
Язык
(dc.language)
İngilizce
Год публикации
(dc.date.issued)
2017
Национальный/Международный
(dc.identifier.ulusaluluslararasi)
Uluslararası
Источник
(dc.relation.journal)
Turkish Journal of Electrical Engineering and Computer Sciences
Номер
(dc.identifier.issue)
2
Том/№
(dc.identifier.volume)
25
Страница
(dc.identifier.startpage)
705-716
ISSN/ISBN
(dc.identifier.issn)
ISSN: 1300-0632; Online ISSN: 1303-6203
Издатель
(dc.publisher)
Tübitak
Базы данных
(dc.contributor.veritaban)
Web of Science Core Collection
Базы данных
(dc.contributor.veritaban)
Tübitak (Academik Journals)
Базы данных
(dc.contributor.veritaban)
Scopus
Вид индекса
(dc.identifier.index)
SCI Expanded
Вид индекса
(dc.identifier.index)
Scopus
Импакт-фактор
(dc.identifier.etkifaktoru)
0,625 / 2018-WOS / 5 Year: 0,708
Резюме
(dc.description.abstract)
Hadoop framework uses MapReduce programming paradigm to process big data by distributing data across a cluster and aggregating. MapReduce is one of the methods used to process big data hosted on large clusters. In this method, jobs are processed by dividing into small pieces and distributing over nodes. Parameters such as distributing method over nodes, the number of jobs held in a parallel fashion and the number of nodes in the cluster affect the execution time of jobs. The aim of this paper is to determine how number of nodes, maps and reduces affect the performance of Hadoop framework on a cloud environment. For this purpose, tests were carried out on a Hadoop cluster with 10 nodes hosted on a cloud environment by running PiEstimator, Grep, Teragen and Terasort benchmarking tools on it. These benchmarking tools available under Hadoop framework are classified as CPU-intensive and CPU-light applications as a result of tests. In CPU-light applications; increasing number of nodes, maps and reduces do not improve efficiency of these applications, even they cause increase of time spent on jobs by using system resources unnecessarily. Therefore, in CPU-light applications, selecting number of nodes, maps and reduces as minimum are found as the optimization of time spent on a process. In CPU-intensive applications, according to the phase that small job pieces are processed, it is found that selecting number of maps or reduces equal to total number of CPUs on a cluster as the optimization of time spent on a process.-
Резюме
(dc.description.abstract)
Keywords: Big Data, Hadoop, Cloud Computing, MapReduce, KVM Virtualization Environment, Benchmarking Tools, Ganglia. 
URL
(dc.rights)
https://journals.tubitak.gov.tr/elektrik/vol25/iss2/5/
DOI
(dc.identifier.doi)
10.3906/elk-1501-43
Факультет / Институт
(dc.identifier.fakulte)
Mühendislik Fakültesi
Кафедра
(dc.identifier.bolum)
Bilgisayar Mühendisliği Bölümü
Автор(ы) в учреждении
(dc.contributor.author)
Göksu Zekiye ÖZEN
Автор(ы) в учреждении
(dc.contributor.author)
Rayımbek SULTANOV
№ регистрации
(dc.identifier.kayitno)
BL5D4402D3
Дата регистрации
(dc.date.available)
2016-11-22
Заметка (Год публикации)
(dc.identifier.notyayinyili)
2017
Wos No
(dc.identifier.wos)
WOS:000399461300005
Тематический рубрикатор
(dc.subject)
big data
Тематический рубрикатор
(dc.subject)
hadoop
Тематический рубрикатор
(dc.subject)
cloud computing
Тематический рубрикатор
(dc.subject)
mapreduce
Тематический рубрикатор
(dc.subject)
kvm virtualization environment
Тематический рубрикатор
(dc.subject)
benchmarking tools
Тематический рубрикатор
(dc.subject)
ganglia 
Анализы
Просмотр публикации
Просмотр публикации
Достигнутые страны
Достигнутые города
Наши обязательства и политика в отношении файлов cookie подпадает под действие закона ТР защите персональных данных № 6698.
Да

creativecommons
Bu site altında yer alan tüm kaynaklar Creative Commons Alıntı-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.
Platforms