Self-Learn Yourself Apache Spark in 21 Blogs – #8
In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series.
Apache Spark can load from any input sources like HDFS, S3, Casandra, RDBMS, Parquet, Avro, and also in memory. Let’s see how we can use it in command line,
Memory Loading Methods
- parallelize
- makeRDD
- range
External Loading Methods
- TextFiles
- wholeTextFiles
- sequenceFile
- objectFile
- hadoopFile
- newAPIHadoopFile
- hadoopRDD
Now lets’ discuss what is a Lambdas expression, which is already used in above few examples. And which is used in future examples too. The lambda expression also known as anonymous functions. Below is the Lambda expression,
rdd.flatMap(line => line.split(“ “))
Let us know discuss on how to convert the named method to lambda expression,
NamedMethod:
def addOne(item: Int) = {
item+1
}
Val intList = List(1,2)
For(item <- intList) yield {
addOne(item)
}
Lambda:
def addOne(item: Int) = {
item+1
}
Val intList = List(1,2)
intList.map(X => {
addOne(x)
})
Still it can fine-tuned like this,
Val intList = List(1,2)
intList.map(item => item+1)
One more note Scala can multiline lambdas via user brackets.
Now let’s discuss on how to do transformation to have meaning full information’s.
Reference – Apache Spark Community and next blog we will see the next chapter in the series.
If you see something here that interests you, we’d love to have you involved. Please subscribe at www.dataottam.com to keep you trendy and for future reads on Big Data, Analytics, and IoT.
As always please feel free to comment us via coffee@dataottam.com to make it “The Best” for our Big Data Analytics community.