Spark Transformations
Spark TransformationsWe all know the following fact:1, RDD are immutable2, Never modify RDD in place3, Transform RDD to another RDDThere are 2 different transformations for RDD, one is narrow transformation:http://static.yjs001.cn/uploadpic/3/9f/39f93b6facdd044792ca121abf9764b5.jpgtransformations like map, flatMap, filter all are narrow transformation, which means shuffle won't happen, so it's fast, it's speed just depends on:1, availability of local memory2, CPU speedanother is wide transfomration:http://static.yjs001.cn/uploadpic/a/c5/ac51e0b7b40b177285290dd87c6870e7.jpgtransformations like groupByKey, reduceByKey, repartition all are wide tranformation, the network speed in shuffle is the key to it's speed, so it's slowerthe final comparison:http://static.yjs001.cn/uploadpic/f/9d/f9de0a92d322a0de7547fd99e811aabe.jpg摘自:http://www.yjs001.cn/bigdata/spark/40455850856940820301.html
Spark Transformations
页:
[1]