大数据日志分析Hadoop项目实战

0x00 教程内容

  1. 大数据日志分析系统简介
  2. UserAgentParser的使用
  3. 实战准备
  4. 项目实战
  5. 结果展示

0x01 大数据日志分析系统简介

1. 需求

a. 简单统计网站的访问日志中每个浏览器的访问次数

2. 背景及架构

a. 请参考文章:大数据日志分析系统背景及架构

0x02 UserAgentParser

1. UserAgentParser的介绍

a. 可以用来解析http user-agent信息的小工具(别人写好的小项目)

2. user-agent信息

a. 信息样式
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
b. 查看user-agent(进入网站按<kbd>F12</kbd>进入检查界面,刷新一下)

在这里插入图片描述

0x03 实战准备

1. 下载UserAgentParser小工具

a. 下载地址(可以用git或者直接下载压缩包然后解压):
https://github.com/LeeKemp/UserAgentParser

2. 安装对应的jar包到本地Maven仓库

a. 用Maven打包小工具成jar包(进入主目录,如:E:\workspace\UserAgentParser-master)
mvn clean package -DskipTest

在这里插入图片描述

b. 安装jar包到本地Maven仓库
mvn clean install -DskipTest
在这里插入图片描述

0x04 项目实战

1. 构建项目

a. 可参考此文章的0x01 新建maven工程
Java API实现HDFS的相关操作

2. 引入依赖

a. 引入依赖(如果没有在安装此jar到本地仓库是无法引入的)

<!-- 添加UserAgent的依赖 -->
<dependency>
    <groupId>com.kumkee</groupId>
    <artifactId>UserAgentParser</artifactId>
    <version>0.0.1</version>
</dependency>

b. 完整的依赖

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.shaonaiyi.hadoop</groupId>
    <artifactId>hadoop-learning</artifactId>
    <version>1.0</version>

    <name>hadoop-learning</name>
    <!-- FIXME change it to the project's website -->
    <url>http://www.example.com</url>

    <properties>
        <hadoop-version>2.7.5</hadoop-version>
    </properties>

    <dependencies>

        <!--添加hadoop依赖-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop-version}</version>
        </dependency>

        <!-- 添加UserAgent的依赖 -->
        <dependency>
            <groupId>com.kumkee</groupId>
            <artifactId>UserAgentParser</artifactId>
            <version>0.0.1</version>
        </dependency>

        <!--添加单元测试依赖-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>
3. 编写测试代码

a. 在java的测试目录创建一个测试包(com.shaonaiyi.hadoop.project):

在这里插入图片描述

b. 新建UserAgentTest测试类:

package com.shaonaiyi.hadoop.project;

import com.kumkee.userAgent.UserAgent;
import com.kumkee.userAgent.UserAgentParser;

/**
 * @Auther: 邵奈一
 * @Date: 2019/03/27 下午 2:45
 * @Description: UserAgent解析测试类
 */
public class UserAgentTest {

    public static void main(String[] args) {
        String agentSource = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36";
        UserAgentParser userAgentParser = new UserAgentParser();
        UserAgent agent = userAgentParser.parse(agentSource);

        String browser = agent.getBrowser();
        String engine = agent.getEngine();
        String engineVersion = agent.getEngineVersion();
        String os = agent.getOs();
        String platform = agent.getPlatform();
        boolean isMobile = agent.isMobile();
        String version = agent.getVersion();

        System.out.println("浏览器:" + browser);
        System.out.println("引擎:" + engine);
        System.out.println("引擎版本:" + engineVersion);
        System.out.println("操作系统:" + os);
        System.out.println("平台:" + platform);
        System.out.println("是否为移动设备:" + isMobile);
        System.out.println("版本号:" + version);

    }

}

c. 执行测试代码,可看到结果:


在这里插入图片描述
4. 编写实战代码

a. 新建包


在这里插入图片描述

b. 新建ParseUserAgentApp类:

package com.shaonaiyi.hadoop.project;

import com.kumkee.userAgent.UserAgent;
import com.kumkee.userAgent.UserAgentParser;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @Auther: 邵奈一
 * @Date: 2019/03/27 下午 2:54
 * @Description: 使用MapReduce完成浏览器的访问次数统计
 */
public class ParseUserAgentApp {

    //Map类实现
    public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> {

        LongWritable one = new LongWritable(1);
        private UserAgentParser userAgentParser;

        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            userAgentParser = new UserAgentParser();
        }

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            //每条日志信息
            String lines = value.toString();

            String agentSource = lines.substring(getCharacterPosition(lines, "\"", 7) + 1);
            UserAgent agent = userAgentParser.parse(agentSource);
            String brower = agent.getBrowser();

            context.write(new Text(brower), one);
        }

        @Override
        protected void cleanup(Context context) throws IOException, InterruptedException {
            userAgentParser = null;
        }
    }

    //Reduce类实现
    public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {

        @Override
        protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {

            int sum = 0;
            for (LongWritable value: values){
                sum += value.get();
            }
            context.write(key, new LongWritable(sum));

        }
    }

    /**
     * 获取指定字符串中指定标识的字符串出现的索引位置
     * @param value 指定的字符串
     * @param operator 指定标识
     * @param index 索引位置
     * @return 返回的索引位置
     */
    private static int getCharacterPosition(String value, String operator, int index){

        Matcher slashMatcher = Pattern.compile(operator).matcher(value);
        int matcherIndex = 0;
        while (slashMatcher.find()) {
            matcherIndex++;

            if (matcherIndex == index) {
                break;
            }
        }
        return slashMatcher.start();
    }


    public static void main(String[] args) throws Exception{

        Configuration configuration = new Configuration();

        // 若输出路径有内容,则先删除
        Path outputPath = new Path(args[1]);
        FileSystem fileSystem = FileSystem.get(configuration);
        if(fileSystem.exists(outputPath)){
            fileSystem.delete(outputPath, true);
            System.out.println("路径存在,但已被删除");
        }

        Job job = Job.getInstance(configuration, "ParseUserAgentApp");

        job.setJarByClass(ParseUserAgentApp.class);

        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

ps:代码其实是此教程MapReduce入门例子之单词计数 改写来的,请查看学习!

c. 打包(怎么打包都可以,此处教一种新的打包方式)
mvn assembly:assembly

在这里插入图片描述

0x05 结果展示

1. 上传项目到服务器

a. 打包好项目后,可以在target目录看到有两个包:
hadoop-learning-1.0.jar:没有引入外部依赖的jar包(在本地windows可以用而已)
hadoop-learning-1.0-jar-with-dependencies.jar:引入了外部依赖的jar包(含小工具jar包)

说明:因为我们的服务器没有我们刚开始时打包的UserAgentParser-0.0.1.jar包,我们只是在我们windows系统本地打了jar包,并安装到了我们windows系统的Maven仓库,实际上我们的服务器上没有的,所以要将hadoop-learning-1.0-jar-with-dependencies.jar此包拷贝到服务器使用,不然的话也要在服务器的Maven仓库安装好UserAgentParser-0.0.1.jar才行。
b. 上传项目到服务器

[hadoop-sny@master mr]$ pwd
/home/hadoop-sny/mr
[hadoop-sny@master mr]$ ll
total 352752
-rw-rw-r--. 1 hadoop-sny hadoop-sny 321100030 Dec 13 18:51 big_file_again.txt
-rw-rw-r--. 1 hadoop-sny hadoop-sny      8837 Mar 22 19:34 hadoop-learning-1.0.jar
-rw-rw-r--. 1 hadoop-sny hadoop-sny  39193853 Mar 27 15:34 hadoop-learning-1.0-jar-with-dependencies.jar
-rw-rw-r--. 1 hadoop-sny hadoop-sny    903971 Dec 19 10:45 mapreduce-course-1.0-SNAPSHOT.jar
-rw-rw-r--. 1 hadoop-sny hadoop-sny        30 Dec 19 17:12 small_file.txt
2. 上传日志文件到HDFS

a. 日志样式,如果没有文件,可以自己复制多几次写成文件来使用:

183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getadv HTTP/1.1" 200 813 "www.imooc.com" "-" cid=0&timestamp=1478707261865&uid=2871142&marking=androidbanner&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=f51e97d1cb1a9caac669ea8acc162b96 "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.134.244:80 200 0.027 0.027
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
117.35.88.11 - - [10/Nov/2016:00:01:02 +0800] "GET /article/ajaxcourserecommends?id=124 HTTP/1.1" 200 2345 "www.imooc.com" "http://www.imooc.com/code/1852" - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36" "-" 10.100.136.65:80 200 0.616 0.616
182.106.215.93 - - [10/Nov/2016:00:01:02 +0800] "POST /socket.io/1/ HTTP/1.1" 200 94 "chat.mukewang.com" "-" - "android-websockets-2.0" "-" 10.100.15.239:80 200 0.004 0.004
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/userdynamic HTTP/1.1" 200 19501 "www.imooc.com" "-" cid=0&timestamp=1478707261847&uid=2871142&touid=2871142&page=1&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=3837a5bf27ea718fe18bda6c53fbbc14 "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.136.65:80 200 0.195 0.195
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
114.248.161.26 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getcourseintro HTTP/1.1" 200 2510 "www.imooc.com" "-" cid=283&secrect=86b720f312c2b25da3b20e59e7c89780&timestamp=1478707261951&token=4c144b3f4314178b9527d1e91ecc0fac&uid=3372975 "mukewang/5.0.2 (iPhone; iOS 8.4.1; Scale/2.00)" "-" 10.100.136.65:80 200 0.007 0.008
120.52.94.105 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getmediainfo_ver2 HTTP/1.1" 200 633 "www.imooc.com" "-" cid=608&secrect=e25994750eb2bbc7ade1a36708b999a5&timestamp=1478707261945&token=9bbdba949aec02735e59e0868b538e19&uid=4203162 "mukewang/5.0.2 (iPhone; iOS 10.0.1; Scale/3.00)" "-" 10.100.136.65:80 200 0.049 0.049
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
112.10.136.45 - - [10/Nov/2016:00:01:02 +0800] "POST /socket.io/1/ HTTP/1.1" 200 94 "chat.mukewang.com" "-" - "android-websockets-2.0" "-" 10.100.15.239:80 200 0.006 0.006
211.162.33.31 - - [10/Nov/2016:00:01:02 +0800] "GET /u/card HTTP/1.1" 200 331 "www.imooc.com" "http://www.imooc.com/code/2053" - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-" 10.100.136.65:80 200 0.371 0.371
116.22.196.70 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/code/3500" mid=3500&time=60 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0" "-" 10.100.134.244:80 200 0.026 0.026
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
113.47.86.12 - - [10/Nov/2016:00:01:02 +0800] "GET /socket.io/1/websocket/eHBhkZC47oY64iLMMeXm HTTP/1.1" 101 125 "chat.mukewang.com" "-" - "-" "-" 10.100.15.239:80 101 277.433 277.433
119.130.229.90 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/code/547" mid=547&time=60 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36" "-" 10.100.136.65:80 200 0.021 0.021
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
120.52.94.105 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/getrelevantcourse HTTP/1.1" 200 774 "www.imooc.com" "-" cid=608&secrect=e25994750eb2bbc7ade1a36708b999a5&timestamp=1478707262003&token=2b865e78535436df02fd3f986bb0cc08&uid=4203162 "mukewang/5.0.2 (iPhone; iOS 10.0.1; Scale/3.00)" "-" 10.100.136.65:80 200 0.048 0.048
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
183.44.115.163 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/savemediafinish HTTP/1.1" 200 103 "www.imooc.com" "-" is_offline=0&time=0&mid=2312&secrect=cc8506ee27115cd3c9d617730ea600d9&cid=0&plat_id=5&timestamp=1478707261086&uid=4356276&stay_time=0&token=22e4a2ec2c40a7c4375651c5020e7023 "mukewang/5.0.1 (Android 5.0.2; Xiaomi Redmi Note 2 Build/LRX22G),Network WIFI" "-" 10.100.136.64:80 200 0.068 0.068
211.162.33.31 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxusermediasstatus?cid=9 HTTP/1.1" 200 2954 "www.imooc.com" "http://www.imooc.com/code/2053" - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-" 10.100.136.64:80 200 0.030 0.030
218.58.205.220 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.009 0.009
114.246.57.116 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/userinfo HTTP/1.1" 200 151 "www.imooc.com" "-" secrect=9455e4679d68f107477a27d69cdf753c&timestamp=1478707262002&token=73bdcb218e48acd4869826afa320baf4&uid=4132795&uuid=0dd9c37bf4ac75031158349738b7612b "mukewang/5.0.2 (iPhone; iOS 10.1.1; Scale/2.00)" "-" 10.100.136.64:80 200 0.070 0.071
218.58.205.245 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.002 0.002
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
112.253.38.168 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.024 0.024
218.58.205.204 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.023 0.023
112.253.38.159 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.024 0.024
218.58.205.252 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.025 0.025
119.184.176.131 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.003 0.003
223.104.31.75 - - [10/Nov/2016:00:01:02 +0800] "GET /socket.io/1/websocket/szGk1G7hrpIe6RWHMfLK HTTP/1.1" 101 91 "chat.mukewang.com" "-" - "-" "-" 10.100.15.239:80 101 30.068 30.068
218.58.205.216 - - [10/Nov/2016:00:01:02 +0800] "HEAD /favicon.ico HTTP/1.1" 404 0 "chat.mukewang.com" "-" - "Go-http-client/1.1" "-" 10.100.15.239:80 404 0.022 0.022
183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/beta HTTP/1.1" 200 16950 "www.imooc.com" "-" cid=0&timestamp=1478707261842&uid=2871142&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=4ea00393c5ac3588c5317cf9f28013fa "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.136.65:80 200 0.377 0.377
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
106.39.41.166 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser/ HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/video/8701" mid=8701&time=120.0010000000002&learn_time=16.1 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0" "-" 10.100.136.64:80 200 0.016 0.016
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000
183.162.52.7 - - [10/Nov/2016:00:01:02 +0800] "POST /api3/searchindex HTTP/1.1" 200 1484 "www.imooc.com" "-" cid=0&words=premiere&timestamp=1478707261876&uid=2871142&secrect=a6e8e14701ffe9f6063934780d9e2e6d&token=1b4fcde08cb054e9077b2f316a7da0b0 "mukewang/5.0.0 (Android 5.1.1; Xiaomi Redmi 3 Build/LMY47V),Network 2G/3G" "-" 10.100.136.65:80 200 0.110 0.110
39.186.247.142 - - [10/Nov/2016:00:01:02 +0800] "GET /video/3237 HTTP/1.1" 200 7227 "www.imooc.com" "http://www.imooc.com/ceping/4191" - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-" 10.100.136.64:80 200 0.198 0.198
113.140.11.123 - - [10/Nov/2016:00:01:02 +0800] "POST /course/ajaxmediauser/ HTTP/1.1" 200 54 "www.imooc.com" "http://www.imooc.com/video/5915/0" mid=5915&time=60.01200000000006&learn_time=284.9 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393" "-" 10.100.134.244:80 200 0.029 0.029
10.100.0.1 - - [10/Nov/2016:00:01:02 +0800] "HEAD / HTTP/1.1" 301 0 "117.121.101.40" "-" - "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" - - - 0.000

b. 上传日志到HDFS根目录
hadoop fs -put access.log /files

3. 执行项目

a. 请确保HDFS与YARN已启动,并确保HDFS根目录有access.log文件
b. 进入jar包所在目录(此处为:/home/hadoop-sny/mr
cd /home/hadoop-sny/mr
c. 执行:
hadoop jar ./hadoop-learning-1.0-jar-with-dependencies.jar com.shaonaiyi.hadoop.project.ParseUserAgentApp /files/access.log /projectout
d. 查看统计结果
hadoop fs -cat /projectout/*
结果显示:

[hadoop-sny@master mr]$ hadoop fs -cat /projectout/*
Chrome  2775
Firefox 327
MSIE    78
Safari  115
Unknown 6705
[hadoop-sny@master mr]$

0xFF 总结

  1. 同样,也可以进入YARN的WebUI界面:http://master:8088,查看执行的作业
  2. 实战的代码其实是此教程:MapReduce入门例子之单词计数 的进阶版,请跳转学习,一步一步升级打怪!
  3. 思考题:请尝试挖掘更多的业务,实现更多的需求,此处只是统计了一个浏览器的次数,其实通过我们的测试类,可以发现我们的业务不只有统计日志中浏览器出现的次数。

作者简介:邵奈一
大学大数据讲师、大学市场洞察者、专栏编辑
公众号、微博、CSDN邵奈一

复制粘贴玩转大数据系列专栏已经更新完成,请跳转学习!

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 158,117评论 4 360
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 66,963评论 1 290
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 107,897评论 0 240
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,805评论 0 203
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,208评论 3 286
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,535评论 1 216
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,797评论 2 311
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,493评论 0 197
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,215评论 1 241
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,477评论 2 244
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 31,988评论 1 258
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,325评论 2 252
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 32,971评论 3 235
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,055评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,807评论 0 194
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,544评论 2 271
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,455评论 2 266

推荐阅读更多精彩内容