Java 实例 - 网页抓取


以下实例演示了如何使用 net.URL 类的 URL() 构造函数来抓取网页:

  1. /*
  2. author by shouce.ren
  3. Main.java
  4. */
  5.  
  6. import java.io.BufferedReader;
  7. import java.io.BufferedWriter;
  8. import java.io.FileWriter;
  9. import java.io.InputStreamReader;
  10. import java.net.URL;
  11.  
  12. public class Main {
  13. public static void main(String[] args)
  14. throws Exception {
  15. URL url = new URL("http://www.shouce.ren");
  16. BufferedReader reader = new BufferedReader
  17. (new InputStreamReader(url.openStream()));
  18. BufferedWriter writer = new BufferedWriter
  19. (new FileWriter("data.html"));
  20. String line;
  21. while ((line = reader.readLine()) != null) {
  22. System.out.println(line);
  23. writer.write(line);
  24. writer.newLine();
  25. }
  26. reader.close();
  27. writer.close();
  28. }
  29. }

以上代码运行输出结果为(网页的源代码,存储在当前目录下的 data.html 文件中):

  1. <!DOCTYPE html> <html> <head> <meta charset="UTF-8"/> <meta http-equiv="X-UA-Compatible" content="IE=11,IE=10,IE=9,IE=8"/>……