以下实例演示了如何使用 net.URL 类的 URL() 构造函数来抓取网页:
- /*
- author by shouce.ren
- Main.java
- */
- import java.io.BufferedReader;
- import java.io.BufferedWriter;
- import java.io.FileWriter;
- import java.io.InputStreamReader;
- import java.net.URL;
- public class Main {
- public static void main(String[] args)
- throws Exception {
- URL url = new URL("http://www.shouce.ren");
- BufferedReader reader = new BufferedReader
- (new InputStreamReader(url.openStream()));
- BufferedWriter writer = new BufferedWriter
- (new FileWriter("data.html"));
- String line;
- while ((line = reader.readLine()) != null) {
- System.out.println(line);
- writer.write(line);
- writer.newLine();
- }
- reader.close();
- writer.close();
- }
- }
以上代码运行输出结果为(网页的源代码,存储在当前目录下的 data.html 文件中):
- <!DOCTYPE html> <html> <head> <meta charset="UTF-8"/> <meta http-equiv="X-UA-Compatible" content="IE=11,IE=10,IE=9,IE=8"/>……