Saturday, 10 August 2013

Creating a Web Crawler using Java and Open the URLs in web browser

Creating a Web Crawler using Java and Open the URLs in web browser

I have created a web crawler and I want to open each URL in the web
browser and close each tab in few seconds. Could you please help me to do
that? I need a code example.I have created a simple GUI to enter the URL
and separate class for crawling. Here is the code for GO button in the
GUI.
private void btnGoActionPerformed(java.awt.event.ActionEvent evt) {
try {
String url = txtUrl.getText();
jtaUrl.setText("");
new SimpleCrawler(url,jtaUrl);
} catch (Exception ex) {
ex.printStackTrace();
}
}
And here is the code for crawler class.
package simplecrawler;
import java.io.*;
import java.net.*;
import java.util.regex.*;
import javax.swing.JTextArea;
public class SimpleCrawler {
private static final long serialVersionUID = 1L;
SimpleCrawler(String webUrl, JTextArea webLinks) throws Exception {
try {
InputStreamReader in = new InputStreamReader(
new URL(webUrl).openStream());
StringBuilder input = new StringBuilder();
int ch;
while ((ch = in.read()) != -1) {
input.append((char) ch);
}
String patternString =
"<a\\s+href\\s*=\\s*(\"[^\"]*\"|[^\\s>])\\s*>";
Pattern pattern = Pattern.compile(patternString,
Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String match = input.substring(start, end);
match.matches("\\s*=\\s*");
webLinks.append(match);
webLinks.append("\n");
}
} catch (IOException e) {
e.printStackTrace();
} catch (PatternSyntaxException e) {
e.printStackTrace();
}
}
}

No comments:

Post a Comment