Installation

  • I tried to install the package, but it fails. What's wrong?
    Very hard to tell without more information!
  • Okay, I'll tell you more about how the package failed to install. It said something about not finding curl-config and then gave a line something like
       ERROR: configuration failed for package 'RCurl'
    
    (Mostly for UNIX/Linux/Mac OS X users) So this seems pretty clear. R is telling you that the configuration for RCurl failed. So you need to think about how it is configured. The fact that it couldn't find curl-config should suggest the problem. And what it means is that one or more of the following is true
    • curl-config is not found in your path
    • curl-config and related devlopment libraries (libcurl) are not installed.
    So you need to make certain that libcurl is available. Do something like locate libcurl and see if it returns something like libcurl.so in the lines it emits. Alternatively, do locate curl-config and see if it is present.

    If libcurl is not installed, use your binary package manager to install the curl-dev package. This is different from the curl package which is the command-line tool for using curl to download files. We need the "linkable" library.

  • What version of libcurl do I need?
    I have only tested this back to versions of libcurl at 7.11.2. I would use the most recent version of libcurl. You can download the versions from http://curl.haxx.se/download.html.
  • I get errors about CURLOPT_HTTPAUTH not being defined when the C code is being compiled. What's the problem?
    The version of libcurl is important. I have developed this package using curl-7.12. You should download and install that. If this becomes a problem, we can perform tests as to what features are available at installation time and customize to that.
    I get errors about such as
     url.c: In function `RCurlVersionInfoToR':
     curl.c:613: structure has no member named `libidn'
     curl.c:613: structure has no member named `libidn'
    
    Again, this is a version of libcurl problem. We have seen this with curl-7.11.2. Again, we want curl-7.12! However, I have added a test for this in the code so that this problem doesn't arise. The libidn field will be NA if there is no such field in the version structure. If it is present but empty, we return "".
  • I get errors about things not being defined, even basic curl structures, etc. What's the problem?
    Possibly, you are not using GNU make. The Makevars file in the src/ directory uses some GNU make constructs. Please set the environment variable MAKE to gmake.

    Runtime

  • When I go to a page that is just a directory, I get results about "302 Moved Permanently". My browser shows the list of files. What's the difference?
    Not entirely certain yet! But try putting a / at the end of the URI, e.g. change
          http://www.omegahat.org/RCurl
    
    to
          http://www.omegahat.org/RCurl/
    
    That works for me.
  • Why does https not work for me?
    Probably because when you compiled/installed libcurl, you didn't have support for SSL. You can check this with the command
     curl-config
      --feature 
    If ssl doesn't appear there, you don't have support for it. You should reinstall curl, having first installed SSL (e.g. openssl).
  • I'm trying to use RCurl to do something in R that works in a Web browser and I can even reproduce using the command line program curl. (That's a big help as we know we are both using libcurl!) But when I do things in R, it fails. What's the problem and how can I fix it?
    Well, there are too many potential reasons and we would need to have more information about what you are attempting to do and what the error messages are. But for one, you might look at authentication and activating Basic authentication, e.g add httpauth = 1L, # "basic" to the curl options.

    But there is a general approach to trying to figure out how to get R to do the same thing as a browser or curl or wget. One approach is to make certain that both R and curl are giving us as much information as possible. So make sure both have verbose switched on. In R, this is a curl option verbose = TRUE and for curl, is is the command line switch -v. Then look at the header information both produce and see if anything is obviously missing or different in the R version.

    A different idea is somewhat advanced, but not very. When the browser or curl makes a request, it is sent across the network via your operating system. With the appropriate permissions on the computer, we can use a program such as tcpdump or wireshark or ethereal to "sniff" or capture the packets as they go across the network device and then we can look at them. We can do this for the curl or browser and then for R and compare what is being sent. This allows us to see the headers as we can with the verbose options, but it also allows us to see the content of the body of the request. This is only important for POST requests.

    We should also note that if you are using HTTPS, the body will be encrypted and you won't be able to make any sense of it. However, if the data in the post are not sensitive, you can send it via HTTP - not HTTPS - and curl and the Web browser will do the same thing and we will be able to see the contents. The server will likely be confused and upset and give an error, but we are trying to determine the problem on the initial client request so that is not a problem. (It is a problem if we are trying to understand why R is not handling the response correctly, but that's a different problem.)

    How do we use tcpdump and ethereal? First, start tcpdump just before you run the R or curl command

    sudo /usr/sbin/tcpdump -s 1518 -i eth1 -w r_packets.tcp
    
    If you do this and wait too long, you will capture all the background packets that are flying through your network interface that have nothing to do with your problem. This is not a problem, but it makes it harder to find the packets which we want to examine.

    So next, go back to R or curl and run the command. When this has completed, kill the tcpdump process, e.g Ctrl-C in the terminal in which it is running or kill with the relevant process id (see the ps command or the Mac activity monoitor.) Now run ethereal with the name of the file to which tcpdump serialized the packets

    ethereal r_packets.tcp
    
    And then you will get a window that looks something like You navigate the list in the top panel to find the HTTP entry (#64 in our example). Click on that and the details of this TCP interaction are displayed in the lower panel. Then you can expand the elements by clicking on the lines that have an arrow on the left. And then you'll see the details of the header and the body.

    If the connection is via SSL, e.g. HTTPS, things are a little more complicated as the content is encrypted. There are a variety of ways and tools to deal with this. Some are (in no particular order)

    • ssldump
    • Burp
    • Paros Proxy
    • fiddler (Windows)
    • WebScarab (Java)
    • Charles (commercial)
    • HttpWatch (Windows only, Free and commercial)
  • I am trying to use the verbose = TRUE option in curlPerform() or some of the other higher-level functions, but there is no output appearing on the console. What's the problem?
    You're on Windows using the R GUI, aren't you? If so, chose the option in one of the menus that controls whether output is buffered and unselect it, i.e. so the output is not buffered. That should do it.
  • I can't use scp or sftp within RCurl but the documentation for curl seems to suggest that it can. So why does RCurl not support it?
    RCurl does support it, but the likely problem is that the version of libcurl you have installed does not support it. You can check what protocols your libcurl and hence RCurl supports via the command curlVersion. If scp and sftp are not there, reinstall libcurl but with support using libssh2. You will need to have the libssh2 development libraries and headers installed before installing libcurl. On some OSes, you will need to rebuild RCurl from source.
  • I am trying to download a file via FTP, but takes a very long time. Any suggestions?
    We saw this in a posting to R-help by Zack Holden. There the issue was passive mode. When using R's own dwnload.file() function, an error was raised. When using RCurl's getURL(), the file was correctly downloaded, but it took a long time. There was about 3 minutes of delay. This can be seen with
    getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', 
            verbose = TRUE)
    
    One can see the request just wait for a long period of time and then eventually the contents of the file are displayed in the R session.

    The fix for this is to avoid extended passive mode and use the regular passive mode. This is controlled via the ftp.use.epsv option in calls to curlPerform() and we set this to FALSE so that PASV is used rather than EPSV.

    getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', 
            ftp.use.epsv = FALSE)
    
  • I want to download all the files in an FTP directory. How do I do it?
    First, we can get a list of all the file names with
    url = 'ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/'
    filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE)
    filenames = paste(url, strsplit(filenames, "\n")[[1]], sep = "")
    
    Now we can download each of these in turn. It is advisable to create a curl handle just once and to set any options before downloading any of the files.
    con = getCurlHandle( ftp.use.epsv = FALSE)
    sapply(filenames, getURL, curl = con)
    
  • When I try to interact with a URL via https, I get an error of the form
    Error in curlPerform(url = url, headerfunction = header$update, curl = curl,  : 
      SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed    
    
    What can I do?

    Information about this came from http://ademar.name/blog/2006/04/curl-ssl-certificate-problem-v.html

    Basically, the remote server is sending us a certificate to say it is who it says it is. However, we have to trust that certificate. We do this by providing information about a collection of trusted signing authorites, e.g. Verisign, Entrust, Thawte We can use the certificates from these agents from the Netscape collection, available via http://curl.netmirror.org/docs/caextract.html, but you can find other collections. We download this file or its equivalent. Next we need to tell libcurl to use that file and where to find it. We do this with the cainfo option.

    x = getURLContent("https://www.google.com",
                      cainfo = "/Users/duncan/cacert.pem")    
    
    Note that we cannot use a ~ in the file path; we have to expand it ourselves.
    x = getURLContent("https://www.google.com",
                      cainfo = path.expand("~/cacert.pem"))
    

    To avoid having to specify the location of the bundle in each call, you can place the file in a place that libcurl looks. This is usually the file /usr/local/share/curl/curl-ca-bundle.crt If you have write permission for this directory, you can place the files. (The file must be present before libcurl is configured and compiled.)

    On some versions of UNIX, the certificates will also be found in /usr/share/ssl/certs/ca-bundle.crt

    If you don't have a certificate from an appropriate signing agent, you can suppress verifying the certificate with the ssl.verifypeer option:

    x = getURLContent("https://www.google.com",
                       ssl.verifypeer = FALSE)
    
    This does risk a 'man in the middle' attack.


  • Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Wed Mar 17 10:42:52 PDT 2010