What about a free unlimited Google API? In the past, Google provided such thing, but it is definitely deprecated (due to abuses?). The new Search API needs money ($5 for 1,000 queries), and the free API has a limited use of 100 queries per day. Without any money, you won’t get far. After getting that information. I let down that project… Until I contribute to Wiktionary!
Extracting words from Malagasy daily newspapers to Malagasy Wiktionary weren’t actually an easy thing to program. At the first version of the script. It only can parse RSS feeds, and is very slow compared to what I used to know. It is because it loads approx. 400,000 words at each launch.
While doing that work. I have noticed that there are a plenty of words that are actually compounded words.This notice gave me an idea: anticipate through looking on google search whether the word exists or not: because on 1,300 roots contained on the Malagasy Wiktionary, I can potentially make 1.7 million by combining two nouns, 2.2 billion with three, and likely 2.8 trillion using four roots. That is enormous, and even at full regime, I will never be able to look for them all: at 5 queries per second (fastest rate I’ve ever had) it will take respectively 4 days with 2 roots, 14 years with three and eventually 177 centuries (17,700 years) for four roots. This is the first reason for which I have decided to try hacking Google Search to see if the word combination has already been used.
First, I looked to the page source, and it is very, very complicated to understand. I even think that this page was made by bot as html tag names are not written in a human language. I also have tried to use the URL but it is actually very, very long, with characters that look more like hashes and keys (?), not findable as they don’t explicitly appear on the main page form. At first sight, this kind of project is likely to fall…
I have found on the Web a post describing how to use the Google Search without any API. But there was a problem: the discussion is almost three years old. And when downloaded, the search engine has visibly been changed: it is very probable that a Google employee reported that discussion leading the company to take adequate measures. When I ran the script, all I could see was that there was nothing operational: no results were given when doing any search. I still keep an eye on the downloaded script. And I am trying to find something which can solve this problem. This script just avoided me to spend hours and hours reinventing a (square) wheel.
Once this problem is solved, at least temporarily, the source code will be released on SourceForge: Bot-Jagwar. It will rapidly fall into deprecation, so if there are peoples willing to update the script. They’ll be welcome :).