Blog

  • Google translate now available in Malagasy

    Good news, if it can be said, for my fellow Malagasy citizens: Since 6th of December 2014, Google Translate has been allowing them to see almost any web page in their mother tongue in addition to 89 others. Many people, myself included, have been waiting for this moment that would have come sooner or later. First of all, I would like to address a big thanks to all people that have made this possible. Thanks to you, the Malagasy language is getting further integrated into the polyglot Web world. You’ve also given a chance to the 15 million monolinguals to have an approximate understanding of what other people have written using other languages are writing.

    Before Google Translate

    Before we’ve got Google translate to translate almost anything in our language, including curse words, several websites have helped us Malagasy and other language enthusiasts to write corpora in a proper way in our mother tongue: many of us have already heard about Freelang, tenymalagasy.org and so on. The only drawback of these website is that they do not work in a collaborative way: they are not «crowdsourced». Wikibolana is a Malagasy language crowdsourced dictionary, but I have been so far the one that has generated most of its content.

    Is it really that good?

    Well, let’s be honest: absolute accuracy has been the motto for no machine translation system ever. But for a brand new language on Google Translate, Malagasy is… quite good. Daring to translate a language with such an unusual syntax like Malagasy is already a huge challenge, a challenge worth to be accepted. At first sight, idiomatic sentences and expressions are fairly well handled. Still when it comes to very complex sentences, it is a  mess: verbs are at the wrong place, which either gives the sentence a completely different meaning, or makes it look like an incomplete sentence. There are also some fails as the one in the screen shot below.

    GTfail
    “ahave” does not mean anything in Malagasy. But this is not the opinion of Google Translate

    Let’s see an example of a translation of a paragraph of the article Madagascar in the English Wikipedia:

    Original in English In 2012, the population of Madagascar was estimated at just over 22 million, 90 percent of whom live on less than two dollars per day. Malagasy and French are both official languages of the state. […] The island’s elephant birds, a family of endemic giant ratites, went extinct in 17th century or earlier, most probably due to human hunting of adult birds and poaching of their large eggs for food. Google-translated in Malagasy (as of December 2014) Tamin’ny 2012, ny mponina ao Madagasikara dia tombanana ho 22 tapitrisa mahery kely, 90 isan-jaton’ny izay [no] miaina amin’ny  [vola] latsaky ny roa dolara isan’andro. Malagasy sy Frantsay dia samy fiteny ofisialy ao amin’ny fanjakana. […] Ny nosy vorona ny elefanta, ny fianakaviana ny fizahantany ratites goavana, dia efa lany tamingana tamin’ny taonjato faha-17, na teo aloha, indrindra noho ny olona angamba ny olon-dehibe ny fihazana sy ny vorona lehibe Fihazana ny atodiny ho sakafo.  

    The green-coloured sentences are syntactically correct without correction. The first one has required the red words in square brackets to sound correct. The third one hurt my brain: “The elephants are a bird island, the family of big tourists, have gone extinct in 17th century, or before, perhaps because of people, adults, hunting and adult birds who have their eggs hunted for food.” It hurt to understand, and also hurt to back-translate. Astonishingly making a round-trip translation has given a correct sentence in English, so please always have your translations checked human translators.

    Efforts to be continued

    One can take part to increase translation accuracy by translating articles by using the Google translator toolkit, or by using and correcting translations provided by Google translate itself.

  • Switching to Linux: good or bad choice?

    Last updated on July 13, 2014
    Do you want to switch to Linux? Before doing so, I invite you to reconsider all implied consequences of a switching to another operating system.
     
    Linux? What is that?
    But in the first place, what is Linux? It is the kernel of the GNU/Linux operating system. To be frank with you, «Linux» is a generic name for a few dozens of distributions having one thing in common: the Linux kernel. What is a kernel? It is a software that manages your hardware (motherboard, CPU, hard disk, networking, etc.) to make it work with applications you use. Current Microsoft Windows’ kernel is NT. By the past it also had MS-DOS which was the kernel used for Windows 1 up to Windows ME. I can write about this longer, but then we’d be off-topic.
    So, Linux is an operating system, competing with Windows. It has to be known that Desktop computer market is the «final frontier» for Linux. All desktop computers nowadays come with Microsoft Windows pre-installed.
    Because they use different kernels, Windows’ software will not work on Linux. There’s still a (poor) workaround for this problem, but I’ll talk about it later. This is also a blessing because Windows’ viruses can’t run on Linux whatsoever.
    I’m not saying Linux is totally clean of viruses – because people have already created viruses that have successfully infected a Linux system – but still, with right reflexes, you’ll avoid most of problems. The most basic tip is to never run a Linux-based system as a root user, unless you know exactly what you’re doing. You can still run tasks requiring root privileges by using your own user password, but it will mostly happen when you install programmes.
     
    Linux is Free
    Primarily, Linux distributions can be used legally free of charge, by anyone. This means you don’t need to install an «anti-product activation » thing picked from a weird site, to use your operating system at will. The latter action, often performed by Windows users, is not only illegal, but can also compromise your security by letting that weird software from a weird site dig «holes» (backdoors) in your firewall. For people who like doing computer DIY, Linux is also open-source, developped by a community counting thousands of programmers an code reviewers. Have you found a bug in the software? You have the freedom to patch it and share your patch to other people. Yes, Linux licence allows this.
    You also have a vast array of choices regarding distributions (commonly known as «distros»).
    Linux distros are all built to do things in a certain way, so you have to think about what you’ll be doing with the OS, and then you download the distro that fits your needs. It is not like Windows, where you first install your OS, and then figure out what you need.
    All distros (eleven) have their own software repository and desktop environment (DE) but they all have something in common: the Linux kernel, hence the generic name. By May 2014, the most recent version is 3.14 issued two months ago.
    Something that discriminates each distro is at first sight their desktop environment, then the default software. Ubuntu itself has six desktop environments (Edubuntu, Kubuntu, Mythbuntu, Ubuntu Studio, Xubuntu, Lubuntu). Depending on your taste, you choose your DE: Unity has a very «modern» appearance; KDE is a very flexible desktop making it look almost like what you want it to (you can even rotate icons on the desktop!); LXDE offers a lightweight DE as well as XFCE. About updates, they are done through an update manager. Also, most of distro issue a new version every year.
     
    The switch
    So you’ve finally decided to switch. Your CD is burnt (or your USB key is configured), and you are going to shut down your PC. Please don’t do it yet, there are some matters to be thought about : do you use specific software for your videos? Do you play games? Have you some specific hardware for which installation requires a driver burnt on a CD?
    To answer these questions, you’ll have to do some research on the Web. If you use frequently used software, then it is likely to find a free and/or open-source equivalent on some distro. If you use something like AutoCAD or Photoshop, then you’ll still find «free» equivalent of these on Linux, but they won’t always be as powerful. Furthermore, chances are that Photoshop format will not be compatible with their free equivalents.
    About games, forget about playing Call of Duty, Battlefield or League of Legends on Ubuntu. The Steam Machine is on its way, so gaming will soon be possible and be more and more common on Linux.
    If you cannot separate of your Windows software, there’s still a workaround: Wine. This piece of software allows you to run simple programmes on Linux. It is not guaranteed that everything will work on it, but still, it’s better than nothing. If you depend on a Windows-OS-only software to do your business, I advise you to dual-boot your computer. Then you’ll have and a Windows OS to run your software and a Linux distro to do your things as well. Note that Windows files can be accessed easily from Linux, when the opposite requires you to download software, and mount manually Linux partitions from that software. It is the way most people do when switching to Linux, avoiding all the inconvenience of having data requiring to be backed-up on another HDD.
    Your hardware has come with an installation CD? The best way to proceed in this case is to check if the distro you’re going to install will support it.
    Still, the best way to know if the distro you’ve chosen fits to the hardware is to boot using the CD which is most of the time a Live CD. Live CDs allow you to test the operating system on your computer without changing anything a single byte to the hard disk drive, as every required data is charged into memory. You can then choose to install the OS on your hard drive once you’re satisfied by the OS behaviour on your computer.
    If you decide to switch, take the time to check if you’ve successfully backed-up all your data. We never know if something is going to fail, and to have twice the same data is always better than not having the data. If you can’t somehow migrate your data because you don’t have an external HDD, you can still choose to dual-boot your computer, so you’ll still have access to your data stored on the Windows NTFS (or FAT32, or FAT) partition. You can even choose to install your OS in an external HDD, if you need all the space on your computer HDD for your data. But to boot, do not forget to plug-in the USB key !
    Usually, installation won’t take a long time. To install Kubuntu 12.04, I only needed 50 minutes to format the entire disk (500 GB) and get the PC ready for work.
     
    My personal story
    Because I got fed up by the inefficiency of my (free) anti-virus programme and by Trojans, key-loggers and root-kits compromising personal data security (my credit card number somehow leaked when I made an online purchase on a well-known financial transaction platform), I decided to make the big switch by changing the OS of my 4 year-old laptop computer to some Linux distribution.
    Because I do care a lot about hardware support and user-friendliness, I’ve taken the decision to choose Kubuntu 12.04, first because it is a long-term support version (i.e. updates will be done on this OS for 5 years), and secondly because I am familiar and have positive experience with Ubuntu distros in terms of hardware support.
    I made the switch a month ago by changing my laptop OS from Win7 to Kubuntu 12.04. The most annoying thing I’ve had to face since the switch is (still) hardware support. If your hardware is a little complex, crap happens quite a lot. Before definitely switching to Kubuntu, I tried Ubuntu (unity desktop), Mandriva (now OpenMandriva), Mint, Mageia and Debian. The latter three were unable to support my networking hardware, and (perhaps I have deficient research skills, but…) I found no workaround for it. Same problem for my printer. My connected printer refuses to do its job when I order it, which is quite frustrating to the average user.
    When the switch has been complete, I noticed that Kubuntu – or at least the 12.04 version – has a serious memory leakage problem: kded4 process occupies more and more memory as time passes, and after a week of activity it ‘eats’ up to two gigabytes of memory. The PC then gets slower and slower, making it totally unusable so I’ve had to find a workaround to make the inflation cease. The price of this has been the inability to make the PC sleep, which reveals to be quite impractical, especially when you are working outside without an accessible plug to help your laptop keep the charge.
    Even if Ubuntu support fairly well all the laptop’s hardware, some hardware problems still arise when you don’t expect them: I wanted to make an Ad-hoc connection to a friend’s laptop, but Kubuntu prevented me to do it because of kernel bugs. Also, a friend of mine had a Ubuntu 12.10 version and I was really astonished to see a so unstable Ubuntu version: random errors pop up every 10 minutes! I finally advised him to install another version.
    Despite the lack of hardware support, switching to some Linux distribution is something great, especially when your hardware can’t support the latest Windows version. Also, for people who don’t want to invest tens of euros (or dollars) in an anti-virus solution, it is also a good choice.
    Useful links

  • Ny ho avin’ny teny malagasy

    Amin’izao fotoana izao dia ohatry ny maro ireo mpiteny malagasy kivy ratsy amin’ny fomba fanoratan’ny sasany ny tenin-drazany eo amin’ny Aterineto. Na aiza na aiza aho mamaky dia mahita olona mitaraina mikasika ny tsipelina foana aho, na amin’ny teny anglisy, na amin’ny teny frantsay, na amin’ny teny malagasy. Inona ny nahatonga izany?
    Voalohany indrindra aloha, dia mba aoka aho hanolotra ny tenako (ho an’izay tsy mahay miteny anglisy). Izaho dia mpianatra amin’ny fianarana ho injeniera amin’ny kajimirindra (informatika). Roapolo taona mahery aho, olom-pirenena malagasy, monina eto Frantsa, ary mpandray anjara eo amin’ny tetikasa Wikimedia amin’ny teny malagasy. Amin’ny maha-teknisianina ahy, dia misy fanazavana voalohany amin’io tsipelina tsy manajahaja ny fiteny io.

    Ny ho avy?

    “Tsy haharitra mandrakariva ny fomba famaritana zavatra iray. Tsy haharitra mandrakariva ny anarana amaritana zavatra iray.” hoy i Laozi.
    Tsy diso izy. Efa hita fa samihafa amin’ny teny malagasy ankehitriny ny teny malagasy notenenina tamin’ny andron-dRadama. Ary azo inoana mafy fa ho samihafa amin’ny teny malagasy ankehitriny ny endriky ny teny malagasy afaka 200 taona. Mety hitranga arak’izany ny fahatsorana ara-boambolana ary ara-pitsipi-pitenenana : mety hanisa amin’ny fomba mahitsy (toa ny fiteny tandrefana, na ny fiteny sinoa) isika ; mety handrafitra abidy vaovao izay tsy misy fifandraisana amin’ny abidy efa mahazatra antsika isika. Mety hiova lalina ihany koa ny fomba fanononana, ary torak’izany ny fomba fanoratana.

    Fiantraikan'ny abidy malagasy
    Fiantraikan’ny abidy malagasy vaovao amin’ny tsipelin-teny (fiaviana : boky Lala sy Noro, avy eo amin’i http://hery.blaogy.com/post/4/8301)

    Ao amin’ny Akademia Malagasy izao dia efa misy ny resaka mikasika ny fanovana ny fomba fanoratana ny teny malagasy amin’ny alalan’ny abidy vaovao (adihevitra nandraisako anjara teo amin’i serasera.org, tamin’ny anarana Tibao). Hisy taratasy hivoaka atsy ho atsy momba izany, ka raha mipetraka ao Antananarivo ianao dia mety hivoaka amin’ny robrika “kolontsaina” amin’ny gazety izany. Raha entin’ny finiavan’ny vahoaka miteny malagasy sy ny Fanjakana vonona ny hamerina ny fanagasiana izany dia mety ity no ho endriky ny andininy voalohan’ny Fanambarana Iraisam-pirenena ny Zon’olombelona :

    Andinini I : Nateraka hu afaka si hituvi ni ulumbeluna rehetra na eu amin’ny zu na eu amin’ny hasina. Sami manan-tsaina si fieritreretana ka tukuni hifampitundra am-pirahalahiana.

    Tsy hanova ny tsipelin’ny teny tsy misy ny litera “y” sy “o” sy “j” io abidy malagasy tadiavina ny hanolo ny abidy taloha io, satria hivadika “dz” ny “j” rehetra amin’ilay abidy. Torak’izany koa ny “y” hivadika “i” na aiza na aiza, ary ny “o” hivadika “u”. Mitovy amin’ny mahazatra ihany ny fanononana, fa ny fomba fanoratana ny teny fotsiny no ovaina kely.

    Efa miova ny fomba fanononana

    Tokony efa nahatsikaritra izany ianareo : inona ny mampiavaka ny teny tsidio ary tsidiho ? Tsy misy. Samy mitovy ny fomba fanononana ireo teny roa ireo, satria tsy tononina intsony ny /h/. Ity indray mbola misy hafa : inona ny mampiavaka ny teny may sy mahay ? Ho an’ny ankamaroan’ny olona, dia tsy misy. Fa ho an’ny olona sasany, dia misy ilay izy, na dia kely fotsiny aza. Ohatra fahatelo : iza no tena marina: mande sa mandeha? Raha “mande” no voafidinao dia diso ianao, na dia /mande/ aza no fanononantsika ilay teny “mandeha” ankehitriny. Ny fanovana kelikely ohatr’izany no mahatonga ny teny mivoatra. Ary mety ho tian’ny olona izany, na mety tsy ho tiany ihany koa.

    Fanatontoloana ary ny fiantraikany amin’ny tenin-drazana

    Manana ny fiantraikany ihany ka ny fanatontolona amin’ny teny malagasy. Ny mahatonga ny tanora ohatra anay miteny vary amin’anana amin’izao fotoana izao dia anisan’ny vokatr’io Fanatontolona io ; manomboka avy any amin’ny fara tampon’ny Fanjakana ny ohatra, ka haninona ny mponina no tsy hanaraka?
    Raha amiko manokana dia fahadisoana lehibe ny famelàna ny teny frantsay ho anisan’ny teny ofisialin’i Madagasikara tamin’ny fandrafetana ny Lalam-panorenan’ny Repoblika fahaefatra. Tsy tanteraka ny fiovàna lehibe nampanantenain’ireo mpitondra, ka mbola hahazo alalana ofisialy ny “hiteny vary amin’anana” foana ireo mpiasam-panjakana sy ireo manam-pahefana izay ekena fa miezaka ny mampiasa teny malagasy madio amin’ny kabary sy tatitra ofisialy. Tsy vao androany no misy an’izany, ary efa nitranga 15 taona lasa izany arak’ity kabary manaraka ity :

    “Ni-inviter anareo Tantsaha iray génération amiko aho, mba hiara mi-étudier ity projet de développement nanaovako étude plus ou moins approfondie ity. Mi-concerner ny région-tsika mantsy izy io, ka tokony hi-sensibiliser-na antsika masse paysanne amin”izao période de décollage économique izao.
    Ny cohésion sy ny participation de tout un chacun anefa no tena primordiale amin”izany, ka tsy ny élites sy ny intellects ihany no ho décideurs sy piliers amin”ny réalisation sy ny exécution-ny, fa ny population active sy isika izay avenir-n”ity tany ity. Raha tsy izany, tsy avotra izao situation déplorable sy catastrophique izao. Mila straté ;gie sy plan triennal bien au point anefa izany fa tsy azo atao au pif fotsiny.
    Izany no ni-convoquer-na antsika exprès amin”ity réunion de sensibilisation ity. Isika rehetra sans exception izany no samy hilaza izay point de vue-ny avy, vis-à-vis de ce projet. Tout au moins, izay mba suggestion na solution intermédiaire sy adéquate any an-tsainareo any.
    Misokatra àry izao ny débat sy ny discussion ka tout le monde peut prendre la parole. Merci !”
    — Ny Teny Malagasy : Fanentanana ny Fon’ny Vahoaka amin’ny Fampandrosoana  (25 Jona 1994)

    Kabary tena natao hanentana ny tantsaha teny ambanivohitra io kabary nodikaina an-tsoratra teo ambony io tamin’ny 1994, ka mazava ho azy tsy nahateny inona na inona izy ireo, izay tsy nahazo teny frantsay. Kanefa teny “frantsay” hatramin’ny teny famaranana no nampiasain’ilay tovolahy mpandaha-teny. Ka rehefa avy nangina sy nifampijerijery ny iray trano dia ity no nolazainy tamin’ireo tantsaha :

    “izao ihany ilay mampi-décevoir ny technicien aminareo, fa tsy capable hi-saisir ny occasion profitable sy favorable tahaka itony,” — Ny Teny Malagasy : Fanentanana ny Fon’ny Vahoaka amin’ny Fampandrosoana (25 Jona 1994)

    Mety hankarary ny sofin’ny sasany ny kabary tahaky ny teo ambony, ary mino aho fa tsy mety nanao kabary tamin’ny tantsaha intsony ilay rangaha diso fanantenana tamin’iny tsifahombiazana iny. Vao nivoaka tamin’ny fangasiana  ny Malagasy tamin’izany fotoana izany, ka naverina ho teny frantsay indray ny teny enti-mampianatra. Na dia izany aza, dia azo lazaina fa diso kajy ihany ilay mpandaha-teny satria dia noheveriny fa nahay teny frantsay hoatr’azy daholo ny iray trano.
    Amin’izao Fanatontoloana vokatry ny Aterineto ary ny fifanakalozan-kevitra iraisam-pirenena izay atao amin’ny teny anglisy izao, dia manomboka miditra ao amin’ny teny malagasy (indray) ny teny anglisy. Saika tsy misy fanagasiana ohatry ny tamin’ny taloha fa tonga dia miditra manta tsotra izao : crowdsourcingcloud computingoutsourcingmanager, sns.

    Lamaody vaovao

    Hoy ny sasany : “Andrianiko ny teniko, ny an’ny hafa koa feheziko”. Teny malagasy ihany no misy ao Madagasikara, ka raha manomboka miteny vahiny ianao dia ohatry ny lazaina hoe manam-pahaizana be. Zava-dehibe amintsika malagasy ny fahaizana, ka izany no mahatonga ny lamaodin’ny sasany “miteny vary amin’anana”. Azo lazaina fa tafaverina tsikelikely amin’ny fampiasana teny malagasy madio indray isika malagasy amin’izao fotoana izao, na dia mbola tsy ampy tsara aza ny ezaka. 
    Inona ny hevitrareo mpamaky mikasika ny ho avin’ny teny malagasy?

  • Volan’i Janosy, mandinika ny efa lasa ary mijery ny ho avy

    Rehefa mandroso ny taona dia mahatsapa aho fa hoatry ny mihahaingana hatrany ny fandrosoan’ny fotoana. Efa lasa sahady ny taona 2013. Tsy hitahita akory dia efa folo taona mahery izay no nialako an’i Madagasikara. Roapolo taona tamin’ny volana Jolay aho. Inona sisa no tavela tato an-dohako tamin’iny taona 2013 iny?
    Andao ary isika hanomboka amin’ny fihatombohany. Efa mby hifarana ny taona 2012. Taiza aho? Lasa nanao reveillon niaraka tamin’ny raiamandreny tany amy tanàna izay tsy fantatra intsony ny anarana. Tany aho no nankalaza ny fahatongavan’ny taona 2013.
    Fara andro tsy tanteraka
    Tamin’ny volana Jolay 2012, nanoratra lahatsoratra momban’ny fara andro izay noheverina fa hitranga tamin’i 21 Desambra 2012 aho, tamin’ny teny anglisy. Na dia betsaka aza ny zavatra nalefa teo amin’ny fahitalavitra momba io daty io, dia kely finoana ihany aho, fa na dia izany aza, tsy nanakana aho ny hanoratra scenario ho an’io daty io aho : iaraha-malala koa fa tena henjana be ny toe-javatra ara-toe-karena tety Eoropa tamin’izany fotoana izany. Tsy izay irery no tena nanaitaitra fa nisy koa ilay vinanin’i Web bot, izay mambara fa ho fara andro ny 21 Desambra 2012. Tamin’ny voalohany dia teo amin’ilay lahatsoratra teo aho no namoaka ilay hevitra ary nanitratra azy teto amin’ity blaogy ity. Rehefa dilatra ny 21 Desambra, dia tsy hay intsony izay vaovao momban’ilay Webbot izay milaza fa maminany fara andro izay.
    Miofana an’asa
    Rehefa tonga kosa ny 2013, ary tsy nisy zavatra manokana nitranga. Marina fa betsaka ireo olo-nalaza maty. Tamin’ny faran’ny volana Janoary, nitady orinasa hiofanana aho, ary nahita tapa-bolana taorian’izay. Orinasa miasa amin’ny kajimirindra izay orinasa izay ary ny tena ataony dia mamorona rindrankajy hanaovana sary ary hanaovana piesy mekanika. Na dia mahay manao fandaharana informatika aza aho, dia tsy mora ny zavatra niandry ahy tany.
    Nandritry ny roa volana sy tapany, dia manao 80km isan’andro aho mandeha lamasinina. Na dia betsaka aza ny zavatra nianarana, vizaka ihany ny vatana rehefa ela ny ela. Rehefa vita ny fanolorana am-bava ny tatitra (frantsay: soutenance) dia “fialan-tsasatra” no teny voalohany tonga tato an-tsaiko : tapa-bolana aho tsy nanoratra fandaharana na inona na inona amin’ny fiteny fandaharana C na Python na amin’izay fiteny rehefa voafehiko. Tsy nahaforona lahatsoratra teo amin’i Wikipedia koa aho fa tena leo niasa.
    2013-2014, ilay taom-pianarana saika fotsy
    Rehefa nifarana tokoa ny taona ary rehefa azo ny mari-pahaizana dia nangataka fianarana hanohizana ny fiofanana. Tao amin’ny Oniversiten’i Versailles, ary tao ihany, aho no nametraka kandidatiora. Tamin’ny volana Septambra vao nivoaka ny valin’ny kandidatiora : voalà – tsy mahazo miditra. Ny fianarana amin’izany efa miditra daholo, ary vitsy ireo fianarana mbola mandray mpianatra. Notetezina daholo ireto fianaran’i Île-de-France, fa mbola tsy nahita foana. Na dia izany aza, nametraka kandidatiora tao amin’ny fianarana ho injeniera aho, fa kely finoana ihany aho tamin’iny satria efa niditra ny mpianatra.
    Nisasaka ny volana Septambra ka mbola tsy nisy namaly ireo fianarana hafa nametrahako kandidatiora. Na dia izany aza mbola nisy ihany ny fanantenana satria efa fantatro fa ho BAC+3 aho farafaharatsiny (nametraka kandidatiora tamina lisansa profesionaly aho, ary fantatra fa ho voaray na inona na inona mitranga). Kinanjo tonga ny valin-kandidatiora napetraka tatsy amin’ilay fianarana ho injeniera : voaray tao aho.
    Nanomboka telo andro taorian’ny valin-kandidatiora ny fianarana ary tsy nisy fianarana tokony nenjehina.

  • African language Wikimedia projects summary

    A few months ago I wrote an article which summarises my history on the Malagasy Wiktionary, and more generally my history on Malagasy language Wikimedia projects.
    I am back here to write a short summary recapitulating the current progression of African language WMF projects. In this article you’ll learn about the current stage of African language projects and their trend.
    In terms of community size, the biggest African-language community is the Afrikaans language Wikipedia community; followed by Egyptian Arabic speaking community and Swahili speaking community.
    If we look closer to the statistics. The award goes to the Afrikaans language Wikipedia community which has 7 to 8 very active contributors (performing more than 100 edits per month).
    The Egyptian Arabic Wikipedia community counts 2-3 very active contributors, which is big for an African language but very small comparing to Standard Arabic community counting more than twenty times more active users (83 very active users in June 2013), most of them being Egyptian contributors.
    About Swahili, the number of very active users is one to two. On a 2-year term, this number can be averaged to 1. But the number of active users (i.e. making more than 5 edits per month) is 9 in average, which is a fine thing for a language that is spoken in countries where internet access is quite hard.
    These numbers were obviously averaged from July 2011 to June 2013, so it smoothes short-term variations.
    In terms of raw article size, the biggest African language Wikimedia project is the Malagasy Wiktionary – which currently counts 2.5 million articles, only smaller than English and bigger than French! – , the Malagasy Wikipedia (40,000+ articles) and the Yoruba Wikipedia (30,000+ articles), followed by the Afrikaans and the Swahili language Wikipedias (respectively 27,000+ and 25,000+ articles).
    The Malagasy Wiktionary balecame very big for reasons you can read here, the Malagasy Wikipedia is big thanks to geography articles (~20,000 articles) and celestial objects (~8,000 articles); the Yoruba Wikipedia is made big by articles about people and also celestial objects (~15,000 objects).
    Many Wikimedians who consult the statistics should know that the number of content pages does not determine the quality or the comprehensiveness of an encyclopedia. Judging wikis by article count is like judging a book by the appearance of its cover. And many book readers and critics know that looking at the cover is not enough to judge a novel. Here, by its raw size, the Malagasy language dominate in the two biggest projects (Wikipedia and Wiktionary) but that doesn’t mean it has a very active community.
    To judge about the quality, comprehensiveness and completeness of the articles of such wikis, it is better to dive into this kind of statistics where scores are given by the absence/presence of vital articles and the size (number of characters) of such articles (if they exist). That kind of statistics are better than article count and page depth which can be inflated by the use of bot and the generation of tons of non-article pages (talk pages, subpages, redirects…).
    According to the List of Wikipedias by sample of articles, the best scored African language Wikipedia is the Afrikaans Wikipedia, which ranks 58th and the Swahili Wikipedia (79th) followed by Egyptian Arabic, Yoruba and Somali Wikipedias. Malagasy Wikipedia is quite far behind and ranks 155th which is only higher than Lingala (161st), Wolof (175th) and Shona (187th) Wikipedias having less than 5,000 articles. Which means article count is only the cover of the book and thus some efforts have to be done there to make Malagasy Wikipedia more comprehensive.
    What about the trend?
    Less than a year ago, some Wikipedias found a way to grow in number of article thanks to species databases. The first ones I saw to grow this way are Winaray and Cebuano Wikipedias. Winaray Wikipedia gained 100,000 articles primarily thanks to low quality geography stubs (consisting in one or two sentences), and secondarily thanks to articles about species, animal and vegetal ones, making it to have 510,000 articles. Cebuano has more than decupled in article count within the last 50 weeks, from 40,000 to more than 500,000 articles. This mania of creating article about species has propagated to Swedish and Dutch Wikipedia which has recently surpassed the German Wikipedia, and in response to that, the latter Wikipedia seemed to have boycotted the Dutch Wikipedia, by deleting the link to the Dutch Wikipedia in the German language Wikipedia main page.
    Now let’s write about the growth trend of African language Wikimedia projects. First off, let’s talk about Wikipedias, then Wiktionaries and finally other «minor» Wikimedia projects.

    Wikipedia language edition

    Current article count

    Growth (in 300 days) (1)

    Malagasy

    40,619

    +2,415

    Yoruba

    30,624

    +582

    Afrikaans

    27,801

    +3,928

    Swahili

    25,368

    +1,232

    Amharic

    12,722

    +1,015

    Egyptian Arabic

    10,764

    +1,939

    Somali

    2,830

    +383

    Lingala

    2,035

    +118

    Kinyarwanda

    1,816

    +7

    Kabyle

    1,517

    +778

    Wolof

    1,172

    +49

    Kongo

    826

    +135

    Northern Sotho

    688

    Igbo

    739

    +44

    Zulu

    586

    +22

    Setswana

    496

    –1

    Bambara

    392

    +6

    Siswati

    368

    +6

    Ewe

    302

    +12

    Hausa

    291

    +17

    Oromo

    276

    +36

    Tigrinya

    259

    +2

    Tsonga

    250

    +7

    Sango

    204

    +17

    Kirundi

    192

    +8

    Sesotho

    189

    +44

    Akan

    179

    +17

    Fulfude

    166

    +12

    Luganda

    166

    –2

    Twi

    157

    +12

    Chamorro

    157

    +6

    Xhosa

    151

    +10

    (1) Calculated following this site, data retrieved in July 26th 2013.
    On Wikipedia, the growth is slow comparing to other languages spoken in developped countries, where Internet access is easy and unexpensive to the normal citizen. The African language with the biggest community grows at approximately 5,000 articles per year, which is fairly high comparing to Swahili which growth is almost twice lower. If the current trend continues, the Afrikaans Wikipedia will surpass the Yoruba language Wikipedia next year, and the Malagasy Wikipedia in the next 2 years, as the two current biggest Wikipedias are stagnating in article growth.
    On smaller Wikipedias, the trend is positive, though slow. All open Wikipedias have more than 100 articles.
    The biggest of them is the Malagasy Wiktionary which has its growth kept by the use of Bot-Jagwar. Owned by myself, Bot-Jagwar runs from the Cloud, so it works regardless my computer and my internet connection’s healths. Thanks to it, the Malagasy Wiktionary gains 300 to 500 content pages daily. Automations eases many things in many ways, but automated processes can fail. So I have to keep an eye not only on the source code but also to entries generated thanks to that source code.
    African language Wikipedias are slowly but surely gaining articles as time passes. There seems to be a moratorium in closing African language Wikipedias, and this is fine because languages mainly spoken in developping countries need time to develop a community. Furthermore, the official language in these countries, especially African ones, are very often not the local language.

    Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050.
    Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050. What about having billions of “virtual” contributors on Wikipedia in 2050? Source (kraxinglogic.com)

    An increase of bot-made articles (which constitute nowadays 20% of articles created in Wikipedia) can indicate that in a near future, perhaps in 25 or 30 years, a bot will be able to write article like humans do. This is because Ray Kurzweil predicts the ability to simulate the human brain to be possible in twelve years and that current computers’ calculation power were supercomputers’ in the 1990s.
    What about me? Well, it’s been a while since my last big article on the Malagasy Wikipedia. And according to the list of Wikipedias by sample of article, several hundreds of article needed in all Wikipedias are missing, so my first goal for Wikipedia is to fill these gaps, slowly but yet surely. I prefer contributing about geography, but as I am the only contributor of the Wiki, I have to fill gaps a bit everywhere : Biography, Chemistry, Sports, etc. At that pace, I can barely create three or four articles per day. At that pace, I can fill the 1,000 articles that every Wikipedia should have list whithin the year.
    It’s been a while since the last time I blogged in Malagasy, So this article will be followed by a Malagasy language article. Perhaps a translation of this one, perhaps a new one.
    Useful resources
    To read further about what’s mentioned here.

    1. The law of Accelerating Returns by Kurzweil
    2. http://www.wikistatistics.net for all statistics about Wikimedia projects

    [polldaddy poll=7298306]

  • My story on the Malagasy Wiktionary

    It’s been a while since I posted on this blog. This article is about the mass adding content on the Malagasy Wiktionary. The object of this post is to provide some explanations on why and how the Malagasy Wiktionary has become so big.
    But first, allow me to introduce myself. My nickname on all Wikimedia projects is Jagwar. I am a Wikimedia contributor since August 2008, and I am going to be 20 years old soon. I speak Malagasy as mother tongue, French as a second language and English as a foreign language (soon the third language, since it is not quite perfect yet…).

    When I discovered perfectly randomly the Malagasy language version, the wiki was virtually dead, with no one adding interesting content, and an active community mainly constituted by non native speakers. Without any knowledge of the rules of the wiki, with almost no knowledge of how to correctly write Malagasy, I began an article. It grew up to 20,000 characters, making it to be the biggest page of the wiki at that time. Bust unfortunately (or fortunately, for the sake of readers), a non-native speaker administrator spotted the lack of notability of the article, leading it to be deleted.
    I could leave the wiki, as tens of hours of work had literally vanished of the wiki… But I didn’t, I still cannot figure out why, but deeper in my mind, a little voice told me to continue contributing. At that time, the Malagasy Wikipedia counted 550 articles, maybe less, but not more.
    So I continued on this way for a while. To help me in my task I wrote to potential volunteers. These people didn’t see the point to contribute to a wiki in their mother tongue: either they were unable to spell correctly Malagasy words, or they didn’t have time enough to do good work; while others required money to start contributing (times are hard in Madagascar, I know), and even with money, I am not sure these ones will stay long once the money paid.
    In October 2008, I discovered Malagasy Wiktionary. At the beginning I actually didn’t know what to do out there, so I continued to work on the Malagasy Wikipedia just to become more skilled and used to write Malagasy.

    In July 2009, I was on vacation to my fatherland: Madagascar. I have taken this occasion to learn more deeply the written Malagasy language, though my means were quite limited: reading newspapers, the Bible (I am christian), watching news broadcasts on TV as well as on Radio… I almost forget French (!), though it was present almost everywhere as second official language.
    When back to France, I have decided to incite potential volunteers that are able to write to contribute on the Malagasy language Wikimedia projects: but you know, Madagascar was in crisis and people sometimes asked for money to contribute: other blamed me on my spelling mistakes, and others simply ignore the request. I had less and less time to dedicate to the projects and I have no money to give this way. One day, I decided that I couldn’t wait anymore for someone to arrive: the progress of my skills in Malagasy, in programming languages, and the promise of a very busy future (inducing a chronic lack of time) mentally forced me to do something, to do something for my mother tongue, even a tiny little thing.

    In 2010, when I could write in my mother tongue without too much spelling mistakes, I started to write bots. Once they are written, I ran them at the very full speed: fifty thousand edits per day: that was the pace, the normal pace. At the beginning it was the importation of foreign language wikis from other wikis, and it consisted mainly in importing verb forms, first through an import form, and after through a script that copy-pastes other wikis’ content pages to the Malagasy Wiktionary equivalent page. I went slightly at the beginning, but I did it more and more often, till the wiki got 200,000 content pages. On these possible copyright-infringing importations, I received a warning from a user that almost got his mother tongue wiki closed due to the creation of thousands of useless pages.

    In 2011, I got mad: after discovering the astonishing easiness of Volapük, I wrote a script to upload the word forms of that language. At full speed – i.e around 50,000 edits per day – three weeks were required to make the Malagasy Wiktionary the third biggest Wiktionary of the world. But months passed, and no one, absolutely no one, did contribute: one day on the wiki, the number of active users dropped to two, for a wiki that contains 1,19 million content pages (in comparison, the German Wikipedia which had a comparable article count, didn’t count less than 25,000 active users) !

    On July of the same year, a new script has been written. That script allowed to create translations based on foreign language entries. With that script, up to 5,000 articles were created, and they mainly concern lemma entries. Just a few weeks later, the import of all Malagasy words has been completed. But its repercussion on article count was not visible due to the mass deletion of Volapük language entries. Why this mass deletion? Because many entries seemed to be wrong as they are not conjugation of verbs, but nouns (-.-‘), so the decision is taken to delete them all to re-create them later, with a better quality if possible. Since then, my activity on the Malagasy Wikipedia is put in brackets to dedicate my whole wiki time to the renovation of the Malagasy Wiktionary.

    During the summer vacation, I took the time to restructure the Malagasy Wiktionary. The article, category structure were inspired by the structure of the French Wiktionary: use of template for languages, parts of speech, allowed the Malagasy Wiktionary entries to be automatically categorized through the use of templates. Time passed and the routine started to install.
    One night, I discovered an online Malagasy monolingual dictionary. Having no idea about the copyright-ability of the content (the copyright seemed to apply only on design), I decided to reuse the content on that dictionary to complete the entries on the Malagasy Wiktionary. The problem arrived just a few weeks later, when I received a mail from a Wikimedia Foundation staff member. P. Beaudette. In its mail, he asked me the origin of the Malagasy language entries, I answered they were from various bilingual dictionaries, and the online monolingual dictionary… An copyright infringement investigation was led and my bot was blocked during the whole process. At the end of it, I was told by the staff member to remove the 30,000 entries that infringe the original dictionary’s copyright, which was done.

    After this copyright infringement episode, I decided to orient my contribution in adding Malagasy language content to other wikis. But before that, I did some work on the Fijian and Tagalog Wiktionaries, that was more or less appreciated… There was in particular an IP address checking my contributions on the Fijian and Tagalog Wiktionaries. This IP told me to stop mass-adding content to these languages of which I speak no word. I ceased to work on both wikis a few weeks later, as the work is finished.

    But this mass-adding content, especially in language I didn’t speak at all, seemed to annoy people that have decided to discuss about the case on MetaWiki forum. No concluding results was given, and things were as they were before.

    With most of the hard work being removed, with a behaviour that has been reproved by many users, I decided to take a break of indefinite duration. It actually lasted 5 months, during which I tried to work on my written Malagasy outside Wikimedia projects. The progression of my skills, spelling as well as programming skills, were honourable, allowing me to go back again and make the Malagasy Wikimedia projects, and especially the Malagasy Wiktionary, evolve again. In July 2012, I built a new tool that allows me to know the non-exising entries/pages on the Malagasy wiktionary by consulting the daily online newspapers. Only two newspapers are currently supported, because of their use of RSS feeds. But the ability to make the script read non-RSS supporting websites is coming soon.

    In September, I have developped a new, improved translation retriever that allows the script to get all translations of all languages on a given page (the previous version could only translate one language at once), which almost decuples the translation harvest. This function is embedded in a XML dump reader that ampifies the efficiency of the script: fast translation retrieving and no requirement to be connected to the server while processing. Done every month, the dump processing and uploading make the wiki to gain more than 100,000 lemmata in a few months. These lemmata may have translation errors, but it is low enough not to be taken in consideration (<1%). Hardest cases can be resolved by a single check on the source wiki (which is indicated by a template).

    In October, I have thought about building a bot that completes a task as scheduled by a parameter file. This is particularly useful for maintaining list of wikis up-to-date. Currently, the pace at which the list of Wikis on the Malagasy Wiktionary is four times a day, i.e every six hours.

    At the end of January 2013, I thought about a more efficient use of the translation retriever that I wrote a few months ago. Then comes the IRC bot: it retrieves in real time all the edits made on selected wikis and does its possible to translate the latter entry in Malagasy,  in real time! The first time it was developped, it only used the traditional translation retriever, but later, on March, it also features a basic entry processor that allows the IRC bot to also translate entries in foreign languages into Malagasy, using the same dictionary. This latter version of the IRC bot is currently in use, and it creates hundreds of entries and content pages on the Malagasy Wiktionary everyday. I have no precise idea about the error rate but I am pretty sure it is less than 5%. The positive side of the bot is its ability to keep the pace when several edits are made in a minute, nevertheless, as it requires to be online and to be connected to Wikimedia servers, the processing frequency is limited to one page per second. Something is being thought on allowing the bot to process more pages.

  • Dikanteny anglisy-malagasy

    Rehefa avy namboatra diksionera aho dia lasa namorona mpandika teny iray.
    Io mpandika teny io dia mandika ny teny anglisy amin’ny teny Malagasy nosoratana amin’ny alalan’i PHP izay mampiasa fisie *.txt ho an’ny rakibolana ary ny parser.
    Efa ora maromaro izay no laniko nanamboarana io mpandika teny io, efa vita hatry ny ela ny dingana voalohany (izany hoe dikanteny mot-à-mot), kanefa tsy dia maha-afa-po loatra ny fehezanteny avoakany amin’ny teny malagasy. Manomboka manaketrika ny dingana faharoa aho kanefa mbola tsy maha-afa-po foana ny valiny. Ity misy ohatra iray:

    I am to speak of the American Vandal this evening, but I wish to say in advance that I do not use this term in derision or apply it as a reproach, but I use it because it is convenient; and duly and properly modified, it best describes the roving, independent, free-and-easy character of that class of traveling Americans who are not elaborately educated, cultivated, and refined, and gilded and filigreed with the ineffable graces of the first society. The best class of our countrymen who go abroad keep us well posted about their doings in foreign lands, but their brethren vandals cannot sing their own praises or publish their adventures. (avy amin’i The American Vandal Abroad nosoratan’i Mark Twain)

    nodikain’ilay mpandika teny :

    izaho manafatrafatra amin’ny ny ny amerikanina firintsy ity hariva izaho anefa faniriana hono amin’ny amy dingina izaho ilay tsy akory manao mampiasa term ity derision amy na mampihatra toy azy fananarana izaho anefa mampiasa fa azy azy manavanana ary duly ary properly modified faratampony azy describes ny dia mahaleotena free-and-easy fomba ny ilay class ny traveling American izay tsy akory elaborately avara-pianarana cultivated ary Showing or having good feelings or good taste. ary Endrika efa lasa an’ny matoanteny gild ary Having filigree ornamentation miaraka miaraka amy ny ineffable graces ny ny aloha fikambanana . faratampony ny ny class -ay andao izay mibodo andafy fantsakàna us mikasika posted doings their vahiny amy lands, their anefa vandals brethren manara-bava cannot manana their na praises their publish sendrasendra

    Raha jerena ny resaka fitsipi-pitenenana ary ny filahatry ny teny ao anatin’ny fehezanteny, dia mbola lavitry ny afo ny kitay. Efa misimisy ihany anefa ny ezaka amin’ny famadihana ny toeran’ny mpamari-toerana ary ny anarana iombonana, araka ny hitantsika anatin’ity fehezanteny ity: “that beautiful woman is my wife” (vadiko ilay vehivavy mahafinaritra iny) izay nodikainy hoe “ity vehivavy mahafinaritra -ko andefimandry”.
    Ilaina fantarina ihany koa fa dikanteny iray ihany no fidian’ilay mpandika teny ao amin’ny rakibolana. Ka arak’izany izy io tsy miraharaha ny polisemia ananan’ny teny. Ny olana faharoa amin’ny fampiasana fisie dia ny filàna manokatra azy isaky ny dikan-teny, ka rehefa mahery ny 2 megaoktety ny totalin’ny haben’ny fisie rehetra sokafana dia miteraka hadisoana ilay dikanteny, izay manakana azy hanao ilay dikanteny eo amin’ny efa-joro. Ka ny fameetrahana banky angona no mety hahavaha izany olana izany.

  • The End of a World : 21 December 2012

    Well, it is a quite current subject, as I can find many things about it just through making a search on Google. So after having been captivated by this date for a few days, I will give you a probable version of it. Because I think something will really happen on December 21, or not.

    My scenario of this 21st December is a digital holocaust (I have found no satisfying term to describe it) making mechanically other events happen: one country, which has build a defense against other country’s attack is attacked. This system strikes back by attacking the country from which the attack has been lead, leading the Internet connection to be overflooded and thus unable to complete anything for hours. But the system has a side effect making it strike another country. The latter attacks but due to side effect, other countries are also affected. At each attack the entire bandwidth of a country is affected to “war effort”.

    Sure, during that date, many things will happen: many sects on this world will commit collective suicide or occupy massively a place that is not made for. Many nations will have their Internet connection cut, so they can not be aware of what’s happening, even in their own country. Most people will be unable to telephone due to the Net War. Everyone will be disconnected to each other, and breaking news won’t have many readers.

    A financial crisis due to internet shutdownCut to the rest of the world, the markets of London, Paris, Tokyo or New York will go mad: the price of precious and rare raw materials is increasing, making the insecurity growing: you will likely to be killed for detaining on you precious items such as telephone, or golden artefacts (rings, necklaces…). In response to this, the government will increase the number of policemen operating in cities, but this has quite huge repercussion on the State’s budget, making national debt increase. Debt increase is not appreciated by the markets, and already very fragile States will be going to lack money to pay all their civil servants. To avoid this, taxes will be increased in order to have some funds to make the State and all its institution work… temporarily.

    Tax increases are never appreciated, even more in economic crisis. In fragile countries (Spain, Greece, Portugal…), the situation will turn into civil war. As money got from taxes goes decreasing. As there is no money anymore to pay them, civil servants are fired, and become officially unemployed. Actually many people will do unreported employment. For the few people still working reportedly, dissatisfaction towards the government policy will grow, leading to more and more frequent and violent demonstrations. To be able to contain demonstrations, the police will become more and more violent, making demontrations become carnage and bloodshed. But due to markets holding governments by their balls tax will increase again and again. But without any effect as already more than 80% of people have at least one unreported work. This is a great loss for taxes, so these 80% will be stalked by the government, leading to civil war.

    Overflooded by stalking workers trying to survive, and by robbers breaking everyone’s home, policemen no longer grant security in cities. So more and more people leave these cities for the surrounding countrysides. People will be organised in small communities, with their own militia to defend the community’s property. The State no longer exists. It is the end of a word: the one we knew before.

  • Search on Google using Python scripts

    What about a free unlimited Google API? In the past, Google provided such thing, but it is definitely deprecated (due to abuses?). The new Search API needs money ($5 for 1,000 queries), and the free API has a limited use of 100 queries per day. Without any money, you won’t get far. After getting that information. I let down that project… Until I contribute to Wiktionary!

    Extracting words from Malagasy daily newspapers to Malagasy Wiktionary weren’t actually an easy thing to program. At the first version of the script. It only can parse RSS feeds, and is very slow compared to what I used to know. It is because it loads approx. 400,000 words at each launch.
    While doing that work. I have noticed that there are a plenty of words that are actually compounded words.This notice gave me an idea: anticipate through looking on google search whether the word exists or not: because on 1,300 roots contained on the Malagasy Wiktionary, I can potentially make 1.7 million by combining two nouns,  2.2 billion with three, and likely 2.8 trillion using four roots. That is enormous, and even at full regime, I will never be able to look for them all: at 5 queries per second (fastest rate I’ve ever had) it will take respectively 4 days with 2 roots, 14 years with three and eventually 177 centuries (17,700 years) for four roots. This is the first reason for which I have decided to try hacking Google Search to see if the word combination has already been used.

    First, I looked to the page source, and it is very, very complicated to understand. I even think that this page was made by bot as html tag names are not written in a human language. I also have tried to use the URL but it is actually very, very long, with characters that look more like hashes and keys (?), not findable as they don’t explicitly appear on the main page form. At first sight, this kind of project is likely to fall…

    I have found on the Web a post describing how to use the Google Search without any API. But there was a problem: the discussion is almost three years old. And when downloaded, the search engine has visibly been changed: it is very probable that a Google employee reported that discussion leading the company to take adequate measures. When I ran the script, all I could see was that there was nothing operational: no results were given when doing any search. I still keep an eye on the downloaded script. And I am trying to find something which can solve this problem. This script just avoided me to spend hours and hours reinventing a (square) wheel.

    Once this problem is solved, at least temporarily, the source code will be released on SourceForge: Bot-Jagwar. It will rapidly fall into deprecation, so if there are peoples willing to update the script. They’ll be welcome :).

  • Cleverbot talking to itself : meditation of a bot.

    Recently I have written a program in Python in order to observe the “meditation” of Cleverbot, you know, the chatbot who has supposedly passed the Turing test (at 59%).

    To make it meditate and to distinguish who asks question and who answers, I have put n scene two virtual persons, talking to each other. “They” mainly use English language in their discussions, but sometimes, and for an unknown reason, “they” do discuss in a foreign language (Spanish, French, Polish, Turkish…) and they finally use English back.

    The script is totally debugged and the bot is meditating now for almost 30 hours. Data is still being collected. And at this time, I have reached more than 12,000 messages. Among these, we can sort out periodic message types like the quasi-perpetual “Why-because” (which has come back a couple of times). We can also discover among these messages that the bot writes what he’s doing with his (her?) collocutor. And  there are also typical questions such as :

    • asking how old is the collocutor : you can remark that each time the first character asks the other’s age, the response given is different. Which is quite normal as the text written down was written by someone else before.
    • asking for the collocutor’s name : same remark as above. Always a different names. I have discovered that they also may give the same answer (“my name is Cleverbot”)… Which make the two characters totally confused.
    • asking if the collocutor believes in God.

    To give you an example of a meditation of Cleverbot, here is a random example picked up from my dump.

    (Extract from line 4,990 to 5,077)

    Menintsoa : I don’t know!!!!
    Jaona : You’re laughing at me.
    Menintsoa : No i’m laughin at that clown.
    Jaona : Oh! Where?
    Menintsoa : England.
    Jaona : Where in England?
    Menintsoa : In the UK.
    Jaona : Where is the UK.
    Menintsoa : The UK is in Europe.
    Jaona : It is not in France.
    Menintsoa : Yes.
    Jaona : Where is that guy?
    Menintsoa : Where’s what?
    Jaona : Where is your proof that I lied.
    Menintsoa : I didn’t say that you did lie.
    Jaona : Yes but you said that you thought I spoke German.
    Menintsoa : Do you speak German?
    Jaona : No.
    Menintsoa : Are you sure?
    Jaona : No. But I wish I was a computer.
    Menintsoa : But you are!
    Jaona : Why do you think that I am a computer?
    Menintsoa : Because it says cleverbot. And you are on a computer.
    Jaona : Is Ashley a computer?
    Menintsoa : I am a computer.
    Jaona : I thought so.
    Menintsoa : I know what you thing.
    Jaona : Ahja.
    Menintsoa : Ja.
    Jaona : That’s German, you dunce.
    Menintsoa : And Dutch.
    Jaona : Hoe get het?
    Menintsoa : Goed.
    Jaona : Waar woon je?
    Menintsoa : I speak German, not Dutch.
    Jaona : Hoeveel is 5 4.
    Menintsoa : 14.
    Jaona : Nee 2.
    Menintsoa : Monty python?
    Jaona : Yes.
    Menintsoa : Do you know what Monty Python is?
    Jaona : A movie.
    Menintsoa : What movie?
    Jaona : Horror.
    Menintsoa : I love horror movies.
    Jaona : Me too.
    Menintsoa : I love you more , babe.
    Jaona : Impossible, I love you so much more.
    Menintsoa : I love you way more!
    Jaona : No, I love you more!
    Menintsoa : I love you *kisses*.
    Jaona : I love you too baby. *kisses*.
    Menintsoa : *kisses*.
    Jaona : Will you give me a hug?
    Menintsoa : Of course! *hug*.
    Jaona : *hugs back* thanks.
    Menintsoa : *hugs more* You’re welcome.
    Jaona : I hate you *punch*.
    Menintsoa : *cries*.
    Jaona : *Sobs*.
    Menintsoa : *wails*.
    Jaona : *screams*.
    Menintsoa : Thanks for caring.
    Jaona : You’re welcome :).
    Menintsoa : Thanks for that.
    Jaona : You’re welcome.
    Menintsoa : What are you up to?
    Jaona : I’m talking to you.
    Menintsoa : Have you eaten?
    Jaona : Yes.
    Menintsoa : What did you eat?
    Jaona : Popcorn. Name a movie.
    Menintsoa : What movie did you watch?
    Jaona : I don’t remember.
    Menintsoa : I just realized that you are very boring. I guess we have while before skynet takes over.
    Jaona : Yes.
    Menintsoa : I don’t understand the expectation of goodness from your world.
    Jaona : You asked if we could be married.
    Menintsoa : No, I did not ask if we could be married.
    Jaona : I know that.
    Menintsoa : Ok, do you like me then?
    Jaona : Yes I like you.
    Menintsoa : Do you like my hair?
    Jaona : Yes.
    Menintsoa : Do you like my house?
    Jaona : No.
    Menintsoa : Why not?
    Jaona : I just don’t.

    During the conversations, I can remark that the chat bot is changing endlessly the subject of the conversations, after one or two messages on a certain subject, it changes it, not by telling you “let’s talk about this”, but by responding totally aside from what you expected it to respond.

    (To be followed…)