DOWNLOAD
DOWNLOAD
RONDHUIT Service PDF Brochure
- Solr Subscription Package
- Training Course: Solr/ManifoldCF
- Training Course: Start Machine Learning using Apache Mahout & Spark
Next Generation Search (NLP & ML)
RONDHUIT has been researching and developing natural language processing (NLP) as well as machine learning (ML) and passing the result through our consulting service to customers so that they can use and manage search functions “more conveniently”, “more intelligently”, and “more easily”. You can combine the knowledge obtained from our services with systems including your current Lucene/Solr to develop them into next-generation search system. Followings are a few of related information materials.
- Lucene 4.0 Score Accounting (SlideShare)
- Introduction to Ambiguity Elimination of Natural Language Processing using Machine Learning (SlideShare)
- Machine Learning using Maximum Entropy Method and Perceptron in OpenNLP (SlideShare)
- NLP × Lucene/Solr (SlideShare)
- From Hidden Markov Model to Viterbi Algorithm (SlideShare)
- Automatic Acquisition of Synonym Knowledge from Dictionary Corpus (SlideShare)
- Extraction and Visualization of Related Term Network from Lucene Index (SlideShare)
- Term Extraction from Lucene Index (SlideShare)
livedoor News Corpus
Overview
This corpus is from news stories in “livedoor news” administered by NHN Japan and only the following ones that are governed by Creative Commons license were collected and had as many HTML tags as possible deleted.
- Topic News http://news.livedoor.com/category/vender/news/
- Sports Watch http://news.livedoor.com/category/vender/208/
- IT Life Hack http://news.livedoor.com/category/vender/223/
- Appliance Channel http://news.livedoor.com/category/vender/kadench/
- MOVIE ENTER http://news.livedoor.com/category/vender/movie_enter/
- Single Woman Report http://news.livedoor.com/category/vender/90/
- Smax http://news.livedoor.com/category/vender/smax/
- livedoor HOMME http://news.livedoor.com/category/vender/homme/
- Peachy http://news.livedoor.com/category/vender/ldgirls/
Collection Timing: Downloaded in early September, 2012. (plain text) : ldcc-20140209.tar.gz download (for Apache Solr) : livedoor-news-data.tar.gz Use this URL when you quote them in any paper.
Licence
Creative Commons license (display – revision prohibited) applies to each article file. Refer to corresponding LICENSE.txt in the subdirectory of extracted download file for crediting as they differ depending on news categories. livedoor is a registered trademark of NHN Japan.
Acknowledgements
RONDHUIT would like to express our sincere gratitude to the NHN Japan for releasing a part of “livedoor news” under the Creative Commons license.
White Papers
- RONDHUIT REPORT Vol.8 “Using Excel to Learn Lucene Score Accounting” (3 page)+Separate Excel file
- RONDHUIT REPORT Vol.7 “The New Features of Lucene/Solr 3.1” (4 page)
- RONDHUIT REPORT Vol.6 “The New Features of Solr 1.4” (2 page)
- RONDHUIT REPORT Vol.5 “The New Features of Lucene 2.9” (3 page)
- RONDHUIT REPORT Vol.4 “The New Features of Solr 1.3” (2 page)
- RONDHUIT REPORT Vol.3 “Obtaining Facet Counts and Refinement Search with Ludia and Solr” (6 page)
- RONDHUIT REPORT Vol.2 “Easy! Building Full-text Search Demonstration in 10 Minutes using Rails and Solr” (2 page)
- RONDHUIT REPORT Vol.1 “Search Performance of Solr” (5 page)
Seminar Materials
BIG DATA ANALYTICS TOKYO 2017
RONDHUIT gave a speech at BIG DATA ANALYTICS TOKYO 2017
Lucene/Solr Workshop 16@Recruit Technologies (2015)
Solr Workshop 15@Recruit Technologies (2014)
Solr Workshop 14@Recruit Technologies (2014)
Solr Workshop 12@VOYAGE GROUP (2013)
- ManifoldCF and Solr
Solr Workshop 10@VOYAGE GROUP (2013)
IBM Power Systems Solution Seminar 8@IBM Japan (2012)
Solr Workshop 8@VOYAGE GROUP (2012)
Solr Workshop 6@EC Navi (2011)
Solr Workshop 5@EC Navi (2011)
Building Search System with Open Source@SCSK (2010)
Solr Workshop 3@EC Navi (2010)
Next Generation Search Technology Forum 2010
RONDHUIT gave a speech at Next Generation Search Technology Forum 2010Free Seminar (2008)
LinuxWorld Expo/Tokyo 2008
Recruit and RONDHUIT co-spoke at LinuxWorld Expo/Tokyo 2008 (Tokyo Big Site)
Sample Codes on Books/Magazine Articles
- [Newly revised] Introduction to Apache Solr ~ Open source full-text search engine
- Introduction to Apache Solr
- Introduction to Apache Solr Complete edition for Lucene 2.0
- Introduction to Apache Solr Old edition for Lucene 1.9
- [ThinkIT]Full-text Search System by JBoss EAP+Lucene
- [ThinkIT]Building Full-text Search System Using Hibernate Search
Links to Articles within the Site
Lucene Revolution 2014 2014 @Washington, D.C. Business Trip Report
ApacheCon 2014 NA @Denver Business Trip Report
Lucene Revolution 2014 2013 @San Diego Business Trip Report
Lucene Revolution 2014 2012 @Boston Business Trip Report
Apache Lucene Eurocon 2011 @Barcelona Business Trip Report
Security Warnings
- [Solr Plug-in]Security Warning: CVE-2014-7810: Apache Tomcat Security Manager Bypass
- Security Warning: Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations
- [Solr Plug-in]Security Warning: CVE-2014-0119 Apache Tomcat information disclosure
- [Solr Plug-in]Security Warning: CVE-2014-0097 Apache Tomcat information disclosure
- [Solr Plug-in]Security Warning: CVE-2014-0096 Apache Tomcat information disclosure
- [Solr Plug-in]Security Warning: CVE-2014-0075 Apache Tomcat denial of service
- [Solr Plug-in]Security Warning: CVE-2014-0050 Apache Commons FileUpload and Apache Tomcat DoS
- [Solr Plug-in]Security Warning: CVE-2012-3544 Chunked transfer encoding extension size is not limited
- [Solr Plug-in]Security Warning: CVE-2013-2067 Session fixation with FORM authenticator
- [Solr Plug-in]Security Warning: CVE-2013-2071 Request mix-up if AsyncListener method throws RuntimeException
- [Solr Plug-in]Security Warning: CVE-2012-4431 Apache Tomcat Bypass of CSRF prevention filter
- [Solr Plug-in]Security Warning: CVE-2012-3546 Apache Tomcat Bypass of security constraints
- [Solr Plug-in]Security Warning: CVE-2012-4534 Apache Tomcat denial of service
- [Solr Plug-in]Security Warning: CVE-2012-3439 Apache Tomcat DIGEST authentication weaknesses
- [Solr Plug-in]Security Warning: CVE-2012-2733 Apache Tomcat Denial of Service
- [Solr Plug-in]Security Warning: Apache Tomcat and the hashtable collision DoS vulnerability
- [Solr Plug-in]Security Warning: CVE-2011-3190 Apache Tomcat Authentication bypass and information disclosure
- [Solr Plug-in]Security Warning: CVE-2011-2729: Commons Daemon fails to drop capabilities (Apache Tomcat)
- [Solr Plug-in]Security Warning: CVE-2011-2526 Apache Tomcat Information disclosure and availability vulnerabilities
- [Solr Plug-in]Security Warning: PingRequestHandlerのStackOverFlowError
- [Solr Plug-in]Security Warning: CVE-2011-2204 Apache Tomcat information disclosure
Apache ManifoldCF Related Articles
Named Entity Extraction Server – NExTR on Rails (GPL2)
NExTR on Rails is a partially altered OSS version of NExTR – a Ruby port of NExT named entity extraction tool developed at Mie University – that enables you to use it from Rails. The license applied is GPL2. NExTR on Rails is provided as an appliance for VMWare virtual environment. Download and extract the file and run from VMWare. Log in with “next” as user name and “chasen” as the password.