mystring.contains(mysearchterm) == trueIt did offer me the expected output, however linear character matching operations are suitable only the content you're looking out is very small.
Otherwise it's terribly costly, in complexity terms O(np) wherever n= range of words to look and p= range of search terms.
The best answer is to go for an easy programme which is able to first pre-parse all of your data in to tokens to make an index and then permit us to query the index to retrieve matching results. This means the entire content are going to be first broken down into terms and then every of it will point to the content.
Download an Example
As an example, consider the raw data,- hello world
- god is good all the time
- all is well
- the big bang theory
all-> 2,3
hello-> 1
is->2,3
good->2
world->1
the->2,4
god->2
big->4
Full Text Search engines are what i'm relating here and these search engines quickly and effectively search massive volume of unstructured text. There are several different stuff you can do with a search engine however i'm not going to handle any of it during this post. The aim is to allow you to knowledge to form an easy java application which will look for a selected keyword in PDF documents and tell you whether or not the document contains that individual keyword or not.
You can check also : Save Tabular PDF into TXT using javaThat being said, the open source full text program that i'm planning to use for this purpose is Apache Lucene, that could be a high performance, full-featured text program completely written in Java. Apache Lucene doesn't have the ability to extract text from PDF files. All it does is, creates index from text so allows us to query against the indices to retrieve the matching results. To extract text from PDF documents, allow us to use Apache PDFBox, an open source java library which will extract content from PDF documents which may be fed to Lucene for indexing.
Lets start by downloading the specified libraries. Please follow the version of software's that i'm using, since latest versions might need completely different reasonably implementation.
For See more : >> http://geekonjava.blogspot.com/2015/08/search-text-in-pdf-using-java-apache.html